Thursday, February 5, 2026

Counting the integral points in a 0-1 knapsack polytope

main

The Problem

\(I = [0,n), a \in Z^{I}\), \(a(i) \geq 0 \text{ } \forall i \in I\)

\(W \in Z\), \(W \geq 0\)

\(F_{I,a,W}=\{J \mid J \subseteq I, \sum_{i \in J} a(i) \leq W)\)

What is \(|F_{I,a,W}|\)?

\(F_{I,a,W}\) can be decomposed into a finite union of disjoint sets.

\(F_{I,a,w}^{'}=\{J \mid J \subseteq I, \sum_{i \in J} a(i) = w)\)

\(F_{I,a,w} = \cup_{w\leq W} F_{I,a,w}^{'}\)

Because \(a(i) \geq 0\), \(F_{I,a,w}^{'}=\emptyset\) for all \(w < 0\), so:

\(F_{I,a,w} = \cup_{0 \leq w\leq W} F_{I,a,w}^{'}\)

If \(J_0 \subseteq J_1 \subseteq \dots \subseteq I\) is a chain, then \(F_{J_0,a,w}^{'} \subseteq F_{J_1,a,w}^{'} \subseteq \dots \subseteq F_{I,a,w}^{'}\) is a chain. A chain of subsets that also covers \(I\) can be used to count \(|F_{I,a,w}^{'}|\)

Let \(J_i=[0,i)\). Then, \(J_0=[0,0)=\emptyset\) and \(I = J_{n}=[0,n)\)

If \(w=0\), then \(|F_{\emptyset,a,w}^{'}|=1\)

If \(w \neq 0\), then \(|F_{\emptyset,a,w}^{'}|=0\)

\(|F_{J_{i},a,w}^{'}|=|F_{J_{i-1},a,w}^{'}|+|F_{J_{i-1},a,w-a(i)}^{'}|\)

\(|F_{I,a,W}|=\sum_{w=0}^{W} |F_{I,a,w}^{'}|\)

If you fix \(J\), then \(|F_{J,a,w}^{'}|\) is a total function of w. \(|F_{J_i,a,w}^{'}|\) can be expressed as a convolution.

\(|F_{J_i,a,w}^{'}| = |F_{\emptyset,a,w}^{'}|*|F_{\{a(0)\},a,w}^{'}|*\dots * |F_{\{a(i-1)\},a,w}^{'}|\)

\(|F_{I,a,w}^{'}| = |F_{\emptyset,a,w}^{'}|*|F_{\{a(0)\},a,w}^{'}|*\dots * |F_{\{a(n-1)\},a,w}^{'}|\)

You can arrive at this convolution with generating functions too. There’s a nice signal processing intuition for this: the functions being convolved are essentially linear two-tap unattenuated forward echo filters. And applying the recurrence successively just performs a sparse convolution.

Finally,

\(|F_{I,a,W}|=\sum_{w=0}^{W} |F_{I,a,w}^{'}|\)

Supposedly, there’s more counting schemes beyond this. I’m not surprised since this problem is coupled with many other counting problems in algebra and geometry: for example, the constraint on the set system is that of the 0-1 knapsack problem, and the feasible points for that integer linear program are exactly the characteristic functions/vectors of feasible sets in the set system \(F_{I,a,W}\).

I thought of another problem for later while solving this one. If \(F_{I,a,w}\) is an independence system over a finite \(I\), then it’s equal to the union of its maximal elements. How do you count or enumerate those elements?

here’s code for the counting scheme, where the convolutions are computed using the direct method. I will make a post about the number theoretic transform later.

#include<iostream>
#include<vector>
#include<cassert>

int main() {
    int n;
    int W; 
    std::cin >> n;
    std::cin >> W;
    assert(W >= 0);
    std::vector<int> a(n); // sequence a
    int i = 0;
    while (i < n) {
        int a_i;
        std::cin >> a_i;
        assert(a_i >= 0);
        a[i] = a_i;
        i = i + 1;
    }
    std::vector<long long> count_F(W + 1);
    count_F[0] = 1;
    i = 0;
    while (i < n) {
        int w = W;
        while (w >= a[i]) {
            if (w - a[i] >= 0) {
                count_F[w] = count_F[w] + count_F[w - a[i]];
            }
            w = w - 1;
        }
        i = i + 1;
    }
    long long count = 0;
    int w = 0;
    while (w <= W) {
        std::cout << count_F[w] << std::endl;
        count = count + count_F[w];
        w = w + 1;
    }
    std::cout << count << std::endl;
}
    

Tuesday, February 3, 2026

easy problem tackled with set system decomposition

maximum_sum_subarray

Here’s an easy discrete optimization problem that many computing students see: "maximum sum subarray".

\(I\) is some finite chain, e.g. \(I=[0,n)\).

Let \(F = \{[a,b] \mid a,b\in I\}\) be the set of intervals in \(I\).

There’s a finite sequence of integers \(s\in\mathbb{Z}^{I}\).

For interval \(J \in F\), let \(f(J)=\sum_{i\in J}s(i)\).

Find \[\max_{J\in F} f(J)\]

solution via set system decomposition

Let \(F_b = \{[a,b]\mid a\in I, a\leq b\}\). \(F_b\) is the set of intervals in \(I\) whose lowest upper bound is b.

\[F = \{[a,b] \mid a,b\in I\} = \{\emptyset\}\cup\{[a,b] \mid a,b\in I,a\leq b\}=\{\emptyset\}\cup\bigcup_{b\in I}F_b\]

If \(b\) is not the maximum element of I, then let \(b + 1\) be the unique element that covers \(b\), i.e. \(b \leq c \leq b + 1\) implies that \(c=b\) or \(c=b+1\)

For \(b,b+1\in I\), \[F_{b+1}=\{J\cup \{b+1\}\mid J\in F_{b}\}\cup\{\{b+1\}\}\]

\(f\) is modular, so

\[\{f(J) \mid J\in F_{b+1}\}=\{f(J)+ f(\{b+1\}) \mid J\in F_{b}\}\cup\{f(\{b+1\})\}\]

\(f(\{b+1\})=s(b+1)\), so

\[\{f(J) \mid J\in F_{b+1}\}=\{f(J)+ s(b+1) \mid J\in F_{b}\}\cup\{s(b+1)\}\]

Recalling the decomposition of \(F\) and the optimization program over it,

\[\max_{J\in F} f(J)=\text{max}(f(\emptyset),\max_{b\in I}\max_{J\in F_b}f(J))\]

Recall the relation between \(F_{b+1}\) and \(F_{b}\)

\[\max_{J\in F_{b+1}}f(J)=\text{max}(\max_{J\in f_b}(f(J)+s(b+1)),s(b+1))\] \[\max_{J\in F_{b+1}}f(J)=\text{max}((\max_{J\in f_b}f(J))+s(b+1),s(b+1))\]

\(\text{max}(x+z,y+z)=\text{max}(x,y)+z\), so

\[\max_{J\in F_{b+1}}f(J)=\text{max}((\max_{J\in F_b}f(J)),0)+s(b+1)\]

This can be followed to a dynamic programming solution over the b axis. It’s very simple, so I’m not going to show code for it.

I’ve seen other "explanations" to this problem that use examples and heuristic thinking, but I don’t like those all that much. I’m happy with what I put together here because it follows a very neat pattern: decompose the set system (feasible set). (2) Put the pieces in relation with another. (3) Exploit the relation. This is more valuable than example churning and "guesswork-ish" dynamic programming relation ansatzes.

Here’s interesting extensions to this problem I thought about. What if \(s\) is an infinite sequence with some regularity condition tacked onto it, e.g. some recurrence holds over it, or it’s periodic. What if \(I\) is not a chain?

Sunday, February 1, 2026

Distinct Values Subarrays

\(I\) is an index set: \(I=[0,n) = \{0, 1, \dots, n - 1 \}\)

\([i,j] = \{i,i+1,\dots,j\}\), \(|[i,j]| = j - i + 1\)

\((I, F)\) is a set family: \((I, F)\), \(F = \{[i,j] \mid i,j \in I, i \leq j \} = \cup_{j \in I} \{[i,j] \mid i \in I, i \leq j\}\)

\(|F|=|\cup_{j \in I} \{[i,j] \mid i \in I, i \leq j\}|=\sum_{j \in I}|\{[i,j] \mid i \in I, i \leq j\}|=\sum_{j=0}^{n-1}j+1\)

\(|F|=\binom{n + 1}{2}\)

\(A\) is some finite set: \(A = \{\text{a}, \text{b}, \dots\}\). \(s\) is a sequence over \(A\): \(s \in A^{I}\)

\(F_s\) is the family of index sets corresponding to all subarrays of \(s\) that have distinct elements: \(F_s=\{[i,j] \mid i,j \in I, i \leq j, \land_{i,j \in I, i \neq j} s(i) \neq s(j)\}\).

\(F_s\) is a finite, hereditary (downward-closed) set family, so it equals the union of the ideals of its maximal elements. It's not an independence system though because the empty set is not in \(F_s\) The maximal elements of \(F_s\) are index sets corresponding to distinct subarrays, that are not included in any other index set: these subarrays can't be extended without repeating an element. For example, consider this sequence: \(s\equiv\langle \text{abcbd}\rangle\). The maximal distinct subarrays are \(\langle \text{abc}\rangle\) and \(\langle \text{cbd}\rangle\). The corresponding index sets are \(\{0,1,2\}\) and \(\{2,3,4\}\). \(F_s = I(\{0,1,2\}) \cup I(\{2,3,4\})\).

\(I(X)=\{Y \mid Y \in F_s, Y \subseteq X\}\) is the ideal of \(X\). A "sliding window" algorithm for computing \(|F_s|\) follows from this observation: expand a subarray to find the maximal distinct subarrays and use the inclusion-exclusion principle to count their union: \(|F_s|=|I(J_1)| + |I(J_2)| - |I(J_1) \cap I(J_2)|\). This is easy to implement because only adjacent maximal distinct subarrays overlap.


Distinct Values Subarrays problem on CSES: https://cses.fi/problemset/task/3420/

My solution program: https://cses.fi/paste/10b865ef59bbe20ef6257a/


Here's another solution program that is only different in the way it computes the combinations: https://cses.fi/paste/10b865ef59bbe20ef6257a/


laminar set families, modular set functions, segment trees

polynomial_queries

For sets \(A,B\), let \(A\dotdiv B=(A-B)\cup(B-A)\) be their symmetric difference.

The set of finite covers of set \(A\) is \[\bigcup_{k\in\mathbb{N}}\{\{A_1,\dots,A_k\}\mid A_i\subseteq A, A_1\cup\dots\cup A_k=A\}\]

The set of finite partitions of set \(A\) is \[\bigcup_{k\in\mathbb{N}}\{\{A_1,\dots,A_k\}\mid A_i\subseteq A, A_1\cup\dots\cup A_k=A_1\dotdiv\dots\dotdiv A_k=A\}\]

For \(A'\subseteq A\), let \(\chi_{A'} \in \{0,1\}^A\) be the characteristic function of set A’.

\[a\in A' \leftrightarrow \chi_{A'}(i)=1\]

Let \(I\) be some finite set.

Consider the set system, \(2^{I}\) and integer set function \(f\in\mathbb{Z}^{2^{I}}\). \(f\) is modular when the following holds

\[f(J_1\cup J_2)+f(J_1\cap J_2)=f(J_1)+f(J_2)\]

Let \(s\in\mathbb{Z}^I\). \(f(J)=\sum_{i\in J}s(i)\) is modular.

segment trees

Let \(I=[0,2^k)\): \(I\) is an interval over the natural numbers.

Let \(S(l,j)=[2^{l}j,2^l(j+1))\). \(S(l,j)\) is a dyadic interval

Now, consider the following family of intervals over \(I=[0,n)\):

\[F = \bigcup_{l \in [0,k]}\{S(l,j)\mid j\in [0,2^{k-l})\}\]

\((F,\subseteq)\) is a laminar set family. Since there’s a unique maximum element, \(F\) is a tree. Sets \(S(l-1,2j)\) and \(S(l-1,2j+1)\) partition \(S(l,j)\).

It follows from modularity of \(f\) and the tree structure of \(F\subseteq 2^{I}\) that:

\[f(S(l,j))=f(S(l-1,2j))+f(S(l-1,2j+1))\]

Consider this algorithm.

Split(\(l, j, J\)):
if \(S(l, j) \cap J=\emptyset\) then
return \(\emptyset\)
if \(S(l, j) \subseteq J\) then
return \(S(l, j)\)
else
return \(Split(l-1, 2j, J) \cup Split(l-1, 2j+1, J)\)

\(Split(k,0,J)\subseteq F\) partitions \(J\)

This can be used to solve problems involving range updates and queries over sequences. Here’s a C++ excerpt from a solution to such a problem. Ignore the "lazy" segment trees.

if (disjoint(c,d,a,b)) {
    return 0;
}
else {
    if (subseteq(c,d,a,b)) {
        return f_S[l][j];
    }
    else {
        long long left = get(lazy1_f_S, lazy2_f_S, f_S, k, l-1,2*j,a,b);
        long long right = get(lazy1_f_S, lazy2_f_S, f_S, k, l-1,2*j+1,a,b);
        return left + right;
    }
}

full solution

Saturday, December 13, 2025

A Rigorous Conversion of Natural Language to SQL

Some time back, I was thinking about the problem of converting natural language to a relational query language, e.g. SQL. Here's PostgreSQL's parser.

The typical language models nowadays are not robust estimators, and they're even worse at generating formal languages. They're not even fit for performing their own uncertainty quantification either. This in part has to do with the "largeness" of their parametric forms, the parametric form itself, and the optimization programs done to train them. But, I won't explain in full that just now.

How do you convert a natural language query to a relational query from noisy outputs of a language model?

First, remember that statistical estimation can be posed as a statistical inverse problem. You can imagine the "natural language to relational query language" problem to be some unknown partial function (since not all natural language sentences are queries). This is important because it can be cast as an inverse problem. What's the corresponding forward problem? It's simply the "relational query language to natural language" problem.

Considering the limitations of present language models, I think this forward problem is special because it's much more well-posed than the inverse if you do it in a bottom-up manner. SQL is inductively defined by a formal grammar. Imagine that you have two relations and a natural language analogue to both of them. Now, what's the natural language analogue of the equijoin of those relations? That's way easier for today's language models to solve.

But, what's the use of solving the forward problem if we want to actually solve the inverse problem? Well, given some noisy results of the inverse (language model NL-to-SQL conversions), you can retrieve the relatively less noisy forward results of those outputs, and then measure pairwise consistency/similarity.

I can imagine a number of parametric and non-parametric statistics that you could form out of those measurements. But, whatever. This basically reduces the difficulty of NL-to-SQL to semantic similarity over natural language, which is very easy nowadays for natural language models.

You could make this process more rigorous with intermediate DSLs that have to do with your relational schema. That's not a coincidence: this problem is essentially that of a compiler since this is a matter of mapping one language to another. Just as LLVM converts parsed C programs to machine code, the inverse problem converts natural language to SQL. An intermediate language does not really change things. In fact, LLVM does have an intermediate language/representation: LLVM IR.

Monday, October 13, 2025

Baltic Olympiad in Informatics 2024 (BOI 2024) problem: jobs

Problem Statement 

Maximize the sum of profits for a subset of jobs S:

max S⊆{1,...,n}i ∈ S xi

such that if job i is picked, then job p(i) is also picked:

i ∈ S → p(i) ∈ S

and there exists an order σ for the jobs in S in which no debt is incurred for all prefixes of the order:

∃ σ ( s + ∑kj=1 xσ(j) ≥ 0,   k = 1, ..., |S| )
Without the second constraint, the answer is simply just y(0), where y(i) = x(i) + max(0, y(first child job of job i)) + max(0, y(second child job of job i)) + … + max(0, y(last child job of job i)). That can easily be computed bottom-up since p(i) < i. This case is equivalent to the case of having enough initial euros to foot any possible intermediate debt: s = 10^18. Note, that x(0) = s and p(0) = 0: job 0 is a dummy job whose profit is the initial balance.

The final job picks will form a single connected component rooted at job 0 because of the first constraint: if job i is picked, then job p(i) is picked. So, picking a job is akin to merging that job with its parent. This is easy if both x(i), x(p(i)) > 0, but not  if x(i) > 0 and x(p(i)) < 0. So, you have to remember the minimum "cost to take" the new merged job that has profit x(i) + x(p(i)). You'd also need to remember the connected components across merges. Both can be done with auxiliary memory, e.g. cost_to_take(i) and a union-find data structure (U(i), find, merge).
 
How do you go about merging jobs to construct the solution then? Do you construct in a top-down manner by growing the solution at its' edges? I found that direction to be a "red herring", and all my attempts for such a job-merging strategy broke down for this example:
6 0
4 0
-3 
-1 2
-1 2
3 3
3 4
 
If x(i) > 0 and job p(i) is picked, then job i will certainly be picked. Further, if x(i) > 0 and cost_to_take(i)≤x(p(i)), then job i will certainly be picked if job p(i) is picked. There's more of these, but notice that these are true all across the job tree. It turns out that performing such job merges all across the tree greedily gives the answer.
 
The greedy job-merging solution is to merge the job that is least expensive to take and has positive profit with its parent job, i.e. out of all the jobs with x(i) > 0, take the one with the smallest cost_to_take(i) and merge it to its parent. I won't go into the actual casework for job-merging as it can be seen in the solution. Here's an additional description of the greedy algorithm. Also, there's a small catch for merging with the zeroth job.

Here's the code for my solution.

Additional observation and questions:
There's a greedy solution to this problem, but I did not reach it from this starting point (the optimization program). Is there some structure in this optimization program that I am missing? I suspect that there is and that exploiting it makes the greedy solution very apparent. But, I have not spotted it. I am wondering about this because there's lots of other optimization problems with greedy solutions, e.g. huffman coding (minimize expected codeword length), basis pursuit (maximize sparsity of signal decomposition), ... 
 
Possible bottom-up dynamic programming solution:
Let z(i, k) = maximum profit attainable if you take job i and have k euros initially. This is enough for a bottom-up dynamic programming solution (convince yourself of this). I could not put together an implementation for it though; someone else supposedly did. One interesting thing to note is that if the job subtree rooted at job i is replaced with another job tree that has the exact same z(i,k) profile, then the problem is unchanged. So, z(i,k) carries all information needed about a job tree for this problem.