Volume 4

2025

1. Finite-valued Streaming String Transducers

Emmanuel Filiot ; Ismaël Jecker ; Christof Löding ; Anca Muscholl ; Gabriele Puppis ; Sarah Winter.

A transducer is finite-valued if for some bound k, it maps any given input to at most k outputs. For classical, one-way transducers, it is known since the 80s that finite valuedness entails decidability of the equivalence problem. This decidability result is in contrast to the general case, which makes finite-valued transducers very attractive. For classical transducers, it is also known that finite valuedness is decidable and that any k-valued finite transducer can be decomposed as a union of k single-valued finite transducers. In this paper, we extend the above results to copyless streaming string transducers (SSTs), answering questions raised by Alur and Deshmukh in 2011. SSTs strictly extend the expressiveness of one-way transducers via additional variables that store partial outputs. We prove that any k-valued SST can be effectively decomposed as a union of k (single-valued) deterministic SSTs. As a corollary, we obtain equivalence of SSTs and two-way transducers in the finite-valued case (those two models are incomparable in general). Another corollary is an elementary upper bound for checking equivalence of finite-valued SSTs. The latter problem was already known to be decidable, but the proof complexity was unknown (it relied on Ehrenfeucht's conjecture). Finally, our main result is that finite valuedness of SSTs is decidable. The complexity is PSpace, and even PTime when the number of variables is fixed.

2. An Alternate Proof of Near-Optimal Light Spanners

Greg Bodwin.

In 2016, a breakthrough result of Chechik and Wulff-Nilsen [SODA '16] established that every $n$-node graph $G$ has a $(1+\varepsilon)(2k-1)$-spanner of lightness $O_{\varepsilon}(n^{1/k})$, and recent followup work by Le and Solomon [STOC '23] generalized the proof strategy and improved the dependence on $\varepsilon$. We give a new proof of this result, with the improved $\varepsilon$-dependence. Our proof is a direct analysis of the often-studied greedy spanner, and can be viewed as an extension of the folklore Moore bounds used to analyze spanner sparsity.

3. Approximate Counting for Spin Systems in Sub-Quadratic Time

Konrad Anand ; Weiming Feng ; Graham Freifeld ; Heng Guo ; Jiaheng Wang.

We present two randomised approximate counting algorithms with $\widetilde{O}(n^{2-c}/\varepsilon^2)$ running time for some constant $c>0$ and accuracy $\varepsilon$: (1) for the hard-core model with fugacity $\lambda$ on graphs with maximum degree $\Delta$ when $\lambda=O(\Delta^{-1.5-c_1})$ where $c_1=c/(2-2c)$; (2) for spin systems with strong spatial mixing (SSM) on planar graphs with quadratic growth, such as $\mathbb{Z}^2$. For the hard-core model, Weitz's algorithm (STOC, 2006) achieves sub-quadratic running time when correlation decays faster than the neighbourhood growth, namely when $\lambda = o(\Delta^{-2})$. Our first algorithm does not require this property and extends the range where sub-quadratic algorithms exist. Our second algorithm appears to be the first to achieve sub-quadratic running time up to the SSM threshold, albeit on a restricted family of graphs. It also extends to (not necessarily planar) graphs with polynomial growth, such as $\mathbb{Z}^d$, but with a running time of the form $\widetilde{O}\left(n^2\varepsilon^{-2}/2^{c(\log n)^{1/d}}\right)$ where $d$ is the exponent of the polynomial growth and $c>0$ is some constant.

4. On Approximability of Steiner Tree in $\ell_p$-metrics

Henry Fleischmann ; Surya Teja Gavva ; Karthik C. S.

In the Continuous Steiner Tree problem (CST), we are given as input a set of points (called terminals) in a metric space and ask for the minimum-cost tree connecting them. Additional points (called Steiner points) from the metric space can be introduced as nodes in the solution. In the Discrete Steiner Tree problem (DST), we are given in addition to the terminals, a set of facilities, and any solution tree connecting the terminals can only contain the Steiner points from this set of facilities. Trevisan [SICOMP'00] showed that CST and DST are APX-hard when the input lies in the $\ell_1$-metric (and Hamming metric). Chleb\'ik and Chleb\'iková [TCS'08] showed that DST is NP-hard to approximate to factor of $96/95\approx 1.01$ in the graph metric (and consequently $\ell_\infty$-metric). Prior to this work, it was unclear if CST and DST are APX-hard in essentially every other popular metric. In this work, we prove that DST is APX-hard in every $\ell_p$-metric. We also prove that CST is APX-hard in the $\ell_{\infty}$-metric. Finally, we relate CST and DST, showing a general reduction from CST to DST in $\ell_p$-metrics.

5. Computing Threshold Budgets in Discrete-Bidding Games

Guy Avni ; Suman Sadhukhan.

In a two-player zero-sum graph game, the players move a token throughout a graph to produce an infinite play, which determines the winner of the game. Bidding games are graph games in which in each turn, an auction (bidding) determines which player moves the token: the players have budgets, and in each turn, both players simultaneously submit bids that do not exceed their available budgets, the higher bidder moves the token, and pays the bid to the lower bidder (called Richman bidding). We focus on discrete-bidding games, in which, motivated by practical applications, the granularity of the players' bids is restricted, e.g., bids must be given in cents. A central quantity in bidding games is threshold budgets: a necessary and sufficient initial budget for winning the game. Previously, thresholds were shown to exist in parity games, but their structure was only understood for reachability games. Moreover, the previously-known algorithms have a worst-case exponential running time for both reachability and parity objectives, and output strategies that use exponential memory. We describe two algorithms for finding threshold budgets in parity discrete-bidding games. The first is a fixed-point algorithm. It reveals, for the first time, the structure of threshold budgets in parity discrete-bidding games. Based on this structure, we develop a second algorithm that shows that the problem of finding threshold budgets is in NP and coNP for both reachability and parity objectives. […]

6. A $(3+\varepsilon)$-Approximate Correlation Clustering Algorithm in Dynamic Streams

Mélanie Cambus ; Fabian Kuhn ; Etna Lindy ; Shreyas Pai ; Jara Uitto.

Grouping together similar elements in datasets is a common task in data mining and machine learning. In this paper, we study streaming algorithms for correlation clustering, where each pair of elements is labeled either similar or dissimilar. The task is to partition the elements and the objective is to minimize disagreements, that is, the number of dissimilar elements grouped together and similar elements that get separated. Our main contribution is a semi-streaming algorithm that achieves a $(3 + \varepsilon)$-approximation to the minimum number of disagreements using a single pass over the stream. In addition, the algorithm also works for dynamic streams. Our approach builds on the analysis of the PIVOT algorithm by Ailon, Charikar, and Newman [JACM'08] that obtains a $3$-approximation in the centralized setting. Our design allows us to sparsify the input graph by ignoring a large portion of the nodes and edges without a large extra cost as compared to the analysis of PIVOT. This sparsification makes our technique applicable in models such as semi-streaming, where sparse graphs can typically be handled much more efficiently. Our work improves on the approximation ratio of the recent single-pass $5$-approximation algorithm and on the number of passes of the recent $O(1/\varepsilon)$-pass $(3 + \varepsilon)$-approximation algorithm [Behnezhad, Charikar, Ma, Tan FOCS'22, SODA'23]. Our algorithm is also more robust and can be applied in dynamic streams. […]

7. Multidimensional Quantum Walks, with Application to $k$-Distinctness

Stacey Jeffery ; Sebastian Zur.

While the quantum query complexity of $k$-distinctness is known to be $O\left(n^{3/4-1/4(2^k-1)}\right)$ for any constant $k \geq 4$, the best previous upper bound on the time complexity was $\widetilde{O}\left(n^{1-1/k}\right)$. We give a new upper bound of $\widetilde{O}\left(n^{3/4-1/4(2^k-1)}\right)$ on the time complexity, matching the query complexity up to polylogarithmic factors. In order to achieve this upper bound, we give a new technique for designing quantum walk search algorithms, which is an extension of the electric network framework. We also show how to solve the welded trees problem in $O(n)$ queries and $O(n^2)$ time using this new technique, showing that the new quantum walk framework can achieve exponential speedups.

8. Regular Languages in the Sliding Window Model

Moses Ganardi ; Danny Hucke ; Markus Lohrey ; Konstantinos Mamouras ; Tatiana Starikovskaya.

We study the space complexity of the following problem: For a fixed regular language $L$, we receive a stream of symbols and want to test membership of a sliding window of size $n$ in $L$. For deterministic streaming algorithms we prove a trichotomy theorem, namely that the (optimal) space complexity is either constant, logarithmic or linear, measured in the window size $n$. Additionally, we provide natural language-theoretic characterizations of the space classes. We then extend the results to randomized streaming algorithms and we show that in this setting, the space complexity of any regular language is either constant, doubly logarithmic, logarithmic or linear. Finally, we introduce sliding window testers, which can distinguish whether a sliding window of size $n$ belongs to the language $L$ or has Hamming distance $> \epsilon n$ to $L$. We prove that every regular language has a deterministic (resp., randomized) sliding window tester that requires only logarithmic (resp., constant) space.

9. Exponential Resolution Lower Bounds for Weak Pigeonhole Principle and Perfect Matching Formulas over Sparse Graphs

Susanna F. de Rezende ; Jakob Nordström ; Kilian Risse ; Dmitry Sokolov.

We show exponential lower bounds on resolution proof length for pigeonhole principle (PHP) formulas and perfect matching formulas over highly unbalanced, sparse expander graphs, thus answering the challenge to establish strong lower bounds in the regime between balanced constant-degree expanders as in [Ben-Sasson and Wigderson '01] and highly unbalanced, dense graphs as in [Raz '04] and [Razborov '03, '04]. We obtain our results by revisiting Razborov's pseudo-width method for PHP formulas over dense graphs and extending it to sparse graphs. This further demonstrates the power of the pseudo-width method, and we believe it could potentially be useful for attacking also other longstanding open problems for resolution and other proof systems.

10. Learning Algorithms for Verification of Markov Decision Processes

Tomáš Brázdil ; Krishnendu Chatterjee ; Martin Chmelik ; Vojtěch Forejt ; Jan Křetínský ; Marta Kwiatkowska ; Tobias Meggendorfer ; David Parker ; Mateusz Ujma.

We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}zdil et al., significantly extending it as well as refining several details and fixing errors. The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stopping criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP.

11. Ortho-Radial Drawing in Near-Linear Time

Yi-Jun Chang.

An orthogonal drawing is an embedding of a plane graph into a grid. In a seminal work of Tamassia (SIAM Journal on Computing 1987), a simple combinatorial characterization of angle assignments that can be realized as bend-free orthogonal drawings was established, thereby allowing an orthogonal drawing to be described combinatorially by listing the angles of all corners. The characterization reduces the need to consider certain geometric aspects, such as edge lengths and vertex coordinates, and simplifies the task of graph drawing algorithm design. Barth, Niedermann, Rutter, and Wolf (SoCG 2017) established an analogous combinatorial characterization for ortho-radial drawings, which are a generalization of orthogonal drawings to cylindrical grids. The proof of the characterization is existential and does not result in an efficient algorithm. Niedermann, Rutter, and Wolf (SoCG 2019) later addressed this issue by developing quadratic-time algorithms for both testing the realizability of a given angle assignment as an ortho-radial drawing without bends and constructing such a drawing. In this paper, we further improve the time complexity of these tasks to near-linear time. We establish a new characterization for ortho-radial drawings based on the concept of a good sequence. Using the new characterization, we design a simple greedy algorithm for constructing ortho-radial drawings.

12. PANDA: Query Evaluation in Submodular Width

Mahmoud Abo Khamis ; Hung Q. Ngo ; Dan Suciu.

In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe PANDA, an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes answers to a conjunctive query in time given by the the submodular width plus the output size of the query. The version in this paper represents a significant simplification of the original version [ANS, PODS'17].

13. A $4/3$ Approximation for $2$-Vertex-Connectivity

Miguel Bosch-Calvo ; Fabrizio Grandoni ; Afrouz Jabal Ameli.

The 2-Vertex-Connected Spanning Subgraph problem (2VCSS) is among the most basic NP-hard (Survivable) Network Design problems: we are given an (unweighted) undirected graph $G$. Our goal is to find a spanning subgraph $S$ of $G$ with the minimum number of edges which is $2$-vertex-connected, namely $S$ remains connected after the deletion of an arbitrary node. 2VCSS is well-studied in terms of approximation algorithms, and the current best (polynomial-time) approximation factor is $10/7$ by Heeger and Vygen [SIDMA'17] (improving on earlier results by Khuller and Vishkin [STOC'92] and Garg, Vempala and Singla [SODA'93]). Here we present an improved $4/3$ approximation. Our main technical ingredient is an approximation preserving reduction to a conveniently structured subset of instances which are ``almost'' 3-vertex-connected. The latter reduction might be helpful in future work.

14. Minimizing Tardy Processing Time on a Single Machine in Near-Linear Time

Nick Fischer ; Leo Wennmann.

In this work we revisit the elementary scheduling problem $1||\sum p_j U_j$. The goal is to select, among $n$ jobs with processing times and due dates, a subset of jobs with maximum total processing time that can be scheduled in sequence without violating their due dates. This problem is NP-hard, but a classical algorithm by Lawler and Moore from the 60s solves this problem in pseudo-polynomial time $O(nP)$, where $P$ is the total processing time of all jobs. With the aim to develop best-possible pseudo-polynomial-time algorithms, a recent wave of results has improved Lawler and Moore's algorithm for $1||\sum p_j U_j$: First to time $\tilde O(P^{7/4})$ [Bringmann, Fischer, Hermelin, Shabtay, Wellnitz; ICALP'20], then to time $\tilde O(P^{5/3})$ [Klein, Polak, Rohwedder; SODA'23], and finally to time $\tilde O(P^{7/5})$ [Schieber, Sitaraman; WADS'23]. It remained an exciting open question whether these works can be improved further. In this work we develop an algorithm in near-linear time $\tilde O(P)$ for the $1||\sum p_j U_j$ problem. This running time not only significantly improves upon the previous results, but also matches conditional lower bounds based on the Strong Exponential Time Hypothesis or the Set Cover Hypothesis and is therefore likely optimal (up to subpolynomial factors). Our new algorithm also extends to the case of $m$ machines in time $\tilde O(P^m)$. In contrast to the previous improvements, we take a different, more direct approach […]

15. Parameterized algorithms for block-structured integer programs with large entries

Jana Cslovjecsek ; Martin Koutecký ; Alexandra Lassota ; Michał Pilipczuk ; Adam Polak.

We study two classic variants of block-structured integer programming. Two-stage stochastic programs are integer programs of the form $\{A_i \mathbf{x} + D_i \mathbf{y}_i = \mathbf{b}_i\textrm{ for all }i=1,\ldots,n\}$, where $A_i$ and $D_i$ are bounded-size matrices. On the other hand, $n$-fold programs are integer programs of the form $\{{\sum_{i=1}^n C_i\mathbf{y}_i=\mathbf{a}} \textrm{ and } D_i\mathbf{y}_i=\mathbf{b}_i\textrm{ for all }i=1,\ldots,n\}$, where again $C_i$ and $D_i$ are bounded-size matrices. It is known that solving these kind of programs is fixed-parameter tractable when parameterized by the maximum dimension among the relevant matrices $A_i,C_i,D_i$ and the maximum absolute value of any entry appearing in the constraint matrix. We show that the parameterized tractability results for two-stage stochastic and $n$-fold programs persist even when one allows large entries in the global part of the program. More precisely, we prove that: - The feasibility problem for two-stage stochastic programs is fixed-parameter tractable when parameterized by the dimensions of matrices $A_i,D_i$ and by the maximum absolute value of the entries of matrices $D_i$. That is, we allow matrices $A_i$ to have arbitrarily large entries. - The linear optimization problem for $n$-fold integer programs that are uniform -- all matrices $C_i$ are equal -- is fixed-parameter tractable when parameterized by the dimensions of matrices $C_i$ and $D_i$ and by the maximum absolute […]

16. A Simple $(1-ε)$-Approximation Semi-Streaming Algorithm for Maximum (Weighted) Matching

Sepehr Assadi.

We present a simple semi-streaming algorithm for $(1-ε)$-approximation of bipartite matching in $O(\log{\!(n)}/ε)$ passes. This matches the performance of state-of-the-art "$ε$-efficient" algorithms -- the ones with much better dependence on $ε$ albeit with some mild dependence on $n$ -- while being considerably simpler. The algorithm relies on a direct application of the multiplicative weight update method with a self-contained primal-dual analysis that can be of independent interest. To show case this, we use the same ideas, alongside standard tools from matching theory, to present an equally simple semi-streaming algorithm for $(1-ε)$-approximation of weighted matchings in general (not necessarily bipartite) graphs, again in $O(\log{\!(n)}/ε)$ passes.