2025
A transducer is finite-valued if for some bound k, it maps any given input to at most k outputs. For classical, one-way transducers, it is known since the 80s that finite valuedness entails decidability of the equivalence problem. This decidability result is in contrast to the general case, which makes finite-valued transducers very attractive. For classical transducers, it is also known that finite valuedness is decidable and that any k-valued finite transducer can be decomposed as a union of k single-valued finite transducers. In this paper, we extend the above results to copyless streaming string transducers (SSTs), answering questions raised by Alur and Deshmukh in 2011. SSTs strictly extend the expressiveness of one-way transducers via additional variables that store partial outputs. We prove that any k-valued SST can be effectively decomposed as a union of k (single-valued) deterministic SSTs. As a corollary, we obtain equivalence of SSTs and two-way transducers in the finite-valued case (those two models are incomparable in general). Another corollary is an elementary upper bound for checking equivalence of finite-valued SSTs. The latter problem was already known to be decidable, but the proof complexity was unknown (it relied on Ehrenfeucht's conjecture). Finally, our main result is that finite valuedness of SSTs is decidable. The complexity is PSpace, and even PTime when the number of variables is fixed.
In 2016, a breakthrough result of Chechik and Wulff-Nilsen [SODA '16] established that every $n$-node graph $G$ has a $(1+\varepsilon)(2k-1)$-spanner of lightness $O_{\varepsilon}(n^{1/k})$, and recent followup work by Le and Solomon [STOC '23] generalized the proof strategy and improved the dependence on $\varepsilon$. We give a new proof of this result, with the improved $\varepsilon$-dependence. Our proof is a direct analysis of the often-studied greedy spanner, and can be viewed as an extension of the folklore Moore bounds used to analyze spanner sparsity.
We present two randomised approximate counting algorithms with $\widetilde{O}(n^{2-c}/\varepsilon^2)$ running time for some constant $c>0$ and accuracy $\varepsilon$: (1) for the hard-core model with fugacity $\lambda$ on graphs with maximum degree $\Delta$ when $\lambda=O(\Delta^{-1.5-c_1})$ where $c_1=c/(2-2c)$; (2) for spin systems with strong spatial mixing (SSM) on planar graphs with quadratic growth, such as $\mathbb{Z}^2$. For the hard-core model, Weitz's algorithm (STOC, 2006) achieves sub-quadratic running time when correlation decays faster than the neighbourhood growth, namely when $\lambda = o(\Delta^{-2})$. Our first algorithm does not require this property and extends the range where sub-quadratic algorithms exist. Our second algorithm appears to be the first to achieve sub-quadratic running time up to the SSM threshold, albeit on a restricted family of graphs. It also extends to (not necessarily planar) graphs with polynomial growth, such as $\mathbb{Z}^d$, but with a running time of the form $\widetilde{O}\left(n^2\varepsilon^{-2}/2^{c(\log n)^{1/d}}\right)$ where $d$ is the exponent of the polynomial growth and $c>0$ is some constant.
In the Continuous Steiner Tree problem (CST), we are given as input a set of points (called terminals) in a metric space and ask for the minimum-cost tree connecting them. Additional points (called Steiner points) from the metric space can be introduced as nodes in the solution. In the Discrete Steiner Tree problem (DST), we are given in addition to the terminals, a set of facilities, and any solution tree connecting the terminals can only contain the Steiner points from this set of facilities. Trevisan [SICOMP'00] showed that CST and DST are APX-hard when the input lies in the $\ell_1$-metric (and Hamming metric). Chleb\'ik and Chleb\'iková [TCS'08] showed that DST is NP-hard to approximate to factor of $96/95\approx 1.01$ in the graph metric (and consequently $\ell_\infty$-metric). Prior to this work, it was unclear if CST and DST are APX-hard in essentially every other popular metric. In this work, we prove that DST is APX-hard in every $\ell_p$-metric. We also prove that CST is APX-hard in the $\ell_{\infty}$-metric. Finally, we relate CST and DST, showing a general reduction from CST to DST in $\ell_p$-metrics.
In a two-player zero-sum graph game, the players move a token throughout a graph to produce an infinite play, which determines the winner of the game. Bidding games are graph games in which in each turn, an auction (bidding) determines which player moves the token: the players have budgets, and in each turn, both players simultaneously submit bids that do not exceed their available budgets, the higher bidder moves the token, and pays the bid to the lower bidder (called Richman bidding). We focus on discrete-bidding games, in which, motivated by practical applications, the granularity of the players' bids is restricted, e.g., bids must be given in cents. A central quantity in bidding games is threshold budgets: a necessary and sufficient initial budget for winning the game. Previously, thresholds were shown to exist in parity games, but their structure was only understood for reachability games. Moreover, the previously-known algorithms have a worst-case exponential running time for both reachability and parity objectives, and output strategies that use exponential memory. We describe two algorithms for finding threshold budgets in parity discrete-bidding games. The first is a fixed-point algorithm. It reveals, for the first time, the structure of threshold budgets in parity discrete-bidding games. Based on this structure, we develop a second algorithm that shows that the problem of finding threshold budgets is in NP and coNP for both reachability and parity objectives. […]
Grouping together similar elements in datasets is a common task in data mining and machine learning. In this paper, we study streaming algorithms for correlation clustering, where each pair of elements is labeled either similar or dissimilar. The task is to partition the elements and the objective is to minimize disagreements, that is, the number of dissimilar elements grouped together and similar elements that get separated. Our main contribution is a semi-streaming algorithm that achieves a $(3 + \varepsilon)$-approximation to the minimum number of disagreements using a single pass over the stream. In addition, the algorithm also works for dynamic streams. Our approach builds on the analysis of the PIVOT algorithm by Ailon, Charikar, and Newman [JACM'08] that obtains a $3$-approximation in the centralized setting. Our design allows us to sparsify the input graph by ignoring a large portion of the nodes and edges without a large extra cost as compared to the analysis of PIVOT. This sparsification makes our technique applicable in models such as semi-streaming, where sparse graphs can typically be handled much more efficiently. Our work improves on the approximation ratio of the recent single-pass $5$-approximation algorithm and on the number of passes of the recent $O(1/\varepsilon)$-pass $(3 + \varepsilon)$-approximation algorithm [Behnezhad, Charikar, Ma, Tan FOCS'22, SODA'23]. Our algorithm is also more robust and can be applied in dynamic streams. […]
While the quantum query complexity of $k$-distinctness is known to be $O\left(n^{3/4-1/4(2^k-1)}\right)$ for any constant $k \geq 4$, the best previous upper bound on the time complexity was $\widetilde{O}\left(n^{1-1/k}\right)$. We give a new upper bound of $\widetilde{O}\left(n^{3/4-1/4(2^k-1)}\right)$ on the time complexity, matching the query complexity up to polylogarithmic factors. In order to achieve this upper bound, we give a new technique for designing quantum walk search algorithms, which is an extension of the electric network framework. We also show how to solve the welded trees problem in $O(n)$ queries and $O(n^2)$ time using this new technique, showing that the new quantum walk framework can achieve exponential speedups.
We study the space complexity of the following problem: For a fixed regular language $L$, we receive a stream of symbols and want to test membership of a sliding window of size $n$ in $L$. For deterministic streaming algorithms we prove a trichotomy theorem, namely that the (optimal) space complexity is either constant, logarithmic or linear, measured in the window size $n$. Additionally, we provide natural language-theoretic characterizations of the space classes. We then extend the results to randomized streaming algorithms and we show that in this setting, the space complexity of any regular language is either constant, doubly logarithmic, logarithmic or linear. Finally, we introduce sliding window testers, which can distinguish whether a sliding window of size $n$ belongs to the language $L$ or has Hamming distance $> \epsilon n$ to $L$. We prove that every regular language has a deterministic (resp., randomized) sliding window tester that requires only logarithmic (resp., constant) space.
We show exponential lower bounds on resolution proof length for pigeonhole principle (PHP) formulas and perfect matching formulas over highly unbalanced, sparse expander graphs, thus answering the challenge to establish strong lower bounds in the regime between balanced constant-degree expanders as in [Ben-Sasson and Wigderson '01] and highly unbalanced, dense graphs as in [Raz '04] and [Razborov '03, '04]. We obtain our results by revisiting Razborov's pseudo-width method for PHP formulas over dense graphs and extending it to sparse graphs. This further demonstrates the power of the pseudo-width method, and we believe it could potentially be useful for attacking also other longstanding open problems for resolution and other proof systems.
We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}zdil et al., significantly extending it as well as refining several details and fixing errors. The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stopping criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP.
An orthogonal drawing is an embedding of a plane graph into a grid. In a seminal work of Tamassia (SIAM Journal on Computing 1987), a simple combinatorial characterization of angle assignments that can be realized as bend-free orthogonal drawings was established, thereby allowing an orthogonal drawing to be described combinatorially by listing the angles of all corners. The characterization reduces the need to consider certain geometric aspects, such as edge lengths and vertex coordinates, and simplifies the task of graph drawing algorithm design. Barth, Niedermann, Rutter, and Wolf (SoCG 2017) established an analogous combinatorial characterization for ortho-radial drawings, which are a generalization of orthogonal drawings to cylindrical grids. The proof of the characterization is existential and does not result in an efficient algorithm. Niedermann, Rutter, and Wolf (SoCG 2019) later addressed this issue by developing quadratic-time algorithms for both testing the realizability of a given angle assignment as an ortho-radial drawing without bends and constructing such a drawing. In this paper, we further improve the time complexity of these tasks to near-linear time. We establish a new characterization for ortho-radial drawings based on the concept of a good sequence. Using the new characterization, we design a simple greedy algorithm for constructing ortho-radial drawings.
In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe PANDA, an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes answers to a conjunctive query in time given by the the submodular width plus the output size of the query. The version in this paper represents a significant simplification of the original version [ANS, PODS'17].