The Complexity of Iterated Reversible Computation

We study a class of functional problems reducible to computing $f^{(n)}(x)$ for inputs $n$ and $x$, where $f$ is a polynomial-time bijection. As we prove, the definition is robust against variations in the type of reduction used in its definition, and in whether we require $f$ to have a polynomial-time inverse or to be computible by a reversible logic circuit. These problems are characterized by the complexity class $\mathsf{FP}^{\mathsf{PSPACE}}$, and include natural $\mathsf{FP}^{\mathsf{PSPACE}}$-complete problems in circuit complexity, cellular automata, graph algorithms, and the dynamical systems described by piecewise-linear transformations.


Introduction
Reversible logic circuits, made from Boolean gates with bijective input-output mappings, can simulate any other combinational logic circuit [3].Other natural models of computing based on low-level reversible operations include reversible cellular automata [26,27,51,40], reversible Turing machines [25,8,35], and even reversibility in LISP programs, where the pair of fundamental operations (car,cdr) can be seen as inverse to cons [30,14].Early motivation for reversible computing came from a lower bound on the energy needed for each transition of a conventional logic gate, arising from fundamental physics [33], and the realization that reversible devices can transcend this bound and compute with arbitrarily small amounts of energy [4,54].More recently, reversible computing has gathered interest and attention from the recognition that quantum circuits must be reversible [45,9].Deterministic reversible logic gates do not have the full power of quantum logic, but are the only kind of deterministic gates that can be incorporated into quantum circuits, so understanding their power is important for understanding the power of quantum computing more generally [1].function iteration, can be seen as a space-complexity analogue of the same open problem, one that we resolve in this work.
We are led by these considerations to the following class of computational problems: Let  be a polynomial-time function whose numbers of input and output bits are equal, and assume in addition that one of the following is true: is a bijection from its inputs to its outputs,  is a bijection whose inverse function is also computable in polynomial time, or  can be computed by a uniform family of reversible logic circuits.What is the complexity of computing  () () from  and ?
As we prove, the three variations in assumptions about  lead to equivalent complexity for this problem, even though they may describe different classes of functions.To show this, we formulate a complexity class of functional problems encapsulating the iteration of polynomialtime bijections.As we prove, the resulting complexity class turns out to equal FP PSPACE , the class of functional problems that can be solved in polynomial time with access to a PSPACE oracle. 1   We demonstrate the applicability of this theory by finding concrete FP PSPACE -complete problems coming from circuit complexity, graph algorithms, cellular automata, and dynamical systems.

New results
We prove the following results.
We define a family of functional complexity classes based on the iteration of bijective polynomial-time functions, with nine variations depending on whether we incorporate reductions from other problems into the definition, which kind of reduction is allowed, and which specific class of polynomial-time bijection is allowed (Definition 3.1).Despite this high variation, we show that six of these classes, the ones allowing either Turing or many-one reductions and allowing any of three types of polynomial-time bijection, are equal to each other (Theorem 3.6).
We observe that these equivalent complexity classes have a complete problem in circuit complexity, falling naturally out of one of our equivalent formulations: given a reversible logic circuit, a number , and an initial value for the circuit input wires, what is the result of feeding the outputs of the circuit back through the inputs for  iterations through the circuit?Equivalently, what would be the output of the circuit formed by composing in sequence  copies of the given circuit?(observation 3.8.) We consider a family of computational problems on implicit graphs (graphs defined procedurally by polynomial-time algorithms for listing the neighbors of each vertex) in 1 Note that FP PSPACE ≠ FPSPACE, the class of functional problems solvable in polynomial space, because FPSPACE (as defined e.g. by Ladner [32]) does not count its output against its space complexity, and includes problems with exponentially long outputs.In contrast FP PSPACE is limited to outputs of polynomial size.
which the graph is undirected of maximum degree two, the input is a leaf (a vertex of degree one), and the desired output is the other leaf in the same connected component.
Such "second-leaf problems" are known to be hard for FP PSPACE [2,44,34], and include producing the same result as Thomason's lollipop algorithm for a second Hamiltonian cycle in a cubic graph [50].We show that all second-leaf problems can be transformed into an equivalent iterated bijection (Theorem 4.2), and as a consequence that the complexity class of iterated bijections, under any of its six equivalent definitions, equals FP PSPACE (Theorem 4.3).
We study finding the configuration of a reversible cellular automaton, after  steps from a given initial configuration.We show that this is complete for FP PSPACE for the billiard-ball model, a two-dimensional reversible cellular automaton of Margolus [36] (Theorem 5.1).
Although certain one-dimensional reversible cellular automata were known to be Turingcomplete, their proofs of universality cannot be used for FP PSPACE -completeness, because they use space proportional to time.Instead, we find a new family of one-dimensional reversible cellular automata that can simulate any two-dimensional reversible cellular automaton with the Margolus neighborhood and suitable boundary conditions (Theorem 5.6).Finding the configuration after  steps for these one-dimensional automata is FP PSPACE -complete (Theorem 5.7).
We consider a family of polynomial-time bijections defined by piecewise linear transformations, and we show that finding their iterated values is FP PSPACE -complete (Theorem 6.6).
However, a natural special case of this problem has a non-obvious polynomial time algorithm, obtained by a transformation into computational topology (Theorem 6.12).
In these problems, as is standard in computational complexity theory, all input values are assumed to be represented as binary sequences, and all time bounds are based on the Turing machine model or on polynomially-equivalent models such as random access machines with logarithmically-bounded word sizes.In particular, our model of computation does not allow unit-cost arithmetic operations on arbitrarily large numbers or exact real-number arithmetic.

Related work
Our research combines research from structural complexity theory, circuit complexity theory, the theory of cellular automata, graph algorithms, and dynamical systems theory, which we survey in more detail in the relevant sections and summarize here.
In structural complexity theory, many early complexity classes such as P, PSPACE, and NP were defined by bounding resources such as space or time in computational models such as deterministic or nondeterministic Turing machines.More recently, it has become common instead to see complexity classes defined by reducibility to certain fundamental problems or classes of problems: for instance, PPA and PPAD are based on searching for specific structures in graphs [44], while ∃R is based on reducibility to problems in the existential theory of the real numbers [47].Similarly, one can redefine NP by reducibility to brute force search algorithms on deterministic machines.Our work takes inspiration from this shift in perspective, and similarly describes a class of problems that are reducible to iteration of bijections.One difference, however, is that while PPA, PPAD, and ∃R are all in some sense "near" NP, the resulting class is more similar to PSPACE.
In the theory of reversible circuits, a central result is the ability of these circuits to simulate non-reversible combinational logic circuits [3].Reversible logic gates or families of gates, such as the Fredkin gate [17] or the Toffoli gate [52], are said to be universal when they can be used for these simulations.This simulation is always possible when additional padding bits are included in the inputs and outputs of the simulated circuits.Without padding, it is not possible for these gates to compute all functions or even all bijective functions.Fredkin gate circuits can only compute functions that preserve Hamming weight [17], and some easily-computed bijections cannot be computed by any reversible logic circuit with gates of bounded complexity (observation 2.1).
Reversible circuits and reversible cellular automata are connected through the simulation of Fredkin-gate circuits by Margolus's billiard-ball block-cellular automaton [36].This provides one route to Turing completeness of reversible cellular automata, for which many more constructions are known [51,40,12,22,28,37].Although Turing completeness is closely related to the completeness properties studied here, these constructions generally involve infinite arrays of cells and in some cases initial conditions with infinite support, in contrast to our focus on space-bounded complexity.Another route to Turing completeness is the simulation of other cellular automata by reversible cellular automata.The billiard-ball model can simulate any other two-dimensional locally reversible cellular automata [11], and Toffoli simulates arbitrary (non-reversible) -dimensional cellular automata by ( + 1)-dimensional reversible cellular automata [51].Our construction of a one-dimensional FP PSPACE -complete (and Turing-complete) reversible cellular automaton that simulates a two-dimensional one stands in sharp contrast to a result of Hertling [21] that (under weak additional assumptions) simulations of cellular automata by reversible ones must increase the dimension, as Toffoli's construction does.Our construction does not meet Hertling's assumptions.
Another well-established connection relates algorithmic problems on implicitly-defined graphs to problems in computational complexity, by considering graphs that describe the state spaces of Turing machines or other computational models.It is standard, for instance, to reinterpret Savitch's theorem relating nondeterministic and deterministic space complexity as providing a quadratic-space algorithm for reachability in implicit directed graphs.Similarly, the Immerman-Szelepcsényi theorem on closure of nondeterministic space classes under complement has an equivalent algorithmic form, a nondeterministic linear-space algorithm for non-reachability in directed graphs [55].The complexity classes PPA and PPAD were formulated in the same way from algorithmic problems on implicit graphs [44].A specific algorithm that has been frequently studied in this light is Thomasen's lollipop algorithm for finding a second Hamiltonian cycle in an (explicit) cubic graph by following a path in a much larger implicit graph defined from the given graph [50].Although some inputs cause this algorithm to take exponential time [6,7,58] the complexity of finding a second Hamiltonian cycle in a different way is unknown, and was one of the motivating problems for the definition of PPA [44].We formulate the same question in a different way, asking how hard it is to find the same cycle that Thomasen's algorithm finds, but again the complexity of this problem remains unknown.
An important precursor of our work is the result of Bennett [2] and of Lange, McKenzie, and Tapp [34] that reversible Turing machines with polynomial space can compute functions complete for FP PSPACE .Citing Bennett, Papadimitriou [44] rephrased this result in terms of implicit graphs: it is complete for FP PSPACE to find the other end of a path component in an implicit graph, given a vertex at one end of the path.Our proof that the iterated functional problems we study are complete for FP PSPACE is based on this result, and on a reduction converting this path problem into an iterated bijection.For related time-space tradeoffs in the power of reversible Turing machines, see also Williams [56].
Our final section concerns the iteration of invertible piecewise linear functions.This topic is well studied in the theory of dynamical systems; previously studied functions of this type include Arnold's cat map [13] and the baker's map [43], both acting on the unit square, and the interval exchange transformations on a one-dimensional interval [29].The focus of past works on these transformations has been on their chaotic dynamics, rather than on the computational complexity of computing their iterates.We also consider perfect shuffle permutations, formulated as piecewise linear functions; their iterates have again been studied, notably to determine their order in the symmetric group [10].

Invertability, reversibility, and reversible logic
We define a bijection of bitstrings to be a function on binary strings of arbitrary length that, on -bit inputs, produces -bit outputs, and is one-to-one for each .We define a polynomialtime bijection to be a bijection of bitstrings computable in polynomial time, and we define a polynomial-time invertible bijection to be a polynomial-time bijection whose inverse function is also a polynomial-time bijection.We define a reversible logic gate to be a Boolean logic gate with equally many input and output bits that computes a bijection from inputs to outputs; the number of inputs and outputs is its arity.Finally, we define a polynomial-time reversible function to be a bijection of bitstrings that, for each , can be computed by a circuit of reversible logic gates of fixed arity that can be constructed from the argument  in time polynomial in .Any polynomial-time reversible function or its inverse can be computed in polynomial time by constructing and simulating its circuit, so the polynomial-time reversible functions are a subset of the polynomial-time invertible bijections.
As in classical logic, certain reversible gates have been identified as universal.These include the three-input Fredkin gate in which one control input is passed through unchanged but determines whether to swap the other two inputs [17], and the Toffoli gate in which the conjunction of two control inputs determines whether to negate a third input [52].Universality, in this context, has sometimes been incorrectly stated as meaning that all bijections can be implemented with these gates.This is impossible for any finite set of gates:

■
We do not expect permutation parity to be the only obstacle to the existence of reversible circuits.For instance, modifying a function by adding an extra input that is passed unchanged to the output and otherwise does not affect the result (unlike a padding bit, which must be zero for correct output) can change the permutation parity from odd to even, but appears unlikely to affect the existence of a circuit, although we do not prove this.Nevertheless, every logic circuit (even an irreversible one) can be simulated by a reversible circuit of approximately the same size with more inputs and outputs (equally many total inputs and outputs).The added "dummy" inputs must all be set to zero in the simulation, and the added "garbage" outputs produce irrelevant values, discarded in the simulated output.Each gate of the simulated circuit can be transformed into (1) reversible gates that use up (1) dummy inputs and produce (1) garbage outputs [3].A stronger version of this simulation, for polynomial-time invertible bijections, uses the same zero-input padding but produces zeros for the output bits instead of garbage, allowing the resulting functions to be iterated.This is, essentially, a result of Jacopini, Mentrasti, and Sontacchi [25], but we include the proof because it is central to our later results, because we use similar techniques in other proofs, and because Jacopini, Mentrasti, and Sontacchi phrased it in terms of reversible Turing machines rather than reversible logic circuits.Define pad(, ) to be the result of padding a binary string  by prepending  zero-bits.
That is, on strings consisting of () zeros followed by an -bit string ,  behaves like the evaluation of  on the final  bits, leaving the zeros unchanged.
P R O O F .Construct a circuit on padded inputs that performs the following computations: 1. Replace  padding bits with their bitwise exclusive or with input , transforming a padded input of the form 0, 0,  (where the first 0 denotes the remaining unused padding bits, the second 0 denotes the padding bits replaced in this computation, and the comma denotes concatenation) into 0, , , the remaining unused padding bits together with two copies of .
2. Expand the polynomial-time computation of  , on -bit inputs, into a classical logic circuit, and use the simulation of classical logic by reversible logic to compute  on one of the copies of , replacing more padding bits by garbage bits.After this step, the input has been transformed into the form ,  (), , where  is the garbage produced by the simulation.
3. Use additional bitwise exclusive ors to transform the input to the form ,  (),  () ⊕ .

Reverse the circuit of
Step 2 to transform the input to the form 0, ,  () ⊕ .
6. Use the simulation of classical by reversible logic to compute  −1 on input  (), transforming the input to the form , ,  () ⊕  where  is the garbage produced by the simulation.
7. Use additional bitwise exclusive ors to transform the input to the form , ,  ().
8. Reverse the circuit of Step 6 to transform the input to the form 0,  (),  ().
9. Use additional bitwise exclusive ors to transform the input to the form 0, 0,  ().■ This proof produces circuits that combine Fredkin or Toffoli gates with additional twoinput two-output exclusive or gates (also called controlled not gates).If a circuit using only one type of gate is desired, then these controlled not gates can be simulated by Fredkin gates or Toffoli gates by using additional (reusable) dummy bits that, like the padding bits, can be passed unchanged from the input to the output of the resulting circuit.We omit the details.

Complexity classes and their equivalences
We will consider two different types of reduction in our definitions of completeness: A polynomial-time functional Turing reduction (Turing reduction, for short) from one functional problem  to another functional problem  can be described as a polynomial-time oracle Turing machine for problem , using an oracle for problem  .That is, it is an algorithm for computing the function  that is allowed to make subroutine calls to an algorithm for function , and that takes polynomial time outside of those calls.
A polynomial-time functional many-one reduction (many-one reduction, for short) consists of two polynomial-time algorithms  1 and  2 , such that  =  1 • • 2 .That is, we can compute  by translating its input in polynomial time into an input for function , computing a single value of , and then translating the computed value of  in polynomial time into the value of  .Equivalently, this can be thought of as a Turing reduction that is limited to a single oracle call.
Additionally, as we have already seen, we have three choices of which type of bijection to use in the iteration.This naturally gives rise to the nine variant complexity classes defined below.
However, we will later see that six of these are actually the same as each other (and all the same as the known complexity class FP PSPACE ): as long as the definition of complexity class includes one of these two types of reduction, the choice of reduction type and bijection type does not matter.

D E F I N I T I O N 3 .1.
Define the nine complexity classes IB ,  , for  ∈ {T, M, −} and  ∈ {b, i, r} (short for "iterated bijection"), as follows.
The complexity classes IB −,b , IB −,i , and IB −,r denote the classes of problems for which the input is a pair (, ) and the output is  () (), where the -times iterated function  is respectively a polynomial-time bijection, a polynomial-time invertible bijection, or a polynomial-time reversible function.
The complexity classes IB T,  denote the classes of problems having a polynomial-time functional Turing reduction to a problem in IB −,  , for each  in {b, i, r}.
The complexity classes IB M,  denote the classes of problems having a polynomial-time functional many-one reduction to a problem in IB −,  , for each  in {b, i, r}.
Padding an input by zeros and unpadding the output in the same way is a many-one reduction, and every many-one reduction is also a Turing reduction.For both kinds of reduction, the composition of two reductions is another reduction.Therefore, the following is an immediate consequence of Lemma 2.2, according to which every polynomial-time invertible bijection can be padded to an equivalent polynomial-time reversible function.(or, respectively,  ∈ IB M,i ), we construct a different polynomial-time bijection ℎ whose iteration will simulate the behavior of algorithm   .In preparation for doing so, we expand   into a (conventional logic) circuit of polynomial size, consisting of the standard Boolean logic gates together with a special many-input many-output gate that takes as input  and  and produces as output  () (), implementing the oracle calls of algorithm   .We will simulate this circuit gate-by-gate, in a topological ordering of its gates, by a function ℎ that operates on triples ( 1 ,  2 , ), where: The value  1 (the "big hand of the clock") will indicate the progression of the simulation through the gates of the expanded circuit for algorithm   .
The value  2 (the "little hand of the clock") will indicate the progression of the simulation through an iteration of function , within a single oracle gate of the circuit.
The value  will indicate the Boolean values on all wires of the circuit, with zeros for wires whose value has not yet been determined by the simulation.
Initially, these values will all be zero, except for the values in  that describe input wires of the simulated circuit; the part of the many-one reduction that determines the initial value of the iterated function can easily calculate what these input wire values should be.
The function ℎ that performs a step of the simulation will always increase  2 by one modulo a suitable value , and if the result is zero it will increase  1 by one modulo a suitable value .
These moduli are chosen so that  is larger than the largest possible argument  in an oracle call to  () (), and so that  is larger than the number of gates in the simulated circuit.Because these increments are performed in modular arithmetic, they are bijective and invertible.We will iterate ℎ for  iterations, so that the big hand will increase for at least as many steps as the number of gates to be simulated.Each iteration of ℎ will also perform additional invertible functions, depending on  1 and  2 : If  1 is the position of standard logic gate  in the topological ordering of the circuit for algorithm   , and  2 = 0, then let  be the correct output of , let   be the wire where that output should go, and let ℎ replace   by its exclusive or with .This operation is bijective and invertible (it is its own reverse).
If  1 is the position of a standard logic gate , and  2 ≠ 0, then ℎ does nothing beyond incrementing its counters.
If  1 is the position in the topological ordering of an oracle gate with input ,  and output , computing  =  () (), and  2 = 0, then ℎ uses bitwise exclusive ors (as in the proof of Lemma 2.2) to copy  onto .
If  1 is the position in the topological ordering of an oracle gate with input ,  and output , and 0 <  2 ≤ , then ℎ replaces  by  ().If  is bijective, this operation is bijective, and if  is invertible, this step is invertible.
If  1 is the position in the topological ordering of an oracle gate with input ,  and output , and  2 > , then ℎ does nothing beyond incrementing its counters.
Finally, the part of the many-one reduction that maps the output of ℎ () to the value of  does so simply by copying the output bits of the simulated circuit.

■
The next observation reduces the computation of a polynomial time bijection (for which we do not necessarily have a polynomial-time inverse) to the iteration of a different polynomialtime invertible bijection (for which we do have the inverse).We include it here to introduce a counting trick in its proof, which we will use in a more complicated way in what follows.
To do so, define f ( ) to be  ( ), if  = , and 0, otherwise, so that the sum is over the values of f .
Create an invertible function  that operates on pairs , , and maps (, ) ↦ →  + 1 mod ,  + f () , invertible via the map (, ) ↦ →  − 1 mod ,  − f ( − 1) , where  is the number of possible inputs to function  .Then we can simply evaluate  () as the -component of  () (0, 0).■ To simulate the iteration of a polynomial-time bijection using invertible steps, we combine the trivial-summation idea of observation 3.4, the alternating forward and backward steps of Lemma 2.2, and the big-hand little-hand timing idea of Lemma 3.3, as follows.
Let  be a polynomial-time bijection for which we wish to compute  () (), the form taken by all problems in IB −,b .We must show that  () () can be computed in IB M,i , by iterating a polynomial-time invertible bijection.To do so, we define an invertible bijection  on 5-tuples ( 1 ,  2 , , , ) where  1 and  2 are the big hand and little hand of the big-hand little-hand timing technique,  is an adequate supply of polynomially many padding bits (zero before and after each iteration),  is the current iterated value (initially the starting value ), and  and  are equally-long values used within the iteration.If the inputs and outputs to  have  bits, we will choose the lengths of the values in these 5-tuples to all be monotonic and easily-computed functions of , so that the computation of  can determine  and decode the 5-tuple to its components in polynomial-time; we omit the details of this decoding process.For inputs to  whose length is not of the correct form to be decoded into a 5-tuple in this way, we define  to be the identity function.
Otherwise, as in Lemma 3.3, we define  so that in each iteration it increments  2 modulo some sufficiently large number  and, if the resulting value of  2 is zero, it also increments  1 modulo some sufficiently large number  > .Each increase of  1 will correspond to one more iteration of the function  , so that  () () may be obtained by iterating  exactly  times, starting from (0, 0, , 0, 0), and examining the  component of the resulting tuple.If  has  bits, is chosen to be greater than 2  + 2. The effect of  on the , , and  components of the 5-tuple are determined by the value of  2 : If  2 = 0, function  sets  =  ⊕  ().P R O O F .This is the definitional problem for IB −,r , and its hardness for IB follows from the composition of reductions with problems in that class.We must also show that this problem itself belongs to IB, but this is easy: it is the problem of computing the th iterate of an invertible polynomial-time function  that takes as input a specification of a reversible circuit and an assignment to its input wires, and that produces as output the unchanged specification of the circuit and the assignment to its output wires obtained by simulating the circuit.The inverse of this function can be obtained by applying it to the reversal of the specified circuit.

Implicit linear forests
An implicit graph is a graph whose vertices are represented as binary strings of a given length, and whose edges are determined by a computational process (an oracle or subroutine for computing neighbors of each vertex) rather than being listed explicitly in an adjacency list or other data structure.These typically represent state spaces of computations or of combinatorial structures, and the use of implicit graphs is common in complexity theory.Savitch's theorem, for instance, can be interpreted as defining an algorithm for finding a path between two selected vertices in an implicit directed graph, in low deterministic space complexity [55].
The problems we have already considered can easily be reformulated in the language of implicit graphs: a bijection can be thought of as a directed graph with the values on which it operates as vertices, and with in-degree and out-degree both exactly one at each vertex.The iteration problem we have been considering, rephrased in this language, asks for the vertex that one would reach by following a path of length  in this graph.However, this is somewhat artificial as a graph problem.Instead, we consider undirected implicit graphs in which every connected component is a path, known as linear forests, or more generally implicit graphs with maximum degree two.Given a leaf vertex (a vertex of degree one) in such a graph, how easy is it to find the other leaf of the same path?As we show in this section, this provides an alternative equivalent formulation of the class IB that is based on graph search rather than on bijective functional iteration.

Thomason's lollipop algorithm
Before proving the equivalence of this formulation, we briefly discuss a prototypical example of a problem of this type, Thomason's lollipop algorithm for a second Hamiltonian cycle.In a 3-regular undirected graph, the number of Hamiltonian cycles through any fixed edge is even [53].A proof of this fact by Thomason [50] constructs a state space, or implicit graph, as follows (Figure 1): The states of the state space are Hamiltonian paths starting at a fixed endpoint of a fixed edge.The initial Hamiltonian cycle can be transformed into one of these states by choosing arbitrarily one of its vertices and edges as the fixed vertex and edge of the state space, and removing the other Hamiltonian cycle edge that is incident to the chosen vertex.
Each state can transition to at most two other states, by adding one more edge to the far end of the Hamiltonian path from the fixed edge.The number of choices for this added edge is exactly two, because the given graph has degree three and one of the edges at the end vertex is already used as part of the path.If this edge is incident to the fixed vertex, adding it produces a Hamiltonian cycle; otherwise, adding it produces a "lollipop" or spanning subgraph in the form of a cycle with a dangling path.When it produces a lollipop, we can break the cycle at the other edge incident to the dangling path, and produce a new state.
Therefore, the states that can form Hamiltonian cycles by the addition of an edge have exactly one neighbor, while the other states have exactly two neighbors.
The evenness of the number of Hamiltonian cycles through a fixed edge follows immediately from this construction: After fixing an orientation for the fixed edge, each Hamiltonian cycle corresponds to a degree-one state in this state space, which can only belong to a path of states.Every path has exactly two degree-one states, so the paths in the state space group the Hamiltonian cycles into pairs [50].
The same argument also provides an algorithm for finding a second Hamiltonian cycle, given as input a single Hamiltonian cycle in an (explicitly represented) graph.One simply chooses arbitrarily an edge of this cycle to be a fixed edge and the orientation of this chosen edge, constructs the state space as above, and walks along the path in the state space from the initial state to another degree-one state, which must come from a different Hamiltonian cycle [50].

Equivalence to iterated bijection
We will formalize a class of computational problems like the one solved by Thomason's algorithm, rather than a single problem, in order to make the neighbor-finding subroutines by which we define an implicit graph be part of the problem definition rather than part of the input.However, we also need input data, used by those subroutines to specify the implicit graph.For instance, in the problem formalizing the input-output behavior of Thomason's lollipop algorithm, the definition of the problem includes the fact that its state space consists of Hamiltonian paths with fixed starts in a cubic graph, rather than being some other kind of implicit graph.However, the specific cubic graph containing these paths is input data rather than being part of the problem specification.Thus, we make the following definitions.

■
Because of this equivalence, from now on we will generally refer to this class by its conventional name, FP PSPACE , instead of the nonce name IB.
Although producing the same output as Thomason's lollipop algorithm belongs to FP PSPACE , by Theorem 4.2, we do not know whether it is FP PSPACE -complete, just as we do not know whether finding an arbitrary second Hamiltonian cycle in a cubic graph is PPA-complete.

Reversible cellular automata
A cellular automaton has a finite set of states, and a periodic system of cells.For us, these cells will form one-dimensional or two-dimensional arrays; although it is common to treat these arrays as infinite, we will form finite computational problems by using arrays of varying size with periodic boundary conditions.A configuration of the automaton assigns a state to each cell.
The automaton is updated by simultaneously computing for each cell a new state, determined in a translation-invariant way as a function of the states of a constant number of neighboring cells.
The resulting cellular automaton is reversible if the transformation from one configuration to the next is a bijection.When a cellular automaton is reversible, its inverse transformation can also be described by a reversible cellular automaton [20,46].A periodic array of reversible logic gates would define a reversible automaton whose reverse dynamics uses the same neighborhood structure, but other reversible cellular automata can have reverse neighborhoods that are much larger than the forward ones [26].Every one-dimensional or two-dimensional reversible cellular automaton can be defined by a rule with locally reversible steps, as would be obtained by an array of reversible gates, but for higher dimensions this remains unknown [27].Just as irreversible circuits can be simulated by reversible ones, irreversible cellular automata can be simulated by reversible ones at the cost of an increase in dimension [51] or of the simulation becoming asynchronous [40].
For a fixed reversible cellular automaton rule, a simulation should take as input an initial configuration  and a number of steps , and produce as output the configuration of the automaton after  steps.We do not require this simulation to be performed by directly calculating the transformations from each configuration to the next; for instance, for the (non-reversible) Conway's Game of Life automaton, hashing techniques have been successful at running simulations using computation time substantially sublinear in the number of simulated steps [18].
What is the complexity of simulating reversible cellular automata?

Billiard-ball model
We  By observation 3.8, simulating the behavior of a reversible logic circuit, with its outputs fed back into its inputs, for a given number of steps, is complete for FP PSPACE .We outline a many-one reduction from this problem to the simulation of BBM patterns, using previously-described ways of simulating circuit components in BBM.The reduction lays out the given circuit as a BBM pattern, including wire-bending and delay circuits that feed the output signals from the circuit back into the inputs, delayed so that the outputs all return to the inputs in synchrony.
We use known methods for laying out circuits or other bounded-degree planar graphs onto grid graphs of polynomial area [49], with the layout oriented diagonally with respect to the BBM cell grid, in accordance with the diagonal movement of the circuit signals.The circuit requires only a bounding box of size polynomial in the circuit size, requires only a polynomial amount of delay for re-synchronization (and a corresponding amount of area for the delay circuits), and performs all its simulations of the given reversible logic gates within a polynomial number of steps.
Hundreds of published NP-completeness proofs already follow this same approach of using orthogonal layouts of circuits (often, of circuits for 3-satisfiability problems) in their reductions; for a typical example, see [38].The use of delay gadgets to correctly synchronize or desynchronize signals within the billiard-ball model is also standard [11].Therefore, we omit the details of these constructions.
The output of the given circuit after a given number of iterations can be obtained by simulating the translated BBM pattern, with input signals set to match the inputs to the given circuit, for a number of steps equal to the product of the number of iterations for the circuit and the time for an input signal to return to the same point in the BBM pattern.

Other known universal reversible cellular automata
The completeness of simulating other universal reversible cellular automaton rules would need to be considered case-by-case, depending on how the universality of those other rules has been proved.For instance, Toffoli [51] transforms arbitrary non-reversible cellular automata of dimension  into reversible automata of dimension  +1 by making the higher-dimensional automaton construct the entire time-space diagram of the lower-dimensional automaton.However, this also has the effect of increasing the space (number of cells) required for the higher-dimensional automaton to accurately perform this simulation, to be proportional to the product of space and number of simulated steps of the lower-dimensional automaton.Because the space bound for Toffoli's method is not polynomial in the space of the simulated automaton, this method cannot be formulated as a polynomial-time many-one or Turing reduction from one problem to another, and cannot be used for proving FP PSPACE -completeness.Similarly, Morita [41] has shown how to simulate cyclic tag systems by a finite pattern in a universal one-dimensional reversible cellular automaton, but the correct behavior of this automaton requires a number of cells proportional to the number of steps of the automaton, so that garbage states from the automaton do not wrap around into the part of the automaton used for describing the rules of the tag system.Again, this need for a number of cells that depends in some way on the time complexity of the simulated computation prevents this method from being used to prove

Dimension reduction
The FP PSPACE -completeness of the two-dimensional BBM automaton, and our failure to translate the existing universality proofs of one-dimensional reversible cellular automata into FP PSPACEcompleteness, raise a natural question: can simulating a one-dimensional reversible cellular automaton be FP PSPACE -complete?We answer this question affirmatively, by providing a onedimensional simulation of any two-dimensional Margolus-neighborhood reversible cellular automaton, using the following ingredients: Tracks.It will be convenient to think of each cell of a one-dimensional cellular automaton as being composed of multiple tracks, each containing a finite state, with possibly different sets of states for different tracks.It is possible for each track to have an update rule that is independent of the states in other tracks, with the value of a single track in a cell computed as a combination of the values in the same track of neighboring cells.Figure 3 shows an example with two tracks, in which the update rule for the top track copies the left neighbor while the update rule for the bottom track copies the right neighbor, causing the states of the tracks to move relative to each other while remaining otherwise unchanged.(This example is from a family of reversible automata described by Boykett [5] as having an update rule that acts on the values of whole cells by combining the left and right neighbors using an algebraic structure called a rectangular band.)Alternatively, the state of one track can control the update rule of another track.As long as these controlled update rules remain individually reversible, the whole automaton will again be reversible, with a reverse dynamics that computes the predecessor value of the controlling track and then uses it to control the update rule of the other track.
The number of states of the whole cell is then the product of the numbers of states within each track.These numbers grow quickly, so the automata resulting from multi-track constructions will in general have many states.However, if we have a fixed number of tracks with a fixed number of states in each track, the total number of states remains finite, as is required for a cellular automaton.

Partitioning automata.
A general construction for one-dimensional reversible automata of Imai and Morita [23] can be thought of as having three tracks per cell: a left track, center track, and right track.The left and right tracks must have equal sets of available states; the states of the center track can differ.The update rule for the automaton performs two operations (as a single automaton step): Swap the value in the right track of each cell with the value in the left track of its right neighbor.
Apply a bijective transformation to the state of each cell (the combination of the states of all three of its tracks), independently of the states of its neighbors.
All one-dimensional cellular automata defined in this way are automatically reversible.The reverse dynamics can be described similarly, as applying the inverse bijective transformation and then swapping values in the same way.Although this is also a cellular automaton, it is not a partitioning automaton of the same type, because the bijection and the swap are performed in a different order.
Firing squad synchronization.We use a reversible solution to the firing squad synchronization problem found by Imai and Morita [23].

L E M M A 5 . 2 (Imai and Morita [23]
).There is a one-dimensional reversible cellular automaton with the following behavior.First, its states can be partitioned into three sets: quiescent, active, and firing.Second, a cell that is quiescent remains quiescent throughout the evolution of the automaton; therefore, the behavior of any pattern can be described purely by considering its contiguous subsequences of non-quiescent cells.Third, for every  there exists a pattern   , consisting of  cells in active states, bounded on both sides by quiescent cells, with the following property: for all 0 ≤  < 3, the pattern resulting from   after  steps consists only of active states, but the pattern resulting from   after exactly 3 steps consists only of firing states.
Some additional detail on how this firing squad computation works will be important.It is a partitioning automaton, where the center track holds the state of each cell: quiescent, active, and firing, with additional information about several more specific types of active cells.On quiescent cells, the bijective transformation of the partitioning automaton is the identity.The Strobed synchronization.We will need to use update rules that, every th step for a variable numerical parameter , perform a different step than the usual computation in the other steps.This can be thought of by analogy to a strobe light, which provides brief flashes of one condition (bright light) interspersed with longer periods of a different condition (darkness).This is not something that can be directly defined into the behavior of a cellular automaton, because directly storing the number of the current step modulo  would use a number of bits of information that is logarithmic in , rather than being encodable into a finite state.Instead, we will simulate this behavior by using tracks that perform the firing squad synchronization computation of Imai and Morita, repeating spatially.We say that a state of one-dimensional cellular automaton is spatially repeating with pattern  and period  if  is a sequence of automaton states of length  and the state is formed by concatenating an infinite sequence of copies of .(Such a state will also be repeating for any period that is a multiple of .)The states of such an automaton continue repeating with the same period for all subsequent time steps.
They have the same behavior as an automaton run with the same rules on a finite cycle of cells of length  containing a single copy of .(Connecting the start and end of  in this way to form a cycle of cells is commonly referred to as using periodic boundary conditions.)Consider an arrangement of these four-cell squares into an infinite horizontal strip.We define helical boundary conditions for this strip, with circumference , by adding vertical connections from each of these four-cell squares upwards to the square offset from it by  units leftward along the strip, and downward to the square offset from it by  units rightward, as depicted in  4. In order to get the squares to line up, we require that  be even.This creates a pattern of cell connectivity that locally (within regions of width less than ) is indistinguishable from the infinite square grid, although globally it has the topology of a cylinder, not the same as a grid.If we map a system of cells, connected in this way, onto a two-track one-dimensional automaton, in which each cell of the one-dimensional automaton holds two cells of the Margolus neighborhood, it will be easy for the one-dimensional automaton to simulate the updates in the Margolus neighborhood that use four-cell squares in odd-numbered steps, aligned with the given strip.However, the updates in even-numbered steps combine information from cells  units apart from each other in the strip, and it is not obvious how to perform those updates using a one-dimensional automaton whose neighborhood size does not depend on .Our eventual solution to this problem will use strobing synchronization to permute the cell states into a position where interacting cells are again adjacent within the one-dimensional strip.
We define toroidal boundary conditions of circumference  and period  (requiring  to be even and greater than ) by using both periodic boundary conditions of period  for the one-dimensional strip of four-cell squares, and helical boundary conditions to define vertical neighbors of each square.The resulting system of cells is again locally (within regions of width less than  and height less than /) indistinguishable from the infinite square grid, although globally it has the topology of a torus.
With the pieces we need all defined, we are now ready to describe our one-dimensional simulation of two-dimensional Margolus-neighborhood reversible automata.
T H E O R E M 5 .6. Every reversible cellular automaton with the two-dimensional Margolus neighborhood and with  states per cell, running on a system of cells with helical boundary conditions with any even circumference , can be simulated by a one-dimensional cellular automaton with ( 2 ) states per cell, with a system of states and an update rule that does not depend on , and with /2 + 1 steps of the one-dimensional automaton for every simulated step of the two-dimensional automaton.The simulated automaton can be made to have toroidal boundary conditions for any period larger than the circumference, giving the one-dimensional automaton periodic boundary conditions with the same period.
We simulate the two-dimensional automaton using a multi-track one-dimensional automaton two of whose tracks represent the upper and lower rows of cells in the helical boundary conditions, and we use strobing synchronization with strobe period , using more tracks.We use one more track to store a single bit of information for each onedimensional cell, indicating whether its first two tracks should be combined with the cell to the left or with the cell to the right to form the four-cell squares of the Margolus neighborhood; we set the initial state of these bits in strict alternation between consecutive cells.
As in Lemma 5.5, all states of the strobing track will be top-lit in one step, followed by  − 1 steps in which they are not top-lit, in a temporally-repeating pattern.When a cell is top-lit, we perform the update in the Margolus neighborhood given by the reversible dynamics of the given two-dimensional automaton, with the following small modification: we swap the resulting cell values between the top and bottom tracks.As a result of this step, the cell values of the two-dimensional automaton are all computed correctly, but are placed in cells that are not adjacent to their neighbors in the next update.The cell values that are now in the top track of the one-dimensional simulation need to be paired with values that are now in the bottom track but are  units farther to the right.To fix this incorrect placement, in each of the  − 1 subsequent steps of the one-dimensional automaton, we slide the top track rightward one step and the bottom track leftward one step, according to the rectangular band dynamics depicted in Figure 3.After these sliding movements of all the cell states, they will once again be placed in a position where the one-dimensional automaton can perform a Margolus-neighborhood update.
Between one top-lit step and the next, the values that were in two vertically-adjacent squares (according to the adjacency pattern of the helical boundary conditions, although far from each other in the one-dimensional simulation) are shifted halfway around the helix in opposite directions, landing on different tracks of a single one-dimensional cell.In this same span of steps, we need to update the bit of information on the final track indicating whether each cell should look left or right to form a Margolus neighborhood in the next step.This update should be done in a way that matches the correct alignment of these squares, halfway around the helix, in alternating steps of the two-dimensional automaton.The required update to the final track depends on the parity of the number of Margolus-neighborhood squares in a single cycle around the helix, /2 =  − 1.When there are evenly many squares in this cycle (true when  is odd), the final-track states should all be flipped from one light step to the next; otherwise, when  is even, they should all be left unchanged.This can most easily be accomplished by flipping these states at every step, regardless of the state of the strobing track.

■
It seems likely that the divisibility condition on the circumference of the helical boundary conditions in Theorem 5.6 can be relaxed using a more general strobing synchronization mechanism, but we do not need this added generality for the following result: There is a one-dimensional reversible cellular automaton for which simulating any given number of iterations, with periodic boundary conditions, is complete for FP PSPACE .
P R O O F .Apply Theorem 5.6 to produce a one-dimensional simulation of the BBM two-dimensional automaton, for a hard instance of BBM generated by Theorem 5.1, and for toroidal boundary conditions with circumference and period both large enough to make no difference to the dynamics of BBM within the bounding box of live cells of the instance.

■
The number of states in the automaton resulting from this construction is 2 3 • 90 2 = 64800, a finite but large number.(The two strobing tracks have 90 states rather than the 99 states of Imai and Morita because they do not use quiescent states.)It would be of interest to find a FP PSPACE -complete one-dimensional reversible cellular automaton with significantly fewer states.

Piecewise linear bijections
The study of iterated behavior of piecewise linear maps and of bijective maps are both central to the theory of dynamical systems.Well-known mappings in this area that combine both characteristics include Arnold's cat map (, ) ↦ → (2 + ,  + ) mod 1 and the baker's map both on the unit square [43,13].
A prominent family of one-dimensional systems are the interval exchange transformations [29].These are piecewise linear bijections that partition a half-open interval into subintervals, permute the subintervals, and translate each subinterval into its permuted position (Figure 5).The computational complexity of iterated interval exchange transformations, and  their application in modeling light reflections within mirrored polygons, was an initial motivation for this paper.Even the most simple nontrivial interval exchange, the transformation  ↦ → ( + ) mod 1, has interesting iterated behavior, including Steinhaus's three-gap theorem according to which there are at most three distinct intervals between consecutive values in the sorted sequence of the first  iterates [48].
For the purposes of computational complexity it is more convenient to consider mappings that act on discrete sets rather than on continuous spaces like the entire unit square.In this light, it is common, for instance to study the effect of Arnold's cat map on grid points, such as the positions of a discrete set of pixels [13]; indeed, its name comes from an example given by Arnold of a picture of a cat being transformed in this way.It is important to note, however, that restricting the domain of a function in this way can change whether it is bijective.Arnold's cat map is bijective on square grids, for instance, but the baker's map is not: each step halves the vertical separation of the grid.
As an example in the other direction, of a piecewise linear transformation that is bijective on integers but not on continuous intervals, consider the familiar perfect riffle shuffle of a deck of  cards [10], which (if the cards are represented by integers in the range from 0 to  − 1) can be expressed as the piecewise linear transformation and  is even.
See Figure 6 for an example with  = 13.We will use these shuffle transformations as components in a hardness proof for a problem of computing iterated piecewise linear bijections.To test whether an input of this form describes a piecewise linear bijection, sort the endpoints of the specified pieces to check that they form disjoint intervals whose union is a single interval, check that each of the specified transformations maps each piece into this union, and check that no two specified transformations have intersecting images.

The piecewise linear bijection problem
The images of any two linear pieces of the given input lie in two arithmetic progressions, and testing whether they intersect can be done using greatest common divisors to form the intersection of these progressions [42].In more detail, suppose that the image of one piece lies within an interval of the progression of values that are  mod , and that the image of a second piece lies within an interval of a progression of values that are  mod .Let  = gcd(, ); then the intersection of the two progressions is either empty (if  ≠  mod ) or has period / (otherwise).When it is non-empty, the extended Euclidean algorithm can be used to find numbers  ′ and  ′ with  ′ +  ′ = , and the intersection of the progressions consists of the values congruent to ( ′ +  ′ )/ mod /.To test whether the two images intersect, we need only compute the coefficients of this progression and test whether it has any values in the interval between the upper and lower ends of both images.
To implement a single iteration of the bijection, on input , find the piece containing  and apply its transformation.To implement the inverse of the bijection, find the piece whose image contains  and apply the inverse of its linear transformation.■ Although the inverse of a piecewise linear bijection is linear on each image of a piece, this may not describe the inverse as a piecewise linear bijection, because the images might not be intervals.Describing the inverse of a piecewise linear bijection as another piecewise linear bijection could produce significantly more pieces.To allow for more general descriptions of transformations, it will be convenient for us to consider compositions of piecewise linear bijections, repeated in a fixed sequence.As the following lemma shows, this can be done by combining the sequence into a single piecewise linear bijection on a larger range, without a significant increase in complexity.For the final iteration   , the image of this transformation should be taken modulo , so that it wraps around to [0, ).The transformation  defined by combining these pieces of transformations has the property that, when it is iterated  times on a starting value  in the range [0, ), the th iteration maps  into the range [, ( + 1)), and that the behavior of this iteration modulo  is the same as   .

Permuting the bits of a binary number
Binary rotation or circular shift operates on numbers in the range [0, 2  ) as follows.Represent any number as a binary string, with the most significant bit on the left and least significant bit on the right.A left circular shift by  units, for 1 ≤  < , moves each bit value into the position  steps to the left, with the most significant  bits wrapping around into the least significant

■
This is just the riffle shuffle example described earlier, in the case where the number of values being shuffled is a power of two.By applying Lemma 6.3 we can compose multiple one-unit shifts to obtain circular shifts of larger numbers of units.We can also compose these shifts in more complex ways to obtain other bit permutations: L E M M A 6 .5. Let  be a subset of [0, ), interpreted as bit positions in the -bit binary values, with 0 as the least significant (rightmost) position, and  − 1 as the most significant (leftmost) position.Then there exists a function  on -bit binary values that permutes the bits so that the positions in  are moved to the || most significant bits, such that both  and  −1 can be expressed as compositions of (||) piecewise linear bijections with (1) pieces each.
P R O O F .We express  as a composition of piecewise linear bijections using induction on ||.
As base cases, if || = 0, we may let  =  −1 be the identity function, expressed as a piecewise linear bijection with one piece.If || = 1, we let  be the composition of a sufficient number of two-piece left-rotations (observation 6.4) to place the single element of  into the most significant position; in this case,  −1 is just the composition of a complementary number of left-rotations, modulo .Otherwise, We construct the function  as a composition of piecewise linear bijections as follows: Remove a single element from , producing the smaller set  ′ .By induction, perform a composition  ′ of piecewise linear bijections that places  ′ into the most significant positions of the resulting permuted bit sequence.Let  be the position in which these bijections leave the remaining element that was removed from  to produce  ′ .
Perform  −1− left circular shifts using two-piece piecewise linear bijections, as described in observation 6.4.As a result, in the value resulting from these shifts, the bit that started out in position  will be in the most significant position.The subinterval [0, 2 −1 ) will contain the inputs for which this most significant bit is zero, and the subinterval [2 −1 , 2  ) will contain the inputs for which it is one.The remaining bits of  will form a contiguous block elsewhere in the bit sequence; let  be the number of positions separating the most significant bit from this contiguous block.
Perform  left circular shifts of the low-order  − 1 bits.Each of these circular shifts can be performed as a four-piece piecewise linear bijection, obtained by applying observation 6.4 separately to the two subintervals [0, 2 −1 ) and [2 −1 , 2  ).
To construct the inverse  −1 of the function  that we constructed above, we simply reverse these steps: If the circuit has  gates, each operating on (1) bits, then the composition of the sequences described above for each gate gives us an overall sequence of  = () piecewise linear transformations, with () pieces, that implements the same function as the given circuit.
By Lemma 6.3 we can find an equivalent single piecewise linear transformation  , with the same number of pieces, operating on ( 2 )-bit values, such that  () (), for  ∈ [0, 2  ), performs a single iteration of the given circuit.Therefore, for any , we can apply the circuit  times to input value , by solving the piecewise linear bijection problem of computing  () ().
The construction of  and  from the circuit are polynomial-time transformations, so this gives a many-one reduction from finding iterated values of reversible logic circuits to the piecewise linear bijection problem.■

Integer interval exchange transformations
We conclude this section with a special case of the piecewise linear bijection problem that has a non-obvious polynomial time algorithm.We define the iterated integer interval exchange transformation problem to be the special case of the piecewise linear bijection problem in which each of the linear bijections of the given bijection has multiplier 1.In these piecewise linear bijections, each piece is just a translation, and the whole bijection is an interval exchange Traversing the normal curve upwards from its central horizontal line, through the glued edges from top to bottom, and continuing upwards back to the same central line, permutes the branches of the curve according to the integer interval exchange transformation that maps [0, 3] ↦ → [11,14], [4,5] ↦ → [0, 1], 6 ↦ → 10, and [7,14] ↦ → [2,9].Conversely, suppose we are given any system of normal coordinates that obey the triangle inequality and sum to an even number in each triangle.Then because the sum   +   +   is even, by assumption,   +   −   is even, as it differs from   +   +   by the even number 2  .Additionally,   +   −   is non-negative, by the triangle inequality.It follows that if we then , , and  are non-negative integers.We can construct a normal curve having this system of normal coordinates by placing   crossing points on each edge .Then, within each triangle   with , , and  calculated as above, we draw  line segments connecting the crossing points on edges  and  that are the  nearest crossings to the shared vertex of  and .Symmetrically, we draw  line segments connecting the crossing points on edges  and  that are the  nearest crossings to the shared vertex of  and , and we draw  line segments connecting the crossing points on edges  and  that are the  nearest crossings to the shared vertex of  and .The resulting system of line segments within each triangle link up to form a normal curve, whose normal coordinates as calculated above are exactly the numbers we are given.
Any two normal curves with the same coordinates necessarily have the same number of crossing points on each edge of the triangulation (given by the normal coordinates) and the same pattern of segments of the curve within each triangle (as described above).They can be mapped to each other by a homeomorphism of the surface that fixes the vertices of the triangulation, maps each edge to itself in a way that that takes the crossing points of one normal curve to the crossing points of the other normal curve, and then maps the interior of each triangle to itself in a way that deforms one system of segments from one normal curve into the corresponding system of segments from the other normal curve.

■
In order to analyze the computational complexity of algorithms on normal curves we also need the following.
O B S E R VAT I O N 6 .1 0. The number of bits needed to specify a triangulated surface with  triangles, and a normal curve on that surface with  segments, is ( log( + )).

P R O O F .
The surface can be specified by numbering and orienting the triangles and, for each triangle, specifying its three neighboring triangles.This specification uses 3⌈log 2 ⌉ bits per triangle.Additionally, each normal coordinate is at most  and specifying it takes at most 1 + log 2  bits for each of the 3/2 edges.■ We need to specify, not only a curve on a surface, but a crossing point of the curve with an edge of the triangulated surface.To do so, it is helpful to introduce index numbers for these crossing points.We use two different forms of indexing, edge coordinates and arc coordinates.Choose an arbitrary orientation for each edge of a triangulated surface with a specified normal curve.Then, for this orientation, the edge coordinate of a point where a normal curve crosses an edge is just its position among the crossings on that edge, after choosing an arbitrary orientation for that edge.
Similarly, choose an arbitrary orientation for each arc of the specified normal curve, and designate one of the crossings points of each arc (chosen arbitrarily) as its starting point.Then, for this data, the arc coordinate of a point where a normal arc crosses an edge is its position among all of the crossings along the normal arc, in the order they are reached by following that arc from its starting point in the direction of the specified orientation.
Erickson and Nayyeri [16] provide several useful algorithms for manipulating normal curves, normal coordinates, and the edge coordinates and arc coordinates of their crossings.In particular, they show that the following computations can all be done in time polynomial in the bit complexity of the normal curve, as given by observation 6.10: Given a normal curve and the edge coordinate of a crossing point  in this curve, find the normal coordinates that describe the arc of the normal curve containing  [16, Theorem 6.2].The algorithm constructs a street complex describing this arc, from which it is possible to convert edge coordinates in the given curve into edge coordinates in the arc and vice versa.
Given a normal curve consisting of a single arc, the edge coordinate of a crossing point , and a choice of a starting point on that arc, find the arc coordinate of  [16, Theorem 6.3].
Given a normal curve consisting of a single arc, the arc coordinate of a crossing point , More precisely, the time for each of these operations is quadratic in the number of triangles in the triangulation and logarithmic in the total number of crossings of the normal curve.

O
B S E R VAT I O N 3 .4. Let  be a polynomial time bijection.Then  ∈ IB M,i .P R O O F .We reduce the computation of  () to the iteration of an invertible function through the trivial summation

Figure 1 .
Figure 1.State space and transitions for Thomason's lollipop algorithm.From any Hamiltonian path (center) with a fixed starting vertex and edge (green), extending the other end of the path by one more edge (blue) can either produce a Hamiltonian cycle (left) or a "lollipop", a shorter cycle with a dangling path (right).Removing one edge from the cycle in a lollipop (red X) produces another Hamiltonian path with the same fixed starting vertex and edge.
Fnding a second Hamiltonian cycle is one of the prototypical examples of a problem in the complexity class PPA, defined more generally in terms of finding a second odd-degree vertex in an implicit graph, although it is not known to be complete for PPA[44].However, a solution of the PPA version of the problem is not required to be in the same component of the state space as the given Hamiltonian cycle, so PPA does not characterize the complexity of finding the same Hamiltonian cycle as the cycle found by Thomason's algorithm.Instead, we are interested in the complexity of a more specific problem, solved by Thomason's algorithm: given a Hamiltonian cycle, a fixed edge, and a fixed orientation for that edge, find the other Hamiltonian cycle from the same component of Thomason's state space.This is an instance of the second leaf problem, in an implicit graph of maximum degree two (not necessarily a linear forest).It is known that some instances may cause Thomason's lollipop algorithm to take an exponential number of steps[31,7,58,6], but while this settles the complexity of this specific algorithm, it leaves open the complexity of the functional problem solved by the algorithm.
will consider in more depth the billiard-ball model or BBM block-cellular automaton, devised by Margolus to simulate reversible logic, universal Turing machines, and other reversible cellular automata[36].This cellular automaton uses the Margolus neighborhood, in which the cells of a square grid are grouped into square blocks of four cells, in two alternating ways, with the corners of the square blocks in even generations of the automaton forming the centers of the square blocks in odd generations (Figure2, left).Cells have two states, dead or alive.The transition function of the automaton acts independently within each square block.In a block with a single live cell, the updated state has again a single live cell in the diagonally opposite position.In a block with two diagonally placed live cells, all four cells change from live to dead or vice versa.All other blocks remain unchanged (Figure2, right).

Figure 2 . 5 . 1 .
Figure 2. The billiard-ball model.Left: the Margolus neighborhood breaks up the square grid of cells into 2 × 2 square blocks in two alternating ways, as shown by the blue and red blocks.Right: Blocks with one live cell, or with two diagonal live cells, change in the ways shown; all other blocks remain unchanged.

Figure 3 .
Figure 3.A two-track automaton in which the update rule for the top track copies the left neighbor and the update rule for the bottom track copies the right neighbor.

5 . 3 .
left and right tracks hold "signals" that move leftwards or rightwards through the automaton, interacting with other cells as they do.The leftmost cell of the initial pattern   is an active cell in a "general" state, with the remaining active states being "soldiers".The general sends out two signals to its left, and transitions to a "waiting" state, waiting for a signal to return from the left.The faster of the two signals sent out in the same direction by the general bounces off the boundary of the pattern, and meets the slower signal in the middle of the pattern.When the signals meet, their interaction produces two new generals in central cells, one sending signals to the left and the other to the right, which again transition to a waiting state, waiting for a signal to return from the direction it was sent.In this way, the pattern is split recursively into two subpatterns which behave in the same way, recursively splitting into smaller subpatterns, until at the base level of the recursion all of the constant-length patterns fire simultaneously.An important consideration, for our purposes, is the behavior of the "waiting" states.In a cell waiting for a signal from the left, the bijective transformation acts as the identity on the right track, and this track otherwise does not affect the behavior of the cell.The cells waiting for signals from the right are symmetric.Effectively, these states partition the pattern into parts that do not interact with each other, without the need for quiescent states.If the initial pattern of Imai and Morita is run for 3/2 + (1) steps, it reaches a configuration in which the leftmost active cell is waiting on a signal from its right, and a central active cell is waiting on a signal from its left.The pattern of length /2 between these two waiting cells will then fire in 3/2 − (1) more steps, regardless of any modification to the states in any other part of the pattern, because these two waiting cells block all interaction from other parts of the pattern.We can eliminate the division by two in these formulas by doubling the size of the initial pattern, and formalize this as the following observation: O B S E R VAT I O N For the Imai-Morita firing squad automaton, for all , there exist patterns of  consecutive active cells that will remain active for 3 − (1) steps and then simultaneously fire, regardless of how the cells outside this pattern are initialized.Additionally, these patterns can be constructed in time polynomial in .

Figure 4 .
Figure 4. Helical boundary conditions for the two-dimensional Margolus neighborhood (shown here with circumference 32 in an exploded view with spacing between rows of squares) transform its behavior for a single time step into that of a two-track one-dimensional cellular automaton.

Figure 5 .
Figure 5. Left: The interval exchange transformation  ↦ → ( + ) mod 1. Right: More complicated interval exchange transformations can be used to model reflections in mirrored polygons.

Figure 6 .
Figure 6.A perfect riffle shuffle of  cards can be represented as a piecewise linear transformation with two pieces acting on the first ⌈/2⌉ cards (blue arrows) and the remaining ⌊/2⌋ cards (magenta positions.A right circular shift by  units performs the opposite transformation, moving each bit value  steps to the right, with the least significant  bits wrapping around into the most significant  positions.A right circular shift by  units is the same as a left circular shift by  −  units.O B S E R VAT I O N 6 .4. A left circular shift by one unit on the range [0, 2  ) can be expressed as a piecewise linear bijection with two pieces.P R O O F .It is the transformation

Figure 7 .
Figure 7.A normal curve (light blue) on a triangulated double torus (black triangles and red vertices, glued from top to bottom and from left side to right side with the pairing indicated by the letters).

D E F I N I T I O N 6 . 8 . 6 . 9 .
The normal coordinates of a normal curve on a triangulated surface are a labeling of each edge  of the triangulation by a non-negative integer   , the number of points of intersection between the curve and edge  (Figure8).The following is standard:O B S E R VAT I O N The normal coordinates of any normal curve of any triangulated surfaceobey the triangle inequality in each triangle of the surface, and sum to an even number in each triangle of the surface.Any system of non-negative integer edge labels obeying these constraints defines a normal curve, which is unique up to homeomorphisms of the surface that map each vertex and edge of the triangulation to itself.P R O O F .Consider any normal curve  of any triangulated surface, and a triangle Δ of the triangulation with edges , , and .Within Δ,  must consist of some number  ≥ 0 of segments crossing from  to , some number  ≥ 0 of segments crossing from  to , and some number  of segments crossing from  to .Then its normal coordinates on these three edges are   = +,   =  + , and   =  + .Their sum is 2 + 2 + 2, an even number.They obey the triangle inequality because   +   = 2 +  +  ≥  +  =   .

6 5 3 Figure 8 .
Figure 8.A triangle in a triangulated surface (black), part of a normal curve (blue), and the normal coordinates of the three triangle edges (green) and a choice of a starting point on that arc, find the edge coordinate of [16, Theorem 6.4].

T H E O R E M 6 .1 2 (
Bell).The iterated integer interval exchange transformation problem, for iterations of a transformation on  intervals over the integers in the range [0,  − 1], can be solved in time polynomial in , log , and log .P R O O F .Given an integer interval exchange transformation  , a number of iterations , and a starting value , perform the following steps to compute  () (): 1 and let − be the polynomial-time invertible bijection that takes an -bit binary number  in 2's-complement notation to its negation −, with zero and the all-ones binary value taken to themselves.Then − is not a polynomial-time reversible function.More strongly, no -input -output reversible logic circuit with gates of arity less than  can compute −.P R O O F .Given any reversible circuit, match up inputs and outputs at each gate to form  longer wires running through the entire circuit, and describe the function of the circuit as a permutation on the 2  truth assignments to these  wires, obtained by composing in topological order permutations at each gate.Each gate's permutation is even, because each cycle in its permutation is paired with another cycle, obtained by negating a value unused by the gate.Therefore, the function of the whole circuit is also an even permutation.However, binary negation swaps (2  − 2)/2 pairs of values, an odd number for  > 1, giving an odd permutation.

O B S E R VAT I O N 3 . 2. IB
T,i = IB T,r and IB M,i = IB M,r .
L E

M M A 3 . 3. IB
T,b = IB M,b and IB T,i = IB M,i .Let  be a functional problem in IB T,b or IB T,i .This means that  can be solved by an algorithm   , that takes polynomial time outside of a polynomial number of calls to a subroutine for computing  () () for a fixed polynomial-time bijection .To show that  ∈ IB M,b Our overall simulation will only perform this step with  and  initially zero, transforming , 0, 0 to ,  (), 0, corresponding to Step 2 of Lemma 2.2.From their definitions, these classes are naturally partially ordered by inclusion, with IB M,  ⊆ IB T,  and IB ,r ⊆ IB ,i ⊆ IB ,b , so we need only show that every problem from the largest class in this partial order, IB T,b , is contained within the smallest class in this partial order, IB M,r .Because of this equivalence, it is justified to drop the subscripts and use IB to refer to any of the six equivalent complexity classes of Theorem 3.6.We will later see (Theorem 4.3) that IB = FP PSPACE .For now, we prove the easy direction of this equivalence.Any function in IB can be computed by performing a polynomial-time reduction and then repeatedly computing a polynomial-time function, using a counter to keep track of the number of iterations performed.The reduction, the computation of the function, and the counter all use only polynomial space.Therefore, the th bit of output of a function in IB belongs to PSPACE, and all polynomially-many bits of output can be obtained by using a polynomial number of calls to a PSPACE oracle to obtain the bits for each different value of .■ If  2 = 1, function  sets  =  ⊕ .When applied to ,  (), 0, this transforms it into  ⊕  (),  (), 0, corresponding to Step 3 of Lemma 2.2.If 1 <  2 < 2  + 2, function  checks whether  () = , and if so replaces  with  ⊕ .Then regardless of the outcome of the check, it increments  modulo 2  .When applied to  ⊕  (),  (), 0, these steps use the trivial summation method to find  and exclusive-or it into the first component, producing  (),  (), 0 as in Step 8 of Lemma 2.2.If  2 = 2  + 2, function  replaces  by  ⊕ .When applied to  (),  (), 0, this produces  (), 0, 0, ready for another iteration.For all other values of  2 ,  does nothing to , , and .The changes to  1 and  2 in each computation of  are easily inverted, and other than those changes the only effect of  is to perform an exclusive-or into one of the three components , , and , with a value computed only from the other two components, an operation that is its own inverse.Therefore,  is a polynomial-time invertible bijection, and we have shown how to compute the iterated values of a polynomial-time bijection  by iterating a different polynomial-time invertible bijection .■ T H E O R E M 3 .6.For  ∈ {M, T} and  ∈ {b, i, r} the complexity classes IB ,  are all equal.P R O O F .Therefore, let  be a functional problem in IB T,b , meaning that it can be reduced by a Turing reduction to the iteration of a polynomial-time bijection  .By Lemma 3.5 and the composition of this Turing reduction with the many-one reduction of Lemma 3.5 to produce another Turing reduction,  ∈ IB T,i .By Lemma 3.3,  ∈ IB M,i .And by observation 3.2,  ∈ IB M,r .■ O B S E R VAT I O N 3 .7. IB ⊆ FP PSPACE .P R O O F .O B S E R VAT I O N 3 .8. Define  be a functional problem whose input is a specification of a reversible logic circuit composed of universal reversible logic gates, a number , and a Boolean assignment to each input wire of the circuit, and whose output is the result of applying the circuit  times, passing its outputs back to its inputs.Then  is complete for IB under both many-one and Turing reductions.
A parameterized family of implicit graphs is defined by a polynomial time function  (, ) that takes as input two bitstrings  and , where  identifies a specific implicit graph and  names a vertex within that graph, and that produces as output a finite sequence of distinct bitstrings of equal length to  naming the neighbors of  in .If  is invalid (meaning that it does not name a vertex in the graph specified by ),  should output a failure condition, again in polynomial time.A parameterized family is undirected if, whenever  belongs to the output of  (, ),  symmetrically belongs to the output of  (, ).It is bivalent if every call to  produces at most two neighbors.A connected leaf problem is defined by an undirected bivalent family of implicit graphs, defined by a polynomial time function .An input to a problem defined in this way consists of input values  and  such that the output of  (, ) has length exactly one.The output to a connected leaf problem, defined in this way, is a bitstring  ), we find a Turing reduction from instances (, ) to equivalent problems of iterated bijection, as follows.Let  be the number of bits in the bitstring ; by the way we have defined parameterized families of implicit graphs, all vertices in the connected component of  in the implicit graph specified by  have the same number  of bits in their descriptions.We construct a polynomialtime bijection  whose inputs and outputs are triples (, , ) of -bit values.In these triples  can be interpreted as a number modulo 2  , at least as large as the number of vertices in the component of .The two remaining values  and  in these triples should be interpreted as describing adjacent vertices in the graph specified by .However, to formulate this as a problem in IB, the bijection  that we construct cannot depend on this interpretation: it must be capable of handling values  and  that do not specify vertices, or that specify vertices that are not adjacent.We compute the value of  according to the following case analysis: First, use  to compute the neighbors  (, ) and  (, ) of  and .If either call returns a failure condition, or if the two vertices are not neighbors, return the input (, , ) unchanged as the output.If  > 0, check whether  has one neighbor.If so, return ( + 1 mod 2 +1 , , ).However, if  has two neighbors, return (, , ) unchanged.In the remaining case,  = 0.If  has one neighbor, return (1, , ).Otherwise,  has two neighbors,  and another vertex .In this case, return (0, , ). is a leaf vertex in the implicit graph described by , and  is its one neighbor, then iterating this function has the effect of walking in one direction along a path, waiting 2  steps, and then walking in the same way in the opposite direction along the path, acting bijectively on all of the triples of values seen in this walk.If  and  are neighbors in a cycle of the implicit graph, then iterating this function starting from (0, , ) has the effect of walking from arc to arc around the cycle, with no waiting steps.For all of the remaining triples of values, this function acts as the identity.Therefore, in all cases it is bijective.Each step involves only two calls to the polynomial time function , and simple case analysis, so it takes polynomial time to compute  .We can solve a connected leaf problem with neighbor function  and data ,  by first using  to find the neighbor  of  and then iterating the function  constructed above for 2  iterations starting from (0, , ) to produce another triple (, , ), and finally returning .By the construction of  , iterating it will necessary reach the leaf at the other end of the component of  in fewer than 2  steps and then wait for 2  steps while incrementing the first component of the triple modulo 2  until it reaches zero again.If we iterate  for exactly 2  steps, the resulting triple (, , ) will necessarily be part of this waiting stage of the dynamics of  , and the returned value  will necessarily be the other leaf connected to , as desired.■ D E F I N I T I O N 4 .1.describingavertex of degree one in the same connected component as  of the implicit graph defined by  and .When the input does not have the correct form ( (, ) produces a failure condition or the wrong number of neighbors) the output is undefined.Thus, the problem of duplicating the output of Thomason's lollipop algorithm is a connected leaf problem in which  describes the underlying cubic graph in which a second Hamiltonian cycle is to be found,  encodes a description of a Hamiltonian path in this underlying graph, and  performs a single step of the lollipop algorithm described above, for each of the two ways of extending the path described by , and outputs the Hamiltonian path or paths that result from this step.T H E O R E M 4 .2. Every connected leaf problem belongs to IB. P R O O F .Given a connected leaf problem defined by a polynomial time function  (, T H E O R E M 4 .3. IB = FP PSPACE .P R O O F .This follows immediately from the fact that IB ⊆ FP PSPACE (observation 3.7), from Theorem 4.2, and from the known FP PSPACE -completeness of the connected leaf problem [2, 44, 34].
We formulate the iterated piecewise linear bijection problem as follows: the input is a triple (, ,  ), where  and  are integers, and  is a piecewise linear bijection on a range of integers that includes , described by specifying integer endpoints of each piece of the bijection, together with two integer coefficients for the linear transformation of that piece.The output is  () ().
P R O O F .
Let  be the composition of a sequence of  piecewise linear bijections  1 ,  2 , . . .  on the integers in the range [0, ).Then there exists a single piecewise linear bijection  on the range [0, ), such that for all  in [0, ),  () =  () ().The number of pieces needed to define  as a piecewise linear transformation is the sum of the numbers of pieces in each   .