An Algorithm For Synthesis of Reversible Logic Circuits
An Algorithm For Synthesis of Reversible Logic Circuits
Abstract—Reversible logic finds many applications, especially in Computation in which there is no information loss is called
the area of quantum computing. A completely specified n-input, reversible and the gates that perform such computation are
n-output Boolean function is called reversible if it maps each called reversible gates. Bennett [3] showed that zero energy
input assignment to a unique output assignment and vice versa.
Logic synthesis for reversible functions differs substantially from dissipation is possible only if a circuit contains reversible gates;
traditional logic synthesis and is currently an active area of re- hence, reversibility will be an important issue in future circuit
search. The authors present an algorithm and tool for the synthesis design. Reversible logic finds many applications, especially in
of reversible functions. The algorithm uses the positive-polarity the area of quantum computing. It is believed that quantum
Reed–Muller expansion of a reversible function to synthesize the computing has the ability to considerably speed up some clas-
function as a network of Toffoli gates. At each stage, candidate
factors, which represent subexpressions common between the sical problems such as factorization (using Shor’s algorithm)
Reed–Muller expansions of multiple outputs, are explored in the or search (using Grover’s algorithm). Quantum gates are re-
order of their attractiveness. The algorithm utilizes a priority- versible by nature [4], which provides a powerful motivation
based search tree, and heuristics are used to rapidly prune the to study reversible circuits.
search space. The synthesis algorithm currently targets the gen- The problem of reversible logic synthesis is concerned with
eralized n-bit Toffoli gate library. However, other algorithms
exist that can convert an n-bit Toffoli gate into a cascade of the ability to automatically generate a reversible circuit given a
smaller Toffoli gates. Experimental results indicate that the au- reversible specification. It is different from traditional Boolean
thors’ algorithm quickly synthesizes circuits when tested on the logic synthesis in five major ways [5]. 1) Unlike Boolean
set of all reversible functions of three variables. Furthermore, it is circuits, reversible circuits have an equal number of inputs and
able to quickly synthesize all four-variable and most five-variable outputs. 2) Fanout is not allowed, and the circuit structure is
reversible functions that were in the test suite. The authors
also present results for some benchmark functions widely dis- constrained to a cascade of reversible gates. 3) Every output of
cussed in literature and some new benchmarks that the authors a gate that is not used in the circuit is a garbage signal. A good
have developed. The algorithm is shown to synthesize many, but synthesis method minimizes the number of garbage signals.
not all, randomly generated reversible functions of as many as 4) The total number of constants at inputs of the gates should
16 variables with a maximum gate count of 25. be kept as low as possible. 5) There are no feedback paths, and
Index Terms—Quantum computing, reversible computing, hence, the circuit is acyclic. These differences make traditional
reversible logic synthesis. synthesis methods inapplicable.
In this paper, we present Reed–Muller reversible logic
I. I NTRODUCTION synthesizer (RMRLS), an algorithm and tool that uses the
positive-polarity Reed–Muller (PPRM) expansion of a re-
Fig. 3. (a) NOT gate, (b) CNOT gate, (c) n-bit Toffoli gate, and (d) circuit for
function in Fig. 1.
B. Reversible Gates
Fig. 2. Augmented full-adder. (a) Original truth table. (b) Possible reversible A reversible gate implements a reversible function. There are
specification. two main types of reversible gates, namely: 1) Toffoli [8] and
2) Fredkin [9]. Since our algorithm currently does not utilize the
described in detail in Section IV. Our experimental results Fredkin gate, we do not discuss it further in this paper. An n-bit
are presented in Section V. The conclusions are given in Toffoli gate, denoted by TOFn(x1 , x2 , . . . , xn ), passes the first
Section VI. n − 1 inputs (referred to as control bits) to the output unaltered
and inverts the nth input (referred to as target bit) if the first
II. B ACKGROUND n − 1 inputs are all one. That is
We present some preliminary concepts in this section. yi = xi , for 1 ≤ i < n
yn = xn ⊕ x1 x2 · · · xn−1 . (1)
A. Reversible Functions
A completely specified n-input, n-output Boolean function A one-bit Toffoli gate inverts the input unconditionally and
is reversible if it maps each input assignment to a unique is called the NOT gate. A two-bit Toffoli gate is called the
output assignment and vice versa [6]. A reversible function of Feynman or CNOT gate. Fig. 3 presents a graphical represen-
n variables can be defined as a truth table or as a permutation tation of NOT, CNOT, and n-bit Toffoli gates and the circuit
on the set of integers {0, 1, . . . , 2n − 1}. For example, the implementation of the reversible function in Fig. 1.
reversible function in Fig. 1 can also be specified as {1, 0, 7,
2, 3, 4, 5, 6}. An irreversible function can be converted into
C. Reed–Muller Expansions
a reversible function by adding extra outputs, called garbage
outputs, such that the input–output mapping is unique. If the Any Boolean function can be described using an EXOR
maximum number of identical output vectors is p, then the sum-of-products (ESOP) expansion [10], [11]. The PPRM
total number of garbage outputs needed is equal to log2 p expansion uses only uncomplemented variables and can be
[2]. Of course, constant garbage inputs must then be added, as derived from the function’s sum-of-products expression. The
necessary, to balance the number of inputs and outputs. PPRM expansion of a function is canonical and of the form
To demonstrate the process of converting an irreversible
function into a reversible one, consider the augmented full- f (x1 , x2 , . . . , xn ) = a0 ⊕ a1 x1 ⊕ · · · ⊕ an xn ⊕ a12 x1 x2
adder, which produces carry (co ), sum (so ), and propagate (po ) ⊕ a13 x1 x3 · · · ⊕ an−1,n xn−1 xn
signals. Its truth table is shown in Fig. 2(a). The function is not ⊕ · · · ⊕ a12···n x1 x2 · · · xn (2)
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 15,2024 at 21:47:42 UTC from IEEE Xplore. Restrictions apply.
GUPTA et al.: AN ALGORITHM FOR SYNTHESIS OF REVERSIBLE LOGIC CIRCUITS 2319
where ai ∈ {0, 1} and xi are all uncomplemented (positive In this paper, the PPRM expansion of a function was obtained
polarity). For example, the PPRM expansion of the function in one of the following ways.
in Fig. 1 is as follows: 1) Completely specified reversible functions: EXORCISM-
4 was used to convert the specification into the ESOP
ao = a ⊕ 1 form, which was then converted into the PPRM form.
2) Benchmark functions: For the benchmarks for which
bo = b ⊕ c ⊕ ac
the specification was incomplete, the PPRM expansion
co = b ⊕ ab ⊕ ac. (3) was derived manually by examining the synthesized cir-
cuits reported by other researchers [13]. For the bench-
marks that we have developed, we provide the complete
D. Quantum Cost specification.
The quantum cost of a reversible circuit is the sum of the
quantum cost of its gates. The quantum cost of a gate G is the III. P REVIOUS W ORK
number of elementary quantum operations required to realize In this section, we present a survey of some of the other
the function given by G [2]. These elementary operations are reversible logic synthesis algorithms that have been proposed.
performed by the NOT, CNOT, and three-bit Toffoli gates. NOT An exhaustive algorithm is presented in [16], which
and CNOT gates have a quantum cost of one. However, they are generates all possible circuits containing k gates for increasing
not complete because they only realize linear functions. The values of k until a circuit is found that implements the given
addition of the three-bit Toffoli gate makes the set of gates specification. This is an iterative-deepening approach, and it
complete (i.e., have the ability to implement any reversible can be seen that the result will be optimal. However, the ap-
function). However, the three-bit Toffoli gate cannot be realized plicability of this method is restricted to reversible functions of
as a single elementary operation. Fortunately, a realization for at most three or four variables that require eight or fewer gates
the three-bit Toffoli gate with a quantum cost of five has been in their implementation. Important theoretical results on the
found [12]. Larger Toffoli gates have a higher quantum cost synthesis properties of even and odd permutation functions are
due to the number of elementary quantum operations required also presented. In [17], the authors introduce local optimization
for their realizations. We use the cost table available from [13] (similar to peephole optimization in compilers) for reversible
to calculate the cost for such gates. circuits where suboptimal subcircuits are replaced with
Given the current state of technologies for implementing smaller counterparts to simplify and reduce the overall size of
quantum circuits, it is believed that an n-bit Toffoli (n > 3) gate the circuit.
will have a high technological cost. These gates are expected An algorithm based on the Rademacher–Walsh spectrum is
to be macros that will be implemented by elementary gates. presented in [18]. At any given stage, the circuit is synthesized
Theoretical lower and upper bounds on the number of elemen- from inputs to outputs or vice versa depending upon the best
tary gates required to implement an n-bit Toffoli gate are given translation (i.e., an application of a generalized n-bit Toffoli
in [12] and [14]. (GT) gate) that is possible. The best translation is determined
to be that which results in the maximum positive change in
the complexity measure of the function. Because there is no
E. Generating PPRM Expansions
backtracking or look-ahead, an error is declared if no translation
Generating the PPRM expansion of a Boolean function is can be found. The authors are working on a formal proof to
nontrivial. There are two issues that complicate the process. show that the algorithm will always synthesize a valid circuit
First, most functions encountered in practice are irreversible given enough time and memory. Their experimental results
and/or incompletely specified. To make a function reversible, indicate that the method holds promise and needs to be further
it is necessary to add garbage inputs and outputs and assign investigated.
bit values (i.e., 0/1) to them. Determining which combination, In [5], the authors present an iterative algorithm to implement
among the possible assignments, will lead to the best synthe- an incompletely specified Boolean function as a cascade of
sized circuit, given the optimization criteria (e.g., least number reversible complex Maitra terms (also known as reversible wave
of Toffoli gates, minimum quantum cost, etc.), is a challenging cascades). The remarkable feature about this algorithm is that
and open problem. The second problem is that it is necessary to it requires at most one constant input and no garbage outputs.
convert a reversible function into the ESOP form before it can The basic idea is that a cascade implementation of the original
be expanded into the PPRM form. Fortunately, this problem incompletely specified function is equivalent to the cascade
has been addressed, and researchers have developed a state- implementation of a completely specified function that has the
of-the-art tool called EXORCISM-4 [15] that uses efficient same ON-set (minterms) and OFF-set (maxterms) as the original
heuristics and look-ahead strategies to quickly find the ESOP function. First, the set of completely specified functions repre-
form of a Boolean function. Once the ESOP form has been senting a stage of a cascade is computed. Next, the remainder
obtained, it can be transformed into the PPRM form by making function is calculated assuming that the completely specified
the substitution ā = a ⊕ 1 on all the complemented variables, functions will be used in the cascade. The stages are added to
algebraically expanding the product terms, and canceling out an the synthesized circuit if the remainder function is independent
even number of identical product terms. of at least one (or more) variable(s).
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 15,2024 at 21:47:42 UTC from IEEE Xplore. Restrictions apply.
2320 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 11, NOVEMBER 2006
A bidirectional algorithm is described in [7], in which syn- total number of terms in the PPRM expansion of f is stored in
thesis proceeds by adding gates to the circuit either at its initT erms, and a timer (T imer) is created that specifies the
inputs or outputs. Synthesis is complete when the reversible time limit for synthesis.
specification has been transformed into the identity function The root node (rootN ode) of the search tree is initial-
(i.e., aout = a, bout = b, cout = c, . . .). This is achieved by ized next. The depth and factor of this node are set to 0
inspecting the truth table in lexicographical order until the first and NULL, respectively. The PPRM expansions of all output
output assignment is encountered, which is not equal to the variables vout,i in f are obtained in terms of all its input
input assignment located in the same row of the table. A series variables vi and stored in rootN ode.pprm. rootN ode.terms
of Toffoli and/or Fredkin gates must be added in such a way that and rootN ode.elim contain the total number of terms in the
the output assignment becomes the same as the input assign- current PPRM expansion and the total number of terms that
ment. However, mapping each output back to its corresponding have been eliminated from the original PPRM expansion once
input is often not the best mapping. An output permutation, a substitution is made, respectively. Because this is the root
which maps output vout to a different input rather than its node, rootN ode.elim is set to zero. In addition, its priority
corresponding input v (e.g., aout = b, bout = a, cout = c, . . .), rootN ode.priority is set to infinity. Finally, an empty priority
is used to find a better mapping. An output permutation is useful queue P Q is initialized, and the root node is pushed onto the
because it may lead to a simpler specification that needs to priority queue. This will be the first node that will be explored
be synthesized. The interesting feature about this algorithm is during synthesis. The priority queue maintains a list of nodes
that it is guaranteed to synthesize a valid circuit given enough sorted with respect to their priorities.
time and memory. However, the method synthesizes circuits After the initialization phase, the algorithm enters a loop.
that frequently contain sequences of gates that can be simpli- The most promising node for further exploration is removed
fied. Thus, a processing step is applied postsynthesis where from the priority queue and stored in parentN ode (line 15).
such sequences, called templates, are identified and simplified. Lines 16 and 17 check to see whether parentN ode is worth
Templates were first introduced in [19] and later generalized exploring or not. If the depth of parentN ode is greater than
in [20]–[22]. or equal to bestDepth − 1, then parentN ode can be ignored
Finally, the algorithm in [6] has the ability to utilize any as it cannot possibly lead to a better solution than the best one
arbitrary gate library (i.e., arbitrary subsets of NOT, CNOT, seen so far.
n-bit Toffoli, SWAP, and n-bit Fredkin gates) during synthesis. We explore parentN ode by examining each output variable
The main idea is that given a reversible specification, all gates in vout,i in the PPRM expansion of f (lines 18 and 19). For each
a given library are possible candidates for application. For each input variable vi , we search for factors in the PPRM expansion
gate, assuming it is used, the remainder function is calculated of vout,i contained in parentN ode.pprm that do not contain
using shared binary decision diagrams with complementary vi (line 20). For example, if aout = a ⊕ 1 ⊕ bc ⊕ ac, then the
edges. The complexity measure of the remainder function is appropriate factors are 1 and bc, as neither contains literal a. For
calculated, and the gate that results in the lowest complexity each factor (f actor) that has been identified in this manner,
measure is chosen to be added to the synthesized circuit. Syn- the substitution vi = vi ⊕ f actor is made in the PPRM
thesis then proceeds on the remainder function. If two or more expansions of parentN ode. A new node is created, which
gates yield the lowest complexity, multiple paths are explored is a child of parentN ode. The depth of the child node is
simultaneously. incremented by one (line 23), and a copy of f actor is stored
(line 24). The PPRM of the child node is set to be the PPRM
obtained once the substitution has been made (line 25).
IV. S YNTHESIS A LGORITHM
The number of terms in the new PPRM and the number
We describe our synthesis algorithm in this section. We of terms eliminated by making the substitution are stored
describe the basic algorithm first. We then describe some in childN ode.terms and childN ode.elim, respectively
heuristics that are used to improve the performance of the basic (lines 26 and 27). Finally, childN ode is analyzed, and one of
algorithm. The input to our algorithm is a PPRM expansion of the following actions is taken.
a reversible function f (v1 , v2 , v3 , . . . , vn ) that is to be synthe-
sized. The output is a network of Toffoli gates that realizes f . 1) If the synthesis of f has been completed (i.e., the PPRM
We illustrate the application of our algorithm with an example expansions for all vout,i contain only vi ), then the values
and comment on the data structures that we use and whether the of bestDepth and bestSolN ode are updated if this solu-
algorithm converges or not. tion improves upon the best solution found so far (lines
28–30).
2) If the number of terms in the PPRM expansion
A. Algorithm
has not decreased by making the substitution (i.e.,
Fig. 4 outlines the pseudocode for the main steps in the basic childN ode.elim ≤ 0), then the node is also disregarded
version of our synthesis algorithm. Initialization takes place (line 31). This guarantees that we only explore those
in lines 1–13. Variable bestDepth, which stores the number nodes where the number of terms in the PPRM expansion
of gates in the best circuit synthesized for f so far, is set to is decreasing monotonically with the application of each
infinity. Variable bestSolN ode stores a pointer to a leaf node, substitution. Otherwise, the priority of childN ode is cal-
which represents the last gate of the synthesized circuit. The culated, and the node is inserted into the priority queue.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 15,2024 at 21:47:42 UTC from IEEE Xplore. Restrictions apply.
GUPTA et al.: AN ALGORITHM FOR SYNTHESIS OF REVERSIBLE LOGIC CIRCUITS 2321
The priority of childN ode is calculated as follows: explore. The latter condition states that we have reached the
time limit for synthesis. Upon termination, bestSolN ode con-
childN ode.priority = α∗childN ode.depth tains a pointer to the leaf node, which represents the last gate of
the synthesized circuit. The path from rootN ode of the search
β∗childN ode.elim
+ − γ∗f actor.literalCount (4) tree to bestSolN ode represents the series of Toffoli gates in
childN ode.depth the synthesized circuit. The edges of the path represent the
substitutions that were made. For each node n along this path,
where α, β, and γ are weights that sum up to one. In (4), the n.f actor contains a copy of the substitution vi = vi ⊕ f actor.
first term gives preference to nodes at a larger depth (i.e., depth- Hence, vi is the target bit, and the literals in f actor represent
first search) as all things being equal, they are more likely to be the control bits of the Toffoli gate.
closer to a solution. The second term addresses the primary ob-
jective of minimizing the number of Toffoli gates. The number
B. Example
of terms eliminated per stage is used to measure a node’s effec-
tiveness. Finally, the third term addresses the secondary objec- Fig. 5 illustrates the application of our basic synthesis al-
tive of minimizing the number of control bits of the individual gorithm to the reversible function in Fig. 1. In the figure, the
Toffoli gates. After careful experimentation, values of 0.3, 0.6, darkly shaded nodes have already been explored, lightly shaded
and 0.1 for α, β, and γ, respectively, were used in this paper. nodes have been added in the current stage, and nodes with no
The algorithm repeats the above process until the priority shading are yet to be considered. Initially, the PPRM expansion
queue becomes empty or the timer expires. The former con- of the function is stored in N ode 0, which is inserted into
dition implies that there are no more candidate nodes left to the priority queue. Fig. 5(a) shows the search tree and priority
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 15,2024 at 21:47:42 UTC from IEEE Xplore. Restrictions apply.
GUPTA et al.: AN ALGORITHM FOR SYNTHESIS OF REVERSIBLE LOGIC CIRCUITS 2323
queue at this point. In the next step, N ode 0 is popped from we do not need to store the PPRM expansion at the intermediate
the priority queue (it is the only item present) and examined for nodes. We only need to store the substitution that was made
possible substitutions. The algorithm identifies three possible in the nonleaf nodes. This memory optimization is very useful
substitutions, namely: 1) a = a ⊕ 1; 2) b = b ⊕ c; and 3) b = when synthesizing larger functions.
b ⊕ ac. For each substitution, a new node is created, the factor
that is identified is substituted in the PPRM expansion to get
D. Additional Substitutions
the new PPRM expansion. Because all three substitutions result
in fewer terms in the new PPRM expansion, all the nodes are So far, the substitutions that we have considered are of the
added to the priority queue. N ode 1.0 has a higher priority form vi = vi ⊕ f actor, which require variable vi to be present
than N ode 1.1 and N ode 1.2 [see Fig. 5(b)]. In the next step, in the PPRM expansion for the corresponding output variable
N ode 1.0 is popped from the priority queue and analyzed. vout,i . This requirement is somewhat stringent and needs to be
b = b ⊕ ac and c = c ⊕ ab are the two possible substitutions relaxed to be able to synthesize functions of larger variables.
identified for this node. N ode 2.0 and N ode 2.1 corresponding Enlarging the set of substitutions presents a drawback in that
to these substitutions are inserted into the priority queue with more substitutions need to be considered when exploring a
N ode 2.0 at the head of the queue with the highest priority node. However, it also increases the likelihood of good sub-
[see Fig. 5(c)]. In the next iteration, N ode 2.0 is popped from stitutions being found that eliminate more terms in the PPRM
the priority queue. The only possible substitution is c = c ⊕ ab, expansion, thus leading to a solution quicker. We allow for two
which leads to a solution [see Fig. 5(d)]. The solution (i.e., the additional types of substitutions.
substitutions shown on the path from the root of the search • We no longer require that output variable vout,i contains
tree to N ode 3.0) and its depth (i.e., the length of the path) variable vi for the algorithm to consider factors from the
are cached. At this point, no new nodes are inserted into the PPRM expansion of vout,i . For example, in the synthesis
priority queue. N odes 1.1 and 1.2 are popped in that order, for the reversible function in Fig. 1 just presented, the
respectively, but fail to give better solutions from any of their algorithm would also select c = c ⊕ b and c = c ⊕ ab as
substitutions. Their children are not added to the queue because possible substitutions in the first stage.
we have already found a solution of depth three, which they will • For any variable vi , we also allow the substitution vi =
not be able to beat. Lastly, N ode 2.1 is popped but discarded vi ⊕ 1 even if the PPRM expansion of vout,i does not
because its depth is too large to be useful. The solution illus- contain 1. In the synthesis of the reversible function in
trated in Fig. 3(d) is the best solution that the algorithm finds. Fig. 1, substitutions b = b ⊕ 1 and c = c ⊕ 1 would be
added as possible substitutions in the first stage. For this
special substitution only, we make an exception in that
C. Data Structures
we allow the number of terms in the PPRM expansion to
The data structures that are used in our algorithm are well increase.
known. A priority queue, implemented as a max heap, is Fig. 6 shows what the initial search tree would look like if the
utilized to determine which node is processed next. Doubly additional substitutions identified above are also considered.
linked lists are used to store the PPRM expansion of a function,
and substitutions are made by traversing the list. The way
E. Heuristics for Basic Algorithm
the algorithm is implemented requires repetitive searching in
the forward direction of the linked list only. To speed up this Given enough time and memory, the basic algorithm in Fig. 4
process, a pointer is saved at the last match, and the search will always find a valid solution. We prove this claim in the
continues from this point onward the next time. A tree data next section. In practice, however, the applicability of the basic
structure is utilized to keep track of the search space. Because algorithm is limited to reversible functions of at most five vari-
leaf nodes represent the only candidates for further exploration, ables before it exceeds the available memory. Consequently, we
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 15,2024 at 21:47:42 UTC from IEEE Xplore. Restrictions apply.
2324 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 11, NOVEMBER 2006
introduce a few heuristics to the basic algorithm to improve its search space will be searched. Therefore, the algorithm will
performance on reversible functions with a much larger number always find a valid solution. This concludes the proof.
of variables. The price we pay is that these heuristics can no As mentioned in the previous section, heuristics must be
longer guarantee that a solution will be found if one exists. utilized to improve the performance of the basic algorithm on
We abandon the entire search process and backtrack to the larger examples. The heuristics choose a subset of the list of
first level of the search tree if we have not been able to find a possible substitutions based on their order of attractiveness.
solution in a user-specified number (e.g., ∼ 10 000) of steps. Depending upon the heuristic, it is possible to miss making
The intuition behind this is that if we made a poor substitution, a critical substitution that is required to find a solution for
chances are that we might have made such a substitution very a given example. Because the heuristics described above are
early (i.e., at higher depths of the search tree) during synthesis. greedy in nature, they may prune such a critical substitution
Given that we have not been able to find a solution after because other substitutions may have higher priorities than the
considerable effort, we choose to abandon the search and restart one that is pruned. Consequently, if the basic algorithm is used
the search from the top of the search tree with a different in conjunction with the heuristics described above, there is no
substitution (i.e., an alternative path). guarantee that a valid solution will be found.
There are many substitutions that need to be considered at
each level of the search tree. Consequently, the branching factor
V. E XPERIMENTAL R ESULTS
in the basic algorithm will be large for larger functions. To
resolve this issue, we apply the greedy method in that only In this section, we present our experimental results. The
the best substitution or the best k substitutions with the highest synthesis algorithm has been implemented in C and is a part of a
score are added to the priority queue at every iteration for each tool called RMRLS [23]. All experiments were conducted on a
input variable vi . The score of each substitution is given by Dell PowerEdge 600SC server featuring a Pentium IV 1.6-GHz
its priority, which is calculated using (4). If there are n input processor, 512-kB cache, 768-MB RAM, and running RedHat
variables, then either n or kn factors are added to the priority Linux 8.0.
queue at each level of the search tree. The value of k used in our
heuristics varies from three to five. This heuristic considerably
A. Three-Variable Reversible Functions
reduces the memory requirements, speeds up the search, and
enables the basic algorithm to handle large functions. We first synthesized all reversible functions of three vari-
ables, which number 8! or 40 320. Although our algorithm
F. Algorithm Convergence targets the GT gate library, circuits of three variables can
contain gates from the NCT library, which contains the NOT,
The remaining question that needs to be addressed is whether CNOT , and three-bit Toffoli gates only. Consequently, we take
the algorithm will always synthesize a valid circuit, i.e., termi- the liberty of saying that we are using the NCT library for this
nate with a valid solution. We claim that this is indeed the case case only. Table I compares our results with those reported by
for the basic algorithm, and the proof is given next. other researchers. The column “No. gates” shows the number of
Given a PPRM expansion, a solution is found if and only gates in the circuit, whereas the data in other columns indicate
if for each input variable vi , vout,i = vi . That is the PPRM the number of such circuits that were synthesized. The synthesis
expansion, through repeated applications of substitutions, has methods in [6] and [7] utilize the more general NCTS library,
been reduced to the identity function. The substitutions that which, in addition to the gates in the NCT library, contains a
are applied are of the form vi = vi ⊕ f actor, where vi is the SWAP gate. A SWAP gate unconditionally exchanges a pair of
target bit of a Toffoli gate and the literals in f actor are the inputs. Optimal results for three-variable reversible functions,
control bits of the Toffoli gate. Note that the literals in f actor using both the NCT and NCTS libraries, are from [16]. As can
cannot contain vi (i.e., the substitution vi = vi ⊕ vi · f actor is be seen from Table I, our algorithm synthesizes good quality
illegal) because vi cannot both be the target and control bit in a circuits for the case of three-variable functions. Furthermore, it
Toffoli gate. took less than half a second to synthesize each function.
The basic algorithm allows for three types of substitutions: By extending our algorithm to include the SWAP gate, it
1) vi = vi ⊕ f actor, where vi is present in the PPRM ex- may be possible to achieve results comparable to those in [6].
pansion of output variable vout,i ; The reason why our algorithm achieves better results than [7]
2) vi = vi ⊕ f actor, where vi is not present in the PPRM despite the lack of a SWAP gate is mainly because substitution
expansion of output variable vout,i ; of common subexpressions in a PPRM expansion seems to be
3) vi = vi ⊕ 1. a more powerful technique in reducing a reversible function to
The union of these types of substitutions are all the possible the identity function than trying to map each output back to
substitutions that can be made in a PPRM expansion. There are its corresponding input. Of course, templates [20]–[22] provide
no other valid substitutions that can be applied. Consequently, a very useful method for improving the synthesized circuit
the algorithm considers all the possible substitutions that can further and should be utilized as a postprocessing step by
be applied at each stage. All of these candidates will be stored any synthesis algorithm. In fact, it was reported in a personal
in the priority queue. Given enough time and memory, each of communication [24] that the average circuit size was improved
the possible substitutions will eventually be analyzed. Although from 6.10 to 6.05 when our results were postprocessed through
this occurs at every stage, in the worst case scenario, the entire a template-based simplification tool [21]. However, because we
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 15,2024 at 21:47:42 UTC from IEEE Xplore. Restrictions apply.
GUPTA et al.: AN ALGORITHM FOR SYNTHESIS OF REVERSIBLE LOGIC CIRCUITS 2325
TABLE II
RANDOM FOUR-VARIABLE REVERSIBLE FUNCTIONS
TABLE III
RANDOM FIVE-VARIABLE REVERSIBLE FUNCTIONS
Specification: {0, 7, 6, 9, 4, 11, 10, 13, 8, 15, 14, 1, 12, 3, 2, Example 12: The 5one013 function outputs one if the num-
5}. ber of ones in the binary encoding of the input vector is equal
Toffoli circuit: TOF3(b, a, d) TOF2(a, b) TOF3(c, b, d) to zero, one, or three. Otherwise, it outputs zero. Four garbage
TOF2(b, c). outputs are necessary to make the function reversible.
The circuit implementation is shown in Fig. 8. Specification: {16, 17, 18, 3, 19, 4, 5, 20, 21, 6, 7, 22, 8, 23,
Example 9: The rd53 example is from the MCNC [25] 24, 9, 25, 10, 11, 26, 12, 27, 28, 13, 14, 29, 30, 15, 31,
benchmark suite and has five inputs and three outputs. The 0, 1, 2}.
output vector is the binary encoding of the number of ones Toffoli circuit: TOF2(d, a) TOF1(e) TOF2(e, d) TOF2(e, c)
in the input vector. Thus, {00000} yields {000}, {00101} TOF2(e, b) TOF2(a, b) TOF2(c, a) TOF2(c, e)
yields {010}, and {11111} yields {101}. Output vectors {010} TOF5(a, b, c, d, e) TOF1(a) TOF2(d, c)
and {011} both occur ten times, and hence, we need to add TOF4(a, b, e, c) TOF1(e) TOF1(d) TOF2(b, e)
log2 10 = 4 garbage outputs for a total of seven outputs. This TOF2(e, b) TOF3(b, c, d) TOF3(a, e, b)
requires the addition of two more constant inputs to the five TOF5(a, b, c, e, d).
existing ones. Function 5one245 is similar but outputs one if the number of
Specification: We use the same specification as that in [18]. ones in the binary encoding of the inputs is equal to two, four,
Toffoli circuit: TOF3(a, b, f ) TOF2(b, a) TOF3(a, c, f ) or five. Functions 6one135 and 6one0246 are an extension of
TOF2(c, a) TOF5(a, b, c, d, g) TOF3(a, d, f ) Example 12 to six variables. Results for these functions will be
TOF2(a, d) TOF4(b, d, e, g) TOF2(c, b) TOF3(d, e, f ) presented later.
TOF5(a, b, d, e, g) TOF5(b, c, d, e, g) TOF2(d, e). Example 13: The Boolean specification for the alu function
is given in Fig. 9. There are three control signals, i.e.: 1) C0 ;
The next set of examples is introduced for the first time in 2) C1 ; and 3) C2 , and two data inputs, i.e.: 1) A and 2) B.
this work. The control signals determine the logic operation performed on
Example 10: The majority5 function outputs one if three the data inputs. Four garbage outputs are necessary to make the
or more of the five inputs are one. Otherwise, it outputs zero. function reversible.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 15,2024 at 21:47:42 UTC from IEEE Xplore. Restrictions apply.
GUPTA et al.: AN ALGORITHM FOR SYNTHESIS OF REVERSIBLE LOGIC CIRCUITS 2327
TABLE IV
REVERSIBLE LOGIC BENCHMARKS
Specification: {16, 17, 18, 19, 0, 20, 21, 22, 23, 24, 25, 11,
12, 26, 27, 15, 28, 13, 14, 29, 8, 9, 10, 30, 31, 1, 2, 3, 4,
5, 6, 7}.
Toffoli circuit: TOF2(e, c) TOF2(c, e) TOF2(e, d) TOF1(e)
TOF4(a, b, d, e) TOF3(b, c, a) TOF3(a, c, e)
TOF3(d, e, a) TOF2(c, d) TOF4(a, d, e, b)
TOF3(b, c, a) TOF4(a, c, e, b) TOF2(d, c)
TOF5(a, b, c, e, d) TOF1(c) TOF2(e, c) TOF3(d, e, c)
TOF4(b, d, e, c).
Example 14: The shifter function has two control signals,
i.e.: 1) s0 and 2) s1 , and n inputs. Depending on the value
of the control signals, the function does a wraparound shift
of zero, one, two, or three positions on the input. The control
signals are passed unchanged to the output. For example, if the
control signals are 10, then {0, 1, 2, 3, . . . , 2n − 1} will become
{2, 3, . . . , 2n − 1, 0, 1}. When n is 10, 15, and 28, respectively,
the synthesized circuit contains 27, 30, and 56 Toffoli gates,
respectively.
D. Benchmark Functions
We next present synthesis results for many of the benchmarks
available from [13]. Note that the results presented in [13] are
the best available for each benchmark from various sources
in the literature, i.e., these results are not all due to the same
algorithm from the literature. Table IV shows the circuit name, we have developed. We will make these examples available to
how many real and garbage inputs it contains, the number of the reversible logic community. Due to memory constraints, our
gates required for its implementation, and its quantum cost. The algorithm was not able to find a solution to some examples,
quantum cost is calculated by using the cost table for Toffoli namely, in the ham#, hwb#, and #symm family of functions
gates available in [13]. To make a fair comparison, we compare given in [13].
our results with the best published results for the GT library
available from [13]. In cases where the circuit contains gates
E. Scalability
from the NCT library only, we compare our results with the best
published results for the NCT library. The † symbol annotating In the final set of experiments, we wanted to test the scal-
the benchmark name in Table IV highlights such cases. ability of our algorithm on reversible functions with a rel-
In general, our results are on par with those from [13]. In atively large number of inputs. Experiments with four- and
some examples, we get identical results. In other cases, we five-variable functions indicate that the algorithm can synthe-
synthesize a circuit with fewer gates but with a higher quantum size circuits with gate counts up to around 45. Consequently,
cost or vice versa. There are only two cases, namely, 5mod5 we generated random reversible logic circuits of 6–16 vari-
and shift10, in which our tool synthesizes a circuit that contains ables with a prespecified number of gates. The circuit was
more Toffoli gates and a higher quantum cost. With the aid of constructed by picking a gate at random from a given library
the postprocessing step using the template-based simplification (GT or NCT). The gate was then concatenated to the end
tool [21], it may be possible to further reduce both the gate of the circuit. The process was repeated until the specified
count and quantum cost of the circuits synthesized by our tool. number of gates had been selected and added. In the case of
Note that no comparison can be made for the examples that the GT library, the number of control bits for each Toffoli
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 15,2024 at 21:47:42 UTC from IEEE Xplore. Restrictions apply.
2328 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 11, NOVEMBER 2006
TABLE V
RANDOM REVERSIBLE FUNCTIONS WITH MAXIMUM GATE COUNT OF 15
TABLE VI
RANDOM REVERSIBLE FUNCTIONS WITH MAXIMUM GATE COUNT OF 20
gate was determined randomly as well. The circuits were then From Tables V and VI, we see that a vast majority of
simulated to obtain their reversible specifications. Next, the the circuits could be synthesized in 60 s. If more time were
PPRM expansions for this specification were derived. Finally, allowed, the circuit size would improve. However, we see a
synthesis was performed on the PPRM specification. The CPU large proportion of circuits failing to synthesize in Table VII in
time limit for synthesis for each function was again set to 60 s, the allotted time. This is again because we are pruning a lot of
and the greedy option for substitution pruning was used. potential substitutions at each iteration with the greedy option.
Tables V–VII show the results for three different scenarios. Better pruning strategies need to be developed to improve
In Table V, 500 examples each, containing 6–16 variables, with this algorithm. Nonetheless, it is comforting to see that the
at most 15 gates, were generated randomly. In Tables VI and algorithm can still quickly synthesize more than half (50%) of
VII, 1000 such examples each were generated requiring at most the circuits in such cases.
20 and 25 gates, respectively. For testing the scalability of our
algorithm, all we were interested in observing was whether a
VI. C ONCLUSION
solution (not necessarily the best) could be found or not. As
soon as a solution was found, we chose to move on to the next We have described an algorithm and tool that uses a PPRM
example. That is why, for example, Table V contains results expansion of a reversible function to synthesize a network
that require more than 15 gates, although we know that all of Toffoli gates. The algorithm searches the tree of possible
the examples require at most 15 gates. The column “Failed” factors in priority order to try to find the best possible solution.
indicates the number of examples that failed to synthesize in The use of extensive pruning leads to very fast synthesis. We
the allotted time. applied our algorithm to all 40 320 reversible functions of three
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 15,2024 at 21:47:42 UTC from IEEE Xplore. Restrictions apply.
GUPTA et al.: AN ALGORITHM FOR SYNTHESIS OF REVERSIBLE LOGIC CIRCUITS 2329
TABLE VII
RANDOM REVERSIBLE FUNCTIONS WITH MAXIMUM GATE COUNT OF 25
variables and obtained good quality results. We also showed [7] D. M. Miller, D. Maslov, and G. W. Dueck, “A transformation based
that it can handle all four-variable functions and most five- algorithm for reversible logic synthesis,” in Proc. Des. Autom. Conf.,
Jun. 2003, pp. 318–323.
variable functions that were in our test suite. We presented [8] T. Toffoli, “Reversible computing,” MIT, Cambridge, MA, 1980.
results on several benchmarks and also introduced some new Tech. Rep.
ones. Experiments on scalability indicate that our algorithm can [9] E. Fredkin and T. Toffoli, “Conservative logic,” Int. J. Theor. Phys.,
vol. 21, no. 3/4, pp. 219–253, 1982.
quickly find solutions to a good proportion of the randomly [10] T. Sasao, Logic Synthesis and Optimization. Dordrecht, The Nether-
generated functions with 6–16 inputs. However, it has most lands: Kluwer, 1993.
success in synthesizing those functions that require no more [11] M. Thornton, Fixed Polarity Reed–Muller Forms. [Online]. Available:
than 25 gates on average. http://engr.smu.edu/~mitch/class/8391/week12/esop.pdf
[12] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus,
As part of future work, we would like to incorporate Fredkin P. Shor, T. Sleator, J. A. Smolin, and H. Weinfurter, “Elementary gates
gates into our algorithm. A Fredkin gate is equivalent to three for quantum computation,” Phys. Rev. A, Gen. Phys., vol. 52, no. 5,
Toffoli gates. Thus, the use of Fredkin gates could yield a pp. 3457–3467, Nov. 1995.
[13] D. Maslov. Reversible Logic Synthesis Benchmarks Page. (2005, May).
significant improvement in circuit quality. We also want to [Online]. Available: http://www.cs.uvic.ca/~dmaslov/
improve the pruning process, so that we can handle circuits [14] V. V. Shende, I. L. Markov, and S. S. Bullock, “Smaller two-qubits circuits
that are currently beyond the capabilities of our algorithm. In for quantum communication and computation,” in Proc. Des. Autom. Test
Eur. Conf., Feb. 2004, pp. 980–985.
particular, there is a need for more powerful heuristics. We are [15] A. Mishchenko and M. Perkowski, “Fast heuristic minimization of ex-
also working on ways to efficiently synthesize functions with clusive sum-of-products,” in Proc. Int. Workshop Appl. Reed–Muller
“don’t cares.” We currently preassign values to “don’t care” Expansion Circuit Des., Aug. 2001, pp. 242–250.
outputs. It would be better if we could find a way to dynamically [16] V. V. Shende, A. K. Prasad, I. L. Markov, and J. P. Hayes, “Synthesis
of reversible logic circuits,” IEEE Trans. Comput.-Aided Des. Integr.
assign these values during synthesis. Circuits Syst., vol. 22, no. 6, pp. 710–722, Jun. 2003.
[17] V. V. Shende, A. K. Prasad, K. N. Patel, I. L. Markov, and J. P. Hayes,
“Scalable simplification of reversible circuits,” in Proc. Int. Workshop
ACKNOWLEDGMENT Logic Synthesis, May 2003.
[18] D. M. Miller and G. W. Dueck, “Spectral techniques for reversible logic
The authors would like to thank D. Maslov for pointing out synthesis,” in Proc. Int. Symp. Represent. Methodol. Future Comput.
an error in a circuit presented in the conference version [26] of Technol., Mar. 2003, pp. 56–62.
[19] K. Iwama, Y. Kambayashi, and S. Yamashita, “Transformation rules for
this paper. designing CNOT-based quantum circuits,” in Proc. Des. Autom. Conf.,
Jun. 2002, pp. 419–425.
[20] D. Maslov, G. W. Dueck, and D. M. Miller, “Simplification of
R EFERENCES Toffoli networks via templates,” in Proc. Symp. Integr. Circuits Syst. Des.,
[1] R. Landauer, “Irreversibility and heat generation in the computing Sep. 2003, pp. 53–58.
process,” IBM J. Res. Develop., vol. 5, no. 3, pp. 183–191, Jul. 1961. [21] ——, “Fredkin/Toffoli templates for reversible logic synthesis,” in Proc.
[2] D. Maslov, “Reversible logic synthesis,” Ph.D. dissertation, Dept. Int. Conf. Comput.-Aided Des., Nov. 2003, pp. 256–261.
Comput. Sci., Univ. New Brunswick, Fredericton, NB, Canada, 2003. [22] ——, “Templates for Toffoli network synthesis,” in Proc. Int. Workshop
[3] C. Bennett, “Logical reversibility of computation,” IBM J. Res. Develop., Logic Synthesis, May 2003, pp. 320–326.
vol. 17, no. 6, pp. 525–532, Nov. 1973. [23] RMRLS. [Online]. Available: http://www.princeton.edu/~cad/projects.
[4] M. Nielsen and I. Chuang, Quantum Computation and Quantum Informa- html
tion. Cambridge, U.K.: Cambridge Univ. Press, 2000. [24] D. Maslov, Private communication, Dec. 2004.
[5] A. Mishchenko and M. Perkowski, “Logic synthesis of reversible [25] N.C.S.U. Benchmark Archives at CBL Collaborative Benchmarking Lab.,
wave cascades,” in Proc. Int. Workshop Logic Synthesis, Jun. 2002, Dept. of Comput. Science, North Carolina State Univ. [Online]. Available:
pp. 197–202. http://www.cbl.ncsu.edu/benchmarks/
[6] P. Kerntopf, “A new heuristic algorithm for reversible logic synthesis,” in [26] A. Agrawal and N. K. Jha, “Synthesis of reversible logic,” in Proc. Des.
Proc. Des. Autom. Conf., Jun. 2004, pp. 834–837. Autom. Test Eur. Conf., Feb. 2004, pp. 21384–21385.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 15,2024 at 21:47:42 UTC from IEEE Xplore. Restrictions apply.
2330 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 11, NOVEMBER 2006
Pallav Gupta (S’99) received the B.S. degree in computer engineering (summa Niraj K. Jha (S’85–M’85–SM’93–F’98) received
cum laude) from the University of Arizona, Tucson, in 2002, and is currently the B.Tech. degree in electronics and electrical com-
working toward the Ph.D. degree in electrical engineering at Princeton Univer- munication engineering from the Indian Institute of
sity, Princeton, NJ. Technology, Kharagpur, India, in 1981, the M.S.
His research interests include computer-aided design for emerging nano- degree in electrical engineering from the State Uni-
technologies such as quantum cellular automata, resonant tunneling diodes, versity of New York, Stony Brook, in 1982, and
carbon nanotubes, reversible logic, and computing paradigms for emerging the Ph.D. degree in electrical engineering from the
technologies. University of Illinois, Urbana, in 1985.
Mr. Gupta was the recipient of the Wallace Memorial Fellowship for He is a Professor of electrical engineering at
the 2005–2006 academic year given to top graduate students at Princeton Princeton University, Princeton, NJ. He served as the
University. Director of the Center for Embedded System-on-a-
Chip Design funded by the New Jersey Commission on Science and Technol-
ogy. He has coauthored three books entitled Testing and Reliable Design of
CMOS Circuits (Norwell, MA: Kluwer, 1990), High-Level Power Analysis and
Optimization (Norwell, MA: Kluwer, 1998), and Testing of Digital Systems
(Cambridge, U.K.; Cambridge University Press, 2003). He has also authored
six book chapters. He has authored or coauthored more than 280 technical
papers. He has received 11 U.S. patents. His research interests include low-
power hardware and software design, CAD of integrated circuits and systems,
digital-system testing, and distributed computing.
Dr. Jha is an ACM Fellow. He is the recipient of the AT&T Foundation
Award and NEC Preceptorship Award for research excellence, NCR Award
for teaching excellence, and Princeton University Graduate Mentoring Award.
He has coauthored six papers which have won the Best Paper Award at
Abhinav Agrawal received the B.S. degree in elec- ICCD’93, FTCS’97, ICVLSID’98, DAC’99, PDCS’02, and ICVLSID’03.
trical engineering (summa cum laude) from Prince- Another paper of his was selected for “The Best of ICCAD: A collection of
ton University, Princeton, NJ, in 2004. the best IEEE International Conference on Computer-Aided Design papers of
His undergraduate research work examined the the past 20 years.” He has served as an Associate Editor of IEEE TRANS -
vulnerabilities of the Intel NetBurst microarchitec- ACTIONS ON C IRCUITS AND S YSTEMS II: A NALOG AND D IGITAL S IGNAL
ture to heat and voltage fluctuations, which he PROCESSING. He is currently serving as an Editor of IEEE TRANSACTIONS ON
presented at the Intel Undergraduate Research Sym- COMPUTER-AIDED DESIGN, IEEE TRANSACTIONS ON VERY LARGE SCALE
posium. Currently, he is a Business Analyst with INTEGRATION (VLSI) SYSTEMS, Journal of Electronic Testing: Theory and
McKinsey and Company, New York, NY, working Applications (JETTA), and Journal of Embedded Computing. He has served as
with clients on strategy, operational improvement, the Guest Editor for the JETTA special issue on high-level test synthesis. He
sales, and marketing. has also served as the Program Chair of the 1992 Workshop on Fault-Tolerant
Mr. Agrawal was elected to Phi Beta Kappa and Tau Beta Pi, Parallel and Distributed Systems and the 2004 International Conference on
upon graduation. Embedded and Ubiquitous Computing.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 15,2024 at 21:47:42 UTC from IEEE Xplore. Restrictions apply.