Andrew Ling FPGA Technology Mapping: A Study of Optimality

FPGA Technology Mapping: A Study of Optimality

(DAC Conference)

ABSTRACT

This paper attempts to quantify the optimality of FPGA technology mapping algorithms. We develop an algorithm, based on Boolean satis_ability (SAT), that is able to map a small subcircuit into the smallest possible number of lookup

tables (LUTs) needed to realize its functionality. We iteratively apply this technique to small portions of circuits that have already been technology mapped by the best available mapping algorithms for FPGAs. In many cases, the optimal

mapping of the subcircuit uses fewer LUTs than is obtained by the technology mapping algorithm. We show that for some circuits the total area improvement can be up to 67%.

1. INTRODUCTION

FPGAs (Field Programmable Gate Arrays) are recon_gurable integrated circuits that are characterized by a sea of programmable logic blocks surrounded by a programmable routing structure. Most modern FPGA devices contain programmable logic blocks that are based on theK-input lookup table (K-LUT) where a K-LUT contains 2K truth table conguration bits so it can implement any K-input function. Figure 1 illustrates the general structure of a 2-LUT. The

number of LUTs needed to implement a given circuit determines the size and cost of the FPGA-based realization.Thus one of the most important phases of the FPGA CAD ow is the technology mapping step that maps an optimized circuit description into a LUT network present in the target FPGA architecture. The goal of the technology mapping step is to reduce area, delay, or a combination thereof in the

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro_t or commercial advantage and that copies bear this notice and the full citation on the _rst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speci_c permission and/or a fee. network of programmable logic blocks that is produced. In this work, we assess state-of-the-art FPGA technology mapping algorithms in terms of area-optimality. Timing-driven technology mapping is not covered in this study. The process of technology mapping is often treated as a covering problem. For example, consider the process of mapping a circuit into LUTs as illustrated in Figure 2a illustrates the initial gate-level network, Figure 2b illustrates a possible covering of the initial network using 4-LUTs, and Figure 2c illustrates the LUT network produced by the covering. In the mapping given, the gate labeled x is covered by both LUTs and is said to be duplicated. Somewhat surprisingly, gate duplication is often necessary to minimize the area of LUT networks [6].

The fundamental question that we ask in this paper is: Given the LUT-level network created by a technology mapping algorithm, how much can its area be reduced? For small subcircuits, it is possible to answer this question in an optimal manner. Consider an arbitrary function f(i0; i1; : : : ; in). Suppose that we seek to determine if it can be implemented in three or fewer 2-LUTs. This problem can be solved by considering the Con_gurable Virtual Network (CVN) shown in Figure 3. The CVN consists of input lines connected to the variables i0 : : : in and three 2-LUTs. A crossbar allows the LUT inputs to select any of the input lines or outputs from other LUTs. Each \switch" on the crossbar is con_gured by a virtual con_guration bit. A 0 indicates that the crosspoint intersection is unconnected, while a 1 indicates a connection at the crosspoint. Clearly, it is possible to enumerate every possible circuit con_guration involving three 2-LUTs by manipulating the virtual con_guration bits for the crossbar as well as the truth table con_guration bits for each of the 2-LUTs. In fact, as detailed in Section 3, we can express the output of the network fnet as a Boolean formula involving the variables i0 : : : in, the virtual con_guration bits V1 : : : Vm and the truth table con_guration bits L1 : : : Lo. Given this formula, we ask the question: is there an assignment of V1 : : : Vm and L1 : : : Lo that will cause fnet(i0; i1; : : : ; in; V1 : : : Vm; L1 : : : Lo) to be identical to f(i0; i1; : : : ; in) for all values of i0 : : : in? This question can be answered exactly with Boolean satis_ability (SAT): Given a Boolean expression in Conjunctive-Normal-Form (CNF), where the expression consists of a product of clauses and each clause consists of a sum of literals, seek an assignment of variables so that each clause has at least one literal set to true. The solution space of this problem grows exponentially with respect to the input size, n, of the virtual network. However, we show in this paper that modern SAT solvers can be used to exactly resynthesize small circuits. Given this optimal resynthesis method for small subcircuits, we simply iteratively apply this technique to small portions of a larger circuit in a sliding window fashion until no additional improvement can be achieved. This approach does not guarantee the mapping optimality of the large circuit, but it does give us some indication of the area \left on the table" by the original technology mapping solution. The remainder of this paper is organized as follows. First we review some of the key literature on area-driven LUT mapping. We then describe our optimal resynthesis approach based on SAT. Next, we describe the application of our optimal resynthesis approach to the 4-LUT networks produced by the best known FPGA technology mapper and provide a set of results. Finally, we provide concluding remarks and directions for future work.

3. RESYNTHESIZING FOR AREA

When resynthesizing for area, one must take an existingLUT circuit and attempt to reduce the number of LUTs 428 in the circuits yet maintain the original functionality. The more LUTs that can be removed, the farther the original circuit is from the optimal mapping. We mentioned previously that reducing the number of LUTs can be achieved by resynthesizing smaller subcircuits and applying this in a sliding window fashion over the larger circuit. The subcircuits that we focus on form a cone. Thus by resynthesizing several cones, this will reduce the LUT count of the overall LUT network.

3.1 Converting Resynthesis Problem into

Boolean Satisability

Determining if an n-feasible cone implemented with X n-LUTs can be resynthesized into an n-feasible cone implemented with X-1 n-LUTs or less can be veri_ed with SAT This is achieved by generating a n-feasible FFC containing less LUTs than the original cone, then expressing the FFC as a Boolean expression in CNF. Next, the CNF expression is assigned the truth table values of the function expressed by the original cone. If this CNF expression is satis_able, resynthesis to the new FFC is possible.

Figure 4: Resynthesis of Three Input Cone of Logic To illustrate this process, consider Figure 4. The original cone 4a consists of three 2-LUTs which implements a three input function. Since only three inputs enter the cone, it may be possible to resynthesize 4a into 4b to save one LUT. To determine if resynthesis from 4a to 4b is possible, 4b must be converted into a CNF expression (a detailed description of converting digital circuits to CNF expressions can be found in [14]). As stated previously, if the expression is satis_able, resynthesis can occur.

Figure 5: AND gate CNF formulae

The CNF equation serves to express all valid vectors of the circuit. For example, consider Figure 5. Equation 1 will be satis_ed if and only if the signals A,B, and Z correspond to an AND gate functionality (e.g. (A = 0;B = X; Z = 0), (A = X;B = 0; Z = 0) or (A = 1;B = 1; Z = 1)). Similarily, equation 2 will be satis_ed only for valid vectors such as (A = 1;B = 1; C = 0; Z = 1; Y = 0).

To apply this for resynthesis checking, we _rst form the CNF equation. Next, we constrain the cone input and output variables in the CNF equation according to the truth table of the cone. Finally, we check for a satis_able assignment using a SAT solver. For example, let us attempt to map a two-input cone to the second circuit in Figure 5. Consider input ?? to be a con_guration bit. To check if the two-input function f(1; 1) = 1 is feasible, we determine if F2(A = 1;B = 1; C; Z; Y = 1) is satis_able. Clearly, this will return true with C = 1. However, this only shows that f(1; 1) = 1 is possible (i.e. one truth table cube). To verify if a resynthesis circuit can implement an entire cone (i.e. an entire truth table), its CNF equation is replicated 2n times to form a new CNF equation, where n represents the fanin size of the cone being mapped. Each replicant of the basic circuit CNF equation represents one cube in the cone's truth table. Again, SAT is performed on the new CNF formula to check if the original cone can _t into the resynthesis circuit.

Figure 6: Detailed Diagram of Resynthesis Logic. Going back to the original LUT example shown in Figure 4, the following steps illustrate how this conversion oc-

curs. Figure 6 is a detailed picture of Figure 4b with internal wires and con_guration bits clearly labeled for CNF construction. In steps to come, f represents the function of the cone under consideration for resynthesis, Step 1: Create a CNF equation for individual elements in resynthesis circuit.

Step 2: Formulate the circuit CNF equation from equations 3 and 4. Note that equation 5 is an expression dependent on the circuit's inputs and output (x1 ?? 3,f), internal wire (M), and con_guration (L1 ?? 8) variables.

Step 3: Replication of equation 5 and constraining of inputs and output variables according to function f.

In equation 6, the con_guration bits are represented by the same variables (L1 - 8) in each Gresynth whereas all other signals are given unique variables in each instance. This ensures that only one con_guration will exist for all cubes of the truth table. Finally, equation 6 is passed into a SAT solver which will return true if the cone _ts in the resynthesis structure. For simplicity, in the previous example we ignored the exibility of FPGA routing which allow LUT inputs to be permuted. This is extremely important since it increases the number of functions a given resynthesis structure can represent. For example, Figure 7 shows how a three input function can be converted to another by simply swapping inputs x1 and x2. Extending Figure 7 to all input permutations Figure 7: Conversion from f(x1; x2; x3) to f(x2; x1; x3) increases the number of functions a resynthesis structure represents by a factor of n!, where n is the fanin size of the resynthesis cone. In order to represent permutable inputs in our CNF expression, virtual multiplexers are added to the inputs shown in Figure 8. These are virtual in the sense that they do not exist in the resynthesis structure, but only serve to allow us to permute the inputs in the CNF expression for SAT.