# Sizing and Placement of Charge Recycling Transistors in MTCMOS Circuits ### Ehsan Pakbaznia Dep. of Electrical Engineering University of Southern California Los Angeles, U.S.A. pakbazni@usc.edu ### Farzan Fallah Fujitsu Labs of America Sunnyvale, U.S.A. farzan@fla.fujitsu.com ### Massoud Pedram Dep. of Electrical Engineering University of Southern California Los Angeles, U.S.A. pedram@usc.edu ABSTRACT - A downside of using Multi-Threshold CMOS (MTCMOS) technique for leakage reduction is the energy consumption during transitions between sleep and active modes. Previously, a charge recycling (CR) MTCMOS architecture was proposed to reduce the large amount of energy consumption that occurs during the mode transitions in powergated circuits. Considering the RC parasitics of the virtual ground and VDD lines, proper sizing and placement of chargerecycling transistors is key to achieving the maximum power saving. In this paper, we show that the sizing and placement problems of charge-recycling transistors in CR-MTCMOS can be formulated as a linear programming problem, and hence, can be efficiently solved using standard mathematical programming packages. The proposed sizing and placement techniques allow us to employ the CR-MTCMOS solution in large row-based standard cell layouts while achieving nearly the full potential of this power-gating architecture, i.e., we achieve 44% saving in switching energy due to the mode transition in CR-MTCMOS compared to standard MTCMOS. ### I. INTRODUCTION Reducing the threshold voltage of transistors in sub-micron CMOS technology compensates the performance degradation resulted from the supply voltage decrement. Threshold voltage reduction, however, increases sub-threshold leakage current increase exponentially [1]. Power gating technique provides low leakage and high performance operation by using low V<sub>t</sub> transistors for logic cells and high V<sub>t</sub> devices as sleep transistors for disconnecting logic cells from power supply and/or ground [2]. This Multi-threshold CMOS (MTCMOS) technology reduces the leakage in the sleep mode. One of the key concerns in MTCMOS is the wake up time latency of the circuit, which is defined as the time required to turn on the circuit after receiving the wake up signal. Reducing the wake up time latency is an important issue since it can affect the overall performance of the VLSI circuit. Another important issue in power gating is minimizing the energy wasted during mode transition, i.e., while switching from active to sleep mode and vice versa. As we will discuss in Section II, both virtual ground and virtual V<sub>DD</sub> nodes experience voltage change during mode transition. Since there is considerable number of cells connected to the virtual ground and virtual supply nodes, the total switching capacitance at these nodes is large, and as a result the switching power consumption during mode transition can be significant. Sleep transistor sizing is an important issue in designing the MTCMOS circuits. References [2][4][5][6] discuss different techniques to size sleep transistor(s) for an arbitrary circuit to meet a performance constraint. None of these techniques propose any method to minimize the power consumption during the sleep-to-active and active-to-sleep mode transitions. Charge recycling technique has been recently proposed in order to reduce the energy consumption during mode transition of MTCMOS circuits [7]. It has been shown that by applying this technique, up to 46% of the switching energy due to mode transition can be saved [7]. In this paper, we apply the charge recycling technique between consecutive rows of a standard cell design. We propose algorithms to do placement and sizing for charge recycling transistors. The remainder of this paper is organized as follows. In Section II, we revisit the concept of using charge recycling technique in MTCMOS circuits. In Section 0, the overall idea of applying charge recycling method to a standard cell design is introduced. In Section IV we propose an algorithm to concurrently do placement and sizing for the charge-recycling transistors. Section V represents the simulation results, and finally Section VI concludes the paper. ### II. CHARGE RECYCLING TECHNIQUE Figure 1 shows the charge recycling configuration. The charge recycling transmission gate is turned on right before going from sleep to active and right after going from active to sleep. In [7] it has been discussed that during the active mode voltage values for nodes G and P are close to 0 and $V_{\rm DD}$ , respectively. In the sleep mode, however, the reverse situation is valid and voltage values for Figure 1. The proposed charge recycling configuration in power gating structures [7]. nodes G and P are close to V<sub>DD</sub> and 0, respectively. Charge recycling technique has been proposed to reduce this mode transition switching energy consumption. At the sleep-to-active transition edge and right before turning on the sleep transistors, we put the circuit to a half-wakeup state by turning the charge recycling circuitry on. After charge recycling is complete, the charge recycling circuitry is turned off and the sleep transistors are turned on to completely wake up the circuit. Similar strategy is used at the active-sleep transition edge. After sleep transistors are completely turned off, the charge-recycling circuitry is turned on to help charging the virtual ground and discharging the virtual supply nodes. Next we show that the assumption that node G is charged to $V_{DD}$ in the sleep mode is valid. Consider sub-circuit $C_1$ in Figure 1. We show that the only case where this assumption is invalid is when outputs of all logic cells in $C_1$ are set to logic 1 (i.e., the pulldown sections of these cells are OFF) immediately before the active-to-sleep transition occurs. However, this case hardly happens in practice, because if there is at least one cell in C<sub>1</sub> with output value set to logic 0 (i.e., its pull-down section is ON) before the active-to-sleep transition and if the sleep period is sufficiently long, then the steady-state value for the virtual ground voltage after entering the sleep mode will be nearly V<sub>DD</sub>. Clearly, considering that a sub-circuit will typically contain tens of logic cells, the probability of at least one of them having a logic 0 at its output (before entering the sleep mode) is nearly 1, i.e., indeed the virtual ground of sub-circuit C<sub>1</sub> will rise and reach nearly V<sub>DD</sub> after sufficient time is spent in the sleep mode. To empirically confirm the aforesaid, we produce in Figure 2 the voltage waveforms of the virtual ground node for four different cases. In each case we have used an NMOS sleep transistor (the case with PMOS sleep transistor will be similar except that the corresponding output states are reversed). The first case is that of having a single inverter cell in sub-circuit C<sub>1</sub>. We force the output of this inverter to logic 1 before entering the sleep mode. As the figure shows, after entering the sleep mode, the virtual ground voltage of the inverter cell rises to about 200mV, which is much less than $V_{DD}$ of 1.2V (see the green waveform). The next case corresponds to the same sub-circuit C<sub>1</sub>, this time with the output of the inverter forced to logic 0. Here, the virtual ground voltage rises to 0.95V, which is close to V<sub>DD</sub> and a suitable level for the chargerecycling purpose (see the blue waveform). The next two cases correspond to C<sub>1</sub> comprising of 4 inverter cells each driven an input to $C_1$ . In one case, three of the inverter outputs are 1 and only one inverter output is 0. In this case, the virtual ground voltage rises to even a higher level than case 2, resulting in a final steady sate voltage level of 1V (see the red waveform), which is again suitable for the charge-recycling purpose. The last case, two inverter outputs are set to logic 1 while the others are set to logic 0. Clearly in this case, after entering the sleep mode, the virtual ground node is expected to rise and achieve a level even closer to $V_{DD}$ than before. This is confirmed by the black waveform in the figure, which shows that a level of nearly 1.2V is achieved by the virtual ground of sub-circuit C<sub>1</sub>. In summary, as long as there is a rather large number of logic cells in a sub-circuit that uses an NMOS sleep transistor, the probability that one of these cells will have a logic 0 output value before entering the sleep mode is quite high (in fact it is nearly one), so, the virtual ground voltage of such a sub-circuit will gradually rise and stabilize to a level near V<sub>DD</sub>. This stabilization occurs after only a relatively short period of sleep time (in the order of usec), which then provides us with the opportunity for charge recycling between this sub-circuit and another one that uses a PMOS sleep transistor. Figure 2. Virtual ground voltage during the sleep mode, VDD=1.2 V. # III. CHARGE RECYCLING FOR STANDARD CELL DESIGNS Figure 3 shows a sample cell row. There is a cavity for charge-recycling transistors for each cell row and all the corresponding charge-recycling transistors for that row are placed in this cavity. The configuration is similar to what was adopted in [5][6]. Virtual GND rail is not shown in this figure. Note for each cell row we only use a certain type of sleep transistor (NMOS or PMOS, but not both). Furthermore, cell rows alternate between NMOS and PMOS sleep transistor types, i.e., row 1 cells are connected to virtual GND through an NMOS sleep transistor, whereas row 2 cells are connected to virtual VDD through a PMOS sleep transistor, and so on. Figure 3. A cell row in standard cell layout. This row uses NMOS sleep transistors placed in the sleep transistor cavity. Figure 4 depicts the virtual GND line model of a single row of Figure 3. Here, $G_i$ denotes connection node of the $i^{th}$ cell in the virtual GND line. $r_{w-Gi}$ denotes the wiring resistance between $G_i$ and $G_{i+1}$ , while $c_{int-Gi}$ represents the interconnect capacitance at $G_i$ . In the presence of RC parasitics of the virtual GND and virtual $V_{DD}$ lines, charge recycling time, which is defined as the minimum time necessary for the charge recycling transistors to remain ON in order to have at least $(1\text{-}\delta)\!\!\times\!\!100$ percent of the full charge recycling completed, is determined by the sizes of the logic cells connected to the virtual GND and virtual $V_{DD}$ , the sizes of the charge-recycling transistors, and the connection points of the charge recycling transistors to the virtual GND and virtual $V_{DD}$ lines. In the remainder of the discussion we assume the charge recycling technique between each pair of the nodes in virtual GND and virtual $V_{DD}$ is performed using an NMOS pass transistor instead of a transmission gate. In practice, this is sufficient, although one can use a transmission gate as well. ## IV. SIZING AND PLACEMENT OF THE CHARGE-RECYCLING TRANSISTORS We consider charge recycling between two rows with M cells per each row (M will be set as the smaller of the two cell counts if rows have different number of cells). Figure 5 shows how charge-recycling is applied between two consecutive rows by placing Figure 4. Virtual GND modeled using an RC network charge-recycling transistors between the two rows. In this figure each charge-recycling transistor, CRT<sub>i</sub>, is connecting the virtual GND node of a cell in the upper row to the virtual V<sub>DD</sub> node of a cell in the lower row. For example CRT<sub>1</sub> connects the virtual GND node of cell 1 to the virtual V<sub>DD</sub> node of cell 5. To simplify the optimization problem and to reduce the routing complexity, the only allowed connections are of the form G<sub>i</sub>-P<sub>i</sub>, (a connection of the form G<sub>i</sub>-P<sub>i</sub>, where i≠j is not allowed.) The connections between charge-recycling transistors and virtual V<sub>DD</sub> line are not shown in Figure 5 for the sake of space. Figure 5. Charge-recycling between two consecutive rows. During charge-recycling, i.e., when the charge recycling transistors are ON, each charge-recycling transistor, CRTi, can be replaced by its resistive model, Ri, which connects node Gi in the virtual GND line to its corresponding node, P<sub>i</sub>, in the virtual V<sub>DD</sub> line as shown in Figure 6. In this figure we have replaced virtual GND and virtual V<sub>DD</sub> lines by their equivalent RC interconnect models in the same way that we did for rows in Figure 4. Note $r_{w-Pi}$ and $c_{int-Pi}$ in the virtual $V_{DD}$ line are defined in the same manner as $r_{w-Gi}$ and $c_{int-Gi}$ in the virtual GND line. $C_{Gi}$ and $C_{Pi}$ in Figure 6 are defined as follows: $$C_{G_i} = c_{\text{int}-G_i} + C_{d-G_i} C_{P_i} = c_{\text{int}-P_i} + C_{d-P_i}$$ (1) where $C_{d-Gi}$ and $C_{d-Pi}$ are the total diffusion capacitances of nodes G<sub>i</sub> and P<sub>i</sub>, respectively. Note for nodes that are directly connected to sleep transistor, the diffusion term also includes diffusion capacitance of the sleep transistor. As stated before, in the sleep mode, all $C_{Gi}$ capacitances are charged to $V_{DD}$ , whereas all $C_{Pi}$ capacitances are discharged to zero. In the active mode, all $C_{Pi}$ capacitances will be charged to $V_{DD}$ , while all $C_{Gi}$ capacitances will be completely discharged. Before going from the sleep to active mode, we allow a portion of the charge of the virtual GND capacitances to migrate to the virtual V<sub>DD</sub> capacitances to reduce the overall energy consumption during the mode transition. We must thus decide on the number, the connection points to the virtual rails, and the size of CRT's. To answer these questions, we formulate an optimization problem in which we maximize the total energy saving ratio for charge recycling between two rows subject to $\gamma$ percent violation in the wake up delay of the original circuit (i.e., the wake up delay when no charge recycling is applied). The wake up time in each case is defined as the time needed for the slowest node in the final value, zero, during the virtual GND to reach within $100 \times \delta$ percent Figure 6. Equivalent circuit model during the charge recycling. of its sleep-active transition. With this definition for wake up time, we can write the set of constraints as follows: $$t_w^{(CR)} \le (1+\gamma) \times t_w \qquad \forall \ 1 \le i \le M \tag{2}$$ $t_{w_i}^{(CR)} \leq (1+\gamma) \times t_w \qquad \forall \ 1 \leq i \leq M \tag{2}$ where $t_w$ is the wake up time of this row in the original circuit and $t_{wi}^{(CR)}$ , which is defined for the circuit with charge recycling technique, is the wake up time of the $i^{th}$ cell in the same row, i.e., the cell connected to the node $G_i$ in the virtual ground line. $t_{wi}$ may be written as: $$t_{w_i}^{(CR)} = d_i^{CR} + t_{rem_i} \qquad \forall \ 1 \le i \le M$$ (3) $t_{w_i}^{(CR)} = d_i^{CR} + t_{rem_i} \quad \forall \ 1 \le i \le M$ (3) where $d_i^{CR}$ is the charge recycling delay for node $G_i$ defined as the time which takes the voltage of the node $G_i$ drops from $V_{DD}$ within $\delta$ percent of its final value, $\alpha V_{DD}$ , and $t_{remi}$ is the remaining time needed for $G_i$ to drop from $\alpha V_{DD}$ to 0 by turning on sleep transistor(s) after the completion of the charge recycling. From the discussion presented in [7], $\alpha$ depends on the ratio of the total capacitances in the virtual GND and virtual V<sub>DD</sub> rails. For the case of equal total capacitance on the virtual rails, we have $\alpha = 0.5$ . Using (3), the constraint set in (2) may be rewritten as: $$d_i^{CR} \le (1+\gamma) \times t_w - t_{rem_i} \qquad \forall \ 1 \le i \le M$$ (4) $d_i^{ch} \le (1+\gamma) \times t_w - t_{rem_i} \qquad \forall \ 1 \le i \le M$ By definition, $t_w$ is independent of the location and size of the charge-recycling transistors, and if we ignore the diffusion capacitances of charge-recycling transistors, $t_{remi}$ is also independent of the location and size of the charge-recycling transistors. For an already placed design with known sleep transistor sizing and placement information, $t_w$ and $t_{remi}$ 's for each row can be calculated using Elmore delay model [9]. We use this set of constraints to solve the problem of maximizing the total energy saving ratio for adjacent standard cell rows, ESR<sub>rows</sub>: $$ESR_{row} = \frac{\left(E_{conv.} - E_{cr}\right) - E_{cr-overhead}}{E_{conv.}} = ESR - \frac{E_{crt-overhead}}{E_{conv.}}$$ (5) where $E_{cr-overhead}$ is the total dynamic and leakage energy consumption in charge recycling transistors for one complete sleepactive cycle. From [7] we know that the first term in (5), ESR, depends only on the total capacitance ratio in the virtual ground and virtual V<sub>DD</sub> lines and does not depend on the charge recycling circuitry. Therefore, the problem of maximizing ESR<sub>row</sub> is equivalent to the problem of minimizing $E_{cr-overhead}$ or equivalently minimizing power overhead due to the charge-recycling transistors. The total power overhead in each row can be written as the summation of the dynamic and leakage power consumptions due to each of the charge recycling transistors: $$P_{cr-overhead} = \sum_{i=1}^{M} C_{g_i} f V_{DD}^2 + \sum_{i=1}^{M} I_{leak_i} V_{DD}$$ (6) TABLE 1. TECHNOLOGY PARAMETERS USED FOR SIMULATIONS | Technology<br>Parameter | (V) | $V_{tLn}$ (V) | $V_{tLp}$ (V) | $V_{tHn}$ (V) | V <sub>tHP</sub> (V) | $c_{int}$ (fF/ $\mu$ m) | $r_{int} \ (\Omega/\mu { m m})$ | |-------------------------|-----|---------------|---------------|---------------|----------------------|-------------------------|---------------------------------| | Value | 1.2 | 0.39 | -0.34 | 0.54 | -0.49 | 0.166 | 0.6 | where the first and second summation terms are the total dynamic and leakage power consumptions due to the CR transistors in the row under the consideration. f is the mode transition frequency, $C_{gi}$ is the input gate capacitance for the $i^{th}$ charge-recycling transistor in the row and $I_{leaki}$ is the sub-threshold leakage current of the $i^{th}$ charge-recycling transistor. For the purpose of this paper, the gate capacitance of the $i^{th}$ charge-recycling transistor, $C_{gi}$ , can be estimated as: $$C_{gi} = C_{ox} W_i L \tag{7}$$ Where $W_i$ is the width of the $i^{th}$ charge-recycling transistor. The sub-threshold leakage current of the $i^{th}$ charge-recycling transistor, $I_{leaki}$ can also be written as [10]: $$I_{leak_i} = \mu_0 \frac{\varepsilon_{ox}}{T_{ox}} \frac{W_i}{L} v_T^2 e^{1.8} \exp\left(\frac{V_{gs} - V_{th}}{S v_T}\right) \left(1 - \exp\left(-\frac{V_{ds}}{v_T}\right)\right)$$ (8) where $V_{gs}$ and $V_{ds}$ are the gate-source and drain-source voltage of the charge-recycling transistor. The leakage current is important in the sleep mode when the charge-recycling transistor is OFF, and $V_{gs}$ =0. Here, $V_{ds}$ for each charge-recycling transistor is the absolute voltage difference between virtual GND and virtual $V_{DD}$ at the connection nodes of that charge-recycling transistor. From (6), we can ignore the dependency of the subthreshold leakage current of the transistor on $V_{ds}$ for $V_{ds} \geq 75mv$ . For a typical MTCMOS circuit this usually happens soon after the mode transition. Hence, for the purpose of our analysis, we can ignore the dependency of the leakage current of a charge-recycling transistor on its drain-source voltage. We, thus, conclude that the total leakage current of a charge-recycling transistor is proportional to its width. From (7) and (8) the total power overhead in (6) can be written as a linear function of the widths of charge-recycling transistors: $$P_{cr-overhead} = A \sum_{i=1}^{M} W_i$$ (9) where A is defined as: $$A = L C_{ox} f V_{DD}^2 + \frac{\mu_0 \varepsilon_{ox}}{L T_{ox}} V_{DD} v_T^2 e^{1.8} \exp\left(\frac{-V_{th}}{S v_T}\right)$$ (10) Therefore, minimizing the power overhead is equivalent to minimizing the total charge-recycling transistor width. Next we formulate the timing constraints in (2). There are M separate timing constraints in (2), one for each $G_i$ node in the virtual ground. All nodes in the virtual GND are charged to $V_{\rm DD}$ in the sleep mode. They remain charged to $V_{\rm DD}$ till the end of the sleep mode and right before the beginning of the charge-recycling operation. Satisfying the constraints in (2) indicates that the maximum increase in the discharging time for all the nodes in the virtual ground is less than $\gamma$ percent of the wake up time for the original circuit. Consider discharging of node $G_i$ in Figure 6. In this figure each charge-recycling transistor is replaced by its equivalent resistive model in linear region. The value of the equivalent resistance can be calculated as follows: $$R_i = \frac{\eta}{W} \tag{11}$$ where $\eta$ is defined as: $$\eta = \frac{L}{\mu C_{ox}(V_{DD} - V_{th})} \tag{12}$$ where L is the length of the charge-recycling transistor. There are M different resistors contributing in charge-recycling operation in Figure 6. These resistors provide discharging paths between virtual GND and virtual $V_{DD}$ . In order to simplify the discharging scenario for each node $G_i$ in the virtual GND, we replace all $R_I$ - $R_M$ resistors with a single equivalent resistor, $R_{eqi}$ , between $G_i$ and $P_i$ . Since there are M nodes in the row, there will be M equivalent resistors, $R_{eqi}$ - $R_{eqM}$ , one for each node representing a discharging scenario. $R_{eqi}$ , is defined as follows: $$R_{eqi} = \frac{\gamma}{W_{eq_i}} = \frac{\gamma}{\sum_{j=1}^{M} \left(1 - \alpha_i \left| x_i - x_j \right| \right) W_j} \qquad 1 \le i, j \le M$$ (13) where $W_{eqi}$ is the equivalent NMOS transistor width with $R_{eqi}$ linear-region resistance, $x_i$ and $x_j$ are the x coordinates of nodes $G_i$ and $G_j$ in the virtual GND line, $\alpha_j$ is a coefficient defined by the designer which depends on the total capacitance at nodes $G_i$ and $P_i$ , and also on the interconnect resistance per unit length for the virtual GND line. $W_{eqj}$ in (13) is defined as a weighted average of the widths of all charge-recycling transistors where weights for the different charge-recycling transistors are defined based on the distances that they have with the cell under the consideration. Note $R_{eqi}$ and $W_{eqi}$ are related through (11). Form (13) $W_{eqi}$ can be written as: $$W_{eq_i} = \sum_{j=1}^{M} b_{ij} W_j \qquad \forall i \quad 1 \le i \le M$$ (14) where $b_{ij}$ coefficients are defined as follows: $$b_{ij} = 1 - \alpha_j \left| x_i - x_j \right| \qquad 1 \le i, j \le M \tag{15}$$ (14) gives the value of each $W_{eqi}$ as a linear function of all $W_i$ 's. The circuit can be further simplified by replacing the RC interconnect networks in the virtual GND and virtual $V_{DD}$ by their equivalent RC-lumped models seen at nodes $G_i$ and $P_i$ , respectively. This simplified model is shown in Figure 7. The RC-lumped model elements for virtual GND, $R_i^{(G)}$ and $C_i^{(G)}$ , can be calculated as [11]: $$C_{i}^{(G)} = Y_{G,1i}$$ $$R_{i}^{(G)} = -\frac{Y_{G,2i}}{Y_{G,1i}^{2}}$$ (16) where $Y_{G,Ii}$ and $Y_{G,2i}$ are the first and second moments of the total admittance at node $G_i$ in the virtual GND RC tree and can be calculated from the Taylor series expansion of the total admittance Figure 7. The simplified circuit model during the charge-recycling at node $G_i$ , $Y_{Gi}(s)$ , that is: $$Y_{G_{i}}(s) = Y_{G,1i}s + Y_{G,2i}s^{2} + \dots + Y_{G,ki}s^{k} + \dots$$ (17) The elements of the RC-lumped model of the virtual $V_{DD}$ can be calculated in similar fashion. The first and second moments of the total admittance at any node in the virtual GND or virtual $V_{DD}$ in an RC tree can be calculated recursively [12]. The details of this approach are omitted for brevity. The charge recycling delay in the circuit given in Figure 7 is defined as the time that takes for the voltage of node $G_i$ takes to drop from $V_{DD}$ to within $\delta$ percent of its final value. We can show that the charge recycling delay for node $G_i$ can be calculated as: $$d_i^{CR} = \frac{1}{\ln(\delta)} \times \frac{\left(R_i^{(G)} + R_{eq_i} + R_i^{(P)}\right) C_i^{(G)} C_i^{(G)}}{\left(C_i^{(G)} + C_i^{(P)}\right)}$$ (18) We can show that using (13), (14) and (18), the set of constraints in (4) can be written as: $$\sum_{i=1}^{M} b_{ij} W_{j} \ge W_{\min - i} \qquad \forall i \quad 1 \le i \le M$$ (19) where $W_{i-min}$ is a lower bound on $W_{eqi}$ and can be calculated as: $$W_{\min-i} = \eta \left[ \left[ (1 + \gamma) t_w - t_{rem_i} \right] \ln(\delta) \times \frac{\left( C_i^{(G)} + C_i^{(P)} \right)}{C_i^{(G)} C_i^{(P)}} - R_i^{(G)} - R_i^{(P)} \right]^{-1}$$ (20) Now having defined the set of linear constraints in (19) and with the objective of minimizing the total power overhead in (9), the optimization problem can be formulated and solved by standard mathematical programming packages as follows: Minimize $$\left(\sum_{i=1}^{M} W_{i}\right)$$ s.t.: $$\sum_{j=1}^{M} b_{ij} W_{j} \ge W_{i-\min} \qquad \forall i \quad 1 \le i \le M$$ $$W_{i} \ge 0 \qquad \forall i \quad 1 \le i \le M$$ (21) The optimization problem defined in (21) is a linear programming (LP) problem, and thus it is a polynomial time solvable problem. ### V. SIMULATION RESULTS ISCAS-85 benchmark circuits have been used in this paper. We use SIS to generate optimized gate level netlists. All the benchmark circuits are first optimized using "script.rugged" in SIS. We use a 90nm technology library to perform timing-driven technology mapping. Only one sleep transistor is used per cell row. Placement of the sleep transistors is fixed, and the left most corner of each cell row is reserved for sleep transistor placement. Then the sleep transistor for each row is sized for a maximum 10% delay penalty. After sleep transistor sizing and placement, we extract the resulted gate level netlist as well as the virtual ground and virtual $V_{\rm DD}$ interconnect values into a file which is the output of SIS. We use this information to calculate $b_{ii}$ values in (15) and $W_{min\ i}$ values in (20). Table 1 shows the technology parameters that we have used for our simulations in this paper. After calculating $b_{ij}$ and $W_{min-i}$ values, we pass them to an LP solver to solve the optimization problem in (21). MATLAB is used to solve the LP problem in (21) in this paper. Finally, knowing the total virtual rail capacitance value for each row and the total required charge recycling transistor width for every pair of rows, we can calculate the total energy overhead in (5). Here we only consider dynamic energy overhead. Table 2 shows the results for the ISCAS-85 benchmark circuits. Since there is no known method for sizing and placement of charge-recycling transistors, we compare the proposed technique with two other different schemes. In the first scheme which we call it single charge recycling MTCMOS (single CRMTCMOS), we only use one charge recycling transistor between two cell rows connecting virtual GND and virtual $V_{\rm DD}$ lines at x=0, i.e., the left most corners of both lines. The second scheme which is called uniform CRMTCMOS uses three charge-recycling transistors per cell row. The three charge-recycling transistors are uniformly distributed in virtual GND or virtual $V_{\rm DD}$ lines. We find the minimum size for the charge-recycling transistors in single and uniform CRMTCMOS schemes such that the wakeup time violation is at most $\gamma$ percent compared to the wakeup time of the original MTCMOS circuit. Then we compare the energy saving ratio for these cases. According to Table 2, the ESR value of the proposed approach is, on average, 18.5% and 8.5% greater than that for single CRMTCMOS and uniform CRMTCMOS schemes, respectively. Next we discuss about the effect of sleep and active durations on the total energy saving ratio that is achieved using CRMTCMOS. For charge-recycling to provide the maximum ESR, the sleep period of the circuit must be long enough such that virtual GND and virtual $V_{\rm DD}$ lines finish their full voltage transitions before the edge of the charge-recycling operation in the sleep period. On the other hand if the sleep period is too long, the overhead of the charge-recycling approach will increase because of the additional leakage path due to the charge-recycling transistors [7]. This leads us to look for a range of appropriate values for active and sleep durations. Fortunately our simulations show that charge-recycling approach works fine for an acceptable range of active/sleep durations. In order to find appropriate ranges for active/sleep durations, we fixed the active mode duration and found the amount of saving achieved for different sleep mode duration values. Figure 8 shows the result of HSPICE simulations for a chain of inverters in 90nm technology. Each curve represents a fixed active duration. Figure 8 indicates that for a given active duration, there is an optimum sleep duration value which results in the maximum ESR. Figure 8 also shows that the total ESR decreases with increasing the sleep duration. That is because the total saving is fixed while the total leakage overhead is increasing, but since the charge recycling transistors are high-Vt, the leakage overhead is very low which results in having high ESR values, 20%, even for large sleep durations. ### VI. CONCLUSIONS There is no known work addressing charge-recycling MTCMOS (CRMTCMOS) placement and sizing problems. In this paper, for the first time, we addressed and solved placement and sizing problems for CRMTCMOS in the presence of RC interconnects. We showed that the placement and sizing problems for CRMTCMOS in the presence of RC interconnects can be formulated as an LP problem, and hence, can be efficiently solved Figure 8. Energy saving ratio (%) versus sleep period for 3 different fixed active periods for a chain of inverters in 90nm technology working in 4 GHz clock frequency. using standard mathematical programming packages. The technique can save up to 44% of the switching energy due to mode transition. #### REFERENCES - [1] J. Kao, S. Narendra, and A. Chandrakasan, "Subthreshold leakage modeling and reduction techniques," in *Proc. Int'l Conf. on Computer-Aided Design*, pp. 141–148, Nov. 2002. - [2] S. Mutoh et al.,"1-V power supply high-speed digital circuit technology with multi threshold-voltage CMOS," *IEEEJSSC*, vol. 30. pp. 847-854, Aug., 1995. - [3] J. Kao, A. Chandrakasan, and D. Antoniadis, "Transistor Sizing Issues and Tool for Multi Threshold CMOS Technology," in *Proc. Design Automation Conference*, pp. 409-414, 1997. - [4] J. Kao, S. Narenda and A. Chandrakasan, "MTCMOS hierarchical sizing based on mutual exclusive discharge patterns," in *Proc. Design Automation Conference*, pp. 495 -500, 1998. - [5] Mohab Anis, S. Areibi, and M. Elmasry, "Design and Optimization of Multithreshold CMOS (MTCMOS) Circuits," *IEEE Transactions on CAD of Integrated Circuits and Systems*, October 2003. - [6] V. Khandelwal and A. Srivastava, "Leakage Control through Fine-Grained Placement and Sizing of Sleep Transistors". Proc. Int'l Conference on Computer Aided Design, pp. 533 -536, 2004. - [7] E. Pakbaznia, F. Fallah and M. Pedram "Charge recycling in MTCMOS circuits: concept and analysis," in *Proc. Design Automation Conference*, pp. 97-102, 2006. - [8] A. Abdollahi, F Fallah, and M. Pedram "An effective power mode transition technique in MTCMOS circuits," in Proc. Design Automation Conference, pp. 37-42, 2005. - [9] W. C. Elmore, "The Transient Response of Damped Linear Network with Particular Regard to Wideband Amplifier", J. Appl. Phys., vol. 19, no. 1, pp. 55-63, 1948. - [10] S. Mukhopadhyay and K. Roy, "Modeling and Estimation of Total Leakage Current in Nano-scaled CMOS Devices Considering the Effect of Parameter Variation", *Proc. Int'l Symp. on Low Power Electronics and Design*, pp. 172-175, 2003. - [11] P.R. O'Brien and T. L. Savarino, "Modeling the Driving-Point Characteristics of Resistive Interconnect for Accurate Delay Estimation," *Proc. of IEEE int'l Conf. on Computer Aided Design*, pp.512-515, 1989. - [12] A.B. Kahng, S. Muddu, "Improved effective capacitance computations for use in logic and layout optimization," *Proc. of VLSI Design*, pp.578 582, 1999. Table 2. Comparing energy consumption of the proposed scheme with single CRMTCMOS and uniform CRMTCMOS schemes ( $\gamma$ =10%). | ( 'ircilif | ш "с | ш - с | Total<br>sleep tx<br>width | Total charg | ESR | Comparison (%) | | | | | |------------|---------------|--------------|----------------------------|-------------|--------------------|---------------------|--------------------------|-------------------|---------------------|----------------------------| | | # of<br>cells | # of<br>rows | | MTCMOS | Single<br>CRMTCMOS | Uniform<br>CRMTCMOS | Proposed<br>CRMTCMO<br>S | (proposed)<br>(%) | Proposed vs. single | Proposed<br>vs.<br>uniform | | 9Svm | 276 | 4 | 7152 | 12 | 10 | 8 | 7 | 42 | 25 | 8 | | C432 | 204 | 2 | 4600 | 8 | 5.5 | 4.9 | 4.5 | 44 | 13 | 5 | | C880 | 432 | 6 | 9936 | 17 | 14.1 | 12 | 10 | 41 | 24 | 12 | | C1355 | 526 | 6 | 11320 | 21 | 15.6 | 14.3 | 12 | 43 | 17 | 11 | | C3540 | 1295 | 10 | 30656 | 75 | 53 | 49 | 42 | 44 | 15 | 9 | | C5315 | 1727 | 10 | 38992 | 123 | 88 | 77 | 67 | 46 | 17 | 8 | | average | - | - | - | 42.6 | 31 | 27.5 | 23.7 | 44.4 | 18.5 | 8.8 |