Previous Projects
Previous Presentations

Current Projects

 

USC SPORT: System Power Optimization and Regulation Technologies

Project URL: SPORT Lab

We investigate power estimation and low power design of CMOS VLSI circuits and systems all different abstraction levels. Our emphasis is on developing mathematically rigorous analysis and optimization algorithms and power-aware design methodologies for solving various problems of practical interest and import. Our most recent work has focused on dynamic power/thermal management, voltage and frequency scaling, low power displays, ASIC design with power gating and multiple voltage islands, and current source based modeling of power and timing in VLSI circuits. More details about various ongoing projects are included below.

 

Stochastic Approaches for Dynamic Thermal Management in High Performance Microprocessor Chips

Sponsor: National Science Foundation - Computer Systems Research

Project Summary: Peak power dissipation and the resulting temperature rise have become the dominant limiting factors to processor performance and a significant component of its design cost. Expensive packaging and heat removal solutions are needed to achieve acceptable substrate and interconnect temperatures in high-performance microprocessors. Current thermal solutions are designed to limit the peak processor power dissipation to ensure its reliable operation under worst-case scenarios. However, the peak power and ensuing peak temperature are hardly ever observed. Dynamic thermal management (DTM) has been proposed as a class of micro-architectural solutions and software strategies to achieve the highest processor performance under a peak temperature limit. When the chip approaches its thermal limit, a DTM controller initiates hardware reconfiguration, slow-down, or shutdown to lower the chip temperature. Possible response mechanisms include micro-architectural adaptations e.g., fetch toggling, register file resizing, and issue width reduction, and/or on-the-fly performance adjustment e.g., dynamic voltage and frequency scaling and functional unit shut-down. The proposed research aims to develop a new DTM solution that takes a global, predictive approach based on constructing and utilizing a continuous-time Markovian decision process model of the microprocessor chip and the application programs. The offline algorithms developed by this framework are provably optimal whereas the online versions of these algorithms are easily deployable and highly flexible. The project thus produces temperature-aware policies and techniques for ensuring that the microprocessor chips operate within the allowed temperature zone while having maximum possible performance yet not being over-designed.

A Stochastic Local Hot Spot Alerting Technique -- In an ASPDAC-08 conference paper, we addressed the questions of how and when to identify and issue a hot spot alert in a microprocessor. These are important questions since temperature reports by thermal sensors may be erroneous, noisy, or arrive too late to enable effective application of thermal management mechanisms to avoid chip failure. More precisely, we presented a stochastic technique for identifying and reporting local hot spots under probabilistic conditions induced by uncertainty in the chip junction temperature and the system power state. In particular, we introduced a stochastic framework for estimating the chip temperature and the power state of the system based on a combination of Kalman Filtering (KF) and Markovian Decision Process (MDP) model. Experimental results demonstrated the effectiveness of the framework and show that the proposed technique alerts about thermal threats accurately and in a timely fashion in spite of noisy or sometimes erroneous readings by the temperature sensor.

Continuous Frequency Adjustment Technique Based on Dynamic Workload Prediction -- In a VLSI Design-08 conference paper, we presented a technique for continuous frequency adjustment (CFA) which enables one to adjust the frequency values of various functional blocks in the system at very low granularity so as to minimize energy while meeting a performance constraint. A key feature of the proposed technique is that the workload characteristics for functional blocks are effectively captured at runtime to generate a frequency value that is continuously adjusted, thereby eliminating the delay and energy penalties incurred by transitions between power-saving modes. The workload prediction is accomplished by solving an initial value problem (IVP). Applying CFA to a real-time system in 65nm CMOS technology, we demonstrate the effectiveness of the proposed technique by reporting 13.6% energy saving under a performance constraint.

A Unified Framework for System-level Design: Modeling and Performance Optimization of Scalable Networking System -- In an ISQED-07 conference paper, we presented a new unified modeling framework, called the extended queuing Petri net (EQPN), which combines extended stochastic Petri net and G/M/1 queuing models, to realize the design of reliable systems during the design time, while improving the accuracy and robustness of power and temperature optimization for high-speed scalable networking systems. The EQPN model is employed to represent the performance behaviors and to minimize power consumption of the system under performance constraints through mathematical programming formulations. Being able to model the system with the EQPN would enable the users to accomplish the design of reliable and optimized system at the beginning of design cycle. The proposed system model was compared with existing stochastic models with real simulation data.

Minimizing Power Dissipation during Write Operation to Register Files -- In an ISLPED-07 conference paper, we introduced a power reduction mechanism for the write operation in register files (RegFiles), which adds a conditional charge-sharing structure to the pair of complementary bit-lines in each column of the RegFile. Because the read and write ports for the RegFile are separately implemented, it is possible to avoid pre-charging the bit-line pair for consecutive writes. More precisely, when writing same values to some cells in the same column of the RegFile, it is possible to eliminate energy consumption due to precharging of the bit-line pair. At the same time, when writing opposite values to some cells in the same column of the RegFile, it is possible to reduce energy consumed in charging the bit-line pair thanks to charge-sharing. Motivated by these observations, we modified the bit-line structure of the write ports in the RegFile removing the per-cycle bit-line pre-charging and employing conditional data dependent charge-sharing. Experimental results on a set of SPEC2000INT / MediaBench benchmarks showed an average of 61.5% power savings with 5.1% area overhead and 16.2% increase in write access delay. Lower power dissipation also resulted in lower substrate temperature in the RegFile.

Active Bank Switching for Temperature Control of the Register File in a Microprocessor -- In a GLS-VLSI-07 paper, we described an effective thermal management scheme, called active bank switching, for temperature control in the register file of a microprocessor. The idea is to divide the physical register file into two equal-sized banks, and to alternate between the two banks when allocating new registers to the instruction operands. Experimental results show that this periodic active bank switching scheme achieves 3.4℃ of steady-state temperature reduction, with a mere 0.75% average performance penalty.

Dynamic Thermal Management for MPEG-2 Decoding In an ISLPED-06 paper, we presented an effective dynamic thermal management (DTM) scheme for MPEG-2 decoding by allowing some degree of spatiotemporal quality degradation. Given a target MPEG-2 decoding time, we dynamically select either an intra-frame spatial degradation or an inter-frame temporal degradation strategy in order to make sure that the microprocessor chip will continue to stay in a thermally safe state of operation, albeit with certain amount of image/video quality loss. For our experiments, we used the MPEG-2 decoder program of MediaBench and modify/combine Wattch and HotSpot for the power and thermal simulations and measurements, respectively. Our experimental results demonstrated that we can achieve thermally safe state with spatial quality degradation of 0.12 RMSE and with frame drop rate of 12.5% on average.

Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach -- In an ICCD-06 paper, we introduced a stochastic DTM technique in high-performance VLSI system with especial attention to the uncertainty in temperature observation. More specifically, we presented a stochastic thermal management framework to improve the accuracy of decision making in DTM, which performs dynamic voltage and frequency scaling to minimize total power dissipation and on-chip temperature. Multi-objective optimization with the aid of a mathematical programming solver was used to reduce operating temperature. Experimental results with a 32-bit embedded RISC processor demonstrated the effectiveness of the technique and show that the proposed algorithm ensures thermal safety under performance constraints.

 

System-Wide Dynamic Voltage Scaling and Power Management in Battery-Powered Embedded Systems

Sponsor: National Science Foundation - Computer Systems Research

Project Summary: One of the key problems confronting computer system designers is the management and conservation of energy sources. This challenge is evident in a number of ways. The goal may be to extend the battery lifetime in a computer system comprising of a processor and a number of memory modules, I/O cores, and bridges. This is especially important in light of the fact that power consumption in a typical portable electronic system is increasing rapidly whereas the gravimetric energy density of its battery source is improving at a much slower pace. Other goals may be to limit the cooling requirements of a computer system or to reduce the financial burden of operating a large computing facility. The objective of this research is to develop system-wide power optimization algorithms and techniques that eliminate waste or overhead and allow energy-efficient use of the various memory and I/O devices while meeting an overall performance requirement. More precisely, this project tackles two related problems: dynamic voltage and frequency scaling targeting the minimization of the total system energy dissipation and global power management in a system comprising of modules that are potentially managed by their own local power management policies, yet must closely interact with one another in order to yield maximum system-wide energy efficiency. The broader impacts of this project include the development of energy-aware computer systems as the key for cost-effective realization of a large number of high-performance applications running on battery-powered portable platforms and the education and training of young researchers and engineers to be able to address complex and intertwined energy efficiency/performance challenges that arise in the context of designing next-generation information technology products and services.

Flow-Through-Queue based Power Management for Gigabit Ethernet Controller -- Computer networking is beginning to support multi-gigabit data transfer rates. In an ASPDAC-07 paper we presented an energy-efficient packet interface architecture and a power management technique for gigabit Ethernet controllers, where low-latency and high-bandwidth are achieved to meet the pressing demands of extremely high frame-rate data. More specifically, we presented a predictive-flow-queue (PFQ) based packet interface architecture to adjust the operating frequencies of various functional blocks in the system at a fine granularity so as to minimize the total system energy dissipation while meeting the performance constraints. A key feature of the proposed architecture is that runtime workload prediction of the network traffic is implemented so as to generate an operating frequency value that is continually adjusted, thereby eliminating the delay and energy penalties incurred by transitions between power-saving modes. Furthermore, a modeling approach based on Markov processes and queuing models is employed, which allow one to apply mathematical programming formulations for energy optimization. Experimental results with a designed 65nm gigabit Ethernet controller show that the proposed energy-efficient architecture and power management technique can achieve system-wide energy savings under tighter performance constraints.

Dynamic Voltage and Frequency Management Based on Variable Update Intervals or Frequency Setting -- In an ICCAD-06 paper, we developed an efficient adaptive method to perform dynamic voltage and frequency management (DVFM) for minimizing the energy consumption of microprocessor chips. Instead of using a fixed update interval, our DVFM system makes use of adaptive update intervals for optimal frequency and voltage scheduling. The optimization enables the system to rapidly track the workload changes so as to meet soft real-time deadlines. The method, which is based on introducing the concept of an effective deadline, utilizes the correlation between consecutive values of the workload. Since in real situations the frequency and voltage update rates are dynamically set based on variable update interval lengths, voltage fluctuations on the power network are also minimized. The technique, which may be implemented by simple hardware and is completely transparent from the application, leads to power savings of up to 60% for highly correlated workloads compared to DVFM systems based on fixed update intervals.

Power-Aware Scheduling and Voltage Setting for Tasks Running on a Hard Real-Time System -- In an ASPDAC-06 paper, we presented a solution to the problem of minimizing energy consumption of a computer system performing periodic hard real-time tasks with precedence constraints. In the proposed approach, dynamic power management and voltage scaling techniques are combined to reduce the energy consumption of the CPU and devices. The optimization problem is initially formulated as an integer programming problem. Next, a three-phase heuristic solution, which integrates power management, task scheduling and task voltage assignment, is provided. Experimental results show that the proposed approach outperforms existing methods by an average of 18% in terms of the system-wide energy savings.

Hierarchical Power Management with Application to Scheduling -- In an ISLPED-05 paper, we presented a hierarchical power management (HPM) architecture which aims to facilitate power-awareness in an energy-managed computer (EMC) system with multiple self-power-managed components. The proposed architecture divides the PM function into two layers: system-level and component-level. Although the system-level PM has detailed information about the global state of the EMC and its various computational and memory resources, it cannot directly control the power management policies of the constituent components, which are typically designed and manufactured by different IC vendors. In particular, the system-level PM resorts to adaptive service request flow regulation and online application scheduling to force the component-level PM's to function in such a way that would minimize the total system energy dissipation while meeting an overall eerformance target. Preliminary experimental results show that HPM achieves a 25% reduction in the total system energy compared to the "best" component-level PM policies.

Dynamic Voltage and Frequency Scaling for Energy-Efficient System Design -- This talk, which was given at NSTU, Taiwan in 2005, summarizes the results of our research in the area of dynamic voltage and frequency scaling (DVFS). More precisely, the first part of this talk describes an intra-process DVFS technique targeted toward non real-time applications running on an embedded system platform. The key idea is to make use of runtime information about the external memory access statistics in order to perform CPU voltage and frequency scaling with the goal of minimizing the energy consumption while translucently controlling the performance penalty. The proposed DVFS technique relies on dynamically-constructed regression models that allow the CPU to calculate the expected workload and slack time for the next time slot, and thus, adjust its voltage and frequency in order to save energy while meeting soft timing constraints. This is in turn achieved by estimating and exploiting the ratio of the total off-chip access time to the total on-chip computation time. The proposed technique has been implemented on an XScale-based embedded system platform and actual energy savings have been calculated by current measurements in hardware. The second part of this talk describes a DVFS technique that minimizes the total system energy consumption for performing a task while satisfying a given execution time constraint. We first show that in order to guarantee minimum energy for task execution by using DVFS it is essential to divide the system power into fixed, idle and active power components. Next, we present a new DVFS technique, which considers not only active power, but also idle and fixed power components of the system. This is in sharp contrast to previous DVFS techniques, which only consider the active power component. The fixed plus idle components of the system power are measured by monitoring the system power when it is idle. The active component of the system power is estimated at run time by a technique known as workload decomposition whereby the workload of a task is decomposed into on-chip and off-chip based on statistics reported by a performance monitoring unit (PMU). We have implemented the proposed DVFS technique on the BitsyX platform; an Intel PXA255-based platform manufactured by ADS Inc., and performed detailed energy measurements.

 

Hardware/Software Support and Algorithms for Dynamic Backlight Scaling in TFT LCDs

Sponsor: National Science Foundation - Computing Processes and Artifacts

Project Summary: Display components have become a key focus of efforts for maximization of the battery lifetime in a wide range of portable, display-equipped, microelectronic systems and products. A particularly effective technique in reducing the power consumption of all kinds of displays is the dynamic backlight scaling technique, where the intensity of the backlight lamp and the LCD transmittance function are changed concurrently and in proportion so that the same visual perception is created in the human eyes at much lower levels of power consumption. This research therefore aims to develop spatiotemporal and/or color-aware backlight scaling techniques for pixel transformation of the displayed still images or video streams so as to maximize the energy saving in a target platform. The new techniques , which take advantage of the human visual system characteristics to minimize distortion between the original and backlight-scaled images/videos, will be implemented and demonstrated on the Apollo Testbed II hardware platform. The broader impact of the research is to significantly reduce the power consumption of typical handheld devices, increasing their discharge-cycle lifetime, thereby, enabling more widespread and convenient use of such devices. The backlight dimming technology can also be applied in AC-powered systems where the key concern is the energy cost to the individual user as well as the society at large. This technology has the potential to reduce the typical energy bill of a desktop computer by 30% or so (when the system is being used). This research, if successful, will expedite introduction of advanced display technologies (such as LED-based backlighting for LCDs, or organic LED-based displays) since it will reduce their power cost without sacrificing quality.

LCD (Liquid Crystal Display) TVs are becoming the main stream in FPD (Flat Panel Display) market. In spite of their superb performances (e.g. vivid image representation and high native resolution) compared to other types of TVs such as PDP (Plasma Display Panel), LCDs suffer from a number of well-known shortcomings such as motion blur artifact, low contrast ratio, and low brightness. Furthermore, backlighting for the modern LCD panels is typically done with the aid of a 2-D array of individually luminance-controlled white LED's, each of whom serves as the backlight for a fixed-size region on the LCD panel. We are currently investigating dimming and scanning of the 2-D LED array with the aid of appropriately time-shifted and duty cycle adjusted Pulse Width Modulation (PWM) signals. The goal is both to minimize the total power dissipation of the LED array drivers while improving the static contrast ratio and eliminating the motion blur artifact in LCD TVs. More precisely, we are developing a 2-D PWM-driven backlight dimming technique which simultaneously dims certain regions of the LCD screen and sets the pixel values by applying an optimal pixel value transformation function. In addition, we are investigating a 2-D backlight scanning technique which determines a new duty cycle for the PWM signal for each white LED driver so as to preserve the original backlight intensity for the LED while ensuring that the LED can be completely turned off for a period of time during each frame. This off time, which is about 8ms in the target display system, greatly reduces the motion blur. At the same time, if the pixel value updates due to refresh operation take place during this off time, the viewer will only see the changed pixel values corresponding to the new frame and will not be subjected to effects arising from pixel value transitions while the pixels are being exposed to back light. Both of the proposed ideas are being implemented in a Xilinx FPGA (Spartan 3E) and tested on a Samsung 40-inch LCD TV.

B2Sim: A Fast Micro-Architecture Simulator Based on Basic Block Characterization -- State-of-the-art architectural simulators support cycle accurate pipeline execution of application programs. However, it takes days and weeks to complete the simulation of even a moderate-size program. During the execution of a program, program behavior does not change randomly but changes over time in a predictable/periodic manner. This behavior provides the opportunity to limit the use of a pipeline simulator. More precisely, in a CODED-06 paper, we presented a hybrid simulation engine, named B2Sim for (cycle-characterized) Basic Block based Simulator, where a fast cache simulator e.g., sim-cache and a slow pipeline simulator e.g., sim-outorder are employed together. B2Sim reduces the runtime of architectural simulation engines by making use of the instruction behavior within executed basic blocks. We integrated B2Sim into SimpleScalar and achieved on average a factor of 3.3 times speedup on the SPEC2000 benchmark and Media-bench programs compared to conventional pipeline simulator while maintaining the accuracy of the simulation results with less than 1% CPI error on average.

Backlight Dimming in Power-Aware Mobile Displays -- In a DAC-06 paper, we introduced a temporally-aware backlight scaling technique for video streams. The goal is to maximize energy saving in the display system by means of dynamic backlight dimming subject to a video distortion tolerance. The video distortion comprises of (1) an intra-frame (spatial) distortion component due to frame-sensitive backlight scaling and transmittance function tuning and (2) an inter-frame (temporal) distortion component due to large-step backlight dimming across frames modulated by the psychophysical characteristics of the human visual system. The proposed backlight scaling technique is capable of efficiently computing the flickering effect online and subsequently using a measure of the temporal distortion to appropriately adjust the slack on the intra-frame spatial distortion, thereby, achieving a good balance between the two sources of distortion while maximizing the backlight dimming-driven energy saving in the display system and meeting an overall video quality figure of merit.
The proposed dynamic backlight scaling approach is amenable to highly efficient hardware realization and has been implemented on the Apollo Testbed II. Actual current measurements demonstrate the effectiveness of proposed technique compared to the previous backlight dimming techniques, which have ignored the temporal distortion effect.

DTM: Dynamic Tone Mapping for Backlight Scaling -- In a DAC-05 paper, we presented an approach for pixel transformation of the displayed image to increase the potential energy saving of the backlight scaling method. The proposed approach takes advantage of human visual system (HVS) characteristics and tries to minimize distortion between the perceived brightness values of the individual pixels in the original image and those of the backlight-scaled image. This is in contrast to previous backlight scaling approaches which simply match the luminance values of the individual pixels in the original and backlight-scaled images. Moreover, the proposed dynamic backlight scaling approach, which is based on tone mapping, is amenable to highly efficient hardware realization because it does not need information about the histogram of the displayed image. Experimental results show that the dynamic tone mapping for backlight scaling method results in about 35% power saving with an effective distortion rate of 5% and 55% power saving for a 20% distortion rate.

HEBS: Histogram Equalization for Backlight Scaling -- In a DATE-05 paper, we presented a method for finding a pixel transformation function that minimizes the backlight intensity while maintaining a pre-specified image distortion level for a liquid crystal display. This is achieved by first finding a pixel transformation function, which maps the original image histogram to a new histogram with lower dynamic range. Next the contrast of the transformed image is enhanced so as to compensate for the brightness loss that arises from backlight dimming. The proposed approach relies on an accurate definition of the image distortion, which accounts for both the pixel value differences and a model of the human visual system and is amenable to highly efficient hardware realization. Experimental results show that histogram equalization for backlight scaling results in about 45% power saving with an effective distortion rate of 5% and 65% power saving for a 20% distortion rate. This is higher power savings compared to previously reported dynamic backlight scaling approaches.

 

Design Techniques and Tools to Enable and Enhance Coarse-Grain Power Gating in ASIC Designs

Sponsor: National Science Foundation - Computing Processes and Artifacts

Project Summary: The semiconductor industry's $261 B in 2006 revenue does not accurately reflect its crucial role in enabling a $47 T ($61 T on a PPP basis) world economy to thrive and grow. This industry underpins the systems and technologies on which the people and governments of the world rely on for future prosperity. This industry is currently facing some extraordinary challenges, including variability of nano devices as well as excessive power dissipation in circuits and systems. In order for the industry to continue to expand and prosper, it is critical to address these challenges heads on. The proposed research takes on one of these two fundamental challenges, i.e., the "power crisis". More precisely, this project focuses on coarse-grain power gating in ASIC designs, which switches entire blocks/rows of standard cells. This choice is due to lower cost and greater leakage savings of coarse-grain power gating compared to its fine-grain counterpart, which inserts the header or footer in each standard cell in the ASIC design library. The project results are expected to include the following: (i) Distributed sleep transistor placement and sizing; (ii) Sleep signal scheduling to minimize the peak current demand on wakeup; (iii) Mode transition energy minimization to enable more frequent mode transitions; (iv) Local sleep signal generation for autonomous power gating; and (v) Power gating to enable multiple power modes. This project aims to address each of these tasks by developing algorithmic or mathematical programming solutions to solving each step and by developing a design flow and prototype software tools that enable widespread adoption of this very interesting and important technology in the ASIC design.

Coarse-Grain MTCMOS Sleep Transistor Sizing Using Delay Budgeting -- Current state-of-the-art sleep transistor sizing algorithms minimize the total sleep transistor width subject to a maximum IR voltage drop on the virtual node of each MTCMOS switch cell. In these approaches, the DC noise constraint for the virtual node of a switch cell is somehow related to the tolerable delay increase in the circuit. Using a single maximum IR voltage drop value on all virtual nodes is over constraining the problem. Instead, we would like to set the DC noise constraint for the virtual node of each MTCMOS switch based on the minimum tolerable delay increase (i.e., the positive timing slack) for any logic cell in the corresponding module. The voltage drop allocation on the virtual nodes of the MTCMOS switches should thus be closely related to the timing slack allocation to individual cells in the circuit. In a DATE-08 paper, we introduced a new approach for minimizing the total sleep transistor width for a coarse-grain MTCMOS circuit assuming a given standard cell and sleep transistor placement. Our algorithm takes a maximum allowed circuit slowdown factor and produces the sizes of various sleep transistors in the standard cell layout while considering the DC parasitics of the virtual ground net. We showed that the problem can be formulated as a sizing with delay budgeting problem and solved efficiently using a heuristic sizing algorithm which implicitly performs maximum current calculation through sleep transistors while accounting for different current flow paths in the virtual ground net through adjacent sleep transistors. This technique uses at least 40% less total sleep transistor width compared to other approaches.

Sizing and Placement of Charge Recycling Transistors in MTCMOS Circuits -- In an ICCAD-07 paper, we showed that the sizing and placement problems of charge-recycling transistors in charge-recycling multi-threshold CMOS (CR-MTCMOS) can be formulated as a linear programming problem, and hence, can be efficiently solved using standard mathematical programming packages. The proposed sizing and placement techniques allow us to employ the CR-MTCMOS solution in large row-based standard cell layouts while achieving nearly the full potential of this power-gating architecture, i.e., we achieve 44% saving in switching energy due to the mode transition in CR-MTCMOS compared to standard MTCMOS.

Charge Recycling in MTCMOS Circuits: Concept and Analysis -- Design of a suitable power gating (e.g., multi-threshold CMOS or super cutoff CMOS) structure is an important and challenging task in sub-90nm VLSI circuits where leakage currents are significant. In designs where the mode transitions are frequent, a significant amount of energy is consumed to turn on or off the power gating structure. It is thus desirable to develop a power gating solution that minimizes the energy consumed during mode transitions. In a DAC-06 paper and an IEEE SSCS DLP talk in October 2006, we described such a solution by recycling charge between the virtual power and ground rails immediately after entering the sleep mode and just before wakeup. The proposed method can save up to 43% of the dynamic energy wasted during mode transition while maintaining the wake up time of the original circuit. It also reduces the peak negative voltage value and the settling time of the ground bounce.

 

Statistical Static Timing Analysis and Circuit Optimization: A Current Source Model-Based Approach

Sponsor: Seminconductor Research Corp.

Project Summary The down scaling of layout geometries to 45nm and below has resulted in a significant increase in the packing density and the operational frequency of VLSI circuits. The conventional static timing analysis (STA) techniques model signal transitions as saturated ramps with known arrival and transition times and propagate these timing parameters from the circuit primary inputs to the primary outputs. However the different waveforms with identical arrival time and slew (transition) time applied to the input of a logic cell or an interconnect line can result in very different propagation delays through the component depending on the exact form of the applied signal waveform. In addition, as we move towards the 45nm and lower minimum feature sizes for the devices, process variations are becoming an ever increasing concern for the design of high performance integrated circuits. The process variations can cause excessive uncertainty in timing calculation, which in turn calls for sophisticated analysis techniques to reduce the uncertainty.

Recent Results of the Current Source Model-Based Approach for Timing Analysis -- Our work focuses on the development of an accurate current source model of a CMOS logic cell with extensions to handle multiple input switching and statistical parameter variability. The work also includes development of efficient methods to generate the CSMs of logic cells, which are typically present in a standard cell library. The work addresses integration of CSMs of logic cells with a waveform propagation engine in order to produce a highly efficient and robust CSM-based static timing analyzer.

 

Optimal Design of Power Delivery Network for System on Chip

Partial support from the National Science Foundation

Project Summary: Utilizing multiple voltage domains (also known as voltage island) is one of the most effective techniques to minimize the overall power dissipation - both dynamic and leakage - while meeting a performance constraint. In a system designed with multiple voltage domains, the power delivery network (PDN) is responsible for delivering power with appropriate voltage levels to different functional blocks (FB's) on the chip. Voltage regulator modules (VRM's) which are in charge of voltage conversion and regulation are inevitable components in this network. The selection of appropriate VRM's plays a critical role in the power efficiency of the PDN.

Design of an Efficient Power Delivery Network in an SoC to Enable Dynamic Power Management In an ISLPED-07 paper, we introduced a new technique to design the power delivery network for a SoC design to support dynamic voltage scaling. In this technique the power delivery network is composed of two layers. In the first layer, DC-DC converters with fixed output voltages are used to generate all voltage levels that are needed by different loads in the SoC design. In the second layer of the power delivery network, a power switch network is used to dynamically connect the power supply terminals each load to the appropriate DC-DC converter output in the first layer. Experimental results demonstrate the efficacy of this technique.

Optimal Selection of Voltage Regulator Modules in a Power Delivery Network -- Typically a star configuration of the VRM's, where only one VRM resides between the power supply and each FB, is used to deliver currents with appropriate voltage levels to different loads in the circuit. In a DAC-07 paper, we showed that using a tree topology of suitably chosen VRM's between the power source and FB's yields higher power efficiency in the PDN. We formulated and efficiently solved the problem of selecting the best set of VRM's in a tree topology as a dynamic program and efficiently solve it.

 

Power Efficient SRAM Cell and Array Design

Partial support from the National Science Foundation

Project Summary: In many modern microprocessors, caches occupy a large portion of the die. For example, in Intel's Itanium 2 Montecito processor, more than 80% of the die is dedicated to caches. Since the leakage power dissipation is roughly proportional to the area of a circuit, the leakage power of caches is one of the major sources of power consumption in high performance microprocessors. Our research on SRAM design focuses on leakage reduction in such memory structures and on judicious use of multiple Vth and multiple tox transistors in a large SRAM array and power-ground-gated, data-retentive SRAM cells.

Low-Leakage SRAM Design in Deep Submicron Technologies -- This January-2008 presentation has two parts. In the first part, a method based on dual-Vt and dual-Tox assignment is presented to reduce the total leakage power dissipation of SRAMs while maintaining their performance. The proposed method is based on the observation that read and write delays of a memory cell in an SRAM block depend on the physical distance of the cell from the sense amplifier and the decoder. Thus, the idea is to deploy different configurations of six-transistor SRAM cells corresponding to different threshold voltage and oxide thickness assignments for the transistors. Unlike other techniques for low-leakage SRAM design, the proposed technique incurs neither area nor delay overhead. In addition, it results in a minor change in the SRAM design flow. The leakage saving achieved by using this technique is a function of the values of the high threshold voltage and the oxide thickness, as well as the number of rows and columns in the cell array. Simulation results with a 65nm process demonstrate that this technique can reduce the total leakage power dissipation of a 64 512 SRAM array by 33% and that of a 32 512 SRAM array by 40%. In the second part, a gated-supply, gated-ground data retention technique for CMOS SRAM cells to enable design of robust and ultra low-power caches in very deep submicron CMOS technologies is presented. We show that, given a fixed value of the voltage difference on the power rails of the SRAM cell during the standby mode, the proposed power-ground-gating (PG-gating) solution achieves significantly higher leakage power savings compared to either power supply (P) gating or ground (G) gating techniques while improving the static noise margin and soft error rate. In particular, it is shown that optimum ground and supply voltage levels exist for which the SRAM cell leakage is minimized subject to a hold static noise margin constraint. When the PG-gated cell is not accessed for read/write operations, it is biased to the optimum values of ground and supply voltages, resulting in minimum leakage power consumption. Simulation results demonstrate that the PG-gating technique has a higher hold and read static noise margin, lower soft error rate, and also higher leakage saving compared to single P or G gating techniques at the expense of an increase in the area overhead. Moreover, the PG-gated cell exhibits less leakage variability under process and temperature variations compared to single P or G gating techniques. Moreover, its hold static noise margin is more robust to process variations. For a 64Kb SRAM array designed in 130nm CMOS technology with Vdd=1.3V and a 180mV hold static noise margin, the leakage power of PG-gated design is 60% lower than that of a low power G-gated design.

===================================================================

===================================================================

 

Minimizing Leakage Power in CMOS Designs

Support from miscellaneous sources

Project Summary: In many new designs, the leakage component of power consumption is comparable to the dynamic component. Many reports indicate that, in sub-65 nm CMOS technology node, 40% or even higher percentage of the total power consumption is due to the leakage of transistors and this percentage will increase with technology scaling unless effective techniques are used to bring leakage under control. This research focuses on minimizing leakage in CMOS VLSI circuits.

Minimizing Leakage Power in CMOS: Technology and Design Issues -- This tutorial given at EPFL in July 2008 focuses on circuit techniques and design methods to accomplish this goal. The first part of the presentation provides an overview of basic physics and technology and scaling trends that have resulted in the significant increase in sub-threshold and gate leakage currents. The part provides an in-depth description of multiple, Vdd, multiple-Vt, and multiple Tox techniques for leakage minimization in light of process variations and substrate temperature changes. The second part of this presentation describes a number of design optimization techniques for controlling leakage current, including, state assignment, technology mapping, and precomputation-based signal guarding. It will also present runtime mechanisms for leakage control including body bias control, transition to minimum leakage state, and power gating.

Circuit and Design Automation Techniques for Leakage Minimization of CMOS VLSI Circuits -- This tutorial given at Samsung Research in October 2006 focuses on circuit techniques and design methods to accomplish leakage minimization in CMOS VLSI circuits. The first part of the presentation provides an overview of basic physics and technology and scaling trends that have resulted in the significant increase in sub-threshold and gate leakage currents. The part provides an in-depth description of multiple, Vdd, multiple-Vt, and multiple Tox techniques for leakage minimization in light of process variations and substrate temperature changes. This part will address the use of high permittivity gate dielectric, metal gate, novel device structures and circuit based techniques for controlling the gate tunneling current. The second part of this presentation describes a number of design optimization techniques for controlling leakage current, including, state assignment, technology mapping, and precomputation-based signal guarding. It will also present runtime mechanisms for leakage control including body bias control, transition to minimum leakage state, power gating, etc.

 

Battery Aware Hierarchical Wireless Sensor Network for Distributed Data Collection

Project Summary: Wireless sensor networks (WSN) have gained considerable attention in applications where spatially distributed events are to be monitored. Recent technological advances have led to the emergence of small battery-powered sensors with considerable processing and communication capabilities. We consider a distributed, hierarchical wireless sensor network of energy-constrained nodes. Each node in this network has limited computation and storage resources, wireless communication capability, and a limited energy source in the form of a battery. This network of autonomous nodes performs collaborative problem solving, such as providing situational and tactical awareness to the first respondents in an emergency situation, carrying out automatic intrusion detection/deterrence, or object recognition and tracking. The problem of interest is maximizing the network lifetime while providing a minimum quality of service requirement subject to some performance constraints (e.g., the response time.) Energy is considered as a key network resource that must be allocated and dispensed properly to maximize the network lifetime. We analyze network and wireless link properties and develop protocols that compensate/account for effects of extreme variations in wireless link dependability, many-to-one nature of the communication in a mixed multi-tier WSN, local high-contention nodes in the network, and relatively high cost of maintenance. This research addresses battery awareness of a monitoring sensor network as an intrinsic aspect of the distributed data collection task. This project will produce battery-aware algorithms and techniques for wireless sensor network design and deployment as the key enabler for cost-effective realization of many applications. The broader impact of this project will be to assist in the critical ongoing efforts to deploy networks of energy-constrained sensors and distribution/collection nodes for environmental, medical and security applications.

Lifetime-Aware Hierarchical Wireless Sensor Network Architecture with Mobile Overlays -- With power efficiency and lifetime awareness becoming critical design concerns, we focus on energy-aware design of different layers of the WSN protocol stack. In a RAW-07 conference paper, we presented and analyzed a hierarchical wireless sensor network with mobile overlays, along with a mobility-aware multi-hop routing scheme, in order to optimize the network lifetime, delay, and local storage size. Furthermore, we show how certain physical layer attributes may affect the overall network lifetime. More specifically, we have investigated how certain adaptive modulation schemes may affect overall energy balancing in the network and hence its lifetime. Finally, we investigate new lifetime models which can be used to obtain more practical design criteria for energy-aware system design.

 

Controlling Uncertainty and Handling Variability in System-Level Dynamic Power Management

Project Summary: Variability represents diversity or heterogeneity in a well-characterized population. Fundamentally a property of Nature, variability is usually not reducible through further measurement or study. For example, different dies have different leakage power dissipations, no matter how carefully we measure them. Uncertainty represents partial ignorance or lack of perfect information about poorly-characterized phenomena or models. Fundamentally a property of the observer, uncertainty is usually reducible through further measurement or study. For example, even though an observer may not know the leakage power dissipation of every die coming out of a manufacturing plant, he or she can surely take more samples to gain additional (albeit still imperfect) information about the leakage power distribution. With the increasing levels of variability in the characteristics of nanoscale CMOS devices and VLSI interconnects and continued uncertainty in the operating conditions of VLSI circuits, achieving power efficiency and high performance in electronic systems under process, voltage, and temperature variations as well as current stress, device aging, and interconnect wear-out phenomena has become a daunting, yet vital, task. This research tackles the problem of system-level dynamic power management (DPM) in systems which are manufactured in nanoscale CMOS technologies and are operated under widely varying conditions over the lifetime of the system. Such systems are greatly affected by increasing levels of process variations typically materializing as random or systematic sources of variability in device and interconnect characteristics, and widely varying workloads and temperature fluctuations usually appearing as sources of uncertainty. At the system level this variability and uncertainty is beginning to undermine the effectiveness of traditional DPM approaches. It is thus critically important that we develop the mathematical basis and practical applications of a variability-aware, uncertainty-reducing DPM approach with the following unique features and capabilities.

Improving the Efficiency of Power Management Techniques by Using Bayesian Classification In an ISQED-08 paper, we presented a supervised learning based dynamic power management (DPM) framework for a multicore processor, where a power manager (PM) learns to predict the system performance state from some readily available input features (such as the state of service queue occupancy and the task arrival rate) and then uses this predicted state to look up the optimal power management action from a pre-computed policy lookup table. The motivation for utilizing supervised learning in the form of a Bayesian classifier is to reduce overhead of the PM which has to recurrently determine and issue voltage-frequency setting commands to each processor core in the system. Experimental results reveal that the proposed Bayesian classification based DPM technique ensures system-wide energy savings under rapidly and widely varying workloads.

Resilient Dynamic Power Management under Uncertainty In a DATE-08 paper, we presented a stochastic framework to improve the accuracy of decision making during dynamic power management, while considering manufacturing process and/or design induced uncertainties. More precisely, the uncertainties are captured by a partially observable semi-Markov decision process and the policy optimization problem is formulated as a mathematical program based on this model. Experimental results with a RISC processor in 65nm technology demonstrate the effectiveness of the technique and show that the proposed uncertainty-aware power management technique ensures system-wide energy savings under statistical circuit parameter variations.

 

Design Methodologies and Techniques for Optimizing Power Consumption and Performance in Pipeline Circuits

Project Summary: Excessive power dissipation and resulting temperature rise have become one of the key limiting factors to processor performance and a significant component of its cost. In modern microprocessors, expensive packaging and heat removal solutions are required to achieve acceptable substrate and interconnect temperatures. Due to their high utilization, pipeline circuits of a high-performance microprocessor are major contributors to the overall power consumption of the processor, and consequently, one of the main sources of heat generation on the chip. Our research is expected to propose techniques to minimize power consumption in pipeline circuits at different design levels and, at the same time, produce guidelines and tools for optimizing their power dissipation.

A Mathematical Solution to Power Optimal Pipeline Design by Utilizing Soft Edge Flip Flops -- In an ISLPED-08 paper, we presented a technique to address the problem of reducing the power consumption in a synchronous linear pipeline, based on the idea of utilizing soft-edge flip-flops (SEFF) for time borrowing and voltage scaling in the pipeline stages. We described a unified methodology for optimally selecting the supply voltage level of a linear pipeline and optimizing the transparency window of the SEFF so as to achieve the minimum power consumption subject to a total computation time constraint. We formulated the problem as a quadratic program that can be solved optimally in polynomial time. Our experimental results demonstrated that this technique is quite effective in reducing the power consumption of a pipeline circuit under a performance constraint. Next, we will improve the pipeline stages by using optimally designed flip-flops. Also, we will consider the effect of higher order constraints such as the interdependency between the setup and hold time, and then generalize the problem to the non-linear pipelines with multi-stage feed forward and feedback paths.

 

Performance and Reliability Analysis and Optimization in Sub-45nm CMOS Circuits

Project Summary: With the CMOS technology in the nanometer regime, reliability is becoming a major design concern. It seems in future designer will need to make power-performance-reliability tradeoffs at all levels of the VLSI circuit and system design. In this area our current research focuses on building accurate, fast and easy to use fault and reliability device models and incorporating these models into CAD tools. Because of reliability concerns physical scaling of CMOS has already been slowed. Many nanotechnologies are emerging that are an order of magnitude smaller than CMOS but all these technologies are far below CMOS in terms of reliability. Our current research also focuses on discovering new hybrid architectures that promise VLSI scaling at the system level in future technologies.

Probabilistic Error Propagation in a Logic Circuit Using the Boolean Difference Calculus -- A gate level probabilistic error propagation model is presented which takes as input Boolean function of the gate, signal probability, the probability for signal being "1", and error probability at the gate inputs, and the gate error probability and generates the error probability at the output of the gate. The presented model uses the Boolean difference calculus and can be efficiently applied to the problem of calculating the error probability at the primary outputs of a multi-level Boolean circuit with a time complexity which is linear in the number of gates in the circuit. This is done by starting from the primary inputs and moving toward the primary outputs by using a post-order (reverse DFS) traversal. Experimental results demonstrate the accuracy and efficiency of the proposed approach compared to the other known methods for error calculation in VLSI circuits.