# A Programmable 34 nW/Channel Sub-Threshold Signal Band Power Extractor on a Body Sensor Node SoC

Alicia M. Klinefelter, *Student Member, IEEE*, Yanqing Zhang, *Student Member, IEEE*, Brian Otis, *Senior Member, IEEE*, and Benton H. Calhoun, *Senior Member, IEEE* 

Abstract—We present a synthesizable, sub-threshold, four-channel signal band power extractor for a batteryless body sensor node system on chip (SoC). The power extractor consists of a programmable 30-tap finite impulse response (FIR) filter and signal power circuit (SPC). The filter uses a serial, resource-shared architecture to reduce area, leakage, and power. The FIR supports a programmable number of taps, number of active channels, and coefficient register data. The SPC uses power-of-two arithmetic for reduced complexity, area, and power. The design was synthesized in 130-nm CMOS and consumes 34 nW (32 nW for FIR, 2 nW for SPC) per channel at 350 mV and 29 kHz.

*Index Terms*—Body sensor node (BSN), electroencephalography (EEG), finite impulse response (FIR) filter, spectral analysis, sub-threshold, ultra-low-power (ULP).

#### I. INTRODUCTION

7 IRELESS sensor nodes, body sensor nodes (BSNs), and other ultra-low-power (ULP) systems demand energyefficient signal processing to meet their stringent power requirements. For example, a BSN system on chip (SoC) with four input channels for electrocardiography (ECG)/ electroencephalography (EEG)/electromyography (ExG) data, ADC, microcontroller (MCU), radio, power management, and energy harvesting runs entirely on power harvested from body heat with no battery [1]. To enable this, the chip must consume an average power less than the power harvested, which is typically in the 30–50  $\mu$ W range. When extracting heart rate from ECG and sending regular RF updates, the entire chip consumes just 19  $\mu$ W [1]. Since the chip harvests power, the processing blocks are power limited to keep the total average chip power below the harvested power and avoid depleting energy on the storage capacitor.

Filtering is frequently required for processing ExG data, and signal spectra power analysis is used for various neural applications, so we present a custom ULP finite impulse response (FIR) accelerator block and signal power circuit (SPC) integrated on the SoC in [1].

Manuscript received July 18, 2012; revised September 18, 2012; accepted October 26, 2012. Date of publication December 21, 2012; date of current version February 1, 2013. This brief was recommended by Associate Editor M. Alioto.

Digital Object Identifier 10.1109/TCSII.2012.2231041

TABLE I EEG Frequency Bands of Interest [2]

| Neurological State                | Frequency Band                             |
|-----------------------------------|--------------------------------------------|
| Visual processing/motor planning  | 8-12Hz (α)                                 |
| Awake and alert                   | 18-26Hz (β)                                |
| Consciousness/Awareness (Present) | $70-100$ Hz $(\gamma)$                     |
| Consciousness/Memories (Past)     | 30-50Hz (low-γ)                            |
| Light Sleep                       | $4-7Hz(\theta)$                            |
| Deep Sleep                        | $0.5\text{-}3\text{Hz}\left(\delta\right)$ |

A primary use of the power extractor on the SoC is processing EEG signals to determine the amount of cortical neuronal activity in the brain. From individual electrodes, the band power can be determined by filtering the data and averaging the square of the data over a window of time [2]. The majority of the signal power of EEG signals is concentrated at frequencies < 200 Hz and can determine neuronal activity such as processing or movement, sleeping events, and seizure prediction as seen in [2]. By filtering the EEG data prior to processing using the SPC, the amount of signal energy within a single-frequency band can be determined and used to classify specific brain activities if the energy passes a known threshold. Normal neuronal activity from an awake subject can be extracted from four key EEG frequency bands:  $\alpha$  (8–12 Hz),  $\beta$  (18–26 Hz), low- $\gamma$  (30–50 Hz), and  $\gamma$  (70–100 Hz). To begin to classify sleep, the  $\delta$  (0.5–25 Hz) and  $\theta$  (4–7 Hz) bands are also used, as seen in Table I. The filter is programmable to allow for subjectspecific frequency bands or to handle general-purpose filtering on-node. The presented FIR was designed considering these bands of interest, but is not limited to this application.

Other power extractor circuits have also been presented for EEG/ECoG systems that consume more power and area due to an analog or mixed signal implementation, as in [4] and [5], or the use of more versatile but costly spectral density algorithms, as in [6]. As the filter consumes > 90% of the power in this implementation of the power extractor, reducing power within the FIR was of high importance and was completed through the use of a serial architecture and sub-threshold operation seen in Fig. 1 with a corresponding timing diagram in Fig. 2. Other sub-threshold filters have been proposed that use body-biasing techniques to reduce the effects of variation, as in [7], or boost a sub-threshold supply voltage to the super-threshold region so that the circuit no longer operates in sub-threshold, as in [8]. We avoided body biasing to make our synthesizable design

A. M. Klinefelter, Y. Zhang, and B. H. Calhoun are with the University of Virginia Department of Electrical and Computer Engineering, Charlottesville, VA 22904 USA (e-mail: amk5vx@virginia.edu).

B. Otis is with the Electrical Engineering Department, University of Washington, Seattle, WA 98195 USA (e-mail: botis@uw.edu).

Color versions of one or more of the figures in this brief are available online at http://ieeexplore.ieee.org.



Fig. 1. Filter resource-shared architecture of the four-channel FIR including detailed diagram of the channel.



Fig. 2. Timing diagram of the FIR filter showing serial operation.



Fig. 3. Block diagram of the signal power circuit (SPC).

process portable and kept a sub-threshold supply voltage since it provided ample performance for our set of BSN applications. The power extractor circuit used power-of-two arithmetic to eliminate the need for multipliers and divider circuits for reduced power, as seen in Fig. 3.

### II. DESIGN CONSIDERATIONS

## A. Sub-Threshold Operation

The effects of variation on the threshold voltage,  $V_{\rm t}$ , are pronounced when operating in the sub-threshold region due to an exponential dependence of the drain current of the transistor on  $V_{\rm t}$ . To mitigate the effects of this on the performance of the power extractor, static CMOS gates with short stacks were used throughout the design to provide robustness. To avoid ratioed circuits, 6-T SRAM cells for storing data and coefficients were avoided in favor of standard cell registers using high  $V_{\rm t}$  devices for reduced leakage and to maintain a synthesizable design [9].



Fig. 4. PMOS headers and clock gating for power savings. Table of measured energy savings for the FIR (energy bottleneck of the system) compared to using the on-chip microcontroller (MCU).

### B. Energy Savings

As the filter and power extractor are not always in use by the reconfigurable datapath on chip, they required extremely low leakage during idle periods. To reduce leakage energy, PMOS headers were added to the design to reduce leakage by cutting off the voltage supply to the gates during idle mode. Appropriate header sizes were chosen iteratively based on a starting width of  $\sim\!10\%$  of the total NFET width of the design and the header width was increased or decreased until a <2% supply voltage droop was seen in simulation for normal circuit operation. The filter and extractor can both be clock gated at the block or individual channel level. This prevents excess switching energy due to an active system clock when these blocks are not in use, as seen in Fig. 4.

# III. SUB-THRESHOLD POWER EXTRACTOR

## A. FIR Filter Design

To support concurrent ExG processing on data from the four analog input channels on chip, the architecture of the FIR has four independent channels, as seen in Fig. 1. The SoC uses a 200-kHz XTAL for the system clock to modulate data in the Medical Implant Communication Service band radio and for a digital clock. Since the typical data sampling rate for ExG is much lower than 200 kHz (< 1024 Hz for our chip), we use a serial architecture [10] versus the traditional direct-form architecture for a FIR filter to reduce power, area, and leakage. The direct-form FIR architecture computes the result in parallel using as many adders and multipliers as there are taps. This method results in a much higher throughput than required for this application space as well as a large area penalty for up to 30 taps replicated over four channels. Resource reuse of the arithmetic units was achieved through using the faster 200-kHz clock for serially processing data between receiving input samples. This allowed for a higher utilization of arithmetic units per sample, resulting in lower leakage from inactive or excess circuitry when the design is operating or not on the datapath. At every rising edge of the sampling clock, a new input sample is received, processed at the faster clock rate, and the result is computed in a fraction of the sample period, as seen in Fig. 2.



Fig. 5. FIR frequency response for the  $\alpha$ ,  $\beta$ , low- $\gamma$ ,  $\gamma$ ,  $\delta$ , and  $\theta$  bands of EEG for tap lengths of 15 and 30 and a sampling rate of 256 Hz.

Each channel contains one 8-bit Baugh-Wooley multiplier, one 16-bit ripple-carry adder, coefficient registers, and a small filter controller for channel synchronization and state retention of the channels. During active operation, each channel is individually clock-gated for any remaining cycles after the result is computed to further reduce energy. Sharing arithmetic units within each channel reduced the overall area by 6–12×/filter compared to prior work [7], [8] and by 12×/channel compared to a traditional parallel architecture. The smaller area also reduces leakage, which helps reduce energy drawn from the off-chip storage capacitor particularly during idle periods.

The filter supports programmability by the SoC's MCU for several modes of operation. A programmable filter is important due to the volatility of energy on the storage capacitor during power harvesting. The on-chip power-management controller can reduce the number of taps used for filtering or the number of active channels to reduce power in the circuits based on available energy. A 30-tap filter met the accuracy specifications for this set of applications as this was the point that the magnitude responses for bandpass filters in the targeted frequency ranges (for non-sleep bands) were attenuating frequencies below the lower cutoff frequency as seen in Fig. 5. When high throughput and energy are more critical than accuracy, a 15-tap mode is available. Each stream of input data can be filtered using one channel or two simultaneously with different filtering coefficients (e.g., EEG bands). The frequency spectra of a filtered EEG signal for an awake subject are shown in Fig. 6. The FIR filter coefficients used for testing were found through MATLAB's filter toolbox using a Kaiser window with  $\beta = 0.5$ .

The relationship between energy/sample and number of taps in the FIR is linear in our serial design as each tap requires one additional multiply accumulate operation than the previous, so varying the number of taps trades off energy with fidelity using this architecture. This also allows the processing of different types of data with different fidelity requirements.



Fig. 6. Original EEG motor activity signal spectrum [11] and measured filtered waveforms using 30 taps.

#### B. SPC Design

The SPC receives output data from the FIR filter and computes the average signal power within a specific frequency band. This block has four input channels corresponding to the channel outputs of the filter and can save state for each channel during a channel switch from the ADC. The extractor also has a programmable summing window size and number of active channels. The circuit directly implements the signal power equation shown in

$$p_x = \frac{1}{N} \sum_{n=0}^{N-1} |x[n]|^2 \tag{1}$$

where  $p_x$  is the average signal power for a signal, x, and N is the summing window size. This is an alternative to directly computing the power spectral density of the signal that gives energy per hertz, as seen in (2), where  $\omega$  is the angular frequency. Power spectral density is a robust way for determining EEG signal characteristics as a function of frequency [2]

$$\varphi(\omega) = \left| \frac{1}{\sqrt{2\pi}} \sum_{n=-\infty}^{\infty} x[n] e^{-j\omega n} \right|^2 = \frac{F(\omega) F^*(\omega)}{2\pi}. \quad (2)$$

The equation in (2) is equivalent to finding the fast Fourier transform (FFT) of the signal, x, and multiplying by the complex conjugate of the result. The FFT is a computationally expensive operation that would also require post-processing for determining the signal power.

Implementing the signal power equation in (1) directly requires a squaring circuit, accumulator (adder), and a division circuit. Since these are costly operations to have multiplied over four channels, computation complexity was reduced by working with the data in powers of two, which reduces the division operation to a series of shifts (Fig. 3). The window size is a programmable input and can be set to any power-of-two



Fig. 7. Relative error due to a power-of-two implementation of the SPC for an awake subject completing 1-D random motion of both hands.



Fig. 8. Body sensor node chip micrograph with power extractor and FIR.

in the range of 4–128. To replace the multiplier required for the squaring operation, data were rounded to the nearest power of 4, and the squared results came from a lookup table using high- $V_{\rm t}$  standard cells to reduce leakage. The rounding reduced the number of bits required during data transformation as the lower two bits were always 0. The error due to this implementation can be seen in Fig. 7 with an example EEG signal for an awake subject performing random 1-D movements of the left and right hand [11]. The SPC used a window size of 128 to process this data and shows higher relative error in frequency bands that do not contain a large amount of the overall signal power and a low error for the frequency bands describing motor planning and awareness. The absolute error between the ideal and the power-of-two methods is small enough to still allow for accurate detection.

#### IV. EXPERIMENTAL RESULTS

The filter and power extractor were synthesized using only standard cells and fabricated in 130-nm CMOS as part of the BSN SoC (Fig. 8). They operate correctly across the target range of 0.3 V–0.7 V with a corresponding frequency range of 8 kHz–6 MHz, as seen in Table II. Fig. 4 shows the mechanisms for additional power reduction in both blocks. Clock gating channels after the result is ready reduces switching energy by

TABLE II DESIGN SUMMARY

| Technology        | 130nm 1.2V CMOS |
|-------------------|-----------------|
| Taps/Channel      | 15, 30          |
| Voltage Range     | 0.3V - 0.7V     |
| Frequency Range   | 8kHz-6MHz       |
| Threshold Voltage | ~550mV          |



Fig. 9. Measured energy-delay curves for one active channel with 30 taps, two active filters using 30 taps, and one active channel using 15 taps including the SPC with a window size of 128.



Fig. 10. Relationships between the supply voltage and the energy and delay of the power extractor.

 $4\times$ , and the MCU power gates the filter and extractor using PMOS headers when the blocks are unused, reducing leakage by up to  $15\times$ . It was shown that the frequency response of the FIR in the EEG energy extraction bands shows that coefficient quantization had little effect on cutoff steepness (Fig. 5). Fig. 6 also shows the accuracy benefits of going from the 15 to 30 tap modes in the  $\alpha,\,\beta,$  low- $\gamma,$  and  $\gamma,\,\delta,$  and  $\theta$  frequency bands. A measured energy-delay plot is shown in Fig. 9 with a minimum energy-delay product occurring at 350 mV and 29 kHz for the one channel, 30-tap, and 128 length SPC window case. Energy with respect to the supply voltage and delay with respect to the supply voltage are shown in Fig. 10. Recent sub-threshold

TABLE III FIR COMPARISON TABLE

|              | This<br>Work        | [7]                 | [8]                | [4]                             |
|--------------|---------------------|---------------------|--------------------|---------------------------------|
| Туре         | 30-tap, 8-<br>bit   | 8-tap,<br>8-bit     | 14-tap, 8-<br>bit  | 4 <sup>th</sup> order<br>analog |
| Channels     | 4                   | 1                   | 1                  | 4                               |
| Programmable | <b>~</b>            | ×                   | ×                  | ~                               |
| Technology   | 0.13µm              | 0.13µm              | 0.13µm             | 0.13µm                          |
| Supply       | 350mV               | 200 mV              | 270mV              | 1.2V                            |
| Frequency    | 29kHz               | 12kHz               | 20MHz              | 20kHz                           |
| Energy/Tap   | 1.10pJ              | 1.19pJ              | 1.11pJ             | (total)<br>39pJ                 |
| Power        | 32nW                | 114nW               | $310 \mu W$        | 780nW                           |
| FOM*         | 0.57                | 18.55               | 17.37              | N/A                             |
| Area/Channel | $0.058 \text{mm}^2$ | 1.54mm <sup>2</sup> | $0.38 \text{mm}^2$ | 0.7mm <sup>2</sup>              |

\*FIR FOM: power(nW)/frequency(MHz)/# of taps/input bit length/coefficient bit length.

FIR filters have included fewer than 15 taps, consumed more power, more area, and have a much larger FOM without having the flexibility demonstrated in this design, as seen in Table III. To date, our filter has the smallest FOM compared to the state of the art. The design implemented in [4] implements an analog multi-channel extractor with fourth-order bandpass filters followed by an integrator for an area  $12 \times \text{larger}$  and a power consumption of  $24 \times \text{larger}$  than our design.

To measure the signal power within different frequency bands, filtered EEG data was input to the SPC to determine power per band. Compared to a direct implementation of the power extraction equation in (1), the power-of-two format SPC has a <7% error in the average case for random input values with a mean of 128, such as the data provided by the 8-bit on-chip ADC. The total area of the SPC was  $180\times180~\mu m$  and always consumed <10% of the energy of the total power extractor including the FIR. Our FIR filter and SPC blocks help the BSN SoC maintain its robust, ULP energy harvesting operation.

# V. CONCLUSION

A programmable, sub-threshold signal band power extractor was designed for use in a low-throughput and ultra-low-power body sensor network SoC. The filter and full extractor circuit are the lowest power designs compared to the state of the art. Although presented here for a single biomedical application, the filter is flexible for use in general purpose DSP applications. By serializing the traditional FIR architecture, the number of

adders and multipliers required for the design was reduced, thereby reducing active and leakage energy for the overall design. Individual channel clock gating was used to reduce leakage for on-chip modes using fewer than the four maximum electrodes. Similarly, the SPC successfully uses power-of-two format to reduce computational complexity, area, and power for little loss in data fidelity. As the extractor was synthesized using only standard cells, the design is highly portable for new technologies.

#### ACKNOWLEDGMENT

The authors acknowledge the team members and advisors at the University of Virginia and the University of Washington that worked on the SoC that contained this design.

#### REFERENCES

- [1] F. Zhang, Y. Zhang, J. Silver, Y. Shakhsheer, M. Nagaraju, A. Klinefelter, J. Pandey, J. Boley, E. Carlson, A. Shrivastava, B. Otis, and B. Calhoun, "A batteryless 19 μW MICS/ISM-band energy harvesting body area sensor node SoC," in *Proc. IEEE ISSCC*, Feb. 2012, pp. 298–300.
- [2] G. Pfurtscheller, C. Neuper, C. Guger, W. Harkam, H. Ramoser, A. Schlogl, B. Obermaier, and M. Pregenzer, "Current trends in Graz brain-computer interface (BCI) research," *IEEE Trans. Rehab. Eng.*, vol. 8, no. 2, pp. 216–219, Jun. 2000.
- [3] A.-T. Avestruz, W. Santa, D. Carlson, R. Jensen, S. Stanslaski, A. Helfenstine, and T. Denison, "A 5 μW/channel spectral analysis IC for chronic bidirectional brain-machine interfaces," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 3006–3024, Dec. 2008.
- [4] F. Zhang, A. Mishra, A. G. Richardson, S. Zanos, and B. P. Otis, "A low-power multi-band ECoG/EEG interface IC," in *Proc. IEEE CICC*, Sep. 2010, pp. 1–4.
- [5] K. Abdelhalim and R. Genov, "915-MHz wireless 64-channel neural recording SoC with programmable mixed-signal FIR filters," in *Proc.* ESSCIRC, Sep. 12–16, 2011, pp. 223–226.
- [6] N. Verma, A. Shoeb, J. Bohorquez, J. Dawson, J. Guttag, and A. P. Chandrakasan, "A micro-power EEG acquisition SoC with integrated feature extraction processor for a chronic seizure detection system," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 804–816, Apr. 2010.
- [7] H. Myeong-Eun, A. Raychowdhury, K. Kim, and K. Roy, "A 85 mV 40 nW process-tolerant subthreshold 8 × 8 FIR filter in 130 nm technology," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2007, pp. 154–155.
- [8] W.-H. Ma, J. C. Kao, V. S. Sathe, and M. C. Papaefthymiou, "187 MHz subthreshold-supply charge-recovery FIR," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 793–803, Apr. 2010.
- [9] B. H. Calhoun, J. F. Ryan, S. Khanna, M. Putic, and J. Lach, "Flexible circuits and architectures for ultralow power," *Proc. IEEE*, vol. 98, no. 2, pp. 267–282, Feb. 2010.
- [10] W. R. Davis, N. Zhang, K. Camera, F. Chen, D. Markovic, N. Chan, B. Nikolic, and R. W. Brodersen, "A design environment for high throughput, low power dedicated signal processing systems," in *Proc. IEEE Conf. Custom Integr. Circuits*, 2001, pp. 545–548.
- [11] Brain Computer Interface research at NUST Pakistan. EEG motor Brain Computer Interface research at NUST Pakistan. EEG motor activity data set. [Online]. Available: https://sites.google.com/site/projectbci/