
Gert Cauwenberghs
· ProfessorVerifiedUniversity of California, San Diego · Biomedical Engineering
Active 1990–2025
About
Gert Cauwenberghs is the Principal Investigator and Co-Director at UC San Diego, with a focus on research related to neuromorphic hardware, spike-based computing, spiking neuron systems and algorithms, and machine learning. His work involves high-performance hardware architecture, High Bandwidth Memory (HBM4), Network-on-Chip (NOC) routing, and systolic array accelerators for machine learning. He is engaged in advancing neuromorphic hardware and bio-instrumentation, including neural prostheses such as retinal implants and cochlear devices, as well as developing low power vision sensing and memristive neuromorphic computing for biomedical applications. His research interests also encompass neuroengineering, computational neuroscience, and the interface of human and artificial intelligence, contributing to the development of implantable brain-machine interfaces and neural systems.
Research topics
- Computer Science
- Artificial Intelligence
- Computer hardware
- Engineering
- Computer architecture
- Parallel computing
- Medicine
- Operating system
- Biology
- Computer engineering
- Biomedical engineering
- Internal medicine
- Electrical engineering
- Embedded system
- Chemistry
- Neuroscience
Selected publications
2025-06-08
preprintOpen accessClo-HDnn is an on-device learning (ODL) accelerator designed for emerging continual learning (CL) tasks. CloHDnn integrates hyperdimensional computing (HDC) along with low-cost Kronecker HD Encoder and weight clustering feature extraction (WCFE) to optimize accuracy and efficiency. Clo-HDnn adopts gradient-free CL to efficiently update and store the learned knowledge in the form of class hypervectors. Its dual-mode operation enables bypassing costly feature extraction for simpler datasets, while progressive search reduces complexity by up to 61 % by encoding and comparing only partial query hypervectors. Achieving 4.66 TFLOPS/W (FE) and 3.78 TOPS/W (classifier), Clo-HDnn delivers <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$7.77 \times$</tex> and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$4.85 \times$</tex> higher energy efficiency compared to SOTA ODL accelerators.
bioRxiv (Cold Spring Harbor Laboratory) · 2025-11-03
preprintOpen accessAbstract Organic Electrochemical Transistors (OECTs) are witnessing rapid growth in biomedical applications and are increasingly becoming an integral part of bio-electronic interfaces. High-performing OECTs are typically fabricated using multistep photolithography and conventional spin-coating and lift-off processes, and while printing techniques have emerged as promising alternatives, they still face challenges in achieving comparable resolutions, reproducibility and performance metrics. Several groups have demonstrated printed OECTs using PEDOT:PSS as the channel material, highlighting the promise of additive manufacturing for scalable bioelectronics. In this work, we build upon these advances and develop an optimized inkjet-printed OECT platform that achieves transconductance values up to 15 mS and sub-millisecond response times as low as 0.31 ms. Our approach systematically optimizes OECT geometrical parameters—channel width, length, and thickness—through precise patterning and oxygen plasma surface modification to overcome longstanding limitations in inkjet printing resolution and reproducibility. The resulting devices exhibit outstanding electrical stability, high amplification, and fast dynamic response. Using a configuration optimized for biosensing, we demonstrate the detection of the heart failure biomarker NT-proBNP within a clinically relevant range of 10–400 pg/mL, with a sensitivity of 0.038% ΔI DS /pg/mL. In a separate configuration on a flexible substrate tailored for in vivo biopotential recording, we showcase the devices’ capabilities by effectively capturing epileptic seizure progression in a rat model with high signal fidelity. This work demonstrates how careful process and geometry optimization can close the performance gap between printed and conventionally fabricated OECTs, enabling scalable, reproducible, and substrate-flexible bioelectronic platforms.
A Dynamic Spike Sorter for Multiscale Nanoelectrode Array Recordings
Advanced Materials Interfaces · 2025-06-18 · 2 citations
articleOpen accessAbstract High‐throughput, intracellular electrophysiology is crucial for advancing the understanding of neuronal processing and network dynamics. Nanoelectrode arrays (NEA) offer a promising approach by directly capturing intracellular signals across sub‐neuronal compartments, including action potentials, postsynaptic potentials, and low‐frequency membrane fluctuations. However, the complexity of NEA datasets, characterized by multiscale events of varying amplitude and duration, demands novel analytical strategies. In this work, a dynamic spike sorting pipeline is introduced and designed to isolate, extract, and sort these diverse electrical signals within a landscape of spontaneous electrical behavior. It is obtained estimates of signal attenuation and distortion using a bespoke biophysical circuit simulation designed to match the specific nanoelectrode interface. Based on these observations, bounds are set for filtering and extracting multiscale waveforms, and validated their isolation using pharmacological data. Finally, it is shown that multiscale analysis of spontaneous electrical recordings reveals interrelationships between high frequency events such as action potentials, and low frequency membrane potential fluctuations which may inform models of neuronal network excitability. Advanced sorting algorithms tailored for nanoelectrode array recordings are essential for unlocking the full potential of next generation, high throughput neuroelectronic devices and achieving a deeper understanding of neuronal dynamics.
A noise-tolerant human–machine interface based on deep learning-enhanced wearable sensors
Nature Sensors · 2025-11-17 · 10 citations
article2025-08-24
articleClo-HDnn is an on-device learning (ODL) accelerator designed for emerging continual learning (CL) tasks. Clo-HDnn integrates hyperdimensional computing (HDC) along with low-cost Kronecker HD Encoder and weight clustering feature extraction (WCFE) to optimize accuracy and efficiency. Clo-HDnn adopts gradient-free CL to efficiently update and store the learned knowledge in the form of class hypervectors. Its dual-mode operation enables bypassing costly feature ex- traction for simpler datasets, while progressive search reduces complexity by up to $61 \%$ by encoding and comparing only partial query hypervectors. Achieving 4.66 TFLOPS/W (FE) and 3.78 TOPS/W (classifier), Clo-HDnn delivers $7.77 \times$ and $4.85 \times$ higher energy efficiency compared to SOTA ODL accelerators.
2025-02-16 · 1 citations
articlePhysiological data, such as EEG and ECG, are crucial in delivering vital information for medical diagnostics and research applications. Recently, the demand for biopotential recording using two electrodes has grown thanks to its better user experience and lower cost than counterparts with three electrodes [1], [2]. However, a two-electrode recording IC suffers from a large common-mode interference (CMI) over <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$100\mathrm{V}_{\text{PP}}$</tex> [3], potentially saturating an analog front-end (AFE) or resulting in large CM to differential-mode (DM) conversion. These challenges necessitate biopotential AFEs to possess a large CMI tolerance as well as a high total common-mode rejection ratio (T-CMRR) while providing excellent noise efficiency with low power consumption. Figure 15.7.1(a) depicts a simplified electrical model of CMI coupling from a power source <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\mathrm{V}_{\text{PO}})$</tex> in a two-electrode recording system attached to a human body [4]. In a ground-isolated system, a displacement current <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\mathrm{l}_{\mathrm{d}})$</tex> splits into <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{l}_{\mathrm{b}}$</tex> and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{l}_{\text{GND}}$</tex>, which flow through <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{C}_{\text{Body}}$</tex> and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{C}_{\text{GND}}$</tex>, respectively. The high CM input impedance of the IA <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\mathrm{Z}_{\text{lN}-\text{CM}-\mathrm{C}})$</tex> converts the <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{l}_{\text{GND}}$</tex> into a large CMI voltage <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\mathrm{V}_{\text{CMl}-\mathrm{C}})$</tex> relative to the chip ground. The <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{C}_{\text{GND}}$</tex> represents the parasitic capacitance between floating chip ground and earth ground, which ranges from 1 to <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$3\text{pF}$</tex> depending on the size of a recording IC and battery [4]. CMI cancellation techniques based on the CM charge pump (CMCP) in Fig. 15.7.1(b) can tolerate CMI up to <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$20\mathrm{V}_{\text{PP}}$</tex> and achieve high T-CMRR [1], [2], [5], [6]. However, these approaches introduce significant noise <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$( > 2\mu \mathrm{V}_{\text{rms}})$</tex> and increase power consumption due to the CMCP, rendering them unsuitable for EEG measurements. An alternative method, the CM averaging unit (CMAU) reported in [7], reduces CMI by driving the chip ground to match the CMI. However, this design overlooks the <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{C}_{\text{GND}}$</tex>, making them feasible only for specific systems without the parasitic <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathrm{C}_{\text{GND}}$</tex>.
ArXiv.org · 2025-03-20
preprintOpen accessSenior authorIn this work, we present HiAER-Spike, a modular, reconfigurable, event-driven neuromorphic computing platform designed to execute large spiking neural networks with up to 160 million neurons and 40 billion synapses - roughly twice the neurons of a mouse brain at faster-than real-time. This system, which is currently under construction at the UC San Diego Supercomputing Center, comprises a co-designed hard- and software stack that is optimized for run-time massively parallel processing and hierarchical address-event routing (HiAER) of spikes while promoting memory-efficient network storage and execution. Our architecture efficiently handles both sparse connectivity and sparse activity for robust and low-latency event-driven inference for both edge and cloud computing. A Python programming interface to HiAER-Spike, agnostic to hardware-level detail, shields the user from complexity in the configuration and execution of general spiking neural networks with virtually no constraints in topology. The system is made easily available over a web portal for use by the wider community. In the following we provide an overview of the hard- and software stack, explain the underlying design principles, demonstrate some of the system's capabilities and solicit feedback from the broader neuromorphic community.
Brain-Body Coupling in Listening to Metronomic Sounds and Music
2025-07-14
articleSenior authorWearables continue to expand their ability to sample more signals from the brain and body while reducing form factors and limiting on-device computing to meet real-world comfort and power constraints. Where evolving multimodal sensing platforms can combine electrophysiological (ExG), optical, chemical, and mechanical sensors for holistic brain-body state estimation, understanding of cross-modal coupling mechanisms remains limited. Here, we collect and analyze bio-signals in a brain-body coupling experiment using simultaneous electroencephalogram (EEG), electrocardiogram (ECG), and respiratory measurements. Our experimental paradigm contrasted listening to self-selected music, preceded by tempo-matched isochronous cues. The results show a clear entrainment of cardiac rhythms with the underlying beat of auditory stimuli for both metronome cues and subject-selected music. Magnitude-Squared coherence analysis showed frequency coupling across recorded modalities, with ECG, EEG, and breath, exhibiting peak coherence near heard or underlying musical beat or its harmonics. We present minimally pre-processed data to motivate future methodological explorations of multiscale brain-body entrainment to rhythmic stimuli.Clinical relevance-These findings have implications for designing music therapy and biofeedback interventions that could adapt to ongoing brain-body rhythms. Understanding brain-body entrainment mechanisms and validating simplified measurement approaches could enable more accessible and effective rhythmic interventions in clinical settings, particularly for conditions benefiting from audio-based therapies or cardiac rehabilitation.
2025-05-25
articleSenior authorMany demanding applications of signal processing and neural networks require scalable, precise and energy efficient matrix-vector multipliers. Translinear MOS circuits, which use logarithmically compressed subthreshold voltages to control linear currents, could offer a particularly dense solution with high dynamic range. However, their susceptibility to Process, Voltage and Temperature (PVT) variations has thus far hindered their use in large arrays. Therefore, we propose a scalable translinear matrix-vector multiplier array, comprising only two transistors and one capacitor per cell, that offers very high PVT invariance. We employ two techniques to achieve this: first, we use dynamic current mirroring within the individual multiplier cells. Second, we store the matrix coefficients using a correlated double sampling (CDS) scheme that is highly invariant to process and voltage variations. To improve retention, stored weights are periodically refreshed through the same CDS scheme, which also compensates for 1/f noise and temperature variations. This refresh is fast and row parallel, allowing our proposed array to scale to millions of weights. We designed and verified the proposed architecture through transistor-level simulation in a 22 nm CMOS process. At less than 0.5 fJ per multiply-accumulate operation, this architecture is especially promising for scalable low-power applications.
Where to cut: Efficient ADC quantization for analog in-memory computing with discrete values
2025-05-25
articleSenior authorMany proposed in-memory-computing systems use analog memristive crossbars to compute matrix-vector products over discrete domains. This yields analog outputs distributed around discrete values across a wide nominal range. Lossless quantization of this range requires costly high-precision analog-to-digital converters (ADCs), which limits the applicability of this approach. But typical results are highly concentrated in a small central region; hence, an ADC with lower resolution that only operates in this central region can achieve almost full accuracy at a fraction of the cost. In this paper, we explore how to appropriately choose ADC resolution and the covered region of interest, specifically for low-precision applications in approximate in-memory-computing. Our results reveal two distinct strategies: ADCs with sufficient resolution should (at least) capture the region of interest without loss, whereas lower-resolution ADCs should space their levels just enough to cover the region of interest. We argue that using this scheme could drastically improve power efficiency and thus scalability of compute-in-memory architectures.
Recent grants
EFRI-M3C: Distributed Brain Dynamics in Human Motor Control
NSF · $1.9M · 2011–2015
NIH · $1.2M · 2010
Collaborative Research: Visual Cortex on Silicon
NSF · $750k · 2013–2018
NSF · $1.5M · 2018–2022
PFI:BIC - Unobtrusive Neurotechnology and Immersive Human-Computer Interface for Enhanced Learning
NSF · $1.0M · 2017–2021
Frequent coauthors
- 83 shared
Georges Gielen
- 73 shared
John Baillieul
- 73 shared
Thomas Siegert
University of West Florida
- 73 shared
Vaishali Damle
- 73 shared
Christofer Hierold
- 73 shared
Isabel Trancoso
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
- 73 shared
Dawn Melley
- 73 shared
Patrick Kempf
Institute of Electrical and Electronics Engineers
Labs
Neuromorphic engineering, computational neuroscience, and bioelectronics
Education
- 1994
PhD, Electrical Engineering
California Institute of Technology
Awards & honors
- Francqui Fellow of the Belgian American Educational Foundati…
- Fellow of the Institute of Electrical and Electronic Enginee…
- Fellow of the American Institute for Medical and Biological…
- National Science Foundation Career Award
- Office of Naval Research Young Investigator Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Gert Cauwenberghs
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup