David Kaeli

· Distinguished Professor of Electrical and Computer Engineering, College of Engineering | Program Director of Data Science, College of Engineering | Affiliated Faculty of Bioengineering, College of Engineering | Affiliated Faculty, Khoury College of Computer SciencesVerified

Northeastern University · Biomedical Engineering

Active 1989–2026

h-index44

Citations7.9k

Papers47287 last 5y

Funding$16.7M2 active

Faculty page

See your match with David Kaeli — sign in to PhdFit.Sign in

About

David Kaeli is a Northeastern University College of Engineering Distinguished Professor on the Electrical and Computer Engineering faculty. He received a BS and PhD in Electrical Engineering from Rutgers University and an MS in Computer Engineering from Syracuse University. He directs the Northeastern University Computer Architecture Research Laboratory (NUCAR), which is an AMD Strategic Academic Partner. Prior to joining Northeastern in 1993, he spent 12 years at IBM, including seven years at the T.J. Watson Research Center. His research focuses on computer architecture, GPUs, heterogeneous computing, performance analysis, security, data privacy and encryption, hardware reliability and recovery, simulation, and workload characterization. He is a Fellow of the IEEE and ACM, and has received numerous honors including the NSF CAREER Award, the Søren Buus Outstanding Research Award, and recognition as a top scientist worldwide by Stanford University. Kaeli is also a distinguished scientist of the ACM and a distinguished professor at Northeastern University, contributing significantly to high-performance, reliable, and secure computing hardware and software systems through his leadership at NUCAR and his involvement in various research centers and initiatives.

Research topics

Computer Science
Artificial Intelligence
Business
Machine Learning
Engineering
Computer Security
Computer engineering
Parallel computing
Operating system
Programming language
Embedded system
Distributed computing
Marketing
Psychology
Microeconomics
Industrial organization
Human–computer interaction
Computer network
Management science
Engineering management
Economics
Risk analysis (engineering)

Selected publications

Human Cognition in Machines: A Unified Perspective of World Models
arXiv (Cornell University) · 2026-04-17
preprintOpen access
This comprehensive report distinguishes prior works by the cognitive functions they innovate. Many works claim an almost "human-like" cognitive capability in their world models. To evaluate these claims requires a proper grounding in first principles in Cognitive Architecture Theory (CAT). We present a conceptual unified framework for world models that fully incorporates all the cognitive functions associated with CAT (i.e. memory, perception, language, reasoning, imagining, motivation, and meta-cognition) and identify gaps in the research as a guide for future states of the art. In particular, we find that motivation (especially intrinsic motivation) and meta-cognition remain drastically under-researched, and we propose concrete directions informed by active inference and global workspace theory to address them. We further introduce Epistemic World Models, a new category encompassing agent frameworks for scientific discovery that operate over structured knowledge. Our taxonomy, applied across video, embodied, and epistemic world models, suggests research directions where prior taxonomies have not.
Publisher DOI
Human Cognition in Machines: A Unified Perspective of World Models
arXiv (Cornell University) · 2026-04-17
articleOpen access
This comprehensive report distinguishes prior works by the cognitive functions they innovate. Many works claim an almost "human-like" cognitive capability in their world models. To evaluate these claims requires a proper grounding in first principles in Cognitive Architecture Theory (CAT). We present a conceptual unified framework for world models that fully incorporates all the cognitive functions associated with CAT (i.e. memory, perception, language, reasoning, imagining, motivation, and meta-cognition) and identify gaps in the research as a guide for future states of the art. In particular, we find that motivation (especially intrinsic motivation) and meta-cognition remain drastically under-researched, and we propose concrete directions informed by active inference and global workspace theory to address them. We further introduce Epistemic World Models, a new category encompassing agent frameworks for scientific discovery that operate over structured knowledge. Our taxonomy, applied across video, embodied, and epistemic world models, suggests research directions where prior taxonomies have not.
Publisher OA PDF
Towards high-accuracy bacterial taxonomy identification using phenotypic single-cell Raman spectroscopy data
ISME Communications · 2025-01-01 · 6 citations
articleOpen access
Abstract Single-cell Raman Spectroscopy (SCRS) emerges as a promising tool for single-cell phenotyping in environmental ecological studies, offering non-intrusive, high-resolution, and high-throughput capabilities. In this study, we obtained a large and the first comprehensive SCRS dataset that captured phenotypic variations with cell growth status for 36 microbial strains, and we compared and optimized analysis techniques and classifiers for SCRS-based taxonomy identification. First, we benchmarked five dimensionality reduction (DR) methods, 10 classifiers, and the impact of cell growth variances using a SCRS dataset with both taxonomy and cellular growth stage labels. Unsupervised DR methods and non-neural network classifiers are recommended for at a balance between accuracy and time efficiency, achieved up to 96.1% taxonomy classification accuracy. Second, accuracy variances caused by cellular growth variance (&lt;2.9% difference) was found less than the influence from model selection (up to 41.4% difference). Remarkably, simultaneous high accuracy in growth stage classification (93.3%) and taxonomy classification (94%) were achievable using an innovative two-step classifier model. Third, this study is the first to successfully apply models trained on pure culture SCRS data to achieve taxonomic identification of microbes in environmental samples at an accuracy of 79%, and with validation via Raman-FISH (fluorescence in situ hybridization). This study paves the groundwork for standardizing SCRS-based biotechnologies in single-cell phenotyping and taxonomic classification beyond laboratory pure culture to real environmental microorganisms and promises advances in SCRS applications for elucidating organismal functions, ecological adaptability, and environmental interactions.
Publisher OA PDF DOI
Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs
2025-05-11
article
Dynamic Binary instrumentation (DBI) is a widely used technique for collecting detailed, fine-grained information from program execution without requiring recompilation or access to the program's source code. DBI provides several benefits over static instrumentation, including full code discovery and the ability to selectively toggle profiling during runtime. Luthier is an open-source DBI framework targeting AMD GPUs, designed to integrate and run seamlessly on the ROCm software stack. During runtime, Luthier allows inspection of loaded GPU code objects and carries out instrumentation by either manually modifying instructions or inserting calls to special device functions (i.e., 'hooks'') at user-specified locations in the program. Luthier hooks allow inspection and modification of the device visible state, and can communicate with the host via host-accessible device memory buffers. Luthier also supports switching between instrumented and un-instrumented versions of a kernel. In this paper, we describe some of the key design challenges we encountered when developing this open-source DBI framework. We then showcase Luthier's user-facing APIs and internal components, providing example usecases implemented using our framework. While Luthier incurs a 50X runtime overhead (on average) when running an instrumented application, this overhead is 10 times lower as compared to the state-of-the-art GPU-based DBI framework, when running equivalent tools on the same workload written in CUDA.
Publisher DOI
Unlocking the Power of Data Harmonization in Environmental Health Sciences: A Comprehensive Exploration of Significance, Use Cases, and Recommendations for Standardization Efforts
Environmental Health Perspectives · 2025-06-06 · 4 citations
reviewOpen access
BACKGROUND: The field of environmental health sciences increasingly demands comprehensive and diverse datasets, particularly in response to emerging research areas such as climate change, mixtures, and exposomics. The data needed to address the complexity of environmental health research questions often extend beyond the boundaries of a single study or data resource. Traditional data management approaches struggle to harmonize the ever-expanding and heterogeneous data sources needed for research in the environmental health sciences. Harmonization may help address this issue as it involves aligning and standardizing various elements of data to allow comprehensive analysis, data pooling and interpretation across studies. OBJECTIVES: The primary objective is to inform researchers about the transformative potential of embracing harmonization methodologies and to motivate contributions to ongoing efforts, thereby fostering advancements. METHODS: Using the Environmental Health Language Collaborative's Data Harmonization Use Case, we provide a practical illustration of existing data harmonization approaches, identify gaps, and emphasize future research and application directions. We selected two publicly available environmental epidemiology studies on the topic of childhood asthma and three studies on the topic of biomarkers of metals exposure during pregnancy and birth outcomes and applied several existing harmonization approaches to assess interoperability. DISCUSSION: Our process revealed the potential limitations of many existing harmonization approaches, with notable failures to identify common variables across independent datasets and lack of agreement between human and computer-based approaches. This use case identified various challenges with existing approaches, including reliance on often incomplete data documentation and large amounts of manual effort. To address these challenges, we recommend the continued advancement and dissemination of community data standards, the development of software and tools to facilitate harmonization through automation, and strategic efforts to promote engagement in data harmonization within the environmental health sciences community. Collaborative science is needed to advance our understanding of environmental contributors to health, and realizing the harmonization potential of our scientific data is a step toward improved collaboration. https://doi.org/10.1289/EHP15410.
Publisher DOI
Association between organic micropollutants in tap water and human exposure and birth outcomes: Implications for environmental health in northern Puerto Rico
Journal of Hazardous Materials · 2025-03-18
article
Publisher DOI
Accelerating mesh-based Monte Carlo simulations using contemporary graphics ray-tracing hardware
ArXiv.org · 2025-11-27
preprintOpen access
Significance: Monte Carlo (MC) methods are the gold-standard for modeling light-tissue interactions due to their accuracy. Mesh-based MC (MMC) offers enhanced precision for complex tissue structures using tetrahedral mesh models. Despite significant speedups achieved on graphics processing units (GPUs), MMC performance remains hindered by the computational cost of frequent ray-boundary intersection tests. Aim: We propose a highly accelerated MMC algorithm, RT-MMC, that leverages the hardware-accelerated ray traversal and intersection capabilities of ray-tracing cores (RT-cores) on modern GPUs. Approach: Implemented using NVIDIA's OptiX platform, RT-MMC extends graphics ray-tracing pipelines towards volumetric ray-tracing in turbid media, eliminating the need for challenging tetrahedral mesh generation while delivering significant speed improvements through hardware acceleration. It also intrinsically supports wide-field sources without complex mesh retesselation. Results: RT-MMC demonstrates excellent agreement with traditional software-ray-tracing MMC algorithms while achieving 1.5x to 4.5x speedups across multiple GPU architectures. These performance gains significantly enhance the practicality of MMC for routine simulations. Conclusion: Migration from software- to hardware-based ray-tracing not only greatly simplifies MMC simulation workflows, but also results in significant speedups that are expected to increase further as ray-tracing hardware rapidly gains adoption. Adoption of graphics ray-tracing pipelines in quantitative MMC simulations enables leveraging of emerging hardware resources and benefits a wide range of biophotonics applications.
Publisher OA PDF DOI
RAGs to Riches: RAG-like Few-shot Learning for Large Language Model Role-playing
ArXiv.org · 2025-09-15
preprintOpen access
Role-playing Large language models (LLMs) are increasingly deployed in high-stakes domains such as healthcare, education, and governance, where failures can directly impact user trust and well-being. A cost effective paradigm for LLM role-playing is few-shot learning, but existing approaches often cause models to break character in unexpected and potentially harmful ways, especially when interacting with hostile users. Inspired by Retrieval-Augmented Generation (RAG), we reformulate LLM role-playing into a text retrieval problem and propose a new prompting framework called RAGs-to-Riches, which leverages curated reference demonstrations to condition LLM responses. We evaluate our framework with LLM-as-a-judge preference voting and introduce two novel token-level ROUGE metrics: Intersection over Output (IOO) to quantity how much an LLM improvises and Intersection over References (IOR) to measure few-shot demonstrations utilization rate during the evaluation tasks. When simulating interactions with a hostile user, our prompting strategy incorporates in its responses during inference an average of 35% more tokens from the reference demonstrations. As a result, across 453 role-playing interactions, our models are consistently judged as being more authentic, and remain in-character more often than zero-shot and in-context Learning (ICL) methods. Our method presents a scalable strategy for building robust, human-aligned LLM role-playing frameworks.
Publisher OA PDF DOI
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting
ArXiv.org · 2025-07-07
preprintOpen access
Recent large-scale Vision Language Action (VLA) models have shown superior performance in robotic manipulation tasks guided by natural language. However, current VLA models suffer from two drawbacks: (i) generation of massive tokens leading to high inference latency and increased training cost, and (ii) insufficient utilization of generated actions resulting in potential performance loss. To address these issues, we develop a training framework to finetune VLA models for generating significantly fewer action tokens with high parallelism, effectively reducing inference latency and training cost. Furthermore, we introduce an inference optimization technique with a novel voting-based ensemble strategy to combine current and previous action predictions, improving the utilization of generated actions and overall performance. Our results demonstrate that we achieve superior performance compared with state-of-the-art VLA models, achieving significantly higher success rates and 39$\times$ faster inference than OpenVLA with 46 Hz throughput on edge platforms, demonstrating practical deployability. The code is available at https://github.com/LukeLIN-web/VOTE.
Publisher OA PDF DOI
FIDESlib: A Fully-Fledged Open-Source FHE Library for Efficient CKKS on GPUs
2025-05-11 · 3 citations
article
Word-wise Fully Homomorphic Encryption (FHE) schemes, such as CKKS, are gaining significant traction due to their ability to provide post-quantum-resistant, privacypreserving approximate computing-an especially desirable feature in the Machine-Learning-as-a-Service (MLaaS) paradigm. In this work, we introduce FIDESlib, the first open-source server-side CKKS GPU library that is fully interoperable with well-established client-side OpenFHE operations. Unlike other existing open-source GPU libraries, FIDESlib provides the first implementation featuring heavily optimized GPU kernels for all CKKS primitives, including bootstrapping. Our library also integrates robust benchmarking and testing, ensuring it remains adaptable to further optimization. Comparing our scheme against Phantom (the previously top open-source CKK library, we show that FIDESlib offers superior performance and scalability. For bootstrapping, FIDESlib achieves no less than <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$70 \times$</tex> speedup over the AVX-optimized OpenFHE implementation. FIDESlib is available on Github <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup><sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>https://github.com/CAPS-UMU/FIDESlib.
Publisher DOI

Recent grants

SHF: Small: The Cross-layer Reliability Stack
NSF · $350k · 2010–2014
CSR: Small: Collaborative Research: Leveraging Intra-chip/Inter-chip Silicon-Photonic Networks for Designing Next-Generation Accelerators
NSF · $266k · 2015–2019
STARSS: Small: Side-Channel Analysis and Resiliency Targeting Accelerators
NSF · $300k · 2016–2020
Project 3 - Effect of Extreme Weather on Potential Exposure of Contaminant Mixtures in Karst Water Systems
NIH · $13.7M · 2010–2026
Collaborative Research: CSR: Medium: Architecting GPUs for Practical Homomorphic Encryption-based Computing
NSF · $600k · 2023–2027

Frequent coauthors

Dana Schaa
Universidad del Noreste
62 shared
Perhaad Mistry
Advanced Micro Devices (United States)
58 shared
Yifan Sun
Ningbo University
35 shared
John Kim
30 shared
Jennifer Dy
Northeastern University
29 shared
Lee Howes
28 shared
José Luis Abellán
Universidad de Murcia
28 shared
Ajay Joshi
25 shared

Labs

Northeastern University Computer Architecture Research Laboratory (NUCAR)PI

Education

PhD Electrical Engineering, Electrical and Computer Engineerin
Rutgers The State University of New Jersey
1992

Awards & honors

NSF CAREER Award (1996)
Fellow of the IEEE
Fellow of the ACM
Søren Buus Outstanding Research Award (2009)
Distinguished Scientist, Association of Computing Machinery

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with David Kaeli

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you