Murali Annavaram

· Lloyd F. Hunt Chair of Electrical Power Engineering and Professor of Electrical and Computing Engineering and Computer ScienceVerified

University of Southern California · Thomas Lord Department of Computer Science

Active 2001–2026

h-index45

Citations7.5k

Papers27083 last 5y

Funding$2.3M1 active

Faculty page Lab page Website

See your match with Murali Annavaram — sign in to PhdFit.Sign in

About

Murali Annavaram has been a faculty member in the Ming-Hsieh Department of Electrical Engineering at the University of Southern California since 2007, where he currently holds the Robert G. and Mary G. Lane Early Career Chair. His research focuses on energy efficiency and reliability of computing platforms, with particular attention to mobile platforms, sensor management for health monitoring, and computer systems architecture exploring reliability challenges in future CMOS technologies. He has received notable awards including the NSF CAREER award in 2010 and the IBM Faculty Partnership award in 2009. Annavaram has a background that includes working as a senior research scientist at Intel Microprocessor Research Labs from 2001 to 2007, where he contributed to energy-efficient server design and 3D stacking architectures, and a visiting researcher at Nokia Research Center in 2007, working on traffic sensing technologies. His work on Energy Per Instruction Throttling influenced Intel's Core i7 processor, and his research on Virtual-Trip-Lines laid the foundation for Nokia Traffic Works, a real-time traffic sensing product. He holds a Ph.D. in Computer Engineering from the University of Michigan, Ann Arbor, and is a Senior Member of IEEE and ACM.

Research topics

Artificial Intelligence
Computer Science
Machine Learning
Algorithm
Computer engineering
Geography
Operating system

Selected publications

Differentially Private Retrieval-Augmented Generation
ArXiv.org · 2026-02-16
articleOpen accessSenior author
Retrieval-augmented generation (RAG) is a widely used framework for reducing hallucinations in large language models (LLMs) on domain-specific tasks by retrieving relevant documents from a database to support accurate responses. However, when the database contains sensitive corpora, such as medical records or legal documents, RAG poses serious privacy risks by potentially exposing private information through its outputs. Prior work has demonstrated that one can practically craft adversarial prompts that force an LLM to regurgitate the augmented contexts. A promising direction is to integrate differential privacy (DP), a privacy notion that offers strong formal guarantees, into RAG systems. However, naively applying DP mechanisms into existing systems often leads to significant utility degradation. Particularly for RAG systems, DP can reduce the usefulness of the augmented contexts leading to increase risk of hallucination from the LLMs. Motivated by these challenges, we present DP-KSA, a novel privacy-preserving RAG algorithm that integrates DP using the propose-test-release paradigm. DP-KSA follows from a key observation that most question-answering (QA) queries can be sufficiently answered with a few keywords. Hence, DP-KSA first obtains an ensemble of relevant contexts, each of which will be used to generate a response from an LLM. We utilize these responses to obtain the most frequent keywords in a differentially private manner. Lastly, the keywords are augmented into the prompt for the final output. This approach effectively compresses the semantic space while preserving both utility and privacy. We formally show that DP-KSA provides formal DP guarantees on the generated output with respect to the RAG database. We evaluate DP-KSA on two QA benchmarks using three instruction-tuned LLMs, and our empirical results demonstrate that DP-KSA achieves a strong privacy-utility tradeoff.
Publisher OA PDF
PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior
arXiv (Cornell University) · 2026-05-12
preprintOpen accessSenior author
Large language models (LLMs) are increasingly used to simulate human behavior, but their ability to simulate $individual$ privacy decisions is not well understood. In this paper, we address the problem of evaluating whether a core set of user persona attributes can drive LLMs to simulate individual-level privacy behavior. We introduce PrivacySIM, an evaluation suite that benchmarks LLM simulation of user privacy behavior against the ground-truth responses of 1,000 users. These users are drawn from five published user studies on privacy spanning LLM healthcare consultations, conversational agents, and chatbots. Drawing on these user studies, we hypothesize three persona facets as plausible predictors of privacy decision-making: demographics, previous experiences, and stated privacy attitudes. We condition nine frontier LLMs on subsets of these three facets and measure how often each model's response to a data-sharing scenario matches the user's actual response. Our findings show that (1) privacy persona conditioning consistently improves simulation quality over no-persona conditioning, but even the strongest model (40.4\% accuracy) remains far from faithfully simulating individual privacy decisions. (2) A user's stated privacy attitudes alone may not be the best predictor because they often diverge from the user's actual privacy behavior. (3) Users with high AI/chatbot experience but low stated privacy attitudes are the most challenging to simulate. PrivacySIM is a first step toward understanding and improving the capabilities of LLMs to simulate user privacy decisions. We release PrivacySIM to enable further evaluation of LLM privacy simulation.
Publisher DOI
LRD-MPC: Efficient MPC Inference through Low-rank Decomposition
Open MIND · 2026-02-16
preprintSenior author
Secure Multi-party Computation (MPC) enables untrusted parties to jointly compute a function without revealing their inputs. Its application to machine learning (ML) has gained significant attention, particularly for secure inference services deployed across multiple cloud virtual machines (VMs), where each VM acts as an MPC party. Model providers secret-share model weights, and users secret-share inputs, ensuring that each server operates only on random shares. While MPC provides strong cryptographic guarantees, it incurs substantial computational and communication overhead. Deep neural networks rely heavily on convolutional and fully connected layers, which require costly matrix multiplications in MPC. To reduce this cost, we propose leveraging low-rank decomposition (LRD) for linear layers, replacing one large matrix multiplication with two smaller ones. Each matrix multiplication in MPC incurs a round of communication, meaning decomposing one matrix multiplication into two leads to an additional communication round. Second, the added matrix multiplication requires an additional truncation step to maintain numerical precision. Since truncation itself requires communication and computation, these overheads can offset the gains from decomposition. To address this, we introduce two complementary optimizations: truncation skipping and efficient linear layer concatenation. Truncation skipping removes the extra truncation induced by LRD, while linear layer concatenation pipelines operations to hide the additional communication round. Together, these techniques mitigate the main overheads of LRD in MPC and improve overall efficiency. Our approach is broadly applicable across MPC protocols. Experiments show up to 25% speedup in n-PC and 33% in 3-PC protocols over full-rank baselines, along with up to 52% GPU energy savings and 88% reduction in offline-phase latency.
DOI
PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior
ArXiv.org · 2026-05-12
articleOpen accessSenior author
Large language models (LLMs) are increasingly used to simulate human behavior, but their ability to simulate $individual$ privacy decisions is not well understood. In this paper, we address the problem of evaluating whether a core set of user persona attributes can drive LLMs to simulate individual-level privacy behavior. We introduce PrivacySIM, an evaluation suite that benchmarks LLM simulation of user privacy behavior against the ground-truth responses of 1,000 users. These users are drawn from five published user studies on privacy spanning LLM healthcare consultations, conversational agents, and chatbots. Drawing on these user studies, we hypothesize three persona facets as plausible predictors of privacy decision-making: demographics, previous experiences, and stated privacy attitudes. We condition nine frontier LLMs on subsets of these three facets and measure how often each model's response to a data-sharing scenario matches the user's actual response. Our findings show that (1) privacy persona conditioning consistently improves simulation quality over no-persona conditioning, but even the strongest model (40.4\% accuracy) remains far from faithfully simulating individual privacy decisions. (2) A user's stated privacy attitudes alone may not be the best predictor because they often diverge from the user's actual privacy behavior. (3) Users with high AI/chatbot experience but low stated privacy attitudes are the most challenging to simulate. PrivacySIM is a first step toward understanding and improving the capabilities of LLMs to simulate user privacy decisions. We release PrivacySIM to enable further evaluation of LLM privacy simulation.
Publisher OA PDF
SAGERec: Sampling and Gating for Enhanced Long-Tail Item Recommendations
2026-02-16
articleOpen accessSenior author
Recommendation systems are an integral part of daily life, influencing how people interact with and access information. The content recommended to users shapes their perceptions, making it crucial to eliminate biases that could negatively impact those perceptions. One such bias is the popularity bias which causes the long-tail item recommendation problem, where systems tend to favor popular items while overlooking less popular yet relevant ones.
Publisher DOI
LRD-MPC: Efficient MPC Inference through Low-rank Decomposition
arXiv (Cornell University) · 2026-02-16
articleOpen accessSenior author
Secure Multi-party Computation (MPC) enables untrusted parties to jointly compute a function without revealing their inputs. Its application to machine learning (ML) has gained significant attention, particularly for secure inference services deployed across multiple cloud virtual machines (VMs), where each VM acts as an MPC party. Model providers secret-share model weights, and users secret-share inputs, ensuring that each server operates only on random shares. While MPC provides strong cryptographic guarantees, it incurs substantial computational and communication overhead. Deep neural networks rely heavily on convolutional and fully connected layers, which require costly matrix multiplications in MPC. To reduce this cost, we propose leveraging low-rank decomposition (LRD) for linear layers, replacing one large matrix multiplication with two smaller ones. Each matrix multiplication in MPC incurs a round of communication, meaning decomposing one matrix multiplication into two leads to an additional communication round. Second, the added matrix multiplication requires an additional truncation step to maintain numerical precision. Since truncation itself requires communication and computation, these overheads can offset the gains from decomposition. To address this, we introduce two complementary optimizations: truncation skipping and efficient linear layer concatenation. Truncation skipping removes the extra truncation induced by LRD, while linear layer concatenation pipelines operations to hide the additional communication round. Together, these techniques mitigate the main overheads of LRD in MPC and improve overall efficiency. Our approach is broadly applicable across MPC protocols. Experiments show up to 25% speedup in n-PC and 33% in 3-PC protocols over full-rank baselines, along with up to 52% GPU energy savings and 88% reduction in offline-phase latency.
Publisher OA PDF
Infrastructure for Valuable, Tradable, and Verifiable Agent Memory
arXiv (Cornell University) · 2026-03-25
preprintOpen accessSenior author
Every API token you spend is your accumulated wealth; once you can prove its value and the effort behind it, you can resell it. As autonomous agents repeatedly call models and tools, they accumulate memories that are your intellectual property. But today these memories remain private and non-transferable, as there is no way to validate their value. We argue that agent memory can serve as an economic commodity in the agent economy, if buyers can verify that it is authentic, effort-backed, and produced in a compatible execution context. To realize this idea, we propose clawgang, which binds memory to verifiable computational provenance, and meowtrade, a market layer for listing, transferring, and governing certified memory artifacts. Together, they transform one-shot API token spending into reusable and tradable assets, enabling timely memory transfer, reducing repeated exploration, and opening a memory trade market.
Publisher DOI
Infrastructure for Valuable, Tradable, and Verifiable Agent Memory
arXiv (Cornell University) · 2026-03-25
articleOpen accessSenior author
Every API token you spend is your accumulated wealth; once you can prove its value and the effort behind it, you can resell it. As autonomous agents repeatedly call models and tools, they accumulate memories that are your intellectual property. But today these memories remain private and non-transferable, as there is no way to validate their value. We argue that agent memory can serve as an economic commodity in the agent economy, if buyers can verify that it is authentic, effort-backed, and produced in a compatible execution context. To realize this idea, we propose clawgang, which binds memory to verifiable computational provenance, and meowtrade, a market layer for listing, transferring, and governing certified memory artifacts. Together, they transform one-shot API token spending into reusable and tradable assets, enabling timely memory transfer, reducing repeated exploration, and opening a memory trade market.
Publisher OA PDF
Differentially Private Retrieval-Augmented Generation
Open MIND · 2026-02-16
preprintSenior author
Retrieval-augmented generation (RAG) is a widely used framework for reducing hallucinations in large language models (LLMs) on domain-specific tasks by retrieving relevant documents from a database to support accurate responses. However, when the database contains sensitive corpora, such as medical records or legal documents, RAG poses serious privacy risks by potentially exposing private information through its outputs. Prior work has demonstrated that one can practically craft adversarial prompts that force an LLM to regurgitate the augmented contexts. A promising direction is to integrate differential privacy (DP), a privacy notion that offers strong formal guarantees, into RAG systems. However, naively applying DP mechanisms into existing systems often leads to significant utility degradation. Particularly for RAG systems, DP can reduce the usefulness of the augmented contexts leading to increase risk of hallucination from the LLMs. Motivated by these challenges, we present DP-KSA, a novel privacy-preserving RAG algorithm that integrates DP using the propose-test-release paradigm. DP-KSA follows from a key observation that most question-answering (QA) queries can be sufficiently answered with a few keywords. Hence, DP-KSA first obtains an ensemble of relevant contexts, each of which will be used to generate a response from an LLM. We utilize these responses to obtain the most frequent keywords in a differentially private manner. Lastly, the keywords are augmented into the prompt for the final output. This approach effectively compresses the semantic space while preserving both utility and privacy. We formally show that DP-KSA provides formal DP guarantees on the generated output with respect to the RAG database. We evaluate DP-KSA on two QA benchmarks using three instruction-tuned LLMs, and our empirical results demonstrate that DP-KSA achieves a strong privacy-utility tradeoff.
DOI
Meta-Learn to Unlearn: Enhanced Exact Machine Unlearning in Recommendation Systems with Meta-Learning
Proceedings on Privacy Enhancing Technologies · 2025-07-13 · 1 citations
articleOpen accessSenior author
Recommendation systems are used widely to recommend items such as movies, products, or news to users. The performance of a recommendation model depends on the quality of the embeddings that are associated with users and items, which are generally learned by tracking user behavior, such as their click history. Recent legislative requirements allow users to withdraw their consent to learning from some of their behaviors, even if they have provided such a consent initially. Once a user withdraws their consent, the models are supposed to unlearn the user behavior. This requirement has led to the emergence of machine unlearning, a research area that proposes a class of privacy policy-compliant techniques aimed at maintaining good model utility after deleting user information. Machine unlearning techniques are generally divided into two categories: exact unlearning, which may be accomplished by retraining the model from scratch after removing a data point from the training data; and approximate unlearning, which approximates the model parameters that would result from removing a specific user data, without needing a complete retraining of the model to minimize computational costs. In this work, we propose an enhanced exact machine unlearning (EEMU) strategy that leverages meta-learning to reduce the loss of recommendation performance while ensuring efficient and exact unlearnability. We demonstrate our results using four public datasets and show a significant improvement in recommendation performance over state-of-the-art baselines while preserving the privacy guarantees of exact unlearning.
Publisher DOI

Recent grants

CSR-PSCE,SM: A Holistic Design Approach to Reliability Using 3D Stacked
NSF · $419k · 2008–2013
SHF:Small: Accelerating Graph Analytics Through Coordinated Storage, Memory and Computing Advances
NSF · $400k · 2017–2020
CT-ISG: A Game Theoretic Framework for Privacy Preservation in Community-Based Mobile Applications
NSF · $266k · 2008–2012
SHF: Small: ML Accelerator Cohort Architecture
NSF · $600k · 2022–2026
CAREER: From Nonstop-Monitoring to Nano-ISA: An Adaptive Multi-Dimensional Framework for Processor Reliability
NSF · $444k · 2010–2017

Frequent coauthors

Salman Avestimehr
60 shared
Michel Dubois
53 shared
Yongqin Wang
Wuhan University of Technology
42 shared
Gunjae Koo
Korea University
34 shared
Zhifeng Lin
Fuzhou University
33 shared
Hanieh Hashemi
31 shared
Hyeran Jeon
University of California, Merced
30 shared
Krishna Giri Narra
30 shared

Education

Ph.D., Computer Science
University of California, Los Angeles
1996
M.S., Computer Science
University of California, Los Angeles
1993
B.S., Electrical Engineering
Indian Institute of Technology, Madras
1991

Awards & honors

NSF CAREER Award (2010)
Body Computing Award (2009)
IEEE International Conference on Distributed Computing in Se…
ACM Senior Membership (2009)
IEEE Senior Membership (2009)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Murali Annavaram

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you