Sharad Mehrotra
· Distinguished ProfessorVerifiedUniversity of California, Irvine · Computer Science
Active 1991–2026
About
Sharad Mehrotra is a Distinguished Professor in the School of Information and Computer Science at the University of California, Irvine, and serves as the Director of the Center for Emergency Response Technologies (CERT) at UCI. He is also the Director and Principal Investigator of the RESCUE project (Responding to Crisis and Unexpected Events), which is funded by the NSF through its large ITR program and spans seven schools with 60 members. Mehrotra is associated with the Cal-IT2 institute, a multidisciplinary research facility spanning UC Irvine and UC San Diego. His research expertise lies in data management and distributed systems, with pioneering contributions including the concept of 'database as a service' and the use of information retrieval techniques, particularly relevance feedback, in multimedia search. His work has earned numerous awards and nominations, including the SIGMOD Best Paper award in 2001 and best paper awards at DASFAA 2004. His current research focuses on building sentient spaces using multimodal sensors, data privacy, and data quality, with recent efforts emphasizing situational awareness from multimodal input such as conversational speech data. Many of his research contributions have been incorporated into software used at various first responder sites.
Research topics
- Computer Science
- Data Mining
- Computer Security
- Artificial Intelligence
- Information Retrieval
- Medicine
- Programming language
- Database
- Mathematics
- Engineering
- Virology
- Data science
- World Wide Web
- Operating system
Selected publications
Proceedings of the ACM on Management of Data · 2026-05-18
articleOpen accessSenior authorLarge Language Models (LLMs) vary significantly in metrics such as accuracy, latency, and cost, making it challenging for users and applications to decide which model to invoke for each query. This paper presents O cto S elector , a framework for LLM selection that satisfies user-defined objectives and constraints across multiple metrics. In the pre-processing phase, O cto S elector learns difficulty-aware representations of queries based on both input and output complexity, clustering them into similar difficulty groups to enable efficient performance estimation across multiple LLMs. During inference, O cto S elector supports LLM selection for batched workload, formulating it as an Integer Linear Programming (ILP) problem that optimizes a user-defined objective (e.g., minimizing cost or latency, or maximizing accuracy) while enforcing constraints on other metrics. We evaluate O cto S elector on two types of tasks: NL2SQL using the Spider and BIRD benchmarks, and sentiment analysis using the IMDb benchmark. When optimizing for cost under accuracy and latency constraints, O cto S elector achieves up to a 67.7% cost reduction on NL2SQL tasks for batched workloads compared to state-of-the-art approaches.
DIM-SUM: Dynamic IMputation for Smart Utility Management
Proceedings of the VLDB Endowment · 2025-07-01 · 2 citations
articleTime series imputation models have traditionally been developed using complete datasets with artificial masking patterns to simulate missing values. However, in real-world infrastructure monitoring, practitioners often encounter datasets where large amounts of data are missing and follow complex, heterogeneous patterns. We introduce DIM-SUM, a preprocessing framework for training robust imputation models that bridges the gap between artificially masked training data and real missing patterns. DIM-SUM combines pattern clustering and adaptive masking strategies with theoretical learning guarantees to handle diverse missing patterns actually observed in the data. Through extensive experiments on over 2 billion readings from California water districts, electricity datasets, and benchmarks, we demonstrate that DIM-SUM outperforms traditional methods by reaching similar accuracy with lower processing time and significantly less training data. When compared against a large pre-trained model, DIM-SUM averages 2x higher accuracy with significantly less inference time.
Search over Secret-Shared Datasets
2025-01-01
book-chapter1st authorCorrespondingModeling Inhabited Smart Spaces to Support Interoperable IoT-Based Applications
2025-06-02
articleOpen accessIoT deployments in smart spaces can enable the development of useful services for their inhabitants. However, the diversity of smart spaces and their sensor infrastructures makes it challenging to develop space-agnostic applications. Moreover, existing schemas addressing interoperability challenges often lack the vocabulary needed to represent the integration of smart space systems and their inhabitants. We present a schema to annotate inhabited smart spaces in support of inhabitant-oriented applications. Our schema integrates well-known ontologies to represent inhabitants, events/activities, and the space itself, along with their interconnections. It also supports the representation of uncertain information from IoT and mobile sensors (e.g., a person's location or occupancy/attendance at an event). Additionally, we introduce an annotation tool that uses an easy-to-use GUI to describe a smart space based on our schema. We demonstrate the potential of our approach through a series of SPARQL queries and a system deployed at the UCI campus that annotates sensor data to support a space-agnostic occupancy monitoring application.
2025-01-01
book-chapterMeaningful Data Erasure in the Presence of Dependencies
Proceedings of the VLDB Endowment · 2025-06-01 · 1 citations
articleOpen accessData regulations like GDPR require systems to support data erasure but leave the definition of "erasure" open to interpretation. This ambiguity makes compliance challenging, especially in databases where data dependencies can lead to erased data being inferred from remaining data. We formally define a precise notion of data erasure that ensures any inference about deleted data, through dependencies, remains bounded to what could have been inferred before its insertion. We design erasure mechanisms that enforce this guarantee at minimal cost. Additionally, we explore strategies to balance cost and throughput, batch multiple erasures, and proactively compute data retention times when possible. We demonstrate the practicality and scalability of our algorithms using both real and synthetic datasets.
Graph structure prompt learning: A novel methodology to improve performance of graph neural networks
Applied Intelligence · 2025-11-01
articleSenior authorTowards Secure Data Management using Multi-Cryptographic Solutions (Invited)
2025-06-22
articleSeveral secure data outsourcing systems incorporate various cryptographic techniques to balance security, functionalities, and efficiency. However, their security properties can be ad hoc and sometimes obscure. Our recent work, Secure Normal Form (SNF) [ICDE’24], presents a principled approach that allows data owners to define acceptable leakages of nonsensitive aspects of their data. This approach enables efficient processing of queries while ensuring no unintended leakage of sensitive information. In this paper, we discuss the benefits and challenges of implementing SNF within advanced computational environments and modern data management architectures. We argue that its applicability may extend beyond merely offloading secure query execution to the cloud.
2025-01-01
book-chapterDIM-SUM: Dynamic IMputation for Smart Utility Management
ArXiv.org · 2025-06-24
preprintOpen accessTime series imputation models have traditionally been developed using complete datasets with artificial masking patterns to simulate missing values. However, in real-world infrastructure monitoring, practitioners often encounter datasets where large amounts of data are missing and follow complex, heterogeneous patterns. We introduce DIM-SUM, a preprocessing framework for training robust imputation models that bridges the gap between artificially masked training data and real missing patterns. DIM-SUM combines pattern clustering and adaptive masking strategies with theoretical learning guarantees to handle diverse missing patterns actually observed in the data. Through extensive experiments on over 2 billion readings from California water districts, electricity datasets, and benchmarks, we demonstrate that DIM-SUM outperforms traditional methods by reaching similar accuracy with lower processing time and significantly less training data. When compared against a large pre-trained model, DIM-SUM averages 2x higher accuracy with significantly less inference time.
Recent grants
Information Technology Research (ITR): Responding to the Unexpected
NSF · $9.5M · 2003–2010
ITR: Privacy in Database-As-A-Service (DAS) Model
NSF · $595k · 2002–2007
RAPID: An Organizational Scale Approach to Privacy-Enabled Contact Tracing in COVID-19
NSF · $100k · 2020–2022
III: Small: Query and Goal Driven Entity Resolution Framework
NSF · $569k · 2011–2015
III: Small: EnrichDB - Supporting Enrichment in Database Systems
NSF · $532k · 2020–2025
Frequent coauthors
- 184 shared
Nalini Venkatasubramanian
- 68 shared
Roberto Yus
University of Maryland, Baltimore County
- 59 shared
Shantanu Sharma
- 55 shared
Andrew Chio
University of California, Irvine
- 54 shared
Daokun Jiang
University of California, Irvine
- 42 shared
Dmitri V. Kalashnikov
Voronezh State University
- 40 shared
Peeyush Gupta
University of California, Irvine
- 36 shared
Daniela Nicklas
University of Bamberg
Awards & honors
- Outstanding Graduate Student Mentor Award (2005)
- C. W. Gear Outstanding Junior Faculty Award
- SIGMOD Best Paper award (2001)
- Best of VLDB 1994 submissions
- best paper award in DASFAA (2004)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Sharad Mehrotra
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup