Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Raul Castro Fernandez

Raul Castro Fernandez

· Assistant Professor of Computer ScienceVerified

University of Chicago · Computer Science

Active 1970–2025

h-index28
Citations3.3k
Papers13957 last 5y
Funding
See your match with Raul Castro Fernandez — sign in to PhdFit.Sign in

About

Raul Castro Fernandez is an Assistant Professor of Computer Science at the University of Chicago. His research focuses on data management, data science, databases, systems, and the value of data. He is interested in understanding how to make the best use of data by building systems to share, discover, prepare, integrate, and process data, often utilizing techniques from data management, statistics, and machine learning. Fernandez is part of ChiData, the data systems research group at The University of Chicago, where he conducts research on large-scale video analysis, efficient data processing systems, and the economics of data. His work aims to advance the understanding of data sharing markets and improve data utilization, contributing to the fields of systems, architecture, and networking.

Research topics

  • Computer Science
  • Data Mining
  • Artificial Intelligence
  • Information Retrieval
  • Computer Security
  • Machine Learning
  • Data science
  • Database
  • World Wide Web
  • Distributed computing
  • Business
  • Operating system
  • Economics
  • Finance
  • Computer network

Selected publications

  • Mass-Scale Analysis of In-the-Wild Conversations Reveals Complexity Bounds on LLM Jailbreaking

    Lecture notes in computer science · 2025-11-23

    book-chapter
  • What is the Value of Data? A Theory and Systematization

    ACM / IMS Journal of Data Science · 2025-03-31

    articleOpen access1st authorCorresponding

    Data powers economies, shapes societies, and fuels decision-making, yet its value remains poorly understood. Despite its centrality, we lack a unified framework for defining, measuring, and reasoning about data’s worth. This article develops a theory and systematization of the value of data—explaining why, how, and when data generates value. We distinguish data from documents, separate objective value from subjective judgments, and identify key dimensions of data’s worth. Our framework reconciles disparate notions of information, knowledge, and utility, offering insights that validate known principles while uncovering new opportunities to extract value from data. More than a taxonomy, this work provides a conceptual foundation for integrating perspectives from computer science, economics, and beyond. The conceptual foundation clarifies data’s role in technology, markets, and governance, advancing our ability to systematically understand and harness its value.

  • <scp>Pneuma</scp> : Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System

    Proceedings of the ACM on Management of Data · 2025-06-17 · 6 citations

    articleOpen accessSenior author

    Finding relevant tables among databases, lakes, and repositories is the first step in extracting value from data. Such a task remains difficult because assessing whether a table is relevant to a problem does not always depend only on its content but also on the context, which is usually tribal knowledge known to the individual or team. While tools like data catalogs and academic data discovery systems target this problem, they rely on keyword search or more complex interfaces, limiting non-technical users' ability to find relevant data. The advent of large language models (LLMs) offers a unique opportunity for users to ask questions directly in natural language, making dataset discovery more intuitive, accessible, and efficient. In this paper, we introduce Pneuma , a retrieval-augmented generation (RAG) system designed to efficiently and effectively discover tabular data. Pneuma leverages large language models (LLMs) for both table representation and table retrieval. For table representation, Pneuma preserves schema and row-level information to ensure comprehensive data understanding. For table retrieval, Pneuma augments LLMs with traditional information retrieval techniques, such as full-text and vector search, harnessing the strengths of both to improve retrieval performance. To evaluate Pneuma , we generate comprehensive benchmarks that simulate table discovery workload on six real-world datasets including enterprise data, scientific databases, warehousing data, and open data. Our results demonstrate that Pneuma outperforms widely used table search systems (such as full-text search and state-of-the-art RAG systems) in accuracy and resource efficiency.

  • Where Does Academic Database Research Go From Here?

    ArXiv.org · 2025-04-11

    preprintOpen accessSenior author

    Panel proposal for an open forum to discuss and debate the future of database research in the context of industry, other research communities, and AI. Includes summaries of past panels, positions from panelists, as well as positions from a sample of the data management community.

  • Where Does Academic Database Research Go from Here?

    Proceedings of the VLDB Endowment · 2025-08-01

    articleSenior author

    An open forum to discuss and debate the future of database research in the context of industry, other research communities, and AI.

  • Not-So-Bitter Pill to Swallow: Slipstreaming Memory Safe Programming via Rust as part of a Database Systems Course

    2025-06-22

    articleOpen accessSenior author
  • Core Hours and Carbon Credits: Incentivizing Sustainability in HPC

    2025-11-12 · 2 citations

    articleOpen access

    Efforts to reduce the environmental impact of HPC often focus on resource providers, but choices made by users, e.g., concerning where to run, can be equally consequential. Here we present evidence that new accounting methods that charge users for energy used can incentivize significantly more efficient behavior. We first survey 300 HPC users and find that fewer than 30% are aware of their energy consumption, and that energy efficiency is a low priority concern. We then propose two new multi-resource accounting methods that charge for computations based on their energy consumption or carbon footprint, respectively. Finally, we conduct both simulation studies and a user study to evaluate the impact of these two methods on user behavior. We find that while only providing users feedback on their energy use had no impact on their behavior, associating energy with cost incentivized users to select more efficient resources, and use 40% less energy.

  • Core Hours and Carbon Credits: Incentivizing Sustainability in HPC

    arXiv (Cornell University) · 2025-01-16

    preprintOpen access

    Realizing a shared responsibility between providers and consumers is critical to manage the sustainability of HPC. However, while cost may motivate efficiency improvements by infrastructure operators, broader progress is impeded by a lack of user incentives. We conduct a survey of HPC users that reveals fewer than 30 percent are aware of their energy consumption, and that energy efficiency is among users' lowest priority concerns. One explanation is that existing pricing models may encourage users to prioritize performance over energy efficiency. We propose two transparent multi-resource pricing schemes, Energy- and Carbon-Based Accounting, that seek to change this paradigm by incentivizing more efficient user behavior. These two schemes charge for computations based on their energy consumption or carbon footprint, respectively, rewarding users who leverage efficient hardware and software. We evaluate these two pricing schemes via simulation, in a prototype, and a user study.

  • Exploiting LLMs for Automatic Hypothesis Assessment via a Logit-Based Calibrated Prior

    ArXiv.org · 2025-06-03

    preprintOpen accessSenior author

    As hypothesis generation becomes increasingly automated, a new bottleneck has emerged: hypothesis assessment. Modern systems can surface thousands of statistical relationships-correlations, trends, causal links-but offer little guidance on which ones are novel, non-trivial, or worthy of expert attention. In this work, we study the complementary problem to hypothesis generation: automatic hypothesis assessment. Specifically, we ask: given a large set of statistical relationships, can we automatically assess which ones are novel and worth further exploration? We focus on correlations as they are a common entry point in exploratory data analysis that often serve as the basis for forming deeper scientific or causal hypotheses. To support automatic assessment, we propose to leverage the vast knowledge encoded in LLMs' weights to derive a prior distribution over the correlation value of a variable pair. If an LLM's prior expects the correlation value observed, then such correlation is not surprising, and vice versa. We propose the Logit-based Calibrated Prior, an LLM-elicited correlation prior that transforms the model's raw output logits into a calibrated, continuous predictive distribution over correlation values. We evaluate the prior on a benchmark of 2,096 real-world variable pairs and it achieves a sign accuracy of 78.8%, a mean absolute error of 0.26, and 95% credible interval coverage of 89.2% in predicting Pearson correlation coefficient. It also outperforms a fine-tuned RoBERTa classifier in binary correlation prediction and achieves higher precision@K in hypothesis ranking. We further show that the prior generalizes to correlations not seen during LLM pretraining, reflecting context-sensitive reasoning rather than memorization.

  • Where Does Academic Database Research Go From Here?

    2025-06-17

    articleSenior author

Frequent coauthors

Labs

Education

  • Ph.D., Computer Science

    University of Chicago

    2010
  • M.S., Computer Science

    University of Chicago

    2006
  • B.S., Computer Science

    University of Chicago

    2004

Awards & honors

  • 2025 SLOAN Fellowship
  • 2024 NSF CAREER Award
  • 2023 SIGMOD Test of Time Award
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Raul Castro Fernandez

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup