
Kristian J. Hammond
· Bill and Cathy Osborn Professor of Computer ScienceNorthwestern University · Chemical Engineering
Active 1963–2026
About
Kristian J. Hammond is the Bill and Cathy Osborn Professor of Computer Science at Northwestern University. He serves as the Director of the Master of Science in Artificial Intelligence Program and is the Director of the Center for Advancing Safety of Machine Intelligence (CASMI). His research interests encompass artificial intelligence, natural language generation, narrative generation, conversational interfaces, and the future of work, with a focus on developing human capabilities onto machines. Hammond has co-founded Narrative Science, a startup that uses artificial intelligence and journalism to transform raw data into natural language, integrating computer science and artificial intelligence into various aspects of life. His work explores AI safety, harm, and the ethical implications of artificial intelligence, contributing to the understanding and development of intelligent systems that enhance human capabilities.
Research topics
- Computer Science
- Political Science
- Computer Security
- Psychology
- Machine Learning
- World Wide Web
- Internet privacy
- Medicine
- Artificial Intelligence
- Environmental health
- Gerontology
- Data science
- Business
- Pathology
- Law
- Criminology
Selected publications
arXiv (Cornell University) · 2026-01-09
preprintOpen accessSenior authorRecent advances in text-to-SQL systems have been driven by larger models and improved datasets, yet progress is still limited by the scarcity of high-quality training data. Manual data creation is expensive, and existing synthetic methods trade off reliability and scalability. Template-based approaches ensure correct SQL but require schema-specific templates, while LLM-based generation scales easily but lacks quality and correctness guarantees. We introduce RingSQL, a hybrid data generation framework that combines schema-independent query templates with LLM-based paraphrasing of natural language questions. This approach preserves SQL correctness across diverse schemas while providing broad linguistic variety. In our experiments, we find that models trained using data produced by RingSQL achieve an average gain in accuracy of +2.3% across six text-to-SQL benchmarks when compared to models trained on other synthetic data. We make our code available at https://github.com/nu-c3lab/RingSQL.
ArXiv.org · 2026-01-09
articleOpen accessSenior authorRecent advances in text-to-SQL systems have been driven by larger models and improved datasets, yet progress is still limited by the scarcity of high-quality training data. Manual data creation is expensive, and existing synthetic methods trade off reliability and scalability. Template-based approaches ensure correct SQL but require schema-specific templates, while LLM-based generation scales easily but lacks quality and correctness guarantees. We introduce RingSQL, a hybrid data generation framework that combines schema-independent query templates with LLM-based paraphrasing of natural language questions. This approach preserves SQL correctness across diverse schemas while providing broad linguistic variety. In our experiments, we find that models trained using data produced by RingSQL achieve an average gain in accuracy of +2.3% across six text-to-SQL benchmarks when compared to models trained on other synthetic data. We make our code available at https://github.com/nu-c3lab/RingSQL.
Ophthalmic surgery, lasers & imaging retina · 2024-05-16 · 3 citations
articleOpen accessMachine teaching, a machine learning subfield, may allow for rapid development of artificial intelligence systems able to automatically identify emerging ocular biomarkers from small imaging datasets. We sought to use machine teaching to automatically identify retinal ischemic perivascular lesions (RIPLs) and subretinal drusenoid deposits (SDDs), two emerging ocular biomarkers of cardiovascular disease. IRB approval was obtained. Four small datasets of SD-OCT B-scans were used to train and test two distinct automated systems, one identifying RIPLs and the other identifying SDDs. An open-source interactive machine-learning software program, RootPainter, was used to perform annotation and training simultaneously over a 6-hour period. For SDDs at the B-scan level, test-set accuracy = 92%, sensitivity = 100%, specificity = 88%, positive predictive value (PPV) = 82%, and negative predictive value (NPV) = 100%. For RIPLs at the B-scan level, test-set accuracy = 90%, sensitivity = 60%, specificity = 93%, PPV = 50%, and NPV = 95%. Machine teaching demonstrates promise within ophthalmic imaging to rapidly allow for automated identification of novel biomarkers from small image datasets. [ Ophthalmic Surg Lasers Imaging Retina 2024;55:475–478.]
Satyrn: A Platform for Analytics Augmented Generation
arXiv (Cornell University) · 2024-06-17 · 1 citations
preprintOpen accessSenior authorLarge language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way that retrieved documents are used in RAG. This analytics augmented generation (AAG) approach supports the ability to utilize standard analytic techniques to generate facts that are then converted to text and passed to an LLM. We present a neurosymbolic platform, Satyrn, that leverages AAG to produce accurate, fluent, and coherent reports grounded in large scale databases. In our experiments, we find that Satyrn generates reports in which over 86% of claims are accurate while maintaining high levels of fluency and coherence, even when using smaller language models such as Mistral-7B, as compared to GPT-4 Code Interpreter in which just 57% of claims are accurate.
Satyrn: A Platform for Analytics Augmented Generation
2024-01-01
articleOpen accessSenior authorMarko Sterbentz, Cameron Barrie, Shubham Shahi, Abhratanu Dutta, Donna Hooshmand, Harper Pack, Kristian J Hammond. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.
Lightweight Knowledge Representations for Automating Data Analysis
arXiv (Cornell University) · 2023-10-15
preprintOpen accessSenior authorThe principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this work, we take the first steps towards automating a key aspect of the data science pipeline: data analysis. We present an extensible taxonomy of data analytic operations that scopes across domains and data, as well as a method for codifying domain-specific knowledge that links this analytics taxonomy to actual data. We validate the functionality of our analytics taxonomy by implementing a system that leverages it, alongside domain labelings for 8 distinct domains, to automatically generate a space of answerable questions and associated analytic plans. In this way, we produce information spaces over data that enable complex analyses and search over this data and pave the way for fully automated data analysis.
Deep Learning for Cardiovascular Imaging
JAMA Cardiology · 2023 · 56 citations
- Artificial Intelligence
- Machine Learning
- Computer Science
Importance: Artificial intelligence (AI), driven by advances in deep learning (DL), has the potential to reshape the field of cardiovascular imaging (CVI). While DL for CVI is still in its infancy, research is accelerating to aid in the acquisition, processing, and/or interpretation of CVI across various modalities, with several commercial products already in clinical use. It is imperative that cardiovascular imagers are familiar with DL systems, including a basic understanding of how they work, their relative strengths compared with other automated systems, and possible pitfalls in their implementation. The goal of this article is to review the methodology and application of DL to CVI in a simple, digestible fashion toward demystifying this emerging technology. Observations: At its core, DL is simply the application of a series of tunable mathematical operations that translate input data into a desired output. Based on artificial neural networks that are inspired by the human nervous system, there are several types of DL architectures suited to different tasks; convolutional neural networks are particularly adept at extracting valuable information from CVI data. We survey some of the notable applications of DL to tasks across the spectrum of CVI modalities. We also discuss challenges in the development and implementation of DL systems, including avoiding overfitting, preventing systematic bias, improving explainability, and fostering a human-machine partnership. Finally, we conclude with a vision of the future of DL for CVI. Conclusions and Relevance: Deep learning has the potential to meaningfully affect the field of CVI. Rather than a threat, DL could be seen as a partner to cardiovascular imagers in reducing technical burden and improving efficiency and quality of care. High-quality prospective evidence is still needed to demonstrate how the benefits of DL CVI systems may outweigh the risks.
The Promise of AI in an Open Justice System
AI Magazine · 2022-03-31
articleOpen accessTo craft effective public policy, modern governments must gather and analyze data on both the performance of their public functions and the responses by the public. Federal administrative agencies such as the Patent Office and Centers for Disease Control routinely do this, as does the United States Congress. More importantly, they make such data freely accessible. Within the United States government, however, the judicial branch is a conspicuous outlier. In theory, federal court records could be used to evaluate the efficiency and fairness of the justice system. In practice, court records are effectively out of reach because they sit behind a government paywall. This financial barrier, along with an equally important myriad of technical obstacles, have forestalled the development of AI-driven analysis that could enable a systematic understanding and evaluation of the work of the courts. The Systematic Content Analysis of Litigation EventS Open Knowledge Network (SCALES OKN) seeks to address this situation by transforming the transparency and accessibility of court records. The SCALES OKN will potentiate the development of new AI solutions that will benefit the judiciary, legal scholars, and the public. In this article, we outline some of key financial, technical, and policy challenges to developing novel AI solutions.
Research gaps and opportunities in precision nutrition: an NIH workshop report
American Journal of Clinical Nutrition · 2022 · 91 citations
- Political Science
- Gerontology
- Psychology
The Promise of AI in an Open Justice System
AI Magazine · 2022-03-01 · 10 citations
articleOpen accessAbstract To craft effective public policy, modern governments must gather and analyze data on both the performance of their public functions and the responses by the public. Federal administrative agencies such as the Patent Office and Centers for Disease Control routinely do this, as does the United States Congress. More importantly, they make such data freely accessible. Within the United States government, however, the judicial branch is a conspicuous outlier. In theory, federal court records could be used to evaluate the efficiency and fairness of the justice system. In practice, court records are effectively out of reach because they sit behind a government paywall. This financial barrier, along with an equally important myriad of technical obstacles, have forestalled the development of AI‐driven analysis that could enable a systematic understanding and evaluation of the work of the courts. The Systematic Content Analysis of Litigation EventS Open Knowledge Network (SCALES OKN) seeks to address this situation by transforming the transparency and accessibility of court records. The SCALES OKN will potentiate the development of new AI solutions that will benefit the judiciary, legal scholars, and the public. In this article, we outline some of key financial, technical, and policy challenges to developing novel AI solutions.
Recent grants
Frequent coauthors
- 27 shared
Jay Budzik
- 23 shared
Larry Birnbaum
- 17 shared
David A. Shamma
Toyota Industries (United States)
- 14 shared
Sara Owsley
- 14 shared
Robin Burke
- 14 shared
Shannon Bradshaw
- 13 shared
Timothy M. Converse
- 12 shared
Colleen M. Seifert
Purdue University System
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Kristian J. Hammond
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup