Kristian J. Hammond

· Bill and Cathy Osborn Professor of Computer Science

Northwestern University · Chemical Engineering

Active 1963–2026

h-index29

Citations5.3k

Papers20621 last 5y

Funding$87k

Faculty page

OpenAlex

See your match with Kristian J. Hammond — sign in to PhdFit.Sign in

About

Kristian J. Hammond is the Bill and Cathy Osborn Professor of Computer Science at Northwestern University. He serves as the Director of the Master of Science in Artificial Intelligence Program and is the Director of the Center for Advancing Safety of Machine Intelligence (CASMI). His research interests encompass artificial intelligence, natural language generation, narrative generation, conversational interfaces, and the future of work, with a focus on developing human capabilities onto machines. Hammond has co-founded Narrative Science, a startup that uses artificial intelligence and journalism to transform raw data into natural language, integrating computer science and artificial intelligence into various aspects of life. His work explores AI safety, harm, and the ethical implications of artificial intelligence, contributing to the understanding and development of intelligent systems that enhance human capabilities.

Research topics

Computer Science
Political Science
Computer Security
Psychology
Machine Learning
World Wide Web
Internet privacy
Medicine
Artificial Intelligence
Environmental health
Gerontology
Data science
Business
Pathology
Law
Criminology

Selected publications

RingSQL: Generating Synthetic Data with Schema-Independent Templates for Text-to-SQL Reasoning Models
arXiv (Cornell University) · 2026-01-09
preprintOpen accessSenior author
Recent advances in text-to-SQL systems have been driven by larger models and improved datasets, yet progress is still limited by the scarcity of high-quality training data. Manual data creation is expensive, and existing synthetic methods trade off reliability and scalability. Template-based approaches ensure correct SQL but require schema-specific templates, while LLM-based generation scales easily but lacks quality and correctness guarantees. We introduce RingSQL, a hybrid data generation framework that combines schema-independent query templates with LLM-based paraphrasing of natural language questions. This approach preserves SQL correctness across diverse schemas while providing broad linguistic variety. In our experiments, we find that models trained using data produced by RingSQL achieve an average gain in accuracy of +2.3% across six text-to-SQL benchmarks when compared to models trained on other synthetic data. We make our code available at https://github.com/nu-c3lab/RingSQL.
Publisher DOI
RingSQL: Generating Synthetic Data with Schema-Independent Templates for Text-to-SQL Reasoning Models
ArXiv.org · 2026-01-09
articleOpen accessSenior author
Recent advances in text-to-SQL systems have been driven by larger models and improved datasets, yet progress is still limited by the scarcity of high-quality training data. Manual data creation is expensive, and existing synthetic methods trade off reliability and scalability. Template-based approaches ensure correct SQL but require schema-specific templates, while LLM-based generation scales easily but lacks quality and correctness guarantees. We introduce RingSQL, a hybrid data generation framework that combines schema-independent query templates with LLM-based paraphrasing of natural language questions. This approach preserves SQL correctness across diverse schemas while providing broad linguistic variety. In our experiments, we find that models trained using data produced by RingSQL achieve an average gain in accuracy of +2.3% across six text-to-SQL benchmarks when compared to models trained on other synthetic data. We make our code available at https://github.com/nu-c3lab/RingSQL.
Publisher OA PDF
Machine Teaching Allows for Rapid Development of Automated Systems for Retinal Lesion Detection From Small Image Datasets
Ophthalmic surgery, lasers & imaging retina · 2024-05-16 · 3 citations
articleOpen access
Machine teaching, a machine learning subfield, may allow for rapid development of artificial intelligence systems able to automatically identify emerging ocular biomarkers from small imaging datasets. We sought to use machine teaching to automatically identify retinal ischemic perivascular lesions (RIPLs) and subretinal drusenoid deposits (SDDs), two emerging ocular biomarkers of cardiovascular disease. IRB approval was obtained. Four small datasets of SD-OCT B-scans were used to train and test two distinct automated systems, one identifying RIPLs and the other identifying SDDs. An open-source interactive machine-learning software program, RootPainter, was used to perform annotation and training simultaneously over a 6-hour period. For SDDs at the B-scan level, test-set accuracy = 92%, sensitivity = 100%, specificity = 88%, positive predictive value (PPV) = 82%, and negative predictive value (NPV) = 100%. For RIPLs at the B-scan level, test-set accuracy = 90%, sensitivity = 60%, specificity = 93%, PPV = 50%, and NPV = 95%. Machine teaching demonstrates promise within ophthalmic imaging to rapidly allow for automated identification of novel biomarkers from small image datasets. [ Ophthalmic Surg Lasers Imaging Retina 2024;55:475–478.]
Publisher OA PDF DOI
Satyrn: A Platform for Analytics Augmented Generation
arXiv (Cornell University) · 2024-06-17 · 1 citations
preprintOpen accessSenior author
Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way that retrieved documents are used in RAG. This analytics augmented generation (AAG) approach supports the ability to utilize standard analytic techniques to generate facts that are then converted to text and passed to an LLM. We present a neurosymbolic platform, Satyrn, that leverages AAG to produce accurate, fluent, and coherent reports grounded in large scale databases. In our experiments, we find that Satyrn generates reports in which over 86% of claims are accurate while maintaining high levels of fluency and coherence, even when using smaller language models such as Mistral-7B, as compared to GPT-4 Code Interpreter in which just 57% of claims are accurate.
Publisher OA PDF DOI
Satyrn: A Platform for Analytics Augmented Generation
2024-01-01
articleOpen accessSenior author
Marko Sterbentz, Cameron Barrie, Shubham Shahi, Abhratanu Dutta, Donna Hooshmand, Harper Pack, Kristian J Hammond. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.
Publisher OA PDF DOI
Lightweight Knowledge Representations for Automating Data Analysis
arXiv (Cornell University) · 2023-10-15
preprintOpen accessSenior author
The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this work, we take the first steps towards automating a key aspect of the data science pipeline: data analysis. We present an extensible taxonomy of data analytic operations that scopes across domains and data, as well as a method for codifying domain-specific knowledge that links this analytics taxonomy to actual data. We validate the functionality of our analytics taxonomy by implementing a system that leverages it, alongside domain labelings for 8 distinct domains, to automatically generate a space of answerable questions and associated analytic plans. In this way, we produce information spaces over data that enable complex analyses and search over this data and pave the way for fully automated data analysis.
Publisher OA PDF DOI
Deep Learning for Cardiovascular Imaging
JAMA Cardiology · 2023 · 56 citations
- Artificial Intelligence
- Machine Learning
- Computer Science
Importance: Artificial intelligence (AI), driven by advances in deep learning (DL), has the potential to reshape the field of cardiovascular imaging (CVI). While DL for CVI is still in its infancy, research is accelerating to aid in the acquisition, processing, and/or interpretation of CVI across various modalities, with several commercial products already in clinical use. It is imperative that cardiovascular imagers are familiar with DL systems, including a basic understanding of how they work, their relative strengths compared with other automated systems, and possible pitfalls in their implementation. The goal of this article is to review the methodology and application of DL to CVI in a simple, digestible fashion toward demystifying this emerging technology. Observations: At its core, DL is simply the application of a series of tunable mathematical operations that translate input data into a desired output. Based on artificial neural networks that are inspired by the human nervous system, there are several types of DL architectures suited to different tasks; convolutional neural networks are particularly adept at extracting valuable information from CVI data. We survey some of the notable applications of DL to tasks across the spectrum of CVI modalities. We also discuss challenges in the development and implementation of DL systems, including avoiding overfitting, preventing systematic bias, improving explainability, and fostering a human-machine partnership. Finally, we conclude with a vision of the future of DL for CVI. Conclusions and Relevance: Deep learning has the potential to meaningfully affect the field of CVI. Rather than a threat, DL could be seen as a partner to cardiovascular imagers in reducing technical burden and improving efficiency and quality of care. High-quality prospective evidence is still needed to demonstrate how the benefits of DL CVI systems may outweigh the risks.
Publisher DOI
The Promise of AI in an Open Justice System
AI Magazine · 2022-03-31
articleOpen access
To craft effective public policy, modern governments must gather and analyze data on both the performance of their public functions and the responses by the public. Federal administrative agencies such as the Patent Office and Centers for Disease Control routinely do this, as does the United States Congress. More importantly, they make such data freely accessible. Within the United States government, however, the judicial branch is a conspicuous outlier. In theory, federal court records could be used to evaluate the efficiency and fairness of the justice system. In practice, court records are effectively out of reach because they sit behind a government paywall. This financial barrier, along with an equally important myriad of technical obstacles, have forestalled the development of AI-driven analysis that could enable a systematic understanding and evaluation of the work of the courts. The Systematic Content Analysis of Litigation EventS Open Knowledge Network (SCALES OKN) seeks to address this situation by transforming the transparency and accessibility of court records. The SCALES OKN will potentiate the development of new AI solutions that will benefit the judiciary, legal scholars, and the public. In this article, we outline some of key financial, technical, and policy challenges to developing novel AI solutions.
Publisher OA PDF DOI
Research gaps and opportunities in precision nutrition: an NIH workshop report
American Journal of Clinical Nutrition · 2022 · 91 citations
- Political Science
- Gerontology
- Psychology
Publisher OA PDF DOI
The Promise of AI in an Open Justice System
AI Magazine · 2022-03-01 · 10 citations
articleOpen access
Abstract To craft effective public policy, modern governments must gather and analyze data on both the performance of their public functions and the responses by the public. Federal administrative agencies such as the Patent Office and Centers for Disease Control routinely do this, as does the United States Congress. More importantly, they make such data freely accessible. Within the United States government, however, the judicial branch is a conspicuous outlier. In theory, federal court records could be used to evaluate the efficiency and fairness of the justice system. In practice, court records are effectively out of reach because they sit behind a government paywall. This financial barrier, along with an equally important myriad of technical obstacles, have forestalled the development of AI‐driven analysis that could enable a systematic understanding and evaluation of the work of the courts. The Systematic Content Analysis of Litigation EventS Open Knowledge Network (SCALES OKN) seeks to address this situation by transforming the transparency and accessibility of court records. The SCALES OKN will potentiate the development of new AI solutions that will benefit the judiciary, legal scholars, and the public. In this article, we outline some of key financial, technical, and policy challenges to developing novel AI solutions.
Publisher OA PDF DOI

Recent grants

SGER: A Proposal for Deploying News at Seven: Providing Research Visibility While Serving Underrepresented Populations
NSF · $87k · 2008–2009

Frequent coauthors

Jay Budzik
27 shared
Larry Birnbaum
23 shared
David A. Shamma
Toyota Industries (United States)
17 shared
Sara Owsley
14 shared
Robin Burke
14 shared
Shannon Bradshaw
14 shared
Timothy M. Converse
13 shared
Colleen M. Seifert
Purdue University System
12 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Kristian J. Hammond

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you