
Ernest Davis
· Professor of Computer ScienceVerifiedNew York University · Computer Science
Active 1948–2024
About
Ernest Davis is a professor in the Department of Computer Science at New York University, affiliated with the Courant Institute of Mathematical Sciences. His research focuses on automated commonsense reasoning, with particular interest in benchmarks and datasets for this area. He has contributed to surveys of the state of the art in commonsense reasoning, including the Winograd Schema Challenge and related topics. Davis is involved in teaching courses on Artificial Intelligence, and he maintains a web presence with resources and notes on AI topics. His work includes publications such as books, research papers, surveys, essays, and reports, and he has written for a general audience on topics related to artificial intelligence.
Research topics
- Artificial Intelligence
- Computer Science
- Mathematics
- Psychology
- Cognitive psychology
- Physics
- Epistemology
- Programming language
- Philosophy
- Linguistics
- Mathematics education
- Cognitive science
- Mathematical economics
- Theoretical computer science
- Pure mathematics
Selected publications
The Defeat of the Winograd Schema Challenge (Abstract Reprint)
Proceedings of the AAAI Conference on Artificial Intelligence · 2024-03-24
articleOpen accessThe Winograd Schema Challenge—a set of twin sentences involving pronoun reference disambiguation that seem to require the use of commonsense knowledge—was proposed by Hector Levesque in 2011. By 2019, a number of AI systems, based on large pre-trained transformer-based language models and fine-tuned on these kinds of problems, achieved better than 90% accuracy. In this paper, we review the history of the Winograd Schema Challenge and discuss the lasting contributions of the flurry of research that has taken place on the WSC in the last decade. We discuss the significance of various datasets developed for WSC, and the research community's deeper understanding of the role of surrogate tasks in assessing the intelligence of an AI system.
Mathematics, word problems, common sense, and artificial intelligence
Bulletin of the American Mathematical Society · 2024 · 72 citations
1st authorCorresponding- Computer Science
- Computer Science
- Artificial Intelligence
The paper discusses the capacities and limitations of current artificial intelligence (AI) technology to solve word problems that combine elementary mathematics with commonsense reasoning. No existing AI systems can solve these reliably. We review three approaches that have been developed, using AI natural language technology: outputting the answer directly, outputting a computer program that solves the problem, and outputting a formalized representation that can be input to an automated theorem verifier. We review some benchmarks that have been developed to evaluate these systems and some experimental studies. We discuss the limitations of the existing technology at solving these kinds of problems. We argue that it is not clear whether these kinds of limitations will be important in developing AI technology for pure mathematical research, but that they will be important in applications of mathematics, and may well be important in developing programs capable of reading and understanding mathematical content written by humans.
The defeat of the Winograd Schema Challenge
Artificial Intelligence · 2023-07-11 · 30 citations
articleBenchmarks for Automated Commonsense Reasoning: A Survey
arXiv (Cornell University) · 2023-02-09 · 6 citations
preprintOpen access1st authorCorrespondingMore than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often flawed and many aspects of common sense remain untested. Consequently, we do not currently have any reliable way of measuring to what extent existing AI systems have achieved these abilities. This paper surveys the development and uses of AI commonsense benchmarks. We discuss the nature of common sense; the role of common sense in AI; the goals served by constructing commonsense benchmarks; and desirable features of commonsense benchmarks. We analyze the common flaws in benchmarks, and we argue that it is worthwhile to invest the work needed ensure that benchmark examples are consistently high quality. We survey the various methods of constructing commonsense benchmarks. We enumerate 139 commonsense benchmarks that have been developed: 102 text-based, 18 image-based, 12 video based, and 7 simulated physical environments. We discuss the gaps in the existing benchmarks and aspects of commonsense reasoning that are not addressed in any existing benchmark. We conclude with a number of recommendations for future development of commonsense AI benchmarks.
Benchmarks for Automated Commonsense Reasoning: A Survey
ACM Computing Surveys · 2023-09-11 · 55 citations
review1st authorCorrespondingMore than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often flawed, and many aspects of common sense remain untested. Consequently, there is currently no reliable way of measuring to what extent existing AI systems have achieved these abilities. This article surveys the development and uses of AI commonsense benchmarks. It enumerates 139 commonsense benchmarks that have been developed: 102 text-based, 18 image-based, 12 video-based, and 7 based in simulated physical environments. It gives more detailed descriptions of twelve of these, three from each category. It surveys the various methods used to construct commonsense benchmarks. It discusses the nature of common sense, the role of common sense in AI, the goals served by constructing commonsense benchmarks, desirable features of commonsense benchmarks, and flaws and gap in existing benchmarks. It concludes with a number of recommendations for future development of commonsense AI benchmarks; most importantly, that the creators of benchmarks invest the work needed to ensure that benchmark examples are consistently high quality.
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems
arXiv (Cornell University) · 2023-08-10 · 9 citations
preprintOpen access1st authorCorrespondingThis report describes a test of the large language model GPT-4 with the Wolfram Alpha and the Code Interpreter plug-ins on 105 original problems in science and math, at the high school and college levels, carried out in June-August 2023. Our tests suggest that the plug-ins significantly enhance GPT's ability to solve these problems. Having said that, there are still often "interface" failures; that is, GPT often has trouble formulating problems in a way that elicits useful answers from the plug-ins. Fixing these interface failures seems like a central challenge in making GPT a reliable tool for college-level calculation problems.
Mathematics, word problems, common sense, and artificial intelligence
arXiv (Cornell University) · 2023-01-23 · 8 citations
preprintOpen access1st authorCorrespondingThe paper discusses the capacities and limitations of current artificial intelligence (AI) technology to solve word problems that combine elementary knowledge with commonsense reasoning. No existing AI systems can solve these reliably. We review three approaches that have been developed, using AI natural language technology: outputting the answer directly, outputting a computer program that solves the problem, and outputting a formalized representation that can be input to an automated theorem verifier. We review some benchmarks that have been developed to evaluate these systems and some experimental studies. We discuss the limitations of the existing technology at solving these kinds of problems. We argue that it is not clear whether these kinds of limitations will be important in developing AI technology for pure mathematical research, but that they will be important in applications of mathematics, and may well be important in developing programs capable of reading and understanding mathematical content written by humans.
Physical Reasoning in an Open World
arXiv (Cornell University) · 2022-01-22
preprintOpen accessSenior authorMost work on physical reasoning, both in artificial intelligence and in cognitive science, has focused on closed-world reasoning, in which it is assumed that the problem specification specifies all relevant objects and substance, all their relations in an initial situation, and all exogenous events. However, in many situations, it is important to do open-world reasoning; that is, making valid conclusions from very incomplete information. We have implemented in Prolog an open-world reasoner for a toy microworld of containers that can be loaded, unloaded, sealed, unsealed, carried, and dumped.
The Defeat of the Winograd Schema Challenge
arXiv (Cornell University) · 2022-01-07 · 4 citations
preprintOpen accessThe Winograd Schema Challenge - a set of twin sentences involving pronoun reference disambiguation that seem to require the use of commonsense knowledge - was proposed by Hector Levesque in 2011. By 2019, a number of AI systems, based on large pre-trained transformer-based language models and fine-tuned on these kinds of problems, achieved better than 90% accuracy. In this paper, we review the history of the Winograd Schema Challenge and discuss the lasting contributions of the flurry of research that has taken place on the WSC in the last decade. We discuss the significance of various datasets developed for WSC, and the research community's deeper understanding of the role of surrogate tasks in assessing the intelligence of an AI system.
Limits of an AI program for solving college math problems
arXiv (Cornell University) · 2022 · 41 citations
1st authorCorresponding- Computer Science
- Artificial Intelligence
- Computer Science
Drori et al. (2022) report that "A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level ... [It] automatically answers 81\% of university-level mathematics problems." The system they describe is indeed impressive; however, the above description is very much overstated. The work of solving the problems is done, not by a neural network, but by the symbolic algebra package Sympy. Problems of various formats are excluded from consideration. The so-called "explanations" are just rewordings of lines of code. Answers are marked as correct that are not in the form specified in the problem. Most seriously, it seems that in many cases the system uses the correct answer given in the test corpus to guide its path to solving the problem.
Recent grants
Automating Commonsense Reasoning for Elementary Physical Science
NSF · $329k · 2006–2010
Frequent coauthors
- 35 shared
Gary Marcus
- 18 shared
Leora Morgenstern
Palo Alto Research Center
- 16 shared
Yanjun Ma
China Southern Power Grid (China)
- 16 shared
Kenneth Church
- 16 shared
Valia Kordoni
Humboldt-Universität zu Berlin
- 16 shared
Zeyu Chen
Soochow University
- 9 shared
Thomas Lukasiewicz
- 7 shared
Vid Kocijan
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ernest Davis
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup