Richard Scheines

· The Bess Family Dean of the Marianna Brown Dietrich College of Humanities and Social Sciences

Carnegie Mellon University · Philosophy

Active 1986–2018

h-index35

Citations13.7k

Papers141

Funding—

Faculty page Lab page

OpenAlex

See your match with Richard Scheines — sign in to PhdFit.Sign in

About

Richard Scheines is the Bess Family Dean of the Marianna Brown Dietrich College of Humanities and Social Sciences at Carnegie Mellon University, where he has served as a professor since 2003 and as Dean since 2014. His research focuses on causal discovery, particularly the problem of learning about causation from statistical evidence. This work is embodied in the TETRAD project, which represents nearly 25 years of collaboration with Clark Glymour, Peter Spirtes, and others, and involves building efficient algorithms for causal discovery that integrate computer science and philosophy. Scheines holds a Ph.D. in History and Philosophy of Science from the University of Pittsburgh, with a thesis on causal models in the social sciences. His areas of specialization include Philosophy of Science (Causation), Artificial Intelligence (Machine Learning), and Educational Computing (Online Courses and Virtual Labs). He has courtesy appointments in the Machine Learning Department and the Human-Computer Interaction Institute at Carnegie Mellon. His professional activities include visiting scholar positions at UC Berkeley, UCLA, and the University of Groningen, and he has been recognized with awards such as the Causality in Statistics Education Award in 2013. His work extends into educational software development, causal inference, and policy advisory roles, contributing significantly to the fields of philosophy of science, machine learning, and educational data mining.

Research topics

Computer science
Mathematics
Econometrics
Artificial intelligence
Psychology

Selected publications

Analysis of Microarray Data for Treated Fat Cells
Figshare · 2018-06-29
articleOpen access
DNA microarrays are perfectly suited for comparing gene expression in different populations of cells. An important application of microarray techniques is identifying genes which are activated by a particular drug of interest. This process will allow biologists to identify therapies targeted to particular diseases, and, eventually, to gain more knowledge about the biological processes in organisms. Such an application is described in this paper. It is focused on diabetes and obesity, which is a genetically heterogeneous disease, meaning that multiple defective genes are responsible for the diseases. The paper is divided in three parts, each dealing with a different problem addressed to our study. First we validate the data from our microarray experiment. We identified significant systematic sources of variability which are potentially issues for other microarray datasets. Second, we applied multiple hypothesis testing to identify differentially expressed genes. We found a set of genes which appear to change in expression level over time in response to a drug treatment. Third, we tried to address the problem of identification of co-expressed genes using cluster analysis. This last problem is still under discussion.
OA PDF DOI
Student Profiling from Tutoring System Log Data: When do Multiple Graphical Representations Matter?
Figshare · 2018-06-29 · 2 citations
articleOpen accessSenior author
We analyze log-data generated by an experiment with Mathtutor, an intelligent tutoring system for fractions. The experiment compares the educational effectiveness of instruction with single and multiple graphical representations. We extract the error-making and hint-seeking behaviors of each student to characterize their learning strategy. Using an expectation-maximization approach, we cluster the students by their strategic profile. We find that a) experimental condition and learning outcome are clearly associated b) experimental condition and learning strategy are not, and c) almost all of the association between experimental condition and learning outcome is found among students implementing just one of the learning strategies we identify. This class of students is characterized by relatively high rates of error as well as a marked reluctance to seek help. They also show the greatest educational gains from instruction with multiple rather than single representations. The behaviors that characterize this group illuminate the mechanism underlying the effectiveness of multiple representations and suggest strategies for tailoring instruction to individual students. Our methodology can be implemented in an on-line tutoring system to dynamically tailor individualized instruction.
OA PDF DOI
An Experimental Comparison of Alternative Proof Construction Environments
Research Showcase @ Carnegie Mellon University (Carnegie Mellon University) · 2018-01-01 · 3 citations
articleOpen access1st authorCorresponding
Abstract: "In this paper we compare computerized environments in which students complete proof construction exercises in formal logic. Afterbeing given a pretest for logical aptitude, three matched groups were presented identical course material on logic for approximately five weeks by a computer. During the treatment, all students were required to complete several hundred proof construction exercises. The three groups did the exercises and the midterm in different environments. The group with a more sophisticated interface performed better on the midterm. Nearly all the difference in performance showed up in the harder problems. In a follow up experiment in which flexible strategic problem solving help was added to the environment, performance improved slightly, but the data are inconclusive."
Publisher OA PDF DOI
Genetic Algorithm Search Over Causal Models
Figshare · 2018-06-29 · 14 citations
articleOpen accessSenior author
Shane Harwood and Richard Scheines. Genetic Algorithm Search Over Causal Models.
OA PDF DOI
Estimating Latent Causal Influences: TETRAD III Variable Selection and Bayesian Parameter Estimation
Figshare · 2018-06-29 · 1 citations
articleOpen access1st authorCorresponding
The statistical evidence for the detrimental effect of exposure to low levels of lead on the cognitive capacities of children has been debated for several decades. In this paper I describe how two techniques from artificial intelligence and statistics help make the statistical evidence for the accepted epidemiological conclusion seem decisive. The first is a variable-selection routine in TETRAD III for finding causes, and the second a Bayesian estimation of the parameter reflecting the causal influence of Actual Lead Exposure, a latent variable, on the measured IQ score of middle class suburban children.
Publisher OA PDF DOI
Time and Attention: Students, Sessions, and Tasks
Research Showcase @ Carnegie Mellon University (Carnegie Mellon University) · 2018-01-01 · 11 citations
article
Students in two classes in the fall of 2004 making extensive use of online courseware were logged as they visited over 500 different “learning pages” which varied in length and in difficulty. We computed the time spent on each page by each student during each session they were logged in. We then modeled the time spent for a particular visit as a function of the page itself, the session, and the student. Surprisingly, the average time a student spent on learning pages (over their whole course experience) was of almost no value in predicting how long they would spend on a given page, even controlling for the session and page difficulty. The page itself was highly predictive, but so was the average time spent on learning pages in a given session. This indicates that local considerations, e.g., mood, deadline proximity, etc., play a much greater role in determining student pace and attention than do intrinsic student traits. We also consider the average time spent on learning pages as a function of the time of semester. Students spent less time on pages later in the semester, even for more demanding material.
Publisher DOI
Is the Doer Effect Robust across Multiple Data Sets
Educational Data Mining · 2018-07-01 · 9 citations
article
Publisher
Causation, Truth, and the Law
Brooklyn law review · 2018-06-29 · 7 citations
articleOpen access1st authorCorresponding
Department of Philosophy technical report
Publisher OA PDF DOI
Searching for Variables and Models to Investigate Mediators of Learning from Multiple Representations
Figshare · 2018-06-29 · 14 citations
articleOpen accessSenior author
Although learning from multiple representations has been shown to be effective in a variety of domains, little is known about the mechanisms by which it occurs. We analyzed log data on error-rate, hint-use, and time-spent obtained from two experiments with a Cognitive Tutor for fractions. The goal of the experiments was to compare learning from multiple graphical representations of fractions to learning from a single graphical representation. Finding that a simple statistical model did not fit data from either experiment, we searched over all possible mediation models consistent with background knowledge, finding several that fit the data well. We also searched over alternative measures of student error-rate, hint-use, and time-spent to see if our data were better modeled with simple monotonic or u-shaped non-monotonic relationships. We found no evidence for non-monotonicity. No matter what measures we used, time-spent was irrelevant, and hint-use was only occasionally relevant. Although the total effect of multiple representations on learning was positive, they also had a negative effect on learning, mediated by a higher error-rate. Our evidence suggests that multiple representations increase error-rate, which in turn inhibits learning. The mechanisms by which multiple representations improve learning are as yet unmodeled
OA PDF DOI
Unidimensional Linear Latent Variable Models
Research Showcase @ Carnegie Mellon University (Carnegie Mellon University) · 2018-06-29 · 3 citations
articleOpen access1st authorCorresponding
Abstract: "Linear structural equation models with latent (unmeasured) variables are used widely in sociology, psychometrics, and political science. When such models have a unidimensional (pure) measurement model (Gerbing and Anderson 82, 88; Scheines 92) they imply constraints on the measured covariances which can be used to either confirm unidimensionality or find submodels which are unidimensional. Assuming unidimensionality, the causal relations among the latent variables can be partially determined by examining other (related) constraints on the measured covariances.In this paper I prove first that unidimensionality is detectible from constraints on only the measured covariances no matter what the structure among latent variables, and second that in a structural equation model with a unidimensional measurement model, for any three latents T[subscript i], T[subscript j], and T[subscript k], [rho]T[subscript i],T[subscript j].T[subscript k] = 0 only if certain constraints hold on only the measured covariances."
Publisher OA PDF DOI

Frequent coauthors

Clark Glymour
82 shared
Peter Spirtes
Carnegie Mellon University
78 shared
Justin Sytsma
Victoria University of Wellington
16 shared
Édouard Machery
16 shared
Jonathan Livengood
University of Illinois Urbana-Champaign
16 shared
Adam Feltz
University of Oklahoma
16 shared
Kevin T. Kelly
15 shared
Christopher Meek
7 shared

Education

Ph.D., History and Philosophy of Science
University of Pittsburgh
1987

Awards & honors

Causality in Statistics Education Award – 2013
Best Paper Award – 2013 6th International Workshop on Educat…
Best Paper Award – 2008 1st International Workshop on Educat…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Richard Scheines

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you