Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
David Karger

David Karger

Verified

Massachusetts Institute of Technology · Electrical Engineering & Computer Science

Active 1992–2024

h-index95
Citations53.4k
Papers38451 last 5y
Funding$1.0M
See your match with David Karger — sign in to PhdFit.Sign in

Research topics

  • Computer Science
  • Artificial Intelligence
  • Machine Learning
  • Psychology
  • Data Mining
  • Natural Language Processing
  • Epistemology
  • Engineering
  • Data science
  • Aerospace engineering
  • Mathematics
  • Mathematics education
  • Combinatorics

Selected publications

  • New Methods for Confusion Detection in Course Forums: Student, Teacher, and Machine

    IEEE Transactions on Learning Technologies · 2021 · 9 citations

    Senior authorCorresponding
    • Computer Science
    • Computer Science
    • Artificial Intelligence

    This article provides computational and rule-based approaches for detecting confusion that is expressed in students' comments in couse forums. To obtain reliable, ground truth data about which posts exhibit student confusion, we designed a decision tree that facilitates the manual labeling of forum posts by experts. However, manual labeling is costly in time and resources, which limits the amount of data that can be generated using this process. Our strategy for overcoming these limitations was to generate rules for detecting confusion based on student input via hashtags, which reflect the student's affective states. We show that the resulting rules closely align with the ground truth judgement of experts. We next applied these rules to datasets of students' forum posts in a large-scale biology course, thereby automatically generating thousands of labeled instances of “confused posts.” Finally, the resulting dataset was used to train a machine learning model for detecting whether students' posts exhibit confusion in the absence of hashtags. In this task, the pretrained language model based on bidirectional encoder representation from transformers (BERT) was able to outperform traditional machine learning models for classifying confusion in posts. This model was also able to generalize and detect student confusion across different offerings of the same course. Ultimately, the use of pretrained language models of this type will provide teachers with better technologies for detecting and alleviating confusion in online discussion forums by leveraging the combined input of teachers and students.

  • Seeding Course Forums using the Teacher-in-the-Loop

    2021 · 3 citations

    • Computer Science
    • Computer Science
    • Mathematics education

    Online forums are an integral part of modern day courses, but motivating students to participate in educationally beneficial discussions can be challenging. Our proposed solution is to initialize (or “seed”) a new course forum with comments from past instances of the same course that are intended to trigger discussion that is beneficial to learning. In this work, we develop methods for selecting high-quality seeds and evaluate their impact over one course instance of a 186-student biology class. We designed a scale for measuring the “seeding suitability” score of a given thread (an opening comment and its ensuing discussion). We then constructed a supervised machine learning (ML) model for predicting the seeding suitability score of a given thread. This model was evaluated in two ways: first, by comparing its performance to the expert opinion of the course instructors on test/holdout data; and second, by embedding it in a live course, where it was actively used to facilitate seeding by the course instructors. For each reading assignment in the course, we presented a ranked list of seeding recommendations to the course instructors, who could review the list and filter out seeds with inconsistent or malformed content. We then ran a randomized controlled study, in which one group of students was shown seeds that were recommended by the ML model, and another group was shown seeds that were recommended by an alternative model that ranked seeds purely by the length of discussion that was generated in previous course instances. We found that the group of students that received posts from either seeding model generated more discussion than a control group in the course that did not get seeded posts. Furthermore, students who received seeds selected by the ML-based model showed higher levels of engagement, as well as greater learning gains, than those who received seeds ranked by length of discussion.

  • #Confused and beyond

    2020 · 13 citations

    • Computer Science
    • Computer Science
    • Artificial Intelligence

    Students' confusion is a barrier for learning, contributing to loss of motivation and to disengagement with course materials. However, detecting students' confusion in large-scale courses is both time and resource intensive. This paper provides a new approach for confusion detection in online forums that is based on harnessing the power of students' self-reported affective states (reported using a set of pre-defined hashtags). It presents a rule for labeling confusion, based on students' hashtags in their posts, that is shown to align with teachers' judgement. We use this labeling rule to inform the design of an automated classifier for confusion detection for the case when there are no self-reported hashtags present in the test set. We demonstrate this approach in a large scale Biology course using the Nota Bene annotation platform. This work lays the foundation to empower teachers with better support tools for detecting and alleviating confusion in online courses.

  • ARDA

    Proceedings of the VLDB Endowment · 2020 · 63 citations

    Senior authorCorresponding
    • Computer Science
    • Computer Science
    • Machine Learning

    Automatic machine learning (AML) is a family of techniques to automate the process of training predictive models, aiming to both improve performance and make machine learning more accessible. While many recent works have focused on aspects of the machine learning pipeline like model selection, hyperparameter tuning, and feature selection, relatively few works have focused on automatic data augmentation. Automatic data augmentation involves finding new features relevant to the user's predictive task with minimal "human-in-the-loop" involvement. We present ARDA, an end-to-end system that takes as input a dataset and a data repository, and outputs an augmented data set such that training a predictive model on this augmented dataset results in improved performance. Our system has two distinct components: (1) a framework to search and join data with the input data, based on various attributes of the input, and (2) an efficient feature selection algorithm that prunes out noisy or irrelevant features from the resulting join. We perform an extensive empirical evaluation of different system components and benchmark our feature selection algorithm on real-world datasets.

Recent grants

Frequent coauthors

Education

  • Ph.D., Computer Science

    Stanford University

    1995
  • A.B., Computer Science

    Harvard University

    1989
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with David Karger

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup