Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Jennifer Dy

Jennifer Dy

· Professor, Jointly Appointed with College of EngineeringVerified

Northeastern University · Artificial Intelligence and Data Science

Active 1998–2025

h-index49
Citations11.3k
Papers350124 last 5y
Funding$5.6M
See your match with Jennifer Dy — sign in to PhdFit.Sign in

About

Jennifer G. Dy is a professor in the College of Engineering and Khoury College of Computer Sciences at Northeastern University, based in Boston. She received her PhD in Electrical and Computer Engineering from Purdue University, along with an MS in Electrical and Computer Engineering from Purdue University and a BS in Electrical Engineering from the University of the Philippines. Her research interests include machine learning, data mining, statistical pattern recognition, computer vision, and image processing. She has been recognized with an NSF Career award in 2004 and has served as an editorial board member for the journal Machine Learning since 2004. Additionally, she has been a program committee member for prominent conferences such as ICML, ACM SIGKDD, AAAI, and SIAM SDM, and was the publications chair for the International Conference on Machine Learning in 2004.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Machine Learning
  • Computer Security
  • Cognitive psychology
  • Computer network
  • Telecommunications

Selected publications

  • On the Role of Calibration in Benchmarking Algorithmic Fairness for Skin Cancer Detection

    The Journal of Machine Learning for Biomedical Imaging · 2025-10-29

    articleOpen accessSenior author

    Artificial Intelligence (AI) models have demonstrated expert-level performance in melanoma detection, yet their clinical adoption is hindered by performance disparities across demographic subgroups such as gender, race, and age. Previous efforts to benchmark the performance of AI models have primarily focused on assessing model performance using group fairness metrics that rely on the Area Under the Receiver Operating Characteristic curve (AUROC), which does not provide insights into a model’s ability to provide accurate estimates. In line with clinical assessments, this paper addresses this gap by incorporating calibration as a complementary benchmarking metric to AUROC-based fairness metrics. Calibration evaluates the alignment between predicted probabilities and observed event rates, offering deeper insights into subgroup biases. We assess the performance of the leading skin cancer detection algorithm of the ISIC 2020 Challenge on the ISIC 2020 Challenge dataset and the PROVE-AI dataset, and compare it with the second- and third-place models, focusing on subgroups defined by sex, race (Fitzpatrick Skin Tone), and age. Our findings reveal that while existing models enhance discriminative accuracy, they often over-diagnose risk and exhibit calibration issues when applied to new datasets. This study underscores the necessity for comprehensive model auditing strategies and extensive metadata collection to achieve equitable AI-driven healthcare solutions.

  • H-SPLID: HSIC-based Saliency Preserving Latent Information Decomposition

    ArXiv.org · 2025-10-23

    preprintOpen access

    We introduce H-SPLID, a novel algorithm for learning salient feature representations through the explicit decomposition of salient and non-salient features into separate spaces. We show that H-SPLID promotes learning low-dimensional, task-relevant features. We prove that the expected prediction deviation under input perturbations is upper-bounded by the dimension of the salient subspace and the Hilbert-Schmidt Independence Criterion (HSIC) between inputs and representations. This establishes a link between robustness and latent representation compression in terms of the dimensionality and information preserved. Empirical evaluations on image classification tasks show that models trained with H-SPLID primarily rely on salient input components, as indicated by reduced sensitivity to perturbations affecting non-salient features, such as image backgrounds. Our code is available at https://github.com/neu-spiral/H-SPLID.

  • Deep Learning of Suboptimal Spirometry to Predict Respiratory Outcomes and Mortality

    Research Square · 2025-06-30 · 1 citations

    preprintOpen access
  • 0123 Predicting malignancy in longitudinally monitored skin lesions with ai

    Journal of Investigative Dermatology · 2025-07-21

    articleOpen access
  • LVT: Large-Scale Scene Reconstruction via Local View Transformers

    2025-12-08

    preprintOpen access

    Large transformer models are proving to be a powerful tool for 3D vision and novel view synthesis. However, the standard Transformer’s well-known quadratic complexity makes it difficult to scale these methods to large scenes. To address this challenge, we propose the Local View Transformer (LVT), a large-scale scene reconstruction and novel view synthesis architecture that circumvents the need for the quadratic attention operation. Motivated by the insight that spatially nearby views provide more useful signal about the local scene composition than distant views, our model processes all information in a local neighborhood around each view. To attend to tokens in nearby views, we leverage a novel positional encoding that conditions on the relative geometric transformation between the query and nearby views. We decode the output of our model into a 3D Gaussian Splat scene representation that includes both color and opacity view-dependence. Taken together, the Local View Transformer enables reconstruction of arbitrarily large, high-resolution scenes in a single forward pass. See our project page for results and interactive demos: https://toobaimt.github.io/lvt/.

  • Author Correction: Context-aware experience sampling reveals the scale of variation in affective experience

    UNC Libraries · 2025-05-30

    articleOpen access1st authorCorresponding
  • Role of CT-quantified Local Histogram Emphysema Patterns in Disease Clustering and COPD Outcomes

    American Journal of Respiratory and Critical Care Medicine · 2025-05-01

    article

    Abstract Rationale: Emphysema, a hallmark of COPD, exhibits substantial heterogeneity in severity and anatomical distribution. The relationship between specific emphysema patterns and clinical outcomes remains incompletely characterized. Objectives: To determine if distinct emphysema patterns, identified through CT-based local histogram analysis and clustering techniques, are associated with specific COPD-related outcomes. Methods: We performed local histogram analysis of lung density from Visit 1 chest CT scans in the COPDGene cohort, quantifying low attenuation areas as paraseptal, centrilobular, or panlobular emphysema. K-medoids clustering was applied to identify distinct emphysema pattern groups. Cross-sectional and longitudinal associations with COPD-related outcomes were assessed using univariable and multivariable analyses. Clinical and imaging differences between MM and MZ smokers were also analysed. Results: In 9,167 non-Hispanic White and African American smokers, four distinct clusters emerged, characterized by varying distributions of paraseptal, panlobular, and centrilobular emphysema (P-values<0.001). These clusters demonstrated significant associations with smoking status, dyspnea scores, frequency of respiratory exacerbations, 5-year lung function decline and emphysema progression, and self-reported cardiometabolic comorbidities. MZ smokers exhibited greater emphysema per pack-year of smoking compared to MM smokers and were predominantly represented in the severe emphysema cluster. All associations remained significant after adjustment for potential confounders. Conclusion: CT-based emphysema patterns identified through local histogram analysis and clustering are associated with distinct clinical outcomes. The findings also underscore the heightened susceptibility of MZ smokers to severe emphysema patterns. Ongoing multi-omics and validation studies may reveal the molecular mechanisms of emphysema and foster personalized treatment approaches.

  • DISCO: Disentangled Communication Steering for Large Language Models

    ArXiv.org · 2025-09-20

    preprintOpen accessSenior author

    A variety of recent methods guide large language model outputs via the inference-time addition of steering vectors to residual-stream or attention-head representations. In contrast, we propose to inject steering vectors directly into the query and value representation spaces within attention heads. We provide evidence that a greater portion of these spaces exhibit high linear discriminability of concepts --a key property motivating the use of steering vectors-- than attention head outputs. We analytically characterize the effect of our method, which we term DISentangled COmmunication (DISCO) Steering, on attention head outputs. Our analysis reveals that DISCO disentangles a strong but underutilized baseline, steering attention inputs, which implicitly modifies queries and values in a rigid manner. In contrast, DISCO's direct modulation of these components enables more granular control. We find that DISCO achieves superior performance over a number of steering vector baselines across multiple datasets on LLaMA 3.1 8B and Gemma 2 9B, with steering efficacy scoring up to 19.1% higher than the runner-up. Our results support the conclusion that the query and value spaces are powerful building blocks for steering vector methods.

  • Structured variation in daily life experience within and across individuals

    2025-07-17

    articleOpen access

    Human experience varies across contexts and individuals. Yet, psychological studies typically constrain rather than discover this structured variation. We demonstrate an alternative approach that samples deeply and broadly to discover reliable person-specific, multimodal patterns of daily life experience. Ninety-seven healthy adults wore cardiac monitors for 8 hours/day for 14 days and reported current valence, arousal, primary activity, social context, and emotions (via free report) when prompted following a substantial cardiac interbeat interval change (and twice randomly each day). From each event (10,755 total, M=110.9 events/person), we extracted cardiovascular, postural, affective, and contextual features. Integrative clustering of these features identified 313 multimodal patterns (M=3.2 patterns/person), which were largely person-specific, with 81.7% of patterns being unique to one person. The pattern-distinguishing features also varied by person. Finally, self-generated emotion labels had many-to-many mappings with multimodal patterns. Our approach has broad utility and provides further evidence that emotions are diverse populations of instances.

  • OrdShap: Feature Position Importance for Sequential Black-Box Models

    ArXiv.org · 2025-07-16

    preprintOpen accessSenior author

    Sequential deep learning models excel in domains with temporal or sequential dependencies, but their complexity necessitates post-hoc feature attribution methods for understanding their predictions. While existing techniques quantify feature importance, they inherently assume fixed feature ordering - conflating the effects of (1) feature values and (2) their positions within input sequences. To address this gap, we introduce OrdShap, a novel attribution method that disentangles these effects by quantifying how a model's predictions change in response to permuting feature position. We establish a game-theoretic connection between OrdShap and Sanchez-Bergantiños values, providing a theoretically grounded approach to position-sensitive attribution. Empirical results from health, natural language, and synthetic datasets highlight OrdShap's effectiveness in capturing feature value and feature position attributions, and provide deeper insight into model behavior.

Recent grants

Frequent coauthors

Labs

  • Khoury College of Computer SciencesPI

Education

  • Ph.D., Electrical Engineering and Computer Science

    Massachusetts Institute of Technology

    2006
  • M.S., Electrical Engineering and Computer Science

    Massachusetts Institute of Technology

    2002
  • B.S., Electrical Engineering and Computer Science

    University of California, Berkeley

    2000

Awards & honors

  • NSF Career Award (2004)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Jennifer Dy

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup