Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Yue  Lu

Yue Lu

· Gordon McKay Professor of Electrical Engineering and of Applied MathematicsVerified

Harvard University · Biostatistics

Active 2004–2026

h-index30
Citations3.8k
Papers17346 last 5y
Funding$1.4M
See your match with Yue Lu — sign in to PhdFit.Sign in

About

Yue M. Lu is the Gordon McKay Professor of Electrical Engineering and of Applied Mathematics at the Harvard John A. Paulson School of Engineering and Applied Sciences, with an affiliate appointment in the Department of Statistics. Since 2024, he has also held the title of Harvard College Professor. His research lies at the intersection of probability, statistics, signal processing, information theory, and data science, with an emphasis on randomness and structure in high-dimensional systems. He develops probabilistic methods for high-dimensional statistical inference and learning, including the fundamental limits of estimation and algorithmic performance. This work aims to bring mathematical clarity to problems in modern statistical learning, signal processing, and information processing, contributing to a deeper understanding of these rapidly developing fields. During the 2024–26 academic years, he is serving as Director of Graduate Studies for Applied Mathematics. He is also a faculty affiliate of the Harvard Data Science Initiative and the Center of Mathematical Sciences and Applications (CMSA).

Research topics

  • Computer Science
  • Artificial Intelligence
  • Chromatography
  • Mathematics
  • Applied mathematics
  • Organic chemistry
  • Chemistry
  • Materials science
  • Physics
  • Quantum mechanics
  • Statistical physics

Selected publications

  • Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

    ArXiv.org · 2026-05-06

    articleOpen accessSenior author

    How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding. We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale $d^2\asymp n$. At load $n/d^2\toα$, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold $d^2\sim 2n\log n$.

  • Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

    arXiv (Cornell University) · 2026-05-06

    preprintOpen accessSenior author

    How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding. We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale $d^2\asymp n$. At load $n/d^2\toα$, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold $d^2\sim 2n\log n$.

  • Asymptotic Theory of Iterated Empirical Risk Minimization, with Applications to Active Learning

    Open MIND · 2026-01-30

    preprintSenior author

    We study a class of iterated empirical risk minimization (ERM) procedures in which two successive ERMs are performed on the same dataset, and the predictions of the first estimator enter as an argument in the loss function of the second. This setting, which arises naturally in active learning and reweighting schemes, introduces intricate statistical dependencies across samples and fundamentally distinguishes the problem from classical single-stage ERM analyses. For linear models trained with a broad class of convex losses on Gaussian mixture data, we derive a sharp asymptotic characterization of the test error in the high-dimensional regime where the sample size and ambient dimension scale proportionally. Our results provide explicit, fully asymptotic predictions for the performance of the second-stage estimator despite the reuse of data and the presence of prediction-dependent losses. We apply this theory to revisit a well-studied pool-based active learning problem, removing oracle and sample-splitting assumptions made in prior work. We uncover a fundamental tradeoff in how the labeling budget should be allocated across stages, and demonstrate a double-descent behavior of the test error driven purely by data selection, rather than model size or sample count.

  • Sharp Spectral Thresholds for Multi-View Spiked Wigner Models

    ArXiv.org · 2026-05-19

    articleOpen accessSenior author

    Motivated by multimodal estimation, we study a multi-view spiked Wigner model in which several noisy matrix observations contain correlated latent spikes. We derive a spectral estimator for the latent spikes by linearizing approximate message passing (AMP). Our main result is an explicit sharp transition formula for its spectrum: for $L \geq 2$ views, letting $λ$ be the $L$-dimensional vector of spike strengths and $B$ the $L\times L$ limiting Gram matrix of the spikes, the critical parameter is $\mathsf{SNR}(λ,B)=λ_{\max}[\mathrm{Diag}(\sqrtλ) (B \odot B) \mathrm{Diag}(\sqrtλ)]$. When $\mathsf{SNR}(λ,B)<1$, the linearized AMP matrix has no outlier beyond the right edge of its bulk spectrum. When $\mathsf{SNR}(λ,B)>1$, an informative outlier is pinned at the distinguished point $1$, and the associated eigenvector has explicit, nontrivial overlaps with the latent signals. Thus $\mathsf{SNR}(λ,B)=1$ gives the exact spectral weak-recovery threshold for the linearized AMP method. To establish our results, we analyze the correlated Gaussian noise matrix through a matrix Dyson equation and combine this deterministic description with finite-rank perturbation arguments adapted to the multi-view spike structure. We also show that, for a broad class of spike priors, the spectral threshold $\mathsf{SNR}(λ,B)=1$ coincides with the information-theoretic threshold for weak recovery, ruling out a statistical-computational gap for this class of priors.

  • Asymptotic Theory of Iterated Empirical Risk Minimization, with Applications to Active Learning

    ArXiv.org · 2026-01-30

    articleOpen accessSenior author

    We study a class of iterated empirical risk minimization (ERM) procedures in which two successive ERMs are performed on the same dataset, and the predictions of the first estimator enter as an argument in the loss function of the second. This setting, which arises naturally in active learning and reweighting schemes, introduces intricate statistical dependencies across samples and fundamentally distinguishes the problem from classical single-stage ERM analyses. For linear models trained with a broad class of convex losses on Gaussian mixture data, we derive a sharp asymptotic characterization of the test error in the high-dimensional regime where the sample size and ambient dimension scale proportionally. Our results provide explicit, fully asymptotic predictions for the performance of the second-stage estimator despite the reuse of data and the presence of prediction-dependent losses. We apply this theory to revisit a well-studied pool-based active learning problem, removing oracle and sample-splitting assumptions made in prior work. We uncover a fundamental tradeoff in how the labeling budget should be allocated across stages, and demonstrate a double-descent behavior of the test error driven purely by data selection, rather than model size or sample count.

  • A Random Matrix Theory of Masked Self-Supervised Regression

    ArXiv.org · 2026-01-30

    articleOpen accessSenior author

    In the era of transformer models, masked self-supervised learning (SSL) has become a foundational training paradigm. A defining feature of masked SSL is that training aggregates predictions across many masking patterns, giving rise to a joint, matrix-valued predictor rather than a single vector-valued estimator. This object encodes how coordinates condition on one another and poses new analytical challenges. We develop a precise high-dimensional analysis of masked modeling objectives in the proportional regime where the number of samples scales with the ambient dimension. Our results provide explicit expressions for the generalization error and characterize the spectral structure of the learned predictor, revealing how masked modeling extracts structure from data. For spiked covariance models, we show that the joint predictor undergoes a Baik--Ben Arous--Péché (BBP)-type phase transition, identifying when masked SSL begins to recover latent signals. Finally, we identify structured regimes in which masked self-supervised learning provably outperforms PCA, highlighting potential advantages of SSL objectives over classical unsupervised methods

  • A Random Matrix Theory of Masked Self-Supervised Regression

    Open MIND · 2026-01-30

    preprintSenior author

    In the era of transformer models, masked self-supervised learning (SSL) has become a foundational training paradigm. A defining feature of masked SSL is that training aggregates predictions across many masking patterns, giving rise to a joint, matrix-valued predictor rather than a single vector-valued estimator. This object encodes how coordinates condition on one another and poses new analytical challenges. We develop a precise high-dimensional analysis of masked modeling objectives in the proportional regime where the number of samples scales with the ambient dimension. Our results provide explicit expressions for the generalization error and characterize the spectral structure of the learned predictor, revealing how masked modeling extracts structure from data. For spiked covariance models, we show that the joint predictor undergoes a Baik--Ben Arous--Péché (BBP)-type phase transition, identifying when masked SSL begins to recover latent signals. Finally, we identify structured regimes in which masked self-supervised learning provably outperforms PCA, highlighting potential advantages of SSL objectives over classical unsupervised methods

  • Sharp Spectral Thresholds for Multi-View Spiked Wigner Models

    arXiv (Cornell University) · 2026-05-19

    preprintOpen accessSenior author

    Motivated by multimodal estimation, we study a multi-view spiked Wigner model in which several noisy matrix observations contain correlated latent spikes. We derive a spectral estimator for the latent spikes by linearizing approximate message passing (AMP). Our main result is an explicit sharp transition formula for its spectrum: for $L \geq 2$ views, letting $λ$ be the $L$-dimensional vector of spike strengths and $B$ the $L\times L$ limiting Gram matrix of the spikes, the critical parameter is $\mathsf{SNR}(λ,B)=λ_{\max}[\mathrm{Diag}(\sqrtλ) (B \odot B) \mathrm{Diag}(\sqrtλ)]$. When $\mathsf{SNR}(λ,B)&lt;1$, the linearized AMP matrix has no outlier beyond the right edge of its bulk spectrum. When $\mathsf{SNR}(λ,B)&gt;1$, an informative outlier is pinned at the distinguished point $1$, and the associated eigenvector has explicit, nontrivial overlaps with the latent signals. Thus $\mathsf{SNR}(λ,B)=1$ gives the exact spectral weak-recovery threshold for the linearized AMP method. To establish our results, we analyze the correlated Gaussian noise matrix through a matrix Dyson equation and combine this deterministic description with finite-rank perturbation arguments adapted to the multi-view spike structure. We also show that, for a broad class of spike priors, the spectral threshold $\mathsf{SNR}(λ,B)=1$ coincides with the information-theoretic threshold for weak recovery, ruling out a statistical-computational gap for this class of priors.

  • The impact of non-steroidal anti-inflammatory drugs on postoperative bleeding in children undergoing tonsillectomy: a meta-analysis of randomized controlled trials

    International Journal of Surgery · 2025-05-16 · 1 citations

    reviewOpen access1st author

    BACKGROUND: Tonsillectomy in children is a common procedure to treat obstructive sleep apnea and other respiratory conditions. However, it is associated with postoperative complications, including bleeding, pain, and postoperative nausea and vomiting. Non-steroidal anti-inflammatory drugs are often used for pain management, but their effects on postoperative bleeding risk remains controversial. This meta-analysis aims to evaluate the impact of NSAIDs on postoperative bleeding and PONV in children undergoing tonsillectomy by synthesizing evidence from RCTs. MATERIALS AND METHODS: This study was conducted strictly in accordance with the PRISMA guidelines. After a thorough search of databases such as CNKI, Wanfang, Sinomed, PubMed, Embase, and the Cochrane Library, 26 randomized controlled trials were eventually included. Meta-analysis was performed using STATA software, and the impact on postoperative bleeding and postoperative nausea and vomiting was evaluated by relative risk (RR) and 95% confidence interval (CI). RESULTS: A total of 26 randomized controlled trials were included in the meta-analysis, involving 2717 pediatric patients. The risk of total postoperative bleeding [1.19, 95% CI (0.9, 1.58)], primary bleeding [RR = 1.13, 95% CI (0.77-1.65)], and secondary bleeding [RR = 1.36, 95% CI (0.86-2.14)] was not significantly affected by the use of NSAIDs during the perioperative period. Subgroup analysis showed that different types of NSAIDs and administration methods did not significantly increase the risk of postoperative bleeding. In addition, NSAIDs significantly reduced the incidence of PONV [RR = 0.78, 95% CI (0.67-0.92)]. CONCLUSION: In conclusion, this study has not identified any correlation between the application of NSAIDs and an elevated risk of bleeding after tonsillectomy in children. It lends support to the notion that these drugs can serve as an effective alternative for analgesia after tonsillectomy in children, which helps to cut down the usage of opioid drugs and consequently mitigate their associated adverse effects. However, more high-quality RCTs are necessary to further confirm these results and refine postoperative management guidelines.

  • Driving pressure-guided dynamic PEEP titration reduces atelectasis and improves oxygenation in pediatric laparoscopy: a randomized trial on personalized ventilation strategies

    BMC Anesthesiology · 2025-08-20 · 1 citations

    articleOpen access

    Pediatric laparoscopic surgery often induces atelectasis due to pneumoperitoneum, postural changes, and immature respiratory physiology, increasing postoperative pulmonary complications (PPCs). Fixed PEEP may fail to address perioperative variability. This study evaluated whether dynamic PEEP adjustment reduces atelectasis and improves oxygenation. Children at moderate or high risk of PPCs undergoing elective laparoscopic surgery were randomized into two groups. Group A had driving pressure-guided individualized PEEP titration at three specified time points: after intubation, before pneumoperitoneum initiation, and after pneumoperitoneum completion. Group B had individualized PEEP titration only after intubation, with this PEEP maintained until the end of ventilation. Both groups received alveolar recruitment maneuvers (ARMs). Observations were conducted at 5 min after tracheal intubation (T1), 20 min post-pneumoperitoneum (T2), 60 min post-pneumoperitoneum (T3), at the end of surgery (T4), and at extubation (T5). The primary outcome were intraoperative lung ultrasound score. Secondary outcomes included incidence of atelectasis, oxygenation index, peak airway pressure, plateau pressure, PEEP, driving pressure, dynamic lung compliance, mean arterial pressure, and heart rate. At T4 and T5, Group A showed significantly lower subpleural consolidation scores, total lung ultrasound scores, and atelectasis rates versus Group B (P < 0.05). Oxygenation indices in Group A were higher at T3–T5 (P < 0.05). Post-pneumoperitoneum, Group A’s median PEEP increased to 8 cmH2O (vs. Group B), with lower driving pressure and higher dynamic compliance (P < 0.05). Hemodynamic parameters showed no intergroup differences (P > 0.05). Driving pressure-guided dynamic PEEP titration reduces postoperative lung ultrasound abnormalities and atelectasis while improving oxygenation and respiratory mechanics in pediatric laparoscopy, without compromising hemodynamic stability. This strategy supports personalized PEEP optimization. This trial was registered on Clinical Trials.gov (Registration No. ChiCTR2300070193, Registration date: 2023-04-04). The trial was retrospectively registered as enrollment began prior to registration. • Compared to static PEEP strategies, driving pressure-guided dynamic PEEP titration during pediatric laparoscopic surgery significantly reduces intraoperative atelectasis, improves oxygenation, and shortens postoperative extubation time. • Driving pressure-guided dynamic PEEP titration optimizes respiratory mechanics under pneumoperitoneum, offering a feasible non-invasive approach to personalized lung protection in children. • The driving pressure-guided dynamic PEEP titration demonstrated robust safety by maintaining hemodynamic stability while enhancing pulmonary compliance and minimizing ventilator-induced lung injury.

Recent grants

Frequent coauthors

  • Martin Vetterli

    École Polytechnique Fédérale de Lausanne

    28 shared
  • Ameya Agaskar

    17 shared
  • Pier Luigi Dragotti

    Imperial College London

    14 shared
  • N. Minh

    VinUniversity

    13 shared
  • Oussama Dhifallah

    Microsoft (United States)

    12 shared
  • Chenhui Hu

    Zhejiang University of Technology

    10 shared
  • Lenka Zdeborová

    École Polytechnique Fédérale de Lausanne

    10 shared
  • Hong Hu

    University of Pennsylvania

    10 shared

Labs

Education

  • Ph.D., Statistics

    Harvard University

    2010
  • M.S., Statistics

    Harvard University

    2006
  • B.S., Mathematics

    University of Science and Technology of China

    2003
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Yue Lu

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup