Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Sitan Chen

Sitan Chen

· Assistant Professor of Computer ScienceVerified

Harvard University · Computer Science

Active 2012–2026

h-index13
Citations1.1k
Papers8573 last 5y
Funding$150k
See your match with Sitan Chen — sign in to PhdFit.Sign in

About

Sitan Chen is an Assistant Professor of Computer Science at Harvard John A. Paulson School of Engineering and Applied Sciences. His research areas include applied mathematics, data science, machine learning, theory of computation, artificial intelligence, applied physics, quantum engineering, and computer science. He is involved in advancing knowledge in these fields through his academic work and teaching at Harvard, contributing to the development of computational and data science, as well as machine learning and AI.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Physics
  • Quantum mechanics
  • Algorithm
  • Theoretical computer science

Selected publications

  • When audio-visual deep learning meets TCM facial inspection: A novel depression detection method for Chinese population

    Expert Systems with Applications · 2026-05-07

    article
  • Quantum Probe Tomography

    ArXiv.org · 2025-10-09

    preprintOpen access1st authorCorresponding

    Characterizing quantum many-body systems is a fundamental problem across physics, chemistry, and materials science. While significant progress has been made, many existing Hamiltonian learning protocols demand digital quantum control over the entire system, creating a disconnect from many real-world settings that provide access only through small, local probes. Motivated by this, we introduce and formalize the problem of quantum probe tomography, where one seeks to learn the parameters of a many-body Hamiltonian using a single local probe access to a small subsystem of a many-body thermal state undergoing time evolution. We address the identifiability problem of determining which Hamiltonians can be distinguished from probe data through a new combination of tools from algebraic geometry and smoothed analysis. Using this approach, we prove that generic Hamiltonians in various physically natural families are identifiable up to simple, unavoidable structural symmetries. Building on these insights, we design the first efficient end-to-end algorithm for probe tomography that learns Hamiltonian parameters to accuracy $\varepsilon$, with query complexity scaling polynomially in $1/\varepsilon$ and classical post-processing time scaling polylogarithmically in $1/\varepsilon$. In particular, we demonstrate that translation- and rotation-invariant nearest-neighbor Hamiltonians on square lattices in one, two, and three dimensions can be efficiently reconstructed from single-site probes of the Gibbs state, up to inversion symmetry about the probed site. Our results demonstrate that robust Hamiltonian learning remains achievable even under severely constrained experimental access.

  • ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems

    ArXiv.org · 2025-06-12

    preprintOpen accessSenior author

    There has been a flurry of activity around using pretrained diffusion models as informed data priors for solving inverse problems, and more generally around steering these models using reward models. Training-free methods like diffusion posterior sampling (DPS) and its many variants have offered flexible heuristic algorithms for these tasks, but when the reward is not informative enough, e.g., in hard inverse problems with low signal-to-noise ratio, these techniques veer off the data manifold, failing to produce realistic outputs. In this work, we devise a simple wrapper, ReGuidance, for boosting both the sample realism and reward achieved by these methods. Given a candidate solution $\hat{x}$ produced by an algorithm of the user's choice, we propose inverting the solution by running the unconditional probability flow ODE in reverse starting from $\hat{x}$, and then using the resulting latent as an initialization for DPS. We evaluate our wrapper on hard inverse problems like large box in-painting and super-resolution with high upscaling. Whereas state-of-the-art baselines visibly fail, we find that applying our wrapper on top of these baselines significantly boosts sample quality and measurement consistency. We complement these findings with theory proving that on certain multimodal data distributions, ReGuidance simultaneously boosts the reward and brings the candidate solution closer to the data manifold. To our knowledge, this constitutes the first rigorous algorithmic guarantee for DPS.

  • Information-Computation Gaps in Quantum Learning via Low-Degree Likelihood

    ArXiv.org · 2025-05-28

    preprintOpen access1st authorCorresponding

    In a variety of physically relevant settings for learning from quantum data, designing protocols that can computationally efficiently extract information remains largely an art, and there are important cases where we believe this to be impossible, that is, where there is an information-computation gap. While there is a large array of tools in the classical literature for giving evidence for average-case hardness of statistical inference problems, the corresponding tools in the quantum literature are far more limited. One such framework in the classical literature, the low-degree method, makes predictions about hardness of inference problems based on the failure of estimators given by low-degree polynomials. In this work, we extend this framework to the quantum setting. We establish a general connection between state designs and low-degree hardness. We use this to obtain the first information-computation gaps for learning Gibbs states of random, sparse, non-local Hamiltonians. We also use it to prove hardness for learning random shallow quantum circuit states in a challenging model where states can be measured in adaptively chosen bases. To our knowledge, the ability to model adaptivity within the low-degree framework was open even in classical settings. In addition, we also obtain a low-degree hardness result for quantum error mitigation against strategies with single-qubit measurements. We define a new quantum generalization of the planted biclique problem and identify the threshold at which this problem becomes computationally hard for protocols that perform local measurements. Interestingly, the complexity landscape for this problem shifts when going from local measurements to more entangled single-copy measurements. We show average-case hardness for the "standard" variant of Learning Stabilizers with Noise and for agnostically learning product states.

  • Catching the Blackdog Easily: A Convenient Depression Diagnosis Method Based on Audio-Visual Deep Learning

    IEEE Transactions on Affective Computing · 2025-05-20

    article

    Depression has currently become a serious social problem worldwide. However, the need for experienced doctors and tedious medical examinations greatly increases the inconvenience in diagnosing the depression. A convenient depression diagnosis method can significantly improve the medical experience of depression patients, and can greatly reduce the workload of doctors. In this paper, a Convenient Depression Diagnosis method based on Audio-Visual Deep Learning (CDD-AVDL) is proposed. CDD-AVDL exploits the videos of testers reading a specially-designed text, and note that the videos contain many subconscious human reactions (e.g., micro expressions, voice variations), which are difficultly affected by the artificial interventions, thus enabling the depression diagnosis results more accurate. In CDD-AVDL, the source features are first extracted from audios and visuals, and then the time-sequential features are extracted. Finally, a full connection layer and a convolution layer fusion are responsible for fusing the audio-visual features to yield the depression probabilities. Extensive experiments and clinical tests show that CDD-AVDL outperforms the state-of-the-arts in terms of the accuracy of depression diagnosis. Moreover, the data collection manner in CDD-AVDL is convenient, and the training cost of CDD-AVDL is very low.

  • A Provably Efficient Method for Tensor Ring Decomposition and Its Applications

    ArXiv.org · 2025-11-30

    preprintOpen access

    We present the first deterministic, finite-step algorithm for exact tensor ring (TR) decomposition, addressing an open question about the existence of such procedures. Our method leverages blockwise simultaneous diagonalization to recover TR-cores from a limited number of tensor observations, providing both algebraic insight and practical efficiency. We extend the approach to the symmetric TR setting, where parameter complexity is significantly reduced and applications arise naturally in physics-based modeling and exchangeable data analysis. To handle noisy observations, we develop a robust recovery scheme that couples our initialization with alternating least squares, achieving faster convergence and improved accuracy compared to classic methods. As applications, we obtain new algorithms for questions in other domains where tensor ring decomposition is a key primitive, namely matrix product state tomography in quantum information, and provable learning of pushforward distributions in the foundations of machine learning. These contributions advance the algorithmic foundations of TR decomposition and open new opportunities for scalable tensor network computation.

  • Efficient Pauli Channel Estimation with Logarithmic Quantum Memory

    PRX Quantum · 2025-05-02 · 4 citations

    articleOpen access1st authorCorresponding

    In this work, we consider one of the prototypical tasks for characterizing the structure of noise in quantum devices: estimating eigenvalues of an <a:math xmlns:a="http://www.w3.org/1998/Math/MathML" display="inline"><a:mi>n</a:mi></a:math>-qubit Pauli noise channel. Prior work [Chen , Phys. Rev. A 105, 032435 (2022)] has proved no-go theorems for this task in the practical regime in which one has a limited amount of quantum memory; i.e., any protocol with <c:math xmlns:c="http://www.w3.org/1998/Math/MathML" display="inline"><c:mo>≤</c:mo><c:mn>0.99</c:mn><c:mi>n</c:mi></c:math> ancilla qubits of quantum memory must make exponentially many measurements, provided that it is . Such protocols can only interact with the channel by repeatedly preparing a state, passing it through the channel, and measuring immediately afterward. Surprisingly, in this work we show that protocols with an extremely small amount of quantum memory achieve an exponential advantage for this task. We give a protocol that can estimate any prescribed set of eigenvalues <e:math xmlns:e="http://www.w3.org/1998/Math/MathML" display="inline"><e:mi>A</e:mi></e:math> of a Pauli channel to error <g:math xmlns:g="http://www.w3.org/1998/Math/MathML" display="inline"><g:mi>ε</g:mi></g:math> using only <i:math xmlns:i="http://www.w3.org/1998/Math/MathML" display="inline"><i:mi>O</i:mi><i:mo stretchy="false">(</i:mo><i:mi>log</i:mi><i:mo></i:mo><i:mi>log</i:mi><i:mo></i:mo><i:mo stretchy="false">(</i:mo><i:mrow><i:mo stretchy="false">|</i:mo></i:mrow><i:mi>A</i:mi><i:mrow><i:mo stretchy="false">|</i:mo></i:mrow><i:mo stretchy="false">)</i:mo><i:mo>/</i:mo><i:msup><i:mi>ε</i:mi><i:mn>2</i:mn></i:msup><i:mo stretchy="false">)</i:mo></i:math> ancilla qubits, <q:math xmlns:q="http://www.w3.org/1998/Math/MathML" display="inline"><q:mrow><q:mrow><q:mrow><q:mover><q:mrow><q:mi>O</q:mi><q:mspace width="0.167em"/></q:mrow><q:mo stretchy="false">~</q:mo></q:mover></q:mrow><q:mspace width="-0.167em"/></q:mrow><q:mrow/></q:mrow><q:mo stretchy="false">(</q:mo><q:msup><q:mi>n</q:mi><q:mn>2</q:mn></q:msup><q:mo>/</q:mo><q:msup><q:mi>ε</q:mi><q:mn>2</q:mn></q:msup><q:mo stretchy="false">)</q:mo></q:math> measurements, and <x:math xmlns:x="http://www.w3.org/1998/Math/MathML" display="inline"><x:mi>O</x:mi><x:mo stretchy="false">(</x:mo><x:msup><x:mi>n</x:mi><x:mn>2</x:mn></x:msup><x:mrow><x:mo stretchy="false">|</x:mo></x:mrow><x:mi>A</x:mi><x:mrow><x:mo stretchy="false">|</x:mo></x:mrow><x:mo>/</x:mo><x:msup><x:mi>ε</x:mi><x:mn>2</x:mn></x:msup><x:mo stretchy="false">)</x:mo></x:math> queries to the Pauli channel. In contrast, we show that any protocol with zero ancilla—even a concatenating one—must make <db:math xmlns:db="http://www.w3.org/1998/Math/MathML" display="inline"><db:mi mathvariant="normal">Ω</db:mi><db:mo stretchy="false">(</db:mo><db:msup><db:mn>2</db:mn><db:mi>n</db:mi></db:msup><db:mo>/</db:mo><db:msup><db:mi>ε</db:mi><db:mn>2</db:mn></db:msup><db:mo stretchy="false">)</db:mo></db:math> measurements. We also prove that the number of queries cannot be improved significantly: with <ib:math xmlns:ib="http://www.w3.org/1998/Math/MathML" display="inline"><ib:mi>k</ib:mi></ib:math> ancilla qubits, <kb:math xmlns:kb="http://www.w3.org/1998/Math/MathML" display="inline"><kb:mi mathvariant="normal">Ω</kb:mi><kb:mo stretchy="false">(</kb:mo><kb:msup><kb:mn>2</kb:mn><kb:mrow><kb:mo stretchy="false">(</kb:mo><kb:mi>n</kb:mi><kb:mo>−</kb:mo><kb:mi>k</kb:mi><kb:mo stretchy="false">)</kb:mo><kb:mo>/</kb:mo><kb:mn>3</kb:mn></kb:mrow></kb:msup><kb:mo stretchy="false">)</kb:mo></kb:math> queries to the channel are necessary to learn all eigenvalues, even for concatenating protocols.

  • Unlocking Mobile Phones by Rolling Wrists: A Novel Motion-Based Biometric Recognition Method

    IEEE Transactions on Information Forensics and Security · 2025-01-01

    article

    With the rapid development of Internet of Things and the increasingly popularized smart devices, the biometric recognition becomes a crucial component of facilitating some basic activities of daily living, and the biometric recognition has the outstanding advantages in terms of reliability and convenience. Various biometric characteristics have been applied to realize the biometric recognition for basic human activities, and new biometric characteristics are worth exploring to further enhance the convenience of our daily lives. This paper explores a new biometric characteristic (the wrist-rolling motion). By taking the mobile phone unlocking as the typical application of the biometric recognition, we verify that the wrist-rolling motion can become an available biometric characteristic with the aid of our designed deep learning model termed RTimesNet. Specifically, RTimesNet is composed of TimesBlocks, decomposition modules, and a multi-head ProbSparse self-attention module, and it exploits the periodicity of wrist-rolling motion to extract the time series features. TimesBlocks extract the features hidden in the wrist-rolling motion, and the decomposition modules decompose the output data of TimesBlocks into the trend-cyclical data and seasonal data, which are then evenly divided and inputted into the multi-head ProbSparse self-attention module for concatenation. In addition, a federated learning manner is adopted for the motion-based biometric recognition, thus avoiding the exchange of local data and protecting the privacy of users. Extensive experiments have been conducted, and the results demonstrate that the wrist-rolling motion can become an available biometric characteristic. Compared with other biometric recognition methods, our proposed method shows a faster unlocking speed and requires less data storage with a satisfactory biometric recognition accuracy.

  • Blink of an eye: a simple theory for feature localization in generative models

    ArXiv.org · 2025-02-02

    preprintOpen accessSenior author

    Large language models can exhibit unexpected behavior in the blink of an eye. In a recent computer use demo, a language model switched from coding to Googling pictures of Yellowstone, and these sudden shifts in behavior have also been observed in reasoning patterns and jailbreaks. This phenomenon is not unique to autoregressive models: in diffusion models, key features of the final output are decided in narrow ``critical windows'' of the generation process. In this work we develop a simple, unifying theory to explain this phenomenon using the formalism of stochastic localization samplers. We show that it emerges generically as the generation process localizes to a sub-population of the distribution it models. While critical windows have been studied at length in diffusion models, existing theory heavily relies on strong distributional assumptions and the particulars of Gaussian diffusion. In contrast to existing work our theory (1) applies to autoregressive and diffusion models; (2) makes no distributional assumptions; (3) quantitatively improves previous bounds even when specialized to diffusions; and (4) requires basic tools and no stochastic calculus or statistical-physics-based machinery. We also identify an intriguing connection to the all-or-nothing phenomenon from statistical inference. Finally, we validate our predictions empirically for LLMs and find that critical windows often coincide with failures in problem solving for various math and reasoning benchmarks.

  • Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions

    ArXiv.org · 2025-02-10

    preprintOpen accessSenior author

    In recent years, masked diffusion models (MDMs) have emerged as a promising alternative approach for generative modeling over discrete domains. Compared to autoregressive models (ARMs), MDMs trade off complexity at training time with flexibility at inference time. At training time, they must learn to solve an exponentially large number of infilling problems, but at inference time, they can decode tokens in essentially arbitrary order. In this work, we closely examine these two competing effects. On the training front, we theoretically and empirically demonstrate that MDMs indeed train on computationally intractable subproblems compared to their autoregressive counterparts. On the inference front, we show that a suitable strategy for adaptively choosing the token decoding order significantly enhances the capabilities of MDMs, allowing them to sidestep hard subproblems. On logic puzzles like Sudoku, we show that adaptive inference can boost solving accuracy in pretrained MDMs from $&lt;7$% to $\approx 90$%, even outperforming ARMs with $7\times$ as many parameters and that were explicitly trained via teacher forcing to learn the right order of decoding.

Recent grants

Frequent coauthors

Labs

  • Sitan Chen's LabPI

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Sitan Chen

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup