Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
David Hogg

David Hogg

· Professor of PhysicsVerified

New York University · Chemistry

Active 1964–2026

h-index126
Citations109.2k
Papers897214 last 5y
Funding$1.7M
See your match with David Hogg — sign in to PhdFit.Sign in

About

David W. Hogg is a Professor of Physics and Data Science in the Center for Cosmology and Particle Physics within the Department of Physics at New York University. He also serves as a Senior Research Scientist in the Astronomical Data Group at the Center for Computational Astrophysics of the Flatiron Institute and holds an affiliation with the Max-Planck-Institut für Astronomie. His main research interests are in observational cosmology, with a focus on approaches that utilize galaxies to infer the physical properties of the Universe. Additionally, he works on the properties and kinematics of stars in the Galaxy, as well as the measurement and discovery of planets around other stars. Throughout his work, he emphasizes the development of engineering systems that enable these scientific projects.

Research topics

  • Computer Science
  • Astrophysics
  • Astronomy
  • Physics
  • Geology
  • Remote sensing
  • Meteorology
  • Geography
  • Operating system
  • Chemistry
  • Database

Selected publications

  • Principled Point-source Detection in Collections of Astronomical Images

    The Astronomical Journal · 2026-03-17 · 1 citations

    articleOpen accessSenior author

    Abstract There is almost no data analysis operation more important to astronomy than the detection of sources (stars or galaxies, say) in imaging. Here we write down a set of reasonable assumptions for well-understood (or well-calibrated) background-dominated imaging (faint sources) and find the detection methods that flow from those assumptions. Our methods are hypothesis comparisons, involving matched filters. We show that they are generally preferable to one-hypothesis ( p -value or n -sigma-deviation) methods, especially for avoiding spurious detection of nonstar image features. We consider the case in which there are multiple images at each point on the sky, and—more importantly for our purposes—when those images are taken through different bandpasses. Detection in multiband imaging involves making choices about the range of colors or spectral energy distributions to which the method will be most sensitive; we deliver methods based on Bayesian decision theory and also frequentist methods that deliver similar outcomes in real-data tests. We discuss relationships between these methods and standard practices. The methods we present perform well, but our main point is that methods should be principled—that is, they should flow from our fundamental assumptions about the data.

  • The Milky Way’s Circular Velocity Curve Measured Using Element Abundance Gradients

    The Astrophysical Journal · 2026-03-10

    articleOpen access

    Abstract Spectroscopic surveys now supply precise stellar label measurements such as element abundances for large samples of stars throughout the Milky Way. These element abundances are known to correlate with orbital actions or other dynamical invariants. We present a new data-driven method for empirically measuring the circular velocity curve of the Galaxy that uses element abundance gradients in the plane of radial kinematics. We use stellar surface abundances from the APOGEE survey combined with kinematic data from the Gaia mission. Our results confirm the ordered structure of the Milky Way’s disk in terms of the average [Fe/H] and [Mg/Fe] abundance ratios, and suggest that 〈[Fe/H]〉 traces the radial positions of stars in the disk, while 〈[Mg/Fe]〉 traces the orbital excursions around this radius. Our method uses the radial orbit structure in the Galaxy to enable an empirical measurement of the circular velocity curve, epicyclic and azimuthal frequencies, and kinematic gradients across the Milky Way’s disk. From these measurements, we infer a value of the circular velocity curve at the solar radius of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:msub> <mml:mrow> <mml:mi>v</mml:mi> </mml:mrow> <mml:mrow> <mml:mi>c</mml:mi> <mml:mo>,</mml:mo> <mml:mo>⊙</mml:mo> </mml:mrow> </mml:msub> <mml:mo>=</mml:mo> <mml:mn>235</mml:mn> <mml:mo>.</mml:mo> <mml:msubsup> <mml:mrow> <mml:mn>3</mml:mn> </mml:mrow> <mml:mrow> <mml:mo>−</mml:mo> <mml:mn>3.7</mml:mn> </mml:mrow> <mml:mrow> <mml:mo>+</mml:mo> <mml:mn>2.8</mml:mn> </mml:mrow> </mml:msubsup> </mml:math> km s −1 using the most constraining abundance ratio, [Mg/Fe]. We also measure the radial and azimuthal frequencies for a circular orbit at the solar radius, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:msub> <mml:mrow> <mml:mi>κ</mml:mi> </mml:mrow> <mml:mrow> <mml:mn>0</mml:mn> <mml:mo>,</mml:mo> <mml:msub> <mml:mrow> <mml:mi>R</mml:mi> </mml:mrow> <mml:mrow> <mml:mo>⊙</mml:mo> </mml:mrow> </mml:msub> </mml:mrow> </mml:msub> <mml:mo>=</mml:mo> <mml:mn>36</mml:mn> <mml:mo>.</mml:mo> <mml:msubsup> <mml:mrow> <mml:mn>9</mml:mn> </mml:mrow> <mml:mrow> <mml:mo>−</mml:mo> <mml:mn>1.0</mml:mn> </mml:mrow> <mml:mrow> <mml:mo>+</mml:mo> <mml:mn>0.8</mml:mn> </mml:mrow> </mml:msubsup> </mml:math> km s −1 kpc −1 and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:msub> <mml:mrow> <mml:mi mathvariant="normal">Ω</mml:mi> </mml:mrow> <mml:mrow> <mml:mn>0</mml:mn> <mml:mo>,</mml:mo> <mml:msub> <mml:mrow> <mml:mi>R</mml:mi> </mml:mrow> <mml:mrow> <mml:mo>⊙</mml:mo> </mml:mrow> </mml:msub> </mml:mrow> </mml:msub> <mml:mo>=</mml:mo> <mml:mn>28</mml:mn> <mml:mo>.</mml:mo> <mml:msubsup> <mml:mrow> <mml:mn>5</mml:mn> </mml:mrow> <mml:mrow> <mml:mo>−</mml:mo> <mml:mn>0.1</mml:mn> </mml:mrow> <mml:mrow> <mml:mo>+</mml:mo> <mml:mn>0.4</mml:mn> </mml:mrow> </mml:msubsup> </mml:math> km s −1 kpc −1 , respectively. These values lead to estimates of the Oort constants of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mi>A</mml:mi> <mml:mo>=</mml:mo> <mml:mn>16</mml:mn> <mml:mo>.</mml:mo> <mml:msubsup> <mml:mrow> <mml:mn>5</mml:mn> </mml:mrow> <mml:mrow> <mml:mo>−</mml:mo> <mml:mn>0.1</mml:mn> </mml:mrow> <mml:mrow> <mml:mo>+</mml:mo> <mml:mn>0.1</mml:mn> </mml:mrow> </mml:msubsup> </mml:math> km s −1 kpc −1 and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mi>B</mml:mi> <mml:mo>=</mml:mo> <mml:mo>−</mml:mo> <mml:mn>11</mml:mn> <mml:mo>.</mml:mo> <mml:msubsup> <mml:mrow> <mml:mn>9</mml:mn> </mml:mrow> <mml:mrow> <mml:mo>−</mml:mo> <mml:mn>0.3</mml:mn> </mml:mrow> <mml:mrow> <mml:mo>+</mml:mo> <mml:mn>0.1</mml:mn> </mml:mrow> </mml:msubsup> </mml:math> km s −1 kpc −1 . We measure the radial acceleration at the solar radius to be <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:msub> <mml:mrow> <mml:mfenced close=")" open="(" separators=""> <mml:mfrac> <mml:mrow> <mml:mo>∂</mml:mo> <mml:mi mathvariant="normal">Φ</mml:mi> </mml:mrow> <mml:mrow> <mml:mo>∂</mml:mo> <mml:mi>R</mml:mi> </mml:mrow>

  • On the Effects of Parameters on Galaxy Properties in CAMELS and the Predictability of Ω<sub>m</sub>

    The Astrophysical Journal · 2025-07-14 · 1 citations

    articleOpen accessCorresponding

    Abstract Recent analyses of cosmological hydrodynamic simulations from CAMELS have shown that machine learning models can predict the parameter describing the total matter content of the universe, Ω m , from the features of a single galaxy. We investigate the statistical properties of two of these simulation suites, IllustrisTNG and ASTRID , confirming that Ω m induces a strong displacement on the distribution of galaxy features. We also observe that most other parameters have little to no effect on the distribution, except for the stellar-feedback parameter A SN1 , which introduces some near-degeneracies that can be broken with specific features. These two properties explain the predictability of Ω m . We use optimal transport to further measure the effect of parameters on the distribution of galaxy properties, which is found to be consistent with physical expectations. However, we observe discrepancies between the two simulation suites, both in the effect of Ω m on the galaxy properties and in the distributions themselves at identical parameter values. Thus, although Ω m ’s signature can be easily detected within a given simulation suite using just a single galaxy, applying this result to real observational data may prove significantly more challenging.

  • Identification of 30,000 White Dwarf–Main-sequence Binary Candidates from Gaia DR3 BP/RP (XP) Low-resolution Spectra

    The Astrophysical Journal Supplement Series · 2025-07-31 · 5 citations

    articleOpen access

    Abstract White dwarf–main-sequence (WD–MS) binary systems are essential probes for understanding binary stellar evolution and play a pivotal role in constraining theoretical models of various transient phenomena. In this study, we construct a catalog of WD–MS binaries using Gaia DR3’s low-resolution BP/RP (XP) spectra. Our approach integrates a model-independent neural network for spectral modeling with Gaussian process classification to accurately identify WD–MS binaries among over 10 million stars within 1 kpc. This study identifies approximately 30,000 WD–MS binary candidates, including 1700 high-confidence systems confirmed through spectral fitting. Our technique is shown to be effective at detecting systems where the MS star dominates the spectrum—cases that have historically challenged conventional methods. Validation using Galaxy Evolution Explorer photometry reinforces the reliability of our classifications: 70% of candidates with an absolute magnitude M G &gt; 7 exhibit UV excess, a characteristic signature of WD companions. Our all-sky catalog of WD–MS binaries expands the available data set for studying binary evolution and WD physics and sheds light on the formation of WD–MS binaries.

  • Lux: A Generative, Multioutput, Latent-variable Model for Astronomical Data with Noisy Labels

    The Astronomical Journal · 2025-05-19 · 2 citations

    articleOpen access

    Abstract The large volume of spectroscopic data available now and from near-future surveys will enable high-dimensional measurements of stellar parameters and properties. Current methods for determining stellar labels from spectra use physics-driven models, which are computationally expensive and have limitations in their accuracy due to simplifications. While machine learning methods provide efficient paths toward emulating physics-based pipelines, they often do not properly account for uncertainties and have complex model structure, both of which can lead to biases and inaccurate label inference. Here we present Lux: a data-driven framework for modeling stellar spectra and labels that addresses prior limitations. Lux is a generative, multioutput, latent-variable model framework built on JAX for computational efficiency and flexibility. As a generative model, Lux properly accounts for uncertainties and missing data in the input stellar labels and spectral data and can either be used in probabilistic or discriminative settings. Here, we present several examples of how Lux can successfully emulate methods for precise stellar label determinations for stars ranging in stellar type and signal-to-noise ratio from the APOGEE survey. We also show how a simple Lux model is successful at performing label transfer between the APOGEE and GALAH surveys. Lux is a powerful new framework for the analysis of large-scale spectroscopic survey data. Its ability to handle uncertainties while maintaining high precision makes it particularly valuable for stellar survey label inference and cross-survey analysis, and the flexible model structure allows for easy extension to other data types.

  • Group Averaging for Physics Applications: Accuracy Improvements at Zero Training Cost

    ArXiv.org · 2025-11-11

    preprintOpen access

    Many machine learning tasks in the natural sciences are precisely equivariant to particular symmetries. Nonetheless, equivariant methods are often not employed, perhaps because training is perceived to be challenging, or the symmetry is expected to be learned, or equivariant implementations are seen as hard to build. Group averaging is an available technique for these situations. It happens at test time; it can make any trained model precisely equivariant at a (often small) cost proportional to the size of the group; it places no requirements on model structure or training. It is known that, under mild conditions, the group-averaged model will have a provably better prediction accuracy than the original model. Here we show that an inexpensive group averaging can improve accuracy in practice. We take well-established benchmark machine learning models of differential equations in which certain symmetries ought to be obeyed. At evaluation time, we average the models over a small group of symmetries. Our experiments show that this procedure always decreases the average evaluation loss, with improvements of up to 37\% in terms of the VRMSE. The averaging produces visually better predictions for continuous dynamics. This short paper shows that, under certain common circumstances, there are no disadvantages to imposing exact symmetries; the ML4PS community should consider group averaging as a cheap and simple way to improve model accuracy.

  • Optical Spectroscopy Reveals Hidden Neutron-capture Elemental Abundance Differences among APOGEE-identified Chemical Doppelgängers*

    The Astrophysical Journal · 2025-10-23 · 3 citations

    articleOpen access

    Abstract Grouping stars by chemical similarity has the potential to reveal the Milky Way’s evolutionary history. The APOGEE stellar spectroscopic survey has the resolution and sensitivity for this task. However, APOGEE lacks access to strong lines of neutron-capture elements ( Z &gt; 28), which have nucleosynthetic origins that are distinct from those of the lighter elements. We assess whether APOGEE abundances are sufficient for selecting chemically similar disk stars by identifying 25 pairs of chemical “doppelgängers” in APOGEE DR17 and following them up with the Tull spectrograph, an optical, R ∼ 60,000 echelle on the McDonald Observatory 2.7 m telescope. Line-by-line differential analyses of pairs’ optical spectra reveal neutron-capture (Y, Zr, Ba, La, Ce, Nd, and Eu) elemental abundance differences of Δ[X/Fe] ∼ 0.020 ± 0.015 to 0.380 ± 0.15 dex (4%–140%), and up to 0.05 dex (12%) on average, a factor of 1–2 times higher than intracluster pairs. This is despite the pairs sharing nearly identical APOGEE-reported abundances and [C/N] ratios, a tracer of giant-star age. This work illustrates that even when APOGEE abundances derived from spectra with a signal-to-noise ratio &gt; 300 are available, optically measured neutron-capture element abundances contain critical information about composition similarity. These results hold implications for the chemical dimensionality of the disk, mixing within the interstellar medium, and chemical tagging with the neutron-capture elements.

  • Signal-preserving CMB component separation with machine learning

    Physical review. D/Physical review. D. · 2025-03-28 · 5 citations

    articleOpen accessSenior author

    Analysis of microwave sky signals, such as the cosmic microwave background, often requires component separation using multifrequency methods, whereby different signals are isolated according to their different frequency behaviors. Many so-called blind methods, such as the internal linear combination (ILC), make minimal assumptions about the spatial distribution of the signal or contaminants, and only assume knowledge of the frequency dependence of the signal. The ILC produces a minimum-variance linear combination of the measured frequency maps. In the case of Gaussian, statistically isotropic fields, this is the optimal linear combination, as the variance is the only statistic of interest. However, in many cases the signal we wish to isolate, or the foregrounds we wish to remove, are non-Gaussian and/or statistically anisotropic (in particular for the case of Galactic foregrounds). In such cases, it is possible that machine learning (ML) techniques can be used to exploit the non-Gaussian features of the foregrounds and thereby improve component separation. However, many ML techniques require the use of complex, difficult-to-interpret operations on the data. We propose a hybrid method whereby we train an ML model using only combinations of the data that , and combine the resulting ML-predicted foreground estimate with the ILC solution to reduce the error from the ILC. We demonstrate our methods on simulations of extragalactic temperature and Galactic polarization foregrounds and show that our ML model can exploit non-Gaussian features, such as point sources and spatially varying spectral indices, to produce lower-variance maps than ILC—e.g., reducing the variance of the B-mode residual by factors of up to 5—while preserving the signal of interest in an unbiased manner. Moreover, we often find improved performance even when applying our ML technique to foreground models on which it was not trained.

  • Equivariant geometric convolutions for dynamical systems on vector and tensor images

    Philosophical Transactions of the Royal Society A Mathematical Physical and Engineering Sciences · 2025-06-05 · 1 citations

    articleOpen access

    Machine learning methods are increasingly being employed as surrogate models in place of computationally expensive and slow numerical integrators for a bevy of applications in the natural sciences. However, while the laws of physics are relationships between scalars, vectors and tensors that hold regardless of the frame of reference or chosen coordinate system, surrogate machine learning models are not coordinate-free by default. We enforce coordinate freedom by using geometric convolutions in three model architectures: a ResNet, a Dilated ResNet and a UNet. In numerical experiments emulating two-dimensional compressible Navier-Stokes, we see better accuracy and improved stability compared with baseline surrogate models in almost all cases. The ease of enforcing coordinate freedom without making major changes to the model architecture provides an exciting recipe for any convolutional neural network-based method applied to an appropriate class of problems.This article is part of the theme issue 'Partial differential equations in data science'.

  • $Lux$: A generative, multi-output, latent-variable model for astronomical data with noisy labels

    ArXiv.org · 2025-02-03

    preprintOpen access

    The large volume of spectroscopic data available now and from near-future surveys will enable high-dimensional measurements of stellar parameters and properties. Current methods for determining stellar labels from spectra use physics-driven models, which are computationally expensive and have limitations in their accuracy due to simplifications. While machine learning methods provide efficient paths toward emulating physics-based pipelines, they often do not properly account for uncertainties and have complex model structure, both of which can lead to biases and inaccurate label inference. Here we present $Lux$: a data-driven framework for modeling stellar spectra and labels that addresses prior limitations. $Lux$ is a generative, multi-output, latent variable model framework built on JAX for computational efficiency and flexibility. As a generative model, $Lux$ properly accounts for uncertainties and missing data in the input stellar labels and spectral data and can either be used in probabilistic or discriminative settings. Here, we present several examples of how $Lux$ can successfully emulate methods for precise stellar label determinations for stars ranging in stellar type and signal-to-noise from the $APOGEE$ surveys. We also show how a simple $Lux$ model is successful at performing label transfer between the $APOGEE$ and $GALAH$ surveys. $Lux$ is a powerful new framework for the analysis of large-scale spectroscopic survey data. Its ability to handle uncertainties while maintaining high precision makes it particularly valuable for stellar survey label inference and cross-survey analysis, and the flexible model structure allows for easy extension to other data types.

Recent grants

Frequent coauthors

  • Hans‐Walter Rix

    Max Planck Institute for Astronomy

    172 shared
  • Michael R. Blanton

    New York University

    166 shared
  • Daniel Foreman-Mackey

    165 shared
  • Adrian M. Price-Whelan

    151 shared
  • Melissa Ness

    Columbia University

    146 shared
  • David J. Schlegel

    122 shared
  • Daniel J. Eisenstein

    Center for Astrophysics Harvard & Smithsonian

    102 shared
  • Donald P. Schneider

    99 shared

Labs

Education

  • PhD, Physics

    California Institute of Technology

    1998
  • SB, Physics

    Massachusetts Institute of Technology

    1992
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with David Hogg

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup