Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Luis Carvalho

Luis Carvalho

· Associate ProfessorVerified

Boston University · Mathematics

Active 2006–2025

h-index13
Citations1.3k
Papers6122 last 5y
Funding$64k
See your match with Luis Carvalho — sign in to PhdFit.Sign in

About

Luis Carvalho is an Associate Professor and a member of the Probability and Statistics research group at Boston University. His academic role involves teaching and research within the Department of Mathematics & Statistics. For more information about his work and background, please refer to his personal webpage.

Research topics

  • Computer Science
  • Data Mining
  • Artificial Intelligence
  • Geography
  • Medicine
  • Computational biology
  • Biology
  • Economics
  • Mathematics
  • Meteorology
  • Environmental health
  • Environmental engineering
  • Mathematical optimization
  • Genetics
  • Environmental science
  • Ecology
  • Atmospheric sciences

Selected publications

  • Bernstein Polynomial Processes for Continuous Time Change Detection

    ArXiv.org · 2025-04-24

    preprintOpen accessSenior author

    There is a lack of methodological results for continuous time change detection due to the challenges of noninformative prior specification and efficient posterior inference in this setting. Most methodologies to date assume data are collected according to uniformly spaced time intervals. This assumption incurs bias in the continuous time setting where, a priori, two consecutive observations measured closely in time are less likely to change than two consecutive observations that are far apart in time. Models proposed in this setting have required MCMC sampling which is not ideal. To address these issues, we derive the heterogeneous continuous time Markov chain that models change point transition probabilities noninformatively. By construction, change points under this model can be inferred efficiently using the forward backward algorithm and do not require MCMC sampling. We then develop a novel loss function for the continuous time setting, derive its Bayes estimator, and demonstrate its performance on synthetic data. A case study using time series of remotely sensed observations is then carried out on three change detection applications. To reduce falsely detected changes in this setting, we develop a semiparametric mean function that captures interannual variability due to weather in addition to trend and seasonal components.

  • Understanding the hydrological and landscape connectivity of lakes

    Landscape Ecology · 2025-07-03 · 6 citations

    articleOpen access

    Context: Connectivity is a key property of water, enabling the flow of energy, material and individuals within and between sites. Climate and land use changes can profoundly modify connectivity, yet few studies have quantified the patterns in connectivity among lakes at national scales. Objectives: Our objectives were: i) to examine relationships between a broad range of lake connectivity metrics, ii) to evaluate how lake connectivity varies nationally, regionally and in relation to land cover. Methods: We calculated hundreds of metrics of freshwater connectivity for all lakes in Great Britain > 1 ha (n = 10,095), quantifying connectedness in their catchments and surrounding landscape. Patterns of metrics, as well as their correlations and inter-connectedness, were examined at multiple scales. Results: Strong correlations existed within groups of metrics for lake, pond and river connectivity. However, both pond and river metrics varied independently of lake metrics. The most and least urban river basin districts showed noticeable differences in metric correlation. Lake area, pond count and river length in catchments were selected as a core set of connectivity metrics, which explain most of the variation across national and regional scales. Conclusions: in the zone nearest the lake. When interpreting ecological responses, the connectivity metric within each core group can be selected based on suitability and data availability. The minimum set of three metrics is recommended to support comparative, global studies.

  • Associations between in-home environmental exposures and lung function in a safety net population of children with asthma using electronic health records and geospatial data

    Annals of Epidemiology · 2025-04-06

    articleOpen access
  • Unsupervised Neural Architecture for Sensorimotor Mapping in Perceptually Aliased Environments

    2024-07-29 · 1 citations

    article1st authorCorresponding

    This paper addresses the challenge of autonomous navigation in environments with perceptual aliasing, where observations are not unique; posing difficulties for current Simultaneous Localisation and Mapping (SLAM) systems. The importance of developing cognitive maps inspired by the hippocampal/entorhinal system (H/E-S) for spatial and relational memory tasks for intelligent behaviour and flexible navigation is discussed. The paper introduces the Merge Expand when Required Clone Structured Representation Yielding explainability (MERCURY) network, an unsupervised neural architecture that learns sensorimotor maps in aliased environments through continuous self-organisation. Experimental results demonstrate MERCURY's improved performance in mapping aliased environments compared to other approaches. The paper concludes with a discussion on the limitations and directions for future research to enhance the robustness and applicability of the proposed approach.

  • Expanding N-glycopeptide identifications by modeling fragmentation, elution, and glycome connectivity

    Nature Communications · 2024-07-22 · 17 citations

    articleOpen access

    Accurate glycopeptide identification in mass spectrometry-based glycoproteomics is a challenging problem at scale. Recent innovation has been made in increasing the scope and accuracy of glycopeptide identifications, with more precise uncertainty estimates for each part of the structure. We present a dynamically adapting relative retention time model for detecting and correcting ambiguous glycan assignments that are difficult to detect from fragmentation alone, a layered approach to glycopeptide fragmentation modeling that improves N-glycopeptide identification in samples without compromising identification quality, and a site-specific method to increase the depth of the glycoproteome confidently identifiable even further. We demonstrate our techniques on a set of previously published datasets, showing the performance gains at each stage of optimization. These techniques are provided in the open-source glycomics and glycoproteomics platform GlycReSoft available at https://github.com/mobiusklein/glycresoft .

  • Modeling urban crime occurrences via network regularized regression

    The Annals of Applied Statistics · 2024-10-31 · 1 citations

    articleSenior author

    Analyses of occurrences of residential burglary in urban areas have shown that crime rates are not spatially homogeneous: rates vary across the network of city streets, resulting in some areas being far more susceptible to crime than others. The explanation for why a certain segment of the city experiences high crime may be different than why a neighboring area experiences high crime. Motivated by the importance of understanding spatial patterns such as these, we consider a statistical model of burglary defined on the street network of Boston, Massachusetts. Leveraging ideas from functional data analysis, our proposed solution consists of a generalized linear model with vertex-indexed covariates, allowing for an interpretation of the covariate effects at the street level. We employ a regularization procedure cast as a prior distribution on the regression coefficients under a Bayesian setup so that the predicted responses vary smoothly according to the connectivity of the city. We introduce a novel variable selection procedure, examine computationally efficient methods for sampling from the posterior distribution of the model parameters, and demonstrate the flexibility of our proposed modeling structure. The resulting model and interpretations provide insight into the spatial network patterns and dynamics of residential burglary in Boston.

  • RAMZIS: a bioinformatic toolkit for rigorous assessment of the alterations to glycoprotein composition that occur during biological processes

    Bioinformatics Advances · 2024-01-01

    articleOpen access

    Motivation: Glycosylation elaborates the structures and functions of glycoproteins; glycoproteins are common post-translationally modified proteins and are heterogeneous and non-deterministically synthesized as an evolutionarily driven mechanism that elaborates the functions of glycosylated gene products. Glycoproteins, accounting for approximately half of all proteins, require specialized proteomics data analysis methods due to micro- and macro-heterogeneities as a given glycosite can be divided into several glycosylated forms, each of which must be quantified. Sampling of heterogeneous glycopeptides is limited by mass spectrometer speed and sensitivity, resulting in missing values. In conjunction with the low sample size inherent to glycoproteomics, a specialized toolset is needed to determine if observed changes in glycopeptide abundances are biologically significant or due to data quality limitations. Results: Identifications by Similarity (RAMZIS), that uses similarity metrics to guide researchers to a more rigorous interpretation of glycoproteomics data. RAMZIS uses a permutation test to generate contextual similarity, which assesses the quality of mass spectral data and outputs a graphical demonstration of the likelihood of finding biologically significant differences in glycosylation abundance datasets. Investigators can assess dataset quality, holistically differentiate glycosites, and identify which glycopeptides are responsible for glycosylation pattern change. RAMZIS is validated by theoretical cases and a proof-of-concept application. RAMZIS enables comparison between datasets too stochastic, small, or sparse for interpolation while acknowledging these issues in its assessment. Using this tool, researchers will be able to rigorously define the role of glycosylation and the changes that occur during biological processes. Availability and implementation: https://github.com/WillHackett22/RAMZIS.

  • Computational Approaches for Exponential-Family Factor Analysis

    arXiv (Cornell University) · 2024-03-22 · 1 citations

    preprintOpen accessSenior author

    We study a general factor analysis framework where the $n$-by-$p$ data matrix is assumed to follow a general exponential family distribution entry-wise. While this model framework has been proposed before, we here further relax its distributional assumption by using a quasi-likelihood setup. By parameterizing the mean-variance relationship on data entries, we additionally introduce a dispersion parameter and entry-wise weights to model large variations and missing values. The resulting model is thus not only robust to distribution misspecification but also more flexible and able to capture mean-dependent covariance structures of the data matrix. Our main focus is on efficient computational approaches to perform the factor analysis. Previous modeling frameworks rely on simulated maximum likelihood (SML) to find the factorization solution, but this method was shown to lead to asymptotic bias when the simulated sample size grows slower than the square root of the sample size $n$, eliminating its practical application for data matrices with large $n$. Borrowing from expectation-maximization (EM) and stochastic gradient descent (SGD), we investigate three estimation procedures based on iterative factorization updates. Our proposed solution does not show asymptotic biases, and scales even better for large matrix factorizations with error $O(1/p)$. To support our findings, we conduct simulation experiments and discuss its application in four case studies.

  • Carbon, indoor air, energy and financial benefits of coupled ventilation upgrade and enhanced rooftop garden installation: An interdisciplinary climate mitigation approach

    Sustainable Cities and Society · 2023-07-12 · 4 citations

    articleOpen access

    Building energy use contributes to urban carbon dioxide (CO2) emissions while inadequate ventilation can yield indoor CO2 build up from human respiration. However, increasing ventilation rates can add to energy costs and climate burdens. Our objective was to quantify changes in emissions, energy, and financial cost when rooftop garden and ventilation upgrades are done simultaneously, with an opportunity to enhance plant growth from exhausted CO2. We measured indoor CO2 concentrations, calculated ventilation rates, and modeled five scenarios to assess these impacts. The indoor CO2 concentration maximum was 2210 ppm, median was 840 ppm, and 33% of the daytime was spent above 1000 ppm. The estimated ventilation rate was 4 L/s. . Our model calculations show that increasing ventilation to recommended levels (7 L/s) would increase total CO2 emissions, energy use, and cost (1-4%), but this could be counterbalanced by rooftop garden installation benefits, which yielded a net decrease of 23-46% in CO2 emissions, 12-13% in energy use, and 12-16% in cost. This novel integration of data collection and modeling provides support for the co-benefits of simultaneous improved installation ventilation systems and indoor CO2-enhanced rooftop gardens.

  • RAMZIS: a bioinformatic toolkit for rigorous assessment of the alterations to glycoprotein structure that occur during biological processes

    bioRxiv (Cold Spring Harbor Laboratory) · 2023-06-01

    preprintOpen access

    Motivation: Glycosylation elaborates the structures and functions of glycoproteins; glycoproteins are common post-translationally modified proteins and are heterogeneous and non-deterministically syn-thesized as an evolutionarily driven mechanism that elaborates the functions of glycosylated gene products. While glycoproteins account for approximately half of all proteins, their macro- and micro-heterogeneity requires specialized proteomics data analysis methods as a given glycosite can be divided into several glycosylated forms, each of which must be quantified. Sampling of heterogeneous glycopeptides is limited by mass spectrometer speed and sensitivity, resulting in missing values. In conjunction with the low sample size inherent to glycoproteomics, this necessitated specialized statistical metrics to identify if observed changes in glycopeptide abundances are biologically significant or due to data quality limitations. Results: Identifications by Similarity (RAMZIS), that uses similarity metrics to guide biomedical researchers to a more rigorous interpretation of glycoproteomics data. RAMZIS uses contextual similarity to assess the quality of mass spectral data and generates graphical output that demonstrates the likelihood of finding biologically significant differences in glycosylation abundance dataset. Investigators can assess dataset quality, holistically differentiate glycosites, and identify which glycopeptides are responsible for glycosylation pattern expression change. Herein RAMZIS approach is validated by theoretical cases and by a proof-of-concept application. RAMZIS enables comparison between datasets too stochastic, small, or sparse for interpolation while acknowledging these issues in its assessment. Using our tool, researchers will be able to rigor-ously define the role of glycosylation and the changes that occur during biological processes.

Recent grants

Frequent coauthors

  • Ian Johnston

    Miltenyi Biotec (Germany)

    8 shared
  • Joseph Zaia

    Boston University

    7 shared
  • Alessandro Baccini

    Boston University

    6 shared
  • Timothy Hancock

    Kantonsspital Aarau

    6 shared
  • Hiroshi Mamitsuka

    Kyoto University

    6 shared
  • Joshua Klein

    Boston University

    5 shared
  • Wayne Walker

    Woodwell Climate Research Center

    5 shared
  • Catherine L. Connolly

    Boston University

    5 shared
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Luis Carvalho

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup