$Ronald Coifman$

Ronald Coifman

· Sterling Professor of Mathematics and Professor of Computer ScienceVerified

Yale University · Department of Mathematics

Active 1965–2026

h-index74

Citations39.6k

Papers36140 last 5y

Funding$1.9M

Faculty page

See your match with Ronald Coifman — sign in to PhdFit.Sign in

About

Ronald Coifman is the Sterling Professor of Mathematics and a Professor of Computer Science at Yale University. His research areas include nonlinear analysis, scattering theory, real and complex analysis, singular integrals, and numerical analysis. He holds a Ph.D. from Geneva, earned in 1965. Coifman is a member of several prestigious organizations, including the National Medal of Science, the National Academy of Sciences, and the American Academy of Arts and Sciences. His work has significantly contributed to the fields of mathematical analysis and computational methods, establishing him as a leading figure in these disciplines.

Research topics

Computer Science
Artificial Intelligence
Statistics
Biology
Mathematics
Physics
Chemistry
Neuroscience
Psychology

Selected publications

Blaschke products and unwinding in higher dimensions
Open MIND · 2026-03-07
preprint1st authorCorresponding
We give a necessary and sufficient condition for the convergence of an infinite product of rational inner functions on the polydisk, and explore generalization to the polydisk of Malmquist- Takenaka bases and various versions of unwinding
DOI
Blaschke products and unwinding in higher dimensions
arXiv (Cornell University) · 2026-03-07
articleOpen access1st authorCorresponding
We give a necessary and sufficient condition for the convergence of an infinite product of rational inner functions on the polydisk, and explore generalization to the polydisk of Malmquist- Takenaka bases and various versions of unwinding
Publisher OA PDF
BDN: Blaschke Decomposition Networks
2025-10-31
articleOpen access
We introduce the Blaschke Decomposition Network (BDN), a novel neural network architecture for analyzing continuous real-valued or complex-valued 1-D and 2-D signals-data types that existing architectures, such as transformers or recurrent networks, are not designed to model. These signals are common in medicine, biology, and other scientific domains, yet their analytic structure is often underutilized in machine learning. Our approach is based on the Blaschke decomposition, which "unwinds" a signal into a sequence of factors determined by its roots-the points in the complex unit disk where the analytic continuation of the signal vanishes. By iteratively peeling off these factors, the decomposition isolates oscillatory components of the signal and produces a compact representation. BDNs are trained to predict these roots directly, and we show that they provide powerful and interpretable representations for downstream tasks. We first design the architecture for 1-D signals and then extend it to 2-D using a wedge-based factorization, enabling the same framework to handle images and other spatially varying signals. Experiments on sensor-derived biomedical data, including electrocardiograms and phase holographic microscopy, show that BDNs achieve strong predictive performance while using fewer parameters than transformers, convolutional, or recurrent networks.
Publisher OA PDF DOI
On Complex Analytic Tools, and the Holomorphic Rotation Methods
Applied and numerical harmonic analysis · 2025-01-01
book-chapter1st authorCorresponding
Publisher DOI
Extracting Dual Analytic Geometries of Linear Transformations to Achieve Efficient Computation
ArXiv.org · 2025-06-13
preprintOpen accessSenior author
We propose a novel framework for fast integral operations by uncovering hidden geometries in the row and column structures of the underlying operators. This is accomplished through the \texttt{Questionnaire} algorithm, an iterative procedure that constructs adaptive hierarchical partition trees, revealing latent multiscale organizations and exposing local low-rank structures within the data. Guided by these geometries, we employ two complementary techniques: (1) The \texttt{\texttt{Butterfly}} algorithm, which exploits the learned hierarchical low-rank structure; and (2) Adaptive \texttt{eGHWT}, best tilings in both space and frequency using all levels of the generalized Haar--Walsh wavelet packets. These techniques enable efficient matrix factorization and multiplication. We coin our algorithms as \texttt{Questionnaire Factorization and Fast Transform (QFFT)}. Unlike classical approaches that rely on prior knowledge of the underlying geometry, \texttt{QFFT} is fully data-driven and applicable to matrices arising from irregular or unknown distributions. Even when the rows and columns both appear mutually orthogonal, our framework identifies the intrinsic ordering of orthogonal vectors that reveal hidden sparsity of the kernel. We demonstrate the effectiveness of our approach on matrices associated with heterogeneous operators and families of orthogonal polynomials. The resulting compressed representations reduce storage complexity from $\mathcal{O}(N^2)$ to $\mathcal{O}(N \log N)$, enabling fast computation and scalable implementation.
Publisher OA PDF DOI
Intrinsic and Extrinsic Organized Attention: Softmax Invariance and Network Sparsity
ArXiv.org · 2025-06-18
preprintOpen accessSenior author
We examine the intrinsic (within the attention head) and extrinsic (amongst the attention heads) structure of the self-attention mechanism in transformers. Theoretical evidence for invariance of the self-attention mechanism to softmax activation is obtained by appealing to paradifferential calculus, (and is supported by computational examples), which relies on the intrinsic organization of the attention heads. Furthermore, we use an existing methodology for hierarchical organization of tensors to examine network structure by constructing hierarchal partition trees with respect to the query, key, and head axes of network 3-tensors. Such an organization is consequential since it allows one to profitably execute common signal processing tasks on a geometry where the organized network 3-tensors exhibit regularity. We exemplify this qualitatively, by visualizing the hierarchical organization of the tree comprised of attention heads and the diffusion map embeddings, and quantitatively by investigating network sparsity with the expansion coefficients of individual attention heads and the entire network with respect to the bi and tri-haar bases (respectively) on the space of queries, keys, and heads of the network. To showcase the utility of our theoretical and methodological findings, we provide computational examples using vision and language transformers. The ramifications of these findings are two-fold: (1) a subsequent step in interpretability analysis is theoretically admitted, and can be exploited empirically for downstream interpretability tasks (2) one can use the network 3-tensor organization for empirical network applications such as model pruning (by virtue of network sparsity) and network architecture comparison.
Publisher OA PDF DOI
From disorganized data to emergent dynamic models: Questionnaires to partial differential equations
PNAS Nexus · 2025-01-21 · 1 citations
articleOpen access
Abstract Starting with sets of disorganized observations of spatially varying and temporally evolving systems, obtained at different (also disorganized) sets of parameters, we demonstrate the data-driven derivation of parameter dependent, evolutionary partial differential equation (PDE) models capable of generating the data. This tensor type of data is reminiscent of shuffled (multidimensional) puzzle tiles. The independent variables for the evolution equations (their “space” and “time”) as well as their effective parameters are all emergent, i.e. determined in a data-driven way from our disorganized observations of behavior in them. We use a diffusion map based questionnaire approach to build a smooth parametrization of our emergent space/time/parameter space for the data. This approach iteratively processes the data by successively observing them on the “space,” the “time” and the “parameter” axes of a tensor. Once the data become organized, we use machine learning (here, neural networks) to approximate the operators governing the evolution equations in this emergent space. Our illustrative examples are based (i) on a simple advection–diffusion model; (ii) on a previously developed vertex-plus-signaling model of Drosophila embryonic development; and (iii) on two complex dynamic network models (one neuronal and one coupled oscillator model) for which no obvious smooth embedding geometry is known a priori. This allows us to discuss features of the process like symmetry breaking, translational invariance, and autonomousness of the emergent PDE model, as well as its interpretability.
Publisher OA PDF DOI
Joint Hierarchical Representation Learning of Samples and Features via Informed Tree-Wasserstein Distance
arXiv (Cornell University) · 2025-01-07
preprintOpen access
High-dimensional data often exhibit hierarchical structures in both modes: samples and features. Yet, most existing approaches for hierarchical representation learning consider only one mode at a time. In this work, we propose an unsupervised method for jointly learning hierarchical representations of samples and features via Tree-Wasserstein Distance (TWD). Our method alternates between the two data modes. It first constructs a tree for one mode, then computes a TWD for the other mode based on that tree, and finally uses the resulting TWD to build the second mode's tree. By repeatedly alternating through these steps, the method gradually refines both trees and the corresponding TWDs, capturing meaningful hierarchical representations of the data. We provide a theoretical analysis showing that our method converges. We show that our method can be integrated into hyperbolic graph convolutional networks as a pre-processing technique, improving performance in link prediction and node classification tasks. In addition, our method outperforms baselines in sparse approximation and unsupervised Wasserstein distance learning tasks on word-document and single-cell RNA-sequencing datasets.
Publisher OA PDF DOI
From clutter to clarity: Emergent neural operators via questionnaire metrics
Computers & Chemical Engineering · 2025-06-05
article
Publisher DOI
Estimating Position-Dependent and Anisotropic Diffusivity Tensors from Molecular Dynamics Trajectories: Existing Methods and Future Outlook
Journal of Chemical Theory and Computation · 2024-05-30 · 9 citations
article
Confinement can substantially alter the physicochemical properties of materials by breaking translational isotropy and rendering all physical properties position-dependent. Molecular dynamics (MD) simulations have proven instrumental in characterizing such spatial heterogeneities and probing the impact of confinement on materials' properties. For static properties, this is a straightforward task and can be achieved via simple spatial binning. Such an approach, however, cannot be readily applied to transport coefficients due to lack of natural extensions of autocorrelations used for their calculation in the bulk. The prime example of this challenge is diffusivity, which, in the bulk, can be readily estimated from the particles' mobility statistics, which satisfy the Fokker-Planck equation. Under confinement, however, such statistics will follow the Smoluchowski equation, which lacks a closed-form analytical solution. This brief review explores the rich history of estimating profiles of the diffusivity tensor from MD simulations and discusses various approximate methods and algorithms developed for this purpose. Besides discussing heuristic extensions of bulk methods, we overview more rigorous algorithms, including kernel-based methods, Bayesian approaches, and operator discretization techniques. Additionally, we outline methods based on applying biasing potentials or imposing constraints on tracer particles. Finally, we discuss approaches that estimate diffusivity from mean first passage time or committor probability profiles, a conceptual framework originally developed in the context of collective variable spaces describing rare events in computational chemistry and biology. In summary, this paper offers a concise survey of diverse approaches for estimating diffusivity from MD trajectories, highlighting challenges and opportunities in this area.
Publisher DOI

Recent grants

CDS&E/Collaborative Research: The Integration of Data-Mining with Multiscale Engineering Computations
NSF · $475k · 2013–2016
CRCNS: Sensory-Motor Integration in Mammalian Brian: experiment, analysis, modeling
NIH · $1.0M · 2016–2021
Geometric Harmonic Analysis
NSF · $354k · 2005–2009

Frequent coauthors

Ioannis G. Kevrekidis
Johns Hopkins University
66 shared
C. W. Gear
Princeton University
40 shared
Yves Meyer
40 shared
Anastasia Georgiou
Johns Hopkins University
38 shared
Eliodoro Chiavazzo
Polytechnic University of Turin
37 shared
Roberto Covino
Frankfurt Institute for Advanced Studies
37 shared
Gerhard Hummer
37 shared
Harlan M. Krumholz
Yale New Haven Health System
35 shared

Awards & honors

National Medal of Science
National Academy of Sciences
American Academy of Arts and Sciences

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Ronald Coifman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you