
Florentina Bunea
· Professor of Statistics and Data ScienceVerifiedCornell University · Computer Science
Active 2003–2025
About
Florentina Bunea is a professor in the Department of Statistics and Data Science at Cornell University and a member of the Graduate Fields of Statistics, Applied Mathematics, and Computer Science. She is actively involved in promoting workforce diversity in data science disciplines as a member of the Diversity and Inclusion Council of the Bowers Computing and Information Science College. Her research broadly focuses on statistical machine learning theory and high-dimensional statistical inference, with an emphasis on developing new methodologies supported by rigorous theoretical foundations to address various problems in data science. Recent projects include work on estimation and theory for soft-max mixtures to better understand LLM/AI algorithms, optimal transport for high-dimensional mixture distributions, inference for the Wasserstein distance between sparse mixing measures in topic models, high-dimensional latent-space clustering, cluster-based inference, network modeling, and inference in high-dimensional models with hidden latent structures and topic models. She maintains a strong interest in model selection, sparsity, and dimension reduction in high-dimensional settings, with applications spanning genetics, systems immunology, neuroscience, sociology, and economics. Her research is partially funded by the National Science Foundation (NSF-DMS). Professor Bunea is a Fellow of the Institute of Mathematical Statistics (IMS) and an IMS Medallion Award recipient. She has served or is currently serving as an Associate Editor for several prestigious journals including the Annals of Statistics, Bernoulli, JASA, JRSS-B, EJS, and the Annals of Applied Statistics, and she is a co-editor for the Chapman and Hall Statistics and Applied Probability Monograph Series.
Research topics
- Computer Science
- Statistics
- Artificial Intelligence
- Machine Learning
- Data Mining
- Combinatorics
- Mathematical optimization
- Algorithm
- Mathematics
- Discrete mathematics
- Applied mathematics
- Biology
Selected publications
ArXiv.org · 2025-05-07
preprintOpen accessThis work proposes new estimators for discrete optimal transport plans that enjoy Gaussian limits centered at the true solution. This behavior stands in stark contrast with the performance of existing estimators, including those based on entropic regularization, which are asymptotically biased and only satisfy a CLT centered at a regularized version of the population-level plan. We develop a new regularization approach based on a different class of penalty functions, which can be viewed as the duals of those previously considered in the literature. The key feature of these penalty schemes it that they give rise to preliminary estimates that are asymptotically linear in the penalization strength. Our final estimator is obtained by constructing an appropriate linear combination of two penalized solutions corresponding to two different tuning parameters so that the bias introduced by the penalization cancels out. Unlike classical debiasing procedures, therefore, our proposal entirely avoids the delicate problem of estimating and then subtracting the estimated bias term. Our proofs, which apply beyond the case of optimal transport, are based on a novel asymptotic analysis of penalization schemes for linear programs. As a corollary of our results, we obtain the consistency of the naive bootstrap for fully data-driven inference on the true optimal solution. Simulation results and two data analyses support strongly the benefits of our approach relative to existing techniques.
Harnessing Waste Heat from Air Conditioning Units with Thermoelectric Generators
Lecture notes in civil engineering · 2025-11-22
book-chapterSenior authorEnergy Recovery from Residential Air Conditioning Systems with a Thermoelectric Generator
2025-10-23
articleThis study investigates a novel approach for recovering waste heat from air conditioning (AC) systems using thermoelectric generators (TEGs). A custom-designed heat recovery system, incorporating thermoelectric module and two heat exchangers, was integrated into an AC unit to harness residual thermal energy. Experimental tests were performed in a thermally controlled environment under three operating temperatures: 18 °C, 22 °C, and 24 °C. The performance of the TEG system was evaluated by measuring the electrical power output generated from the temperature gradient across the modules. The results show that the highest power output— approximately 3.3 W—was obtained when the AC unit operated at 24 °C. These findings demonstrate the feasibility of using TEGs for energy recovery in residential AC systems and provide a basis for future optimization and integration strategies.
SLIDE: Significant Latent Factor Interaction Discovery and Exploration across biological domains
Nature Methods · 2024 · 25 citations
- Computer Science
- Computer Science
- Machine Learning
Statistics and Learning Theory in the Era of Artificial Intelligence
Oberwolfach Reports · 2024-11-25
articleOpen access1st authorCorrespondingThe workshop highlighted recent theoretical advances on inference in high-dimensional statistical models based on the interplay of techniques from mathematical statistics, machine learning, theoretical computer science and related areas. The workshop brought together about 50 researchers in order to present new results, exchange ideas and explore open problems.
Water Sterilization using Ultraviolet Light-Emitting Diodes
Electrotehnica Electronica Automatica · 2024-03-15
articleIn order to provide adequate living conditions, it is crucial for each building to have access to essential utilities such as water, electricity, and sewage systems. In some regions, they may experience isolation, but they can still establish connections to distribution networks and fulfill basic living requirements. In such cases, local solutions like wells and septic tanks are implemented. The water sourced from these sources contains a multitude of viruses and bacteria that can have a detrimental effect on the health of anyone who consumes it. Numerous water filtration/sterilization options are available on the market, but they come with significant installation, maintenance, and operating expenses. The use of ultraviolet C (UVC) rays generated by mercury lamps for sterilization by irradiation is becoming increasingly popular due to its cost-effectiveness in terms of installation and maintenance. The main drawbacks of this technique are the high energy consumption and the potential danger of mercury exposure. This article describes the processes for designing, making, and testing two water filtration probes that use light-emitting diodes (LED) diodes to emit UVC rays. This solution lowers energy consumption eliminates the risk of mercury contamination, and leads to a decrease in maintenance costs, as the lifespan of diodes is longer than that of mercury vapor lamps. The two probes have LED diodes that emit at a wavelength of 275 nm, with a total radiant flux of 12 mW and 100 mW, respectively. Biological tests were carried out in the laboratory to assess the effects of these probes on an artificially contaminated water sample. The results obtained are satisfactory and comparable to those of sterilization devices with LED lamps.
Learning large softmax mixtures with warm start EM
arXiv (Cornell University) · 2024-09-16
preprintOpen accessSoftmax mixture models (SMMs) are discrete $K$-mixtures introduced to model the probability of choosing an attribute $x_j \in \RR^L$ from $p$ candidates, in heterogeneous populations. They have been known as mixed multinomial logits in the econometrics literature, and are gaining traction in the LLM literature, where single softmax models are routinely used in the final layer of a neural network. This paper provides a comprehensive analysis of the EM algorithm for SMMs in high dimensions. Its population-level theoretical analysis forms the basis for proving (i) local identifiability, in SSMs with generic features and, further, via a stochastic argument, (ii) full identifiability in SSMs with random features, when $p$ is large enough. These are the first results in this direction for SSMs with $L > 1$. The population-level EM analysis characterizes the initialization radius for algorithmic convergence. This also guides the construction of warm starts of the sample level EM. Under suitable initialization, the EM algorithm is shown to recover the mixture atoms of the SSM at near-parametric rate. We provide two main directions for warm start construction, both based on a new method for estimating the moments of the mixing measure underlying an SSM with random design. First, we construct a method of moments (MoM) estimator of the mixture parameters, and provide its first theoretical analysis. While MoM can enjoy parametric rates of convergence, and thus can serve as a warm-start, the estimator's quality degrades exponentially in $K$. Our recommendation, when $K$ is not small, is to run the EM algorithm several times with random initializations. We again make use of the novel latent moments estimation method to estimate the $K$-dimensional subspace of the mixture atoms. Sampling from this subspace reduces substantially the number of required draws.
ANALYTICAL MODEL AND NUMERICAL FINITE ELEMENT MODEL FOR A SUBMERSIBLE SYNCHRONOUS HYDROGENERATOR
Journal of Science and Arts · 2023-03-27
articleOpen accessThe main objective of the paper is to demonstrate the functionality and reliability of a hydrokinetic turbine system – a submersible electric generator, suitable for very low watercourses. To achieve this goal, it is necessary to design and build a prototype of a hydrokinetic turbine, coupled with an electric generator with excitation made with permanent magnets. This paper presents an analytical model and a finite element numerical model for estimating the electrical parameters of the generator (voltage, current, power, etc.). The two models were compared, and the results were extrapolated, in the analytical model, for several speed values.
Experimental Study of a Small-Scale Axial Hydrokinetic Turbine with Adjustable Blade Pitch
2023-10-26 · 5 citations
articleA small-scale axial hydrokinetic turbine (HKT) with a runner having 3 blades with adjustable pitch and a diameter of 0.2 m was designed and tested to evaluate the optimum relationship between its power coefficient and its blade tip speed ratio (TSR). The design was carried out for a water velocity of 0.8 m/s and was based on the Blade Element Momentum Theory. The turbine was built by 3D printing and tested in a free surface water channel for water velocities between 0.8 and 1.1 m/s at three different blade pitch angles. The speed and torque at the turbine shaft were measured. The results of the experimental tests are encouraging and in good agreement with the literature and show that for harvesting hydrokinetic energy for power generation, fast HKTs with 3 thinner blades are more suitable than slower designs with wider blades, as the former allow a reduction in the size and cost of the electrical generator.
ENERGY HARVESTING USING THERMOELECTRIC GENERATORS � CASE STUDY
International Multidisciplinary Scientific GeoConference SGEM ... · 2023-10-01 · 2 citations
articleSenior authorNowadays, there is increased interest in valuing energies that are often unused. This phenomenon is called energy harvesting. Thermoelectric generators are often used to �harvest� thermal energy and to transform it into electrical energy. In the preliminary study presented in this paper have the purpose to use thermoelectric elements to capitalize thermal energy release outdoor by the air conditioning equipment. The results of experimental research showed a mean temperature difference (?t) of 50�C between the hot and cold sources. The energy which can be generated with this temperature gradient, according to technical data of a specified thermoelectric module, is around 1W and it can be raised by installing more modules.
Recent grants
Statistical Foundations of Model-Based Variable Clustering
NSF · $250k · 2017–2020
Curve aggregation and classification
NSF · $161k · 2004–2007
Matrix estimation under rank constraints for complete and incomplete noisy data
NSF · $222k · 2010–2012
Learning from Hidden Signatures in High-Dimensional Models
NSF · $300k · 2020–2023
Matrix estimation under rank constraints for complete and incomplete noisy data
NSF · $220k · 2011–2014
Frequent coauthors
- 88 shared
Marten Wegkamp
- 38 shared
Adrian Barbu
Florida State University
- 26 shared
Alexandre B. Tsybakov
- 25 shared
Adrian Nedelcu
- 21 shared
Xin Bing
- 19 shared
Corina Alice Băbuţanu
Institutul National de Cercetare-Dezvoltare pentru Inginerie Electrica ICPE-CA
- 19 shared
Gabriel Dan Ciocan
Université Laval
- 17 shared
Paul Dancă
Education
PhD, Hydraulics, Hydraulic Machinery and Environmental Engineering
University Politehnica of Bucharest
Awards & honors
- IMS Medallion Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Florentina Bunea
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup