Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Ashok Cutkosky

Ashok Cutkosky

· Assistant Professor – Electrical & Computer Engineering Affiliated Faculty – Computer ScienceVerified

Boston University · Computer Science

Active 2009–2025

h-index12
Citations2.1k
Papers9559 last 5y
Funding
See your match with Ashok Cutkosky — sign in to PhdFit.Sign in

About

Ashok Cutkosky is an assistant professor in the Electrical and Computer Engineering (ECE) department at Boston University. Prior to his current academic position, he worked as a research scientist at Google. He earned his PhD in computer science from Stanford University in 2018 under the supervision of Kwabena Boahen. Additionally, he holds an AB in mathematics from Harvard University, obtained in 2013, and is also a master of medicine. His research interests focus on optimization algorithms for machine learning, with recent work on non-convex optimization and adaptive online learning. Cutkosky has contributed extensively to the field through numerous publications and has been involved in teaching courses related to machine learning and optimization at Boston University.

Research topics

  • Computer Science
  • Statistics
  • Algorithm
  • Artificial Intelligence
  • Mathematics
  • Machine Learning
  • Mathematical analysis
  • Theoretical computer science
  • Physics
  • Mathematical optimization

Selected publications

  • Pretraining Improves Prediction of Genomic Datasets Across Species

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-08-24

    preprintOpen accessSenior authorCorresponding

    Recent studies suggest that deep neural network models trained on thousands of human genomic datasets can accurately predict genomic features, including gene expression and chromatin accessibility. However, training these models is computation- and time-intensive, and datasets of comparable size do not exist for most other organisms. Here, we identify modifications to an existing state-of-the-art model that improve model accuracy while reducing training time and computational cost. Using this stream-lined model architecture, we investigate the ability of models pretrained on human genomic datasets to transfer performance to a variety of different tasks. Models pretrained on human data but fine-tuned on genomic datasets from diverse tissues and species achieved significantly higher prediction accuracy while significantly reducing training time compared to models trained from scratch, with Pearson correlation coefficients between experimental results and predictions as high as 0.8. Further, we found that including excessive training tasks decreased model performance and that this compromised performance could be partially but not completely rescued by fine-tuning. Thus, simplifying model architecture, applying pretrained models, and carefully considering the number of training tasks may be effective and economical techniques for building new models across data types, tissues, and species.

  • Unconstrained Robust Online Convex Optimization

    ArXiv.org · 2025-06-15

    preprintOpen accessSenior author

    This paper addresses online learning with ``corrupted'' feedback. Our learner is provided with potentially corrupted gradients $\tilde g_t$ instead of the ``true'' gradients $g_t$. We make no assumptions about how the corruptions arise: they could be the result of outliers, mislabeled data, or even malicious interference. We focus on the difficult ``unconstrained'' setting in which our algorithm must maintain low regret with respect to any comparison point $u \in \mathbb{R}^d$. The unconstrained setting is significantly more challenging as existing algorithms suffer extremely high regret even with very tiny amounts of corruption (which is not true in the case of a bounded domain). Our algorithms guarantee regret $ \|u\|G (\sqrt{T} + k) $ when $G \ge \max_t \|g_t\|$ is known, where $k$ is a measure of the total amount of corruption. When $G$ is unknown we incur an extra additive penalty of $(\|u\|^2+G^2) k$.

  • Adaptive bandit algorithms increase efficiency of mobile tuberculosis screening programs

    Scientific Reports · 2025-12-08

    articleOpen access

    Community-based tuberculosis screening using mobile X-ray units can effectively increase case detection rates by reducing barriers to accessing services. This study evaluated the multi-armed bandit (MAB) framework, a machine learning approach, for optimizing mobile screening locations. Using simulations, we compared two MAB algorithms-Exp3 and LinUCB-with strategies based on historical case rates and random placement. The MAB algorithms continually updated site selection based on observed screening yields, and LinUCB additionally incorporated local socioeconomic indicators associated with tuberculosis rates. Over three years, assuming two mobile units serving 95 sites in Lima, Peru, 1,000 simulations demonstrated the MAB algorithms significantly reduced the average number of screenings needed to detect one individual with tuberculosis: 112 (standard deviation [SD]: 10) for Exp3 and 79 (SD: 12) for LinUCB, versus 152 (SD: 11) for random placement and 143 (SD: 11) for historic case-rate-driven placement. LinUCB performed best, achieving a 20% increase in detection efficiency by week 16 and 50% by week 40 compared to case-rate-driven placement. Overall, both MAB algorithms improved tuberculosis screening yields, emphasizing the value of data-driven approaches for optimizing mobile screening interventions. Incorporating adaptive models into screening programs may enhance targeting efficiency and offers a promising direction for policymakers and implementers seeking to optimize resource allocation in high-burden setting.

  • Fully Unconstrained Online Learning

    2024-01-01 · 1 citations

    article1st authorCorresponding
  • Fully Unconstrained Online Learning

    arXiv (Cornell University) · 2024-05-30

    preprintOpen access1st authorCorresponding

    We provide an online learning algorithm that obtains regret $G\|w_\star\|\sqrt{T\log(\|w_\star\|G\sqrt{T})} + \|w_\star\|^2 + G^2$ on $G$-Lipschitz convex losses for any comparison point $w_\star$ without knowing either $G$ or $\|w_\star\|$. Importantly, this matches the optimal bound $G\|w_\star\|\sqrt{T}$ available with such knowledge (up to logarithmic factors), unless either $\|w_\star\|$ or $G$ is so large that even $G\|w_\star\|\sqrt{T}$ is roughly linear in $T$. Thus, it matches the optimal bound in all cases in which one can achieve sublinear regret, which arguably most "interesting" scenarios.

  • General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization

    arXiv (Cornell University) · 2024-11-11

    preprintOpen accessSenior author

    This work investigates the effectiveness of schedule-free methods, developed by A. Defazio et al. (NeurIPS 2024), in nonconvex optimization settings, inspired by their remarkable empirical success in training neural networks. Specifically, we show that schedule-free SGD achieves optimal iteration complexity for nonsmooth, nonconvex optimization problems. Our proof begins with the development of a general framework for online-to-nonconvex conversion, which converts a given online learning algorithm into an optimization algorithm for nonconvex losses. Our general framework not only recovers existing conversions but also leads to two novel conversion schemes. Notably, one of these new conversions corresponds directly to schedule-free SGD, allowing us to establish its optimality. Additionally, our analysis provides valuable insights into the parameter choices for schedule-free SGD, addressing a theoretical gap that the convex theory cannot explain.

  • Random Scaling and Momentum for Non-smooth Non-convex Optimization

    arXiv (Cornell University) · 2024-05-16 · 1 citations

    preprintOpen accessSenior author

    Training neural networks requires optimizing a loss function that may be highly irregular, and in particular neither convex nor smooth. Popular training algorithms are based on stochastic gradient descent with momentum (SGDM), for which classical analysis applies only if the loss is either convex or smooth. We show that a very small modification to SGDM closes this gap: simply scale the update at each time point by an exponentially distributed random scalar. The resulting algorithm achieves optimal convergence guarantees. Intriguingly, this result is not derived by a specific analysis of SGDM: instead, it falls naturally out of a more general framework for converting online convex optimization algorithms to non-convex optimization algorithms.

  • The Road Less Scheduled

    arXiv (Cornell University) · 2024-05-24 · 2 citations

    preprintOpen accessSenior author

    Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available at https://github.com/facebookresearch/schedule_free. Schedule-Free AdamW is the core algorithm behind our winning entry to the MLCommons 2024 AlgoPerf Algorithmic Efficiency Challenge Self-Tuning track.

  • Online Linear Regression in Dynamic Environments via Discounting

    arXiv (Cornell University) · 2024-05-29 · 1 citations

    preprintOpen accessSenior author

    We develop algorithms for online linear regression which achieve optimal static and dynamic regret guarantees \emph{even in the complete absence of prior knowledge}. We present a novel analysis showing that a discounted variant of the Vovk-Azoury-Warmuth forecaster achieves dynamic regret of the form $R_{T}(\vec{u})\le O\left(d\log(T)\vee \sqrt{dP_{T}^γ(\vec{u})T}\right)$, where $P_{T}^γ(\vec{u})$ is a measure of variability of the comparator sequence, and show that the discount factor achieving this result can be learned on-the-fly. We show that this result is optimal by providing a matching lower bound. We also extend our results to \emph{strongly-adaptive} guarantees which hold over every sub-interval $[a,b]\subseteq[1,T]$ simultaneously.

  • Adam with model exponential moving average is effective for nonconvex optimization

    2024-01-01 · 6 citations

    articleSenior author

Frequent coauthors

Labs

Education

  • Ph.D.

    Stanford University

    2018
  • B.A., Mathematics

    Harvard

    2013
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Ashok Cutkosky

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup