Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Andrew Owens

Andrew Owens

Verified

University of Michigan · Computer Science

Active 2008–2025

h-index30
Citations8.1k
Papers8345 last 5y
Funding
See your match with Andrew Owens — sign in to PhdFit.Sign in

About

Andrew Owens is an associate professor of computer science at Cornell Tech and the Cornell Ann S. Bowers College of Computing and Information Science. His research aims to create multimodal systems that learn to see, hear, and touch without human-labeled training data. Instead, these systems learn from co-occurring sensory signals, such as the correlations between the visual and audio streams of a video. His work has enabled applications that include producing soundtracks for silent videos, robotic manipulation with vision and touch, detecting AI-generated images, and generating visual illusions. Owens is a recipient of a Sloan Research Fellowship and an NSF CAREER Award. Prior to joining Cornell, he was an assistant professor at the University of Michigan and a postdoctoral scholar at the University of California, Berkeley. He received a Ph.D. in computer science from the Massachusetts Institute of Technology in 2016 and a B.A. in computer science from Cornell University in 2010.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Computer vision

Selected publications

  • Investigating Screen Complexity in the Vehicle, How Does Design Effect Driver Attention and Task Performance?

    SSRN Electronic Journal · 2025-01-01

    preprintOpen access
  • GPS as a Control Signal for Image Generation

    2025-06-10 · 2 citations

    articleSenior author

    We show that the GPS tags contained in photo metadata provide a useful control signal for image generation. We train GPS-to-image models and use them for tasks that require a fine-grained understanding of how images vary within a city. In particular, we train a diffusion model to generate images conditioned on both GPS and text. The learned model generates images that capture the distinctive appearance of different neighborhoods, parks, and landmarks. We also extract 3D models from 2D GPS-to-image models through score distillation sampling, using GPS conditioning to constrain the appearance of the reconstruction from each viewpoint. Our evaluations suggest that our GPS-conditioned models successfully learn to generate images that vary based on location, and that GPS conditioning improves estimated 3D structure.

  • Contrastive Touch-to-Touch Pretraining

    2025-05-19 · 1 citations

    article

    Today's tactile sensors have a variety of different designs, making it challenging to develop general-purpose methods for processing touch signals. In this paper, we learn a unified representation that captures the shared information between different tactile sensors. Unlike current approaches that focus on reconstruction or task-specific supervision, we leverage contrastive learning to integrate tactile signals from two different sensors into a shared embedding space, using a dataset in which the same objects are probed with multiple sensors. We apply this approach to paired touch signals from GelSlim and Soft Bubble sensors. We show that our learned features provide strong pretraining for downstream pose estimation and classification tasks. We also show that our embedding enables models trained using one touch sensor to be deployed using another without additional training. Project details can be found at https://www.mmintlab.com/research/cttp/.

  • Community Forensics: Using Thousands of Generators to Train Fake Image Detectors

    2025-06-10 · 3 citations

    articleSenior author

    One of the key challenges of detecting AI-generated images is spotting images that have been created by previously unseen generative models. We argue that the limited diversity of the training data is a major obstacle to addressing this problem, and we propose a new dataset that is significantly larger and more diverse than prior works. As part of creating this dataset, we systematically download thousands of text-to-image latent diffusion models and sample images from them. We also collect images from dozens of popular open source and commercial models. The resulting dataset contains 2.7M images that have been sampled from 4803 different models. These images collectively capture a wide range of scene content, generator architectures, and image processing settings. Using this dataset, we study the generalization abilities of fake image detectors. Our experiments suggest that detection performance improves as the number of models in the training set increases, even when these models have similar architectures. We also find that increasing the diversity of the models improves detection performance, and that our trained detectors generalize better than those trained on other datasets. The dataset can be found in https://jespark.net/projects/2024/community_forensics

  • Supervising Sound Localization by In-the-wild Egomotion

    2025-06-10

    articleSenior author

    We present a method for learning binaural sound localization using egomotion as a supervisory signal. Over the course of a video, the camera’s direction to a sound source will change as the camera moves. We train an audio model to predict sound directions that are consistent with visual estimates of camera motion, which we obtain using traditional methods from multi-view geometry. This provides a weak but plentiful form of supervision that we combine with traditional binaural cues. To evaluate this method, we propose a dataset of real-world audio-visual videos with egomotion. We show that our model can successfully learn from real-world data and that it performs well on sound localization tasks.

  • Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

    2024-06-16 · 13 citations

    articleSenior author

    We address the problem of synthesizing multi-view optical illusions: images that change appearance upon a transformation, such as a flip or rotation. We propose a simple, zero-shot method for obtaining these illusions from off-the-shelf text-to-image diffusion models. During the reverse diffusion process, we estimate the noise from different views of a noisy image, and then combine these noise estimates together and denoise the image. A theoretical analysis suggests that this method works precisely for views that can be written as orthogonal transformations, of which permutations are a subset. This leads to the idea of a visual anagram-an image that changes appearance under some rearrangement of pixels. This includes rotations and flips, but also more exotic pixel permutations such as a jigsaw rearrangement. Our approach also naturally extends to illusions with more than two views. We provide both qualitative and quantitative results demonstrating the effectiveness and flexibility of our method. Please see our project webpage for additional visualizations and results: https://dangeng.github.io/visual_anagrams/.

  • Efficient Vision-Language Pre-Training by Cluster Masking

    2024-06-16 · 4 citations

    articleSenior author

    We propose a simple strategy for masking image patches during visual-language contrastive learning that improves the quality of the learned representations and the training speed. During each iteration of training, we randomly mask clusters of visually similar image patches, as measured by their raw pixel intensities. This provides an extra learning signal, beyond the contrastive training itself, since it forces a model to predict words for masked visual structures solely from context. It also speeds up training by reducing the amount of data used in each image. We evaluate the effectiveness of our model by pre-training on a number of bench-marks, finding that it outperforms other masking strategies, such as FLIP, on the quality of the learned representation.

  • MaPP Puzzle Hunt

    Math Horizons · 2024-08-29

    articleSenior author
  • Community Forensics: Using Thousands of Generators to Train Fake Image Detectors

    arXiv (Cornell University) · 2024-11-06

    preprintOpen accessSenior author

    One of the key challenges of detecting AI-generated images is spotting images that have been created by previously unseen generative models. We argue that the limited diversity of the training data is a major obstacle to addressing this problem, and we propose a new dataset that is significantly larger and more diverse than prior work. As part of creating this dataset, we systematically download thousands of text-to-image latent diffusion models and sample images from them. We also collect images from dozens of popular open source and commercial models. The resulting dataset contains 2.7M images that have been sampled from 4803 different models. These images collectively capture a wide range of scene content, generator architectures, and image processing settings. Using this dataset, we study the generalization abilities of fake image detectors. Our experiments suggest that detection performance improves as the number of models in the training set increases, even when these models have similar architectures. We also find that detection performance improves as the diversity of the models increases, and that our trained detectors generalize better than those trained on other datasets. The dataset can be found in https://jespark.net/projects/2024/community_forensics

  • Contrastive Touch-to-Touch Pretraining

    arXiv (Cornell University) · 2024-10-15

    preprintOpen access

    Today's tactile sensors have a variety of different designs, making it challenging to develop general-purpose methods for processing touch signals. In this paper, we learn a unified representation that captures the shared information between different tactile sensors. Unlike current approaches that focus on reconstruction or task-specific supervision, we leverage contrastive learning to integrate tactile signals from two different sensors into a shared embedding space, using a dataset in which the same objects are probed with multiple sensors. We apply this approach to paired touch signals from GelSlim and Soft Bubble sensors. We show that our learned features provide strong pretraining for downstream pose estimation and classification tasks. We also show that our embedding enables models trained using one touch sensor to be deployed using another without additional training. Project details can be found at https://www.mmintlab.com/research/cttp/.

Frequent coauthors

  • Alexei A. Efros

    34 shared
  • Sheng-Yu Wang

    Carnegie Mellon University

    13 shared
  • William T. Freeman

    11 shared
  • Shiry Ginosar

    10 shared
  • Przemysław Prusinkiewicz

    University of Calgary

    8 shared
  • Mikolaj Cieslak

    University of Calgary

    8 shared
  • Oliver Wang

    Adobe Systems (United States)

    8 shared
  • Richard Zhang

    8 shared

Awards & honors

  • Sloan Research Fellowship
  • NSF CAREER Award
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Andrew Owens

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup