
Bharath Hariharan
VerifiedCornell University · Computer Science
Active 2002–2026
About
Bharath Hariharan is an associate professor in Computer Science at Cornell University specializing in computer vision and machine learning. His research focuses on challenging problems that do not fit the traditional "Big Data" paradigm, emphasizing the integration of advances in machine learning with insights from computer vision, geometry, and domain-specific knowledge. His work addresses fundamental challenges such as recognition on satellite images, which is critical for environmental and earth sciences but complicated by the lack of large labeled datasets. To tackle these challenges, his group has developed one of the most accurate foundation vision-language models for satellite images and novel self-supervised representations for this domain. In addition to satellite image recognition, Hariharan's research explores 4D reconstruction and recognition, aiming to understand dynamic scenes and long-term changes in the environment. His group investigates novel tracking formulations that follow individual pixels through long-term occlusions and objects through state changes, as well as new benchmarks and architectures for multimodal video understanding. His research has been recognized with prestigious awards including an NSF CAREER award and a PAMI Young Researcher Award, reflecting his significant contributions to the fields of computer vision and machine learning.
Research topics
- Artificial Intelligence
- Computer Science
- Remote sensing
- Computer vision
- Geography
Selected publications
Swin-HViT for Accurate Early-Stage Crop Disease Diagnosis Using a Hybrid Transformer Model
Research Square · 2026-02-16
preprintOpen accessColor Bind: Exploring Color Perception in Text-to-Image Models
2026-03-06
articleText-to-image generation has recently seen remarkable success, granting users with the ability to create high-quality images through the use of text. However, contemporary methods face challenges in capturing the precise semantics conveyed by complex multi-object prompts. Consequently, many works have sought to mitigate such semantic mis-alignments, typically via inference-time schemes that modify the attention layers of the denoising networks. However, prior work has mostly utilized coarse metrics, such as the cosine similarity between text and image CLIP embeddings, or human evaluations, which are challenging to conduct on a larger-scale. In this work, we perform a case study on colors—a fundamental attribute commonly associated with objects in text prompts, which offer a rich test bed for rigorous evaluation. Our analysis reveals that pretrained models struggle to generate images that faithfully reflect multiple color attributes—far more so than with single-color prompts—and that neither inference-time techniques nor existing editing methods reliably resolve these semantic misalignments. Accordingly, we introduce a dedicated image editing technique, mitigating the issue of multi-object semantic alignment for prompts containing multiple colors. We demonstrate that our approach significantly boosts performance over a wide range of metrics, considering images generated by various text-to-image diffusion-based techniques. Our code, benchmark and evaluation protocol is publicly available on our project webpage.
3D Synthesis for Architectural Design
2025-02-26
articleSenior authorWe introduce a 3D synthesis method for architectural design to allow for the efficient generation of diverse and realistic building designs. In spite of advances in 3D synthesis, current off-the-shelf 3D synthesis techniques are inappropriate for architectural design: they are trained primarily on isolated objects, have limited diversity, blend building facades with background and produce overly complex geometry that is difficult to edit or manipulate, a major issue in an iterative design process. We propose an alternative pipeline that integrates auto-generated coarse models with segment-wise texture inpainting and semantics-based editing, resulting in diverse, style-consistent, and shape-precise designs. We show through qualitative and quantitative experiments that our pipeline generates more diverse, visually appealing architectures with clean geometries without the need for any extensive training. Project page: https://itingtsai.github.io/syn_arch_2025/
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
2025-06-10
articleVisual data is used in numerous different scientific work-flows ranging from remote sensing to ecology. As the amount of observation data increases, the challenge is not just to make accurate predictions but also to understand the underlying mechanisms for those predictions. Good interpretation is important in scientific workflows, as it allows for better decision-making by providing insights into the data. This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks. We propose DiSciPLE (Discovering Scientific Programs using LLMs and Evolution) an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data. Additionally, we propose two improvements: a program critic and a program simplifier to improve our method further to synthesize good programs. On three different real-world problems, DiSciPLE learns state-of-the-art programs on novel tasks with no prior literature. For example, we can learn programs with 35% lower error than the closest non-interpretable baseline for population density estimation.
Task Scheduling in Cloud Computing Using Fossa Optimization Algorithm
2025-06-05
articleSenior authorCloud computing has gained popularity due to its high-performance distributed computation and its ability to be accessed from anywhere in the world through online resources. Through cloud service providers and the Internet, it gives users access to shared computing resources. The efficient functioning of task scheduling in clouds is one of the most important research topics that have to be addressed. First, traditional scheduling methods have been used to assess the effectiveness of task scheduling algorithms for cloud computing. Traditional scheduling algorithms perform poorly as a result of the growing number of activities generated. In this study, we suggest a newly created Fossa Optimization method for job scheduling in cloud computing environments. When compared to other meta-heuristic algorithms such as Ant Colony Optimization (ACO), Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Lion Optimization algorithm, the Fossa Optimization method’s advantage is its high rate of convergence. In this paper, we suggest a novel task-scheduling algorithm for cloud computing that is based on the Fossa Optimization Algorithm (FOA). The Fossa Optimization Algorithm (FOA) is a meta-heuristic optimization method inspired by nature that imitates the Fossa’s hunting and movement patterns. When compared to other scheduling algorithms, it performed exceptionally well in terms of minimizing the wait time and costs, as well as resource use and ensuring that workloads are distributed in a balanced manner.
MONITRS: Multimodal Observations of Natural Incidents Through Remote Sensing
ArXiv.org · 2025-07-22
preprintOpen accessSenior authorNatural disasters cause devastating damage to communities and infrastructure every year. Effective disaster response is hampered by the difficulty of accessing affected areas during and after events. Remote sensing has allowed us to monitor natural disasters in a remote way. More recently there have been advances in computer vision and deep learning that help automate satellite imagery analysis, However, they remain limited by their narrow focus on specific disaster types, reliance on manual expert interpretation, and lack of datasets with sufficient temporal granularity or natural language annotations for tracking disaster progression. We present MONITRS, a novel multimodal dataset of more than 10,000 FEMA disaster events with temporal satellite imagery and natural language annotations from news articles, accompanied by geotagged locations, and question-answer pairs. We demonstrate that fine-tuning existing MLLMs on our dataset yields significant performance improvements for disaster monitoring tasks, establishing a new benchmark for machine learning-assisted disaster response systems. Code can be found at: https://github.com/ShreelekhaR/MONITRS
An Overview of the Significance of Cloud Computing in the Realm of the Metaverse
Auerbach Publications eBooks · 2025-01-27 · 2 citations
book-chapter1st authorCorrespondingThe metaverse, Blockchain technologies, and cloud computing converge to give rise to seamless connectivity and transformative digital opportunities. The term metaverse is derived from the words meta, meaning “beyond,” and universe, and can be understood as “outside the universe.” The metaverse pertains to the virtual shared space where users can interact with computer-generated environments, objects, and other users. Metaverse technology is powered by augmented reality (AR), virtual reality (VR), and cloud computing. Our understanding of the digital landscape and communication has changed dramatically in the early days of the metaverse. The importance and impact of cloud computing in this area become more apparent as the metaverse and related technologies unfold and spread. This thesis explores how the creation, sustainability, and flexibility of the metaverse are all facilitated by cloud computing. We explore the relationship between cloud computing and the metaverse and highlight important technical features, some challenges, and possible solutions. Although widely used, the basic techniques for this system are still in their infancy. Developing new strategies is critical to successfully building and deploying metaverse applications. The use of artificial intelligence (AI), particularly deep learning (DL), has shown positive results in a variety of applications of the metaverse, including manufacturing, enterprise, and science.
Multi-Objective Workflow Scheduling with QLearning and Aquila Optimization
2025-09-10
articleSenior authorScientific workflow scheduling in cloud computing is a complex multi-objective optimization challenge involving trade-offs among energy consumption, makespan, cost, and reliability. Existing reinforcement learning methods such as MORL-WS and evolutionary algorithms like NSGA-II achieve partial improvements but suffer from slow convergence, limited adaptability, or inadequate energy savings. To address these gaps, we propose a hybrid framework that integrates Q-learning with an enhanced Aquila Optimizer (AO), termed MOQ-AO. The enhanced AO employs chaotic initialization, opposition-based learning, and Pareto-front maintenance to balance exploration and exploitation, while Q-learning dynamically refines scheduling policies using real-time system feedback. This hybridization enables adaptive workload-to-resource mapping, dynamic voltage and frequency scaling (DVFS), and improved reliability under heterogeneous cloud environments. Extensive simulations in CloudSim with scientific workflows (Montage, CyberShake, LIGO) demonstrate that MOQ-AO consistently outperforms baselines. Compared to NSGA-II, MORL-WS, and ASSA, MOAO reduces energy consumption by up to 38%, shortens makespan by 37%, and improves deadline compliance to 92% with faster convergence (156 s vs. 189-217s). These results confirm that combining metaheuristic optimization with reinforcement learning yields significant improvements in both efficiency and adaptability. The findings highlight MOQ-AO as a scalable and robust next-generation approach for scientific workflow scheduling, with potential applicability in federated, multi-cloud, and quantum-inspired computing paradigms.
ObjectCarver: Semi-Automatic Segmentation, Reconstruction and Separation of 3D Objects
2025-03-25
articleSenior authorImplicit neural fields have made remarkable progress in reconstructing 3D surfaces from multiple images; however, they encounter challenges when it comes to separating individual objects within a scene. Previous approaches to this problem require ground-truth segmentation masks and introduce floating artifacts in occluded parts of the scene. We address these challenges with ObjectCarver. Object-Carver requires no ground-truth segmentation; all it needs is just a few user clicks in a single view. ObjectCarver also introduces a new loss function that prevents floaters and avoids inappropriate carving-out due to occlusion. Finally, ObjectCarver uses a simple initialization technique that significantly speeds up the process while preserving geometric details. We demonstrate qualitatively and quantitatively on multiple datasets (including a new dataset and benchmark with complete ground-truth) that ObjectCarver produces more accurate reconstructions of each object while minimizing artifacts.
FlashDepth: Real-Time Streaming Video Depth Estimation at 2K Resolution
2025-10-19
articleOpen accessA versatile video depth estimation model should (1) be accurate and consistent across frames, (2) produce high-resolution depth maps, and (3) support real-time streaming. We propose FlashDepth, a method that satisfies all three requirements, performing depth estimation on a 2044x1148 streaming video at 24 FPS. We show that, with careful modifications to pretrained single-image depth models, these capabilities are enabled with relatively little data and training. We evaluate our approach across multiple unseen datasets against state-of-the-art depth models, and find that ours outperforms them in terms of boundary sharpness and speed by a significant margin, while maintaining competitive accuracy. We hope our model will enable various applications that require high-resolution depth, such as video editing, and online decision-making, such as robotics. We release all code and model weights at https://github.com/Eyeline-Research/FlashDepth
Frequent coauthors
- 37 shared
Kilian Q. Weinberger
- 32 shared
Mark Campbell
- 30 shared
Ross Girshick
- 29 shared
Yurong You
- 25 shared
R. Moharana
Indian Institute of Technology Jodhpur
- 25 shared
Sonali Gupta
Graphic Era University
- 25 shared
Abhay Jain
Saveetha University
- 25 shared
Pankaj Jain
Indian Institute of Technology Kanpur
Labs
Computer vision and machine learning, particularly on problems that defy the 'Big Data' label.
Awards & honors
- PAMI Young Researcher Award
- IEEE Computer Society Bharath Hariharan Research 2022
- NSF Faculty Early Career Development Award (CAREER)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Bharath Hariharan
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup