About
Vladimir Pavlovic is a professor in the Department of Computer Science at Rutgers University. His research focuses on machine learning, computer vision, and biomedical informatics. He is affiliated with the Intelligent Systems research group. He has been recognized for his contributions to artificial intelligence, with research featured in media such as Vice. Pavlovic has held a leave of absence at the Samsung AI Center in Cambridge, UK, from fall 2018 to fall 2020. His work has been acknowledged through awards such as the 2022 Piero Zamperoni Best Student Paper at ICPR, and he has served as a general co-chair and program co-chair for CVPR 2026.
Research topics
- Computer Science
- Machine Learning
- Artificial Intelligence
- Natural Language Processing
- Data Mining
Selected publications
Optimizing Java Performance with the Vector API for Data-Parallel Computations
2025-06-02
article1st authorCorrespondingThe Java Vector API efficiently optimizes dataparallel operations using SIMD (Single Instruction, Multiple Data), making it particularly useful for applications that rely on vector-based numerical computations, such as simulations, machine learning, and data analytics. This paper analyzes the Vector API's design, supported features, and potential use in performance-critical domains such as game engines, data analytics, and scientific simulations. We analyzed the Vector API with practical test cases to measure its performance, focusing on features such as supported data types, operational flexibility, and the way older CPUs without appropriate SIMD support affect its efficiency. Our preliminary results demonstrate notable improvements in the performance of parallelizable tasks while also identifying limitations, including reliance on hardware SIMD support and reduced efficiency when fallback mechanisms are employed. The Vector API demonstrates how Java can leverage hardware-accelerated computations, allowing software engineers to achieve significant performance gains without requiring the use of low-level programming languages. The findings suggest that the Vector API effectively bridges the gap between hardware-accelerated computations and Java's highlevel programming model, providing a viable path for optimizing performance in specific use cases without requiring low-level programming expertise.
GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs
ArXiv.org · 2025-03-30
preprintOpen accessSenior authorRaven's Progressive Matrices (RPMs) is an established benchmark to examine the ability to perform high-level abstract visual reasoning (AVR). Despite the current success of algorithms that solve this task, humans can generalize beyond a given puzzle and create new puzzles given a set of rules, whereas machines remain locked in solving a fixed puzzle from a curated choice list. We propose Generative Visual Puzzles (GenVP), a framework to model the entire RPM generation process, a substantially more challenging task. Our model's capability spans from generating multiple solutions for one specific problem prompt to creating complete new puzzles out of the desired set of rules. Experiments on five different datasets indicate that GenVP achieves state-of-the-art (SOTA) performance both in puzzle-solving accuracy and out-of-distribution (OOD) generalization in 22 OOD scenarios. Compared to SOTA generative approaches, which struggle to solve RPMs when the feasible solution space increases, GenVP efficiently generalizes to these challenging setups. Moreover, our model demonstrates the ability to produce a wide range of complete RPMs given a set of abstract rules by effectively capturing the relationships between abstract rules and visual object properties.
IEEE Transactions on Multimedia · 2025-01-01
editorialOpen accessModelling Biometeorological Processes using Physics-Informed Neural Networks
2025-06-30
preprintOpen accessBiometeorological models have traditionally been categorized as mechanistic (deterministic) or stochastic, with recent expansions to include machine learning (ML) models. Mechanistic models represent system processes based on a cause-effect concept and domain knowledge, while a subset—physics-based models—explicitly incorporate physical laws. Despite their interpretability, such models often rely on empirically fixed parameters and may overlook complex environmental interactions. In this study, we investigate a hybrid modeling framework that combines physics-based modeling with Physics-Informed Neural Networks (PINNs) to enhance the simulation of biosphere-atmosphere interactions.Focusing on mosquito population dynamics as a climate-sensitive system, we couple a physics-based dynamic model with a PINN to improve representation of environmental drivers affecting larval and pupal development rates. Traditionally, air temperature is used as the primary forcing variable in such models. However, our results show that the PINN, trained on historical meteorological and entomological data, identifies precipitation and humidity as significant additional predictors of mosquito development dynamics. This enriched modeling captures population peaks more accurately and improves predictive performance during critical seasonal transitions.By integrating physics-based structure with data-driven learning, the hybrid model maintains explainability while revealing hidden nonlinear dependencies among meteorological variables. The findings demonstrate how advanced ML techniques like PINNs can uncover meteorological sensitivities that traditional models may not capture—highlighting the importance of meteorological data in biosphere modeling.This approach not only enhances disease vector modeling under varying climate conditions but also offers a transferable framework for other environmental applications such as crop phenology, urban microclimate analysis, and infectious diseases transmission in the human population. The study underscores the value of combining physics-based models with machine learning to extract deeper insight from complex meteorological data. Acknowledgements: This research is supported by the Ministry of Science, Technological Development and Innovation of the Republic of Serbia (Grants No. 451-03-137/2025-03/ 200125 & 451-03-136/2025-03/ 200125) and COST Action CA20108 FAIR Network of micrometeorological measurements (FAIRNESS).
SSRN Electronic Journal · 2025-01-01
preprintOpen accessSODA: Spectral Orthogonal Decomposition Adaptation for Diffusion Models
2025-02-26 · 1 citations
articleOpen accessdaptation (SODA), which balances computational efficiency and representation capacity. Extensive evaluations on text-to-image diffusion models demonstrate SODA's effectiveness, offering a spectrum-aware alternative to existing fine-tuning methods.
2025-10-19
preprintOpen accessSenior authorDespite their remarkable potential, Large Vision-Language Models (LVLMs) still face challenges with object hallucination, a problem where their generated outputs mistakenly incorporate objects that do not actually exist. Although most works focus on addressing this issue within the language-model backbone, our work shifts the focus to the image input source, investigating how specific image tokens contribute to hallucinations. Our analysis reveals a striking finding: a small subset of image tokens with high attention scores are the primary drivers of object hallucination. By removing these hallucinatory image tokens (only 1.5% of all image tokens), the issue can be effectively mitigated. This finding holds consistently across different models and datasets. Building on this insight, we introduce EAZY, a novel, training-free method that automatically identifies and Eliminates hAllucinations by Zeroing out hallucinatorY image tokens. We utilize EAZY for unsupervised object hallucination detection, achieving 15% improvement compared to previous methods. Additionally, EAZY demonstrates remarkable effectiveness in mitigating hallucinations while preserving model utility and seamlessly adapting to various LVLM architectures.
CASIM: Composite Aware Semantic Injection for Text to Motion Generation
ArXiv.org · 2025-02-04
preprintOpen accessRecent advances in generative modeling and tokenization have driven significant progress in text-to-motion generation, leading to enhanced quality and realism in generated motions. However, effectively leveraging textual information for conditional motion generation remains an open challenge. We observe that current approaches, primarily relying on fixed-length text embeddings (e.g., CLIP) for global semantic injection, struggle to capture the composite nature of human motion, resulting in suboptimal motion quality and controllability. To address this limitation, we propose the Composite Aware Semantic Injection Mechanism (CASIM), comprising a composite-aware semantic encoder and a text-motion aligner that learns the dynamic correspondence between text and motion tokens. Notably, CASIM is model and representation-agnostic, readily integrating with both autoregressive and diffusion-based methods. Experiments on HumanML3D and KIT benchmarks demonstrate that CASIM consistently improves motion quality, text-motion alignment, and retrieval scores across state-of-the-art methods. Qualitative analyses further highlight the superiority of our composite-aware approach over fixed-length semantic injection, enabling precise motion control from text prompts and stronger generalization to unseen text inputs.
TrajDiffuse: A Conditional Diffusion Model for Environment-Aware Trajectory Prediction
arXiv (Cornell University) · 2024-10-14
preprintOpen accessSenior authorAccurate prediction of human or vehicle trajectories with good diversity that captures their stochastic nature is an essential task for many applications. However, many trajectory prediction models produce unreasonable trajectory samples that focus on improving diversity or accuracy while neglecting other key requirements, such as collision avoidance with the surrounding environment. In this work, we propose TrajDiffuse, a planning-based trajectory prediction method using a novel guided conditional diffusion model. We form the trajectory prediction problem as a denoising impaint task and design a map-based guidance term for the diffusion process. TrajDiffuse is able to generate trajectory predictions that match or exceed the accuracy and diversity of the SOTA, while adhering almost perfectly to environmental constraints. We demonstrate the utility of our model through experiments on the nuScenes and PFSD datasets and provide an extensive benchmark analysis against the SOTA methods.
CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation
Lecture notes in computer science · 2024-11-28 · 4 citations
book-chapter
Recent grants
RI: Small: Novel structured regression approaches to high-dimensional motion analysis
NSF · $403k · 2009–2013
Nonlinear methods for parametric grouping and modeling of motion
NSF · $300k · 2005–2010
Frequent coauthors
- 38 shared
Sejong Yoon
College of New Jersey
- 34 shared
Ricardo Guerrero
Samsung (United Kingdom)
- 32 shared
Yuting Wang
Liaocheng University
- 31 shared
Mubbasir Kapadia
- 30 shared
Maja Pantić
- 29 shared
Minyoung Kim
Dong-Eui University
- 26 shared
Ognjen Rudovic
- 23 shared
Thomas S. Huang
Labs
Education
Ph.D., Computer Science
Rutgers, The State University of New Jersey
Awards & honors
- Piero Zamperoni Best Student Paper at ICPR'22 (2022)
- Named General Co-Chair at CVPR 2026
- Named Program Co-Chair at CVPR 2026
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Vladimir Pavlovic
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup