Juan Pablo Bello

· Professor of Computer Science and EngineeringVerified

New York University · Computer Science

Active 1998–2026

h-index45

Citations9.3k

Papers23969 last 5y

Funding$7.3M

Faculty page Lab page

See your match with Juan Pablo Bello — sign in to PhdFit.Sign in

About

Juan Pablo Bello is a Professor of Music Technology, Computer Science & Engineering, Electrical & Computer Engineering, and Urban Science at New York University. He received a Bachelor of Engineering in Electronics from the Universidad Simón Bolívar in Caracas, Venezuela in 1998, and earned a doctorate in Electronic Engineering at Queen Mary, University of London in 2003. His expertise lies in digital signal processing, applied machine learning, and their applications in machine listening and music information retrieval. He has published more than 150 papers and articles in books, journals, and conference proceedings. Since 2016, he has served as the director of the Music and Audio Research Lab (MARL), a research center within NYU's Steinhardt School of Education, Culture and Human Development. From 2019 to 2022, he was also the director of the Center for Urban Science and Progress (CUSP), a research center at NYU's Tandon School of Engineering. His work has received support from various public and private institutions including the NSF, DARPA, IMLS, Bosch, Adobe, Google, and iHeartRadio. He is a recipient of an NSF CAREER award and a Fulbright scholar grant for multidisciplinary studies in France.

Research topics

Artificial Intelligence
Computer Science
Machine Learning
Speech recognition
Data Mining
Telecommunications
Real-time computing
Natural Language Processing
Computer Security
Multimedia
Acoustics
Geography
World Wide Web
Engineering
Computer network

Selected publications

Evaluating Compositional Structure in Audio Representations
ArXiv.org · 2026-03-14
articleOpen accessSenior author
We propose a benchmark for evaluating compositionality in audio representations. Audio compositionality refers to representing sound scenes in terms of constituent sources and attributes, and combining them systematically. While central to auditory perception, this property is largely absent from current evaluation protocols. Our framework adapts ideas from vision and language to audio through two tasks: A-COAT, which tests consistency under additive transformations, and A-TRE, which probes reconstructibility from attribute-level primitives. Both tasks are supported by large synthetic datasets with controlled variation in acoustic attributes, providing the first benchmark of compositional structure in audio embeddings.
Publisher OA PDF
Evaluating Compositional Structure in Audio Representations
arXiv (Cornell University) · 2026-03-14
preprintOpen accessSenior author
We propose a benchmark for evaluating compositionality in audio representations. Audio compositionality refers to representing sound scenes in terms of constituent sources and attributes, and combining them systematically. While central to auditory perception, this property is largely absent from current evaluation protocols. Our framework adapts ideas from vision and language to audio through two tasks: A-COAT, which tests consistency under additive transformations, and A-TRE, which probes reconstructibility from attribute-level primitives. Both tasks are supported by large synthetic datasets with controlled variation in acoustic attributes, providing the first benchmark of compositional structure in audio embeddings.
Publisher DOI
Controllable Embedding Transformation for Mood-Guided Music Retrieval
2026-04-21
articleOpen access
Music representations are the backbone of modern recommendation systems, powering playlist generation, similarity search, and personalized discovery. Yet most embeddings offer little control for adjusting a single musical attribute, e.g., changing only the mood of a track while preserving its genre or instrumentation. In this work, we address the problem of controllable music retrieval through embedding-based transformation, where the objective is to retrieve songs that remain similar to a seed track but are modified along one chosen dimension. We propose a novel framework for mood-guided music embedding transformation, which learns a mapping from a seed audio embedding to a target embedding guided by mood labels, while preserving other musical attributes. Because mood cannot be directly altered in the seed audio, we introduce a sampling mechanism that retrieves proxy targets to balance diversity with similarity to the seed. We train a lightweight translation model using this sampling strategy and introduce a novel joint objective that encourages transformation and information preservation. Extensive experiments on two datasets show strong mood transformation performance while retaining genre and instrumentation far better than training-free baselines, establishing controllable embedding transformation as a promising paradigm for personalized music retrieval.
Publisher OA PDF DOI
Comparative analysis of SVM and logistic regression for classifying diagnostic microRNA signatures in colorectal cancer
2025-09-20
articleOpen access
The Early and accurate classification of gene signatures is critical for improving colorectal cancer (CRC) diagnosis. While previous studies have applied machine learning to microRNA datasets, few have combined feature selection and extraction methods in aunified diagnostic pipeline. This study proposes a novel integration of Genetic Algorithm (GA) and Independent Component Analysis (ICA) for selecting and extracting relevant features from high-dimensional microRNA data. GA is used as a wrapper-based feature selection method to reduce the original 2457 features to 52, while ICA further transforms these into 12 uncorrelated components. These components are then classified using Support Vector Machine (SVM) and Logistic Regression (LR) models. Using the GA–ICA–SVM pipeline, we achieved an AUC of 0.8347, outperforming the LR model, which achieved an AUC of 0.7318. This approach demonstrates improved performance and efficiency in detecting CRC-related biomarkers and offers a reproducible framework for biomarker-based cancer diagnosis.
Publisher OA PDF DOI
Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning
ArXiv.org · 2025-07-30
preprintOpen accessSenior author
Recent advances in self-supervised learning (SSL) methods offer a range of strategies for capturing useful representations from music audio without the need for labeled data. While some techniques focus on preserving comprehensive details through reconstruction, others favor semantic structure via contrastive objectives. Few works examine the interaction between these paradigms in a unified SSL framework. In this work, we propose a multi-view SSL framework for disentangling music audio representations that combines contrastive and reconstructive objectives. The architecture is designed to promote both information fidelity and structured semantics of factors in disentangled subspaces. We perform an extensive evaluation on the design choices of contrastive strategies using music audio representations in a controlled setting. We find that while reconstruction and contrastive strategies exhibit consistent trade-offs, when combined effectively, they complement each other; this enables the disentanglement of music attributes without compromising information integrity.
Publisher OA PDF DOI
Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach
ArXiv.org · 2025-07-08
preprintOpen accessSenior author
Acoustic mapping techniques have long been used in spatial audio processing for direction of arrival estimation (DoAE). Traditional beamforming methods for acoustic mapping, while interpretable, often rely on iterative solvers that can be computationally intensive and sensitive to acoustic variability. On the other hand, recent supervised deep learning approaches offer feedforward speed and robustness but require large labeled datasets and lack interpretability. Despite their strengths, both methods struggle to consistently generalize across diverse acoustic setups and array configurations, limiting their broader applicability. We introduce the Latent Acoustic Mapping (LAM) model, a self-supervised framework that bridges the interpretability of traditional methods with the adaptability and efficiency of deep learning methods. LAM generates high-resolution acoustic maps, adapts to varying acoustic conditions, and operates efficiently across different microphone arrays. We assess its robustness on DoAE using the LOCATA and STARSS benchmarks. LAM achieves comparable or superior localization performance to existing supervised methods. Additionally, we show that LAM's acoustic maps can serve as effective features for supervised models, further enhancing DoAE accuracy and underscoring its potential to advance adaptive, high-performance sound localization systems.
Publisher OA PDF DOI
Balancing Information Preservation and Disentanglement in Self-Supervised Music Representation Learning
2025-10-12
articleSenior author
Recent advances in self-supervised learning (SSL) methods offer a range of strategies for capturing useful representations from music audio without the need for labeled data. While some techniques focus on preserving comprehensive details through reconstruction, others favor semantic structure via contrastive objectives. Few works examine the interaction between these paradigms in a unified SSL framework. In this work, we propose a multi-view SSL framework for disentangling music audio representations that combines contrastive and reconstructive objectives. The architecture is designed to promote both information fidelity and structured semantics of factors in disentangled subspaces. We perform an extensive evaluation on the design choices of contrastive strategies using music audio representations in a controlled setting. We find that while reconstruction and contrastive strategies exhibit consistent trade-offs, when combined effectively, they complement each other; this enables the disentanglement of music attributes without compromising information integrity.
Publisher DOI
Latent Multi-view Learning for Robust Environmental Sound Representations
ArXiv.org · 2025-10-02
preprintOpen accessSenior author
Self-supervised learning (SSL) approaches, such as contrastive and generative methods, have advanced environmental sound representation learning using unlabeled data. However, how these approaches can complement each other within a unified framework remains relatively underexplored. In this work, we propose a multi-view learning framework that integrates contrastive principles into a generative pipeline to capture sound source and device information. Our method encodes compressed audio latents into view-specific and view-common subspaces, guided by two self-supervised objectives: contrastive learning for targeted information flow between subspaces, and reconstruction for overall information preservation. We evaluate our method on an urban sound sensor network dataset for sound source and sensor classification, demonstrating improved downstream performance over traditional SSL techniques. Additionally, we investigate the model's potential to disentangle environmental sound attributes within the structured latent space under varied training configurations.
Publisher OA PDF DOI
Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach
2025-10-12
articleSenior author
Acoustic mapping techniques have long been used in spatial audio processing for direction of arrival estimation (DoAE). Traditional beamforming methods for acoustic mapping, while interpretable, often rely on iterative solvers that can be computationally intensive and sensitive to acoustic variability. On the other hand, recent supervised deep learning approaches offer feedforward speed and robustness but require large labeled datasets and lack interpretability. Despite their strengths, both methods struggle to consistently generalize across diverse acoustic setups and array configurations, limiting their broader applicability. We introduce the Latent Acoustic Mapping (LAM) model, a self-supervised framework that bridges the interpretability of traditional methods with the adaptability and efficiency of deep learning methods. LAM generates high-resolution acoustic maps, adapts to varying acoustic conditions, and operates efficiently across different microphone arrays. We assess its robustness on DoAE using the LOCATA and STARSS benchmarks. LAM achieves comparable or superior localization performance to existing supervised methods. Additionally, we show that LAM’s acoustic maps can serve as effective features for supervised models, further enhancing DoAE accuracy and underscoring its potential to advance adaptive, high- performance sound localization systems.
Publisher DOI
Towards Few-Shot Training-Free Anomaly Sound Detection
2025-08-17
articleSenior author
Publisher DOI

Recent grants

III: Medium: Spatial Sound Scene Description
NSF · $1.1M · 2020–2024
CAREER: Analyzing the Sequential Structure of Music Audio
NSF · $561k · 2009–2015
BIGDATA: Collaborative Research: IA: BirdVox: Automating Acoustic Monitoring of Migrating Bird Species
NSF · $612k · 2016–2021
CPS: Frontier: SONYC: A Cyber-Physical System for Monitoring, Analysis and Mitigation of Urban Noise Pollution
NSF · $5.1M · 2016–2022

Frequent coauthors

Justin Salamon
124 shared
Vincent Lostanlen
Centre National de la Recherche Scientifique
72 shared
Andrew Farnsworth
Cornell University
57 shared
Mark Cartwright
New York University
56 shared
Rachel Bittner
55 shared
Magdalena Fuentes
31 shared
Ho-Hsiang Wu
Robert Bosch (United States)
30 shared
Brian McFee
29 shared

Awards & honors

Frontier Award from the National Science Foundation
CAREER Award from the National Science Foundation
Fulbright Scholar Grant

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Juan Pablo Bello

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you