Augustin Chaintreau

· Associate ProfessorVerified

Columbia University · Computer Science

Active 2000–2025

h-index35

Citations7.1k

Papers13321 last 5y

Funding$540k

Faculty page Website

See your match with Augustin Chaintreau — sign in to PhdFit.Sign in

About

Augustin Chaintreau is an Assistant Professor in the Computer Science Department at Columbia University. His research focuses on designing algorithms and conducting mathematical analysis of networks, with the goal of balancing the benefits of leveraging personal data and social networks with a commitment to fairness and privacy. His recent work addresses issues such as transparency in personalization, fairness in personal data markets, efficiency of crowdsourced content curation, and user privacy across various domains. He has contributed to understanding social media behaviors, data sharing, and privacy risks, with his research being featured in numerous media outlets including the Washington Post, Fortune, New Scientist, and the New York Times. Chaintreau has also been actively involved in the research community, serving as PC Co-chair for ACM SIGMETRICS, General Chair for the Data Transparency Lab Conference 2016, and area editor for IEEE Transactions on Mobile Communication. He has participated in program committees for over thirty conferences and has held editorial and leadership roles within the academic community. In addition to his research, Chaintreau teaches courses related to social networks and computer networks at Columbia University, providing teaching materials and engaging in academic instruction across multiple semesters. His work aims to advance understanding of networked systems, social information sharing, and privacy, contributing both theoretical insights and practical solutions.

Research topics

Computer Science
Machine Learning
Data science
Artificial Intelligence
Programming language
Mathematics
Telecommunications
World Wide Web
Mathematical optimization

Selected publications

Information Loss and Disparate Effects in Network Embeddings
ArXiv.org · 2025-09-15
preprintOpen accessSenior author
An extensive line of work studies fairness interventions for network embeddings, but less is known about their baseline behavior. In this work, we ask: how do baseline embeddings (without fairness interventions) produce disparate effects at the representation level? We analyze the asymptotic behavior of low-dimensional embeddings on stochastic block model (SBM) graphs, which encode both homophily and group structure. We characterize exact conditions under which embeddings cause information loss, showing that the amount of information loss depends directly on the graph's density and assortativity. Notably, very different graphs can produce identical embeddings in the limit, and this non-invertibility disproportionately affects smaller and sparser communities. As a result, simple downstream tasks, such as link prediction, introduce higher error rates for these communities, helping explain disparities widely observed in practice.
Publisher OA PDF DOI
The Cost of Balanced Training-Data Production in an Online Data Market
2025-04-22
articleOpen access1st authorCorresponding
Many ethical issues in machine learning are connected to the training data. Online data markets are an important source of training data, facilitating both production and distribution. Recently, a trend has emerged of for-profit ''ethical'' participants in online data markets. This trend raises a fascinating question: Can online data markets sustainably and efficiently address ethical issues in the broader machine-learning economy? In this work, we study this question in a stylized model of an online data market. We investigate the effects of intervening in the data market to achieve balanced training-data production. The model reveals the crucial role of market conditions. In small and emerging markets, an intervention can drive the data producers out of the market, so that the cost of fairness is maximal. Yet, in large and established markets, the cost of fairness can vanish (as a fraction of overall welfare) as the market grows. Our results suggest that ''ethical'' online data markets can be economically feasible under favorable market conditions, and motivate more models to consider the role of data production and distribution in mediating the impacts of ethical interventions.
Publisher OA PDF DOI
The Cost of Balanced Training-Data Production in an Online Data Market
ArXiv.org · 2025-01-31
preprintOpen access1st authorCorresponding
Many ethical issues in machine learning are connected to the training data. Online data markets are an important source of training data, facilitating both production and distribution. Recently, a trend has emerged of for-profit "ethical" participants in online data markets. This trend raises a fascinating question: Can online data markets sustainably and efficiently address ethical issues in the broader machine-learning economy? In this work, we study this question in a stylized model of an online data market. We investigate the effects of intervening in the data market to achieve balanced training-data production. The model reveals the crucial role of market conditions. In small and emerging markets, an intervention can drive the data producers out of the market, so that the cost of fairness is maximal. Yet, in large and established markets, the cost of fairness can vanish (as a fraction of overall welfare) as the market grows. Our results suggest that "ethical" online data markets can be economically feasible under favorable market conditions, and motivate more models to consider the role of data production and distribution in mediating the impacts of ethical interventions.
Publisher OA PDF DOI
Network Fairness Ambivalence: When Does Social Network Capital Mitigate or Amplify Unfairness?
ACM SIGMETRICS Performance Evaluation Review · 2024-06-11
articleSenior author
What are the necessary and sufficient conditions under which multi-hop dissemination strategies decrease rather than increase inequity within social networks? Our analysis of various strategies suggests that this largely depends on a limit related to the degree of homophily in the network.
Publisher DOI
Network Fairness Ambivalence: When does social network capital mitigate or amplify unfairness?
Proceedings of the ACM on Measurement and Analysis of Computing Systems · 2024-05-21
articleOpen accessSenior author
Social networks inherit societal biases present across lines of gender, race, socioeconomic status, and other factors. Networks can structurally perpetuate unequal access to information and opportunities through homophilous dynamics. While there is substantial knowledge about inequity in the diffusion of opportunities in a network where nodes seek them from their immediate neighbors, much less is known when considering beyond that first hop. In this paper, we leverage recent mathematical analysis of network fairness to prove that enabling simple multi-hop dissemination can reduce inequity towards a minority group in the network as long as homophily is sufficiently weak. Otherwise, our necessary and sufficient condition proves that multi-hop dissemination strategies amplify the bias already found amongst considering direct neighbors. We empirically validate these results on four social network datasets as well as present an example of a key application of our findings with a scenario of individuals who leverage their personal network to seek job referrals. Our results suggest that online platforms designing algorithms to promote opportunities to multi-hop connections must carefully take into account network metrics measuring group size and homophily in order to avoid amplifying bias against marginalized groups on their platforms.
Publisher OA PDF DOI
Fairness Rising from the Ranks: HITS and PageRank on Homophilic Networks
2024-05-08 · 5 citations
articleOpen accessSenior author
In this paper, we investigate the conditions under which link analysis algorithms prevent minority groups from reaching high ranking slots. We find that the most common link-based algorithms using centrality metrics, such as PageRank and HITS, can reproduce and even amplify bias against minority groups in networks. Yet, their behavior differs: one one hand, we empirically show that PageRank mirrors the degree distribution for most of the ranking positions and it can equalize representation of minorities among the top ranked nodes; on the other hand, we find that HITS amplifies pre-existing bias in homophilic networks through a novel theoretical analysis, supported by empirical results. We find the root cause of bias amplification in HITS to be the level of homophily present in the network, modeled through an evolving network model with two communities. We illustrate our theoretical analysis on both synthetic and real datasets and we present directions for future work.
Publisher OA PDF DOI
Network Fairness Ambivalence: When Does Social Network Capital Mitigate or Amplify Unfairness?
2024-06-01
articleSenior author
What are the necessary and sufficient conditions under which multi-hop dissemination strategies decrease rather than increase inequity within social networks? Our analysis of various strategies suggests that this largely depends on a limit related to the degree of homophily in the network.
Publisher DOI
Longitudinal study of exposure to radio frequencies at population scale
Environment International · 2022-03-24 · 14 citations
articleOpen access
Evaluating exposure to radio frequencies (RF) at population-scale is important for conducting sound epidemiological studies about possible health impact of RF radiations. Numerous studies reported population exposure to RF radiations used in wireless telecommunication technologies, but used very small population samples. In this context, the real exposure of the population at scale remains poorly understood. Here, to the best of our knowledge, we report the largest crowd-based measurement of population exposure to RF produced by cellular antennas, Wi-Fi access points, and Bluetooth devices for 254,410 unique users in 13 countries from January 2017 to December 2020. First, we present methods to assess the population exposure to RF radiations using smartphone measurements obtained using the ElectroSmart Android app. Then, we use these methods to evaluate and characterize the evolution of RF exposure. We show that total exposure has been multiplied by 2.3 in the four-year period considered, with Wi-Fi as the largest contributor. The cellular exposure levels are orders of magnitude lower than regulation limits and are not correlated to national regulation policies. The population tends to be more exposed at home; for half of the study subjects, personal Wi-Fi routers and Bluetooth devices contributed to more than 50% of their total exposure. In this work, we showcase how crowdsource-based data allow large-scale and long-term assessment of population exposure to RF radiations.
Publisher DOI
Non-Existence of Stable Social Groups in Information-Driven Networks
Theory of Computing Systems · 2022-07-06
articleOpen access1st authorCorresponding
Publisher OA PDF DOI
“I Don’t Have a Photograph, But You Can Have My Footprints." – Revealing the Demographics of Location Data
Proceedings of the International AAAI Conference on Web and Social Media · 2021-08-03
articleOpen access
High accuracy location data are routinely available to a plethora of mobile apps and web services. The availability of such data lead to a better general understanding of human mobility. However, as location data are usually not associated with demographic information, little work has been done to understand the differences in human mobility across demographics. In this study we begin to fill the void. In particular, we explore how the growing number of geotagged footprints that social network users create can reveal demographic attributes and how these footprints enable the understanding of mobility at a demographic level. Our methodology gives rise to novel opportunities in the study of mobility. We leverage publicly available geotagged photographs from a popular photosharing network to build a dataset on demographic mobility patterns. Our analysis of this dataset not only reproduces previous results on mobility behavior at various geographical levels but further extends the existing picture: it allows for the refinement of mobility modeling from entire populations to specific demographic groups. Our analysis suggests the existence of regional variations in mobility and reveals statistically significant differences in mobility between genders and ethnicities.
Publisher OA PDF DOI

Recent grants

CAREER: Banalytics: Behavioral Network Analytics with Data Transparency
NSF · $540k · 2013–2019

Frequent coauthors

François Baccelli
École Normale Supérieure - PSL
23 shared
Christophe Diot
Google (United States)
23 shared
Guillaume Ducoffe
National Institute for Research & Development in Informatics
14 shared
Arthi Ramachandran
14 shared
Daniel Hsu
13 shared
Fabrizio Dell’Acqua
12 shared
Nakul Verma
12 shared
Emmanuelle Lebhar
Université Paris Cité
12 shared

Education

Ph.D, Computer Science
École Normale Supérieure - PSL
2006
DEA [M. Sc.] Probability and Applications, Mathematics Department
Université de Paris
2002
Magistère, Mathematics and Computer Science
École Normale Supérieure - PSL
2001

Awards & honors

PC Co-chair for ACM SIGMETRICS
General Chair for the Data Transparency Lab Conference 2016
Area editor for IEEE Transactions on Mobile Communication

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Augustin Chaintreau

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you