
Amelie Marian
· Assistant ProfessorVerifiedRutgers University · Computer Science
Active 1999–2024
About
Amélie Marian is a Professor in the Computer Science Department at Rutgers University. Her research primarily focuses on Data Management and Algorithms, with particular emphasis on Accountability and Transparency of Algorithms for Decision-Making, Explainable Rankings, Personal Information Management, Data Integration, and Data Corroboration. She is engaged in developing methodological approaches for auditing, explaining, and correcting algorithmic decisions under real-world constraints, as well as designing trust-aware digital infrastructures that promote transparency, accountability, and informed participation in democracy. Her work also includes enhancing student engagement and learning outcomes through personalized and randomized data assignments aligned with course objectives. Amélie Marian received her Ph.D. in Computer Science from Columbia University in 2005. Prior to that, she was a member of the VERSO project at INRIA-Rocquencourt from March 1999 to August 2000. She earned her B.S. and M.S. degrees from Université Paris Dauphine, France, in 1998 and 1999, respectively. Throughout her career, she has been recognized with several awards, including a Microsoft Live Labs Award in 2006, three Google Research Awards in 2008, 2010, and 2012, and an NSF CAREER award in 2009.
Research topics
- Computer Science
- Information Retrieval
- Machine Learning
- Data Mining
- Artificial Intelligence
- Data science
- Mathematics
- Human–computer interaction
- Algorithm
- Distributed computing
- World Wide Web
Selected publications
Post-Match Error Mitigation for Deferred Acceptance
arXiv (Cornell University) · 2024-09-20
preprintOpen accessReal-life applications of deferred-acceptance (DA) matching algorithms sometimes exhibit errors or changes to the matching inputs that are discovered only after the algorithm has been run and the results are announced to participants. Mitigating the effects of these errors is a different problem than the original match since the decision makers are often constrained by the offers they already sent out. We propose models for this new problem, along with mitigation strategies to go with these models. We explore three different error scenarios: resource reduction, additive errors, and subtractive errors. For each error type, we compute the expected number of students directly harmed, or helped, by the error, the number indirectly harmed or helped, and the number of students with justified envy due to the errors. Error mitigation strategies need to be selected based on the goals of the administrator, which include restoring stability, avoiding direct harm to any participant, and focusing the extra burden on the schools that made the error. We provide empirical simulations of the errors and the mitigation strategies.
Explainable Disparity Compensation for Efficient Fair Ranking
2024-05-13
articleSenior authorRanking functions that are used in decision systems often produce disparate results for different populations because of bias in the underlying data. Addressing, and compensating for, these disparate outcomes is a critical problem for fair decision-making. Recent compensatory measures have mostly focused on opaque transformations of the ranking functions to satisfy fairness guarantees or on the use of quotas or set-asides to guarantee a minimum number of positive outcomes to members of underrepresented groups. In this paper we propose easily explainable data-driven compensatory measures for ranking functions. Our measures rely on the generation of bonus points given to members of underrepresented groups to address disparity in the ranking function. The bonus points can be set in advance, and can be combined, allowing for considering the intersections of representations and giving better transparency to stakeholders. We propose efficient sampling-based algorithms to calculate the number of bonus points to minimize disparity. We validate our algorithms using real-world school admissions and recidivism datasets, and compare our results with that of existing fair ranking algorithms.
Explainable Disparity Compensation for Efficient Fair Ranking
arXiv (Cornell University) · 2023-07-25
preprintOpen accessSenior authorRanking functions that are used in decision systems often produce disparate results for different populations because of bias in the underlying data. Addressing, and compensating for, these disparate outcomes is a critical problem for fair decision-making. Recent compensatory measures have mostly focused on opaque transformations of the ranking functions to satisfy fairness guarantees or on the use of quotas or set-asides to guarantee a minimum number of positive outcomes to members of underrepresented groups. In this paper we propose easily explainable data-driven compensatory measures for ranking functions. Our measures rely on the generation of bonus points given to members of underrepresented groups to address disparity in the ranking function. The bonus points can be set in advance, and can be combined, allowing for considering the intersections of representations and giving better transparency to stakeholders. We propose efficient sampling-based algorithms to calculate the number of bonus points to minimize disparity. We validate our algorithms using real-world school admissions and recidivism datasets, and compare our results with that of existing fair ranking algorithms.
2023-03-19
articleSenior authorDigital devices are an integral part of our lives. Through these devices, people produce and save personal data, with or without their explicit awareness. This personal digital information has been exploited by companies, but users find it hard to access and search in a uniform way, due to the heterogeneity, fragmentation of data and non-uniform access interface. By integrating and organizing this information into common kinds of everyday episodes ("scripts") that people engage in, we can help users recall and explore forgotten details of their past. However, being able to recognize such episodes in the user’s personal digital information requires not only script knowledge (e.g., the steps/actions in the script), but also explicit knowledge about the digital traces potentially left behind by each of the actions. In this paper, we present "One Of Us", a web-based multiplayer game, which collects descriptions of different kinds of personal digital traces, by having players identify the digital traces that might be produced by each of the actions in a given script. We report on the results of an experimental study, which gives evidence that our game is i) enjoyable, ii) accounts for uncommon answers, iii) validates and assesses knowledge by having the players vote on other’s responses - thus not requiring a second round of quality assessment, and iv) dynamically acquires new pieces of information.
2023-06-12 · 7 citations
article1st authorCorrespondingAlgorithms are used to aid decision-making for a wide range of public policy decisions. Yet, the details of the algorithmic processes and how to interact with their systems are often inadequately communicated to stakeholders, leaving them frustrated and distrusting of the outcomes of the decisions. Transparency and accountability are critical prerequisites for building trust in the results of decisions and guaranteeing fair and equitable outcomes. Unfortunately, organizations and agencies do not have strong incentives to explain and clarify their decision processes; however, stakeholders are not powerless and can strategically combine their efforts to push for more transparency.
Determining Winners in Elections with Absent Votes
arXiv (Cornell University) · 2023-10-11
preprintOpen accessAn important question in elections is the determine whether a candidate can be a winner when some votes are absent. We study this determining winner with the absent votes (WAV) problem when the votes are top-truncated. We show that the WAV problem is NP-complete for the single transferable vote, Maximin, and Copeland, and propose a special case of positional scoring rule such that the problem can be computed in polynomial time. Our results in top-truncated rankings differ from the results in full rankings as their hardness results still hold when the number of candidates or the number of missing votes are bounded, while we show that the problem can be solved in polynomial time in either case.
Supporting Human Memory by Reconstructing Personal Episodic Narratives from Digital Traces
Proceedings of the International AAAI Conference on Web and Social Media · 2022-05-31 · 1 citations
articleOpen accessSenior authorNumerous applications capture in digital form aspects of people’s lives. The resulting data, which we call Personal Digital Traces - PDTs, can be used to help reconstruct people’s episodic memories and connect to their past personal events. This may have several applications, from helping the recall of patients with neurodegenerative diseases to gathering clues from multiple sources to identify recent contacts and places visited – a critical new application for the recent health crisis. This paper takes steps towards integrating, connecting and summarizing the heterogeneous collection of data into episodic narratives using scripts – prototypical plans for everyday activities. Specifically, we propose a matching algorithm that groups PDTs from many different sources into script instances (episodes), and we provide a technique for ranking the likelihood of candidate episodes. We report on the results of a study based on the personal data of real users, which gives evidence that our episode reconstruction 1) integrates well PDTs from different sources into coherent episodes, and 2) augments users’ memory of their past actions.
Fairness-aware Federated Matrix Factorization
2022 · 39 citations
Senior authorCorresponding- Computer Science
- Computer Science
- Algorithm
Achieving fairness over different user groups in recommender systems is an important problem. The majority of existing works achieve fairness through constrained optimization that combines the recommendation loss and the fairness constraint. To achieve fairness, the algorithm usually needs to know each user’s group affiliation feature such as gender or race. However, such involved user group feature is usually sensitive and requires protection. In this work, we seek a federated learning solution for the fair recommendation problem and identify the main challenge as an algorithmic conflict between the global fairness objective and the localized federated optimization process. On one hand, the fairness objective usually requires access to all users’ group information. On the other hand, the federated learning systems restrain the personal data in each user’s local space. As a resolution, we propose to communicate group statistics during federated optimization and use differential privacy techniques to avoid exposure of users’ group information when users require privacy protection. We illustrate the theoretical bounds of the noisy signal used in our method that aims to enforce privacy without overwhelming the aggregated statistics. Empirical results show that federated learning may naturally improve user group fairness and the proposed framework can effectively control this fairness with low communication overheads.
A Frequency-Based Learning-To-Rank Approach for Personal Digital Traces
Proceedings of the ... Annual Hawaii International Conference on System Sciences/Proceedings of the Annual Hawaii International Conference on System Sciences · 2022-01-01 · 1 citations
articleOpen accessSenior authorPersonal digital traces are constantly produced by connected devices, internet services and interactions. These digital traces are typically small, heterogeneous and stored in various locations in the cloud or on local devices, making it a challenge for users to interact with and search their own data. By adopting a multidimensional data model based on the six natural questions --- what, when, where, who, why and how --- to represent and unify heterogeneous personal digital traces, we can propose a learning-to-rank approach using the state of the art LambdaMART algorithm and frequency-based features that leverage the correlation between content (what), users (who), time (when), location (where) and data source (how) to improve the accuracy of search results. Due to the lack of publicly available personal training data, a combination of known-item query generation techniques and an unsupervised ranking model (field-based BM25) is used to build our own training sets. Experiments performed over a publicly available email collection and a personal digital data trace collection from a real user show that the frequency-based learning approach improves search accuracy when compared with traditional search tools.
Identifying Possible Winners in Ranked Choice Voting Elections with Outstanding Ballots
Proceedings of the AAAI Conference on Human Computation and Crowdsourcing · 2022-10-14 · 3 citations
articleOpen accessSenior authorSeveral election districts in the US have recently moved to ranked-choice voting (RCV) to decide the results of local elections. RCV allows voters to rank their choices, and the results are computed in rounds, eliminating one candidate at a time. RCV ensures fairer elections and has been shown to increase elected representation of women and people of color. A main drawback of RCV is that the round-by-round process requires all the ballots to be tallied before the results of an election can be calculated. With increasingly large portions of ballots coming from absentee voters, RCV election outcomes are not always apparent on election night, and can take several weeks to be published, leading to a loss of trust in the electoral process from the public. In this paper, we present an algorithm for efficiently computing possible winners of RCV elections from partially known ballots and evaluate it on data from the recent New York City Primary elections. We show that our techniques allow to significantly narrow down the field of possible election winners, and in some case identify the winner as soon as election night despite a number of yet-unaccounted absentee ballots, providing more transparency in the electoral process.
Recent grants
Frequent coauthors
- 10 shared
Varvara Kalokyri
- 9 shared
Alexander Borgida
- 9 shared
Serge Abiteboul
- 8 shared
Daniela Vianna
- 6 shared
Kostas Stefanidis
Tampere University
- 6 shared
Thu D. Nguyen
- 5 shared
Luis Gravano
- 5 shared
Divesh Srivastava
Education
Ph.D., Computer Science
Rutgers, The State University of New Jersey
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Amelie Marian
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup