Gianluca Stringhini

· Assistant Professor – Electrical & Computer EngineeringAffiliated Faculty – Computer ScienceVerified

Boston University · Computer Science

Active 2010–2026

h-index48

Citations8.5k

Papers300135 last 5y

Funding$1.6M1 active

Faculty page Lab page

See your match with Gianluca Stringhini — sign in to PhdFit.Sign in

About

Gianluca Stringhini is an Assistant Professor in the Department of Electrical and Computer Engineering at Boston University. His research applies a data-driven approach to better understand malicious activity on the Internet. Through the collection and analysis of large-scale datasets, he develops novel and robust mitigation techniques to make the Internet a safer place. His work involves a mix of quantitative analysis, some qualitative analysis, machine learning, crime science, and systems design. Recently, he has investigated the spread of alternative news and memes on online social networks, raids organized by trolls against other Internet users, cyberbullying, ransomware, online dating scams, money laundering schemes linked to cybercrime, malware delivery networks, and online social network compromises.

Research topics

Computer Science
Sociology
Political Science
World Wide Web
Social Science
Internet privacy
Geography
Law
Computer Security
Cartography
Business
Advertising
Criminology
Biology
Public relations

Selected publications

The Cost of Convenience: Identifying, Analyzing, and Mitigating Predatory Loan Applications on Android
ArXiv.org · 2026-01-19
articleOpen accessSenior author
Digital lending applications, commonly referred to as loan apps, have become a primary channel for microcredit in emerging markets. However, many of these apps demand excessive permissions and misuse sensitive user data for coercive debt-recovery practices, including harassment, blackmail, and public shaming that affect both borrowers and their contacts. This paper presents the first cross-country measurement of loan app compliance against both national regulations and Google's Financial Services Policy. We analyze 434 apps drawn from official registries and app markets from Indonesia, Kenya, Nigeria, Pakistan, and the Philippines. To operationalize policy requirements at scale, we translate policy text into testable permission checks using LLM-assisted policy-to-permission mapping and combine this with static and dynamic analyses of loan apps' code and runtime behavior. Our findings reveal pervasive non-compliance among approved apps: 141 violate national regulatory policy and 147 violate Google policy. Dynamic analysis further shows that several apps transmit sensitive data (contacts, SMS, location, media) before user signup or registration, undermining informed consent and enabling downstream harassment of borrowers and third parties. Following our disclosures, Google removed 93 flagged apps from Google Play, representing over 300M cumulative installs. We advocate for adopting our methodology as a proactive compliance-monitoring tool and offer targeted recommendations for regulators, platforms, and developers to strengthen privacy protections. Overall, our results highlight the need for coordinated enforcement and robust technical safeguards to ensure that digital lending supports financial inclusion without compromising user privacy or safety.
Publisher OA PDF DOI
Praxium: Diagnosing Cloud Anomalies with AI-based Telemetry and Dependency Analysis
ArXiv.org · 2026-03-25
articleOpen access
As the modern microservice architecture for cloud applications grows in popularity, cloud services are becoming increasingly complex and more vulnerable to misconfiguration and software bugs. Traditional approaches rely on expert input to diagnose and fix microservice anomalies, which lacks scalability in the face of the continuous integration and continuous deployment (CI/CD) paradigm. Microservice rollouts, containing new software installations, have complex interactions with the components of an application. Consequently, this added difficulty in attributing anomalous behavior to any specific installation or rollout results in potentially slower resolution times. To address the gaps in current diagnostic methods, this paper introduces Praxium, a framework for anomaly detection and root cause inference. Praxium aids administrators in evaluating target metric performance in the context of dependency installation information provided by a software discovery tool, PraxiPaaS. Praxium continuously monitors telemetry data to identify anomalies, then conducts root cause analysis via causal impact on recent software installations, in order to provide site reliability engineers (SRE) relevant information about an observed anomaly. In this paper, we demonstrate that Praxium is capable of effective anomaly detection and root cause inference, and we provide an analysis on effective anomaly detection hyperparameter tuning as needed in a practical setting. Across 75 total trials using four synthetic anomalies, anomaly detection consistently performs at >0.97 macro-F1. In addition, we show that causal impact analysis reliably infers the correct root cause of anomalies, even as package installations occur at increasingly shorter intervals.
Publisher OA PDF
The Cost of Convenience: Identifying, Analyzing, and Mitigating Predatory Loan Applications on Android
arXiv (Cornell University) · 2026-01-19
preprintOpen accessSenior author
Digital lending applications, commonly referred to as loan apps, have become a primary channel for microcredit in emerging markets. However, many of these apps demand excessive permissions and misuse sensitive user data for coercive debt-recovery practices, including harassment, blackmail, and public shaming that affect both borrowers and their contacts. This paper presents the first cross-country measurement of loan app compliance against both national regulations and Google's Financial Services Policy. We analyze 434 apps drawn from official registries and app markets from Indonesia, Kenya, Nigeria, Pakistan, and the Philippines. To operationalize policy requirements at scale, we translate policy text into testable permission checks using LLM-assisted policy-to-permission mapping and combine this with static and dynamic analyses of loan apps' code and runtime behavior. Our findings reveal pervasive non-compliance among approved apps: 141 violate national regulatory policy and 147 violate Google policy. Dynamic analysis further shows that several apps transmit sensitive data (contacts, SMS, location, media) before user signup or registration, undermining informed consent and enabling downstream harassment of borrowers and third parties. Following our disclosures, Google removed 93 flagged apps from Google Play, representing over 300M cumulative installs. We advocate for adopting our methodology as a proactive compliance-monitoring tool and offer targeted recommendations for regulators, platforms, and developers to strengthen privacy protections. Overall, our results highlight the need for coordinated enforcement and robust technical safeguards to ensure that digital lending supports financial inclusion without compromising user privacy or safety.
Publisher DOI
Loki: Proactively discovering online scams by mining toxic search queries
2026-01-01
articleOpen accessSenior author
Online e-commerce scams, ranging from shopping scams to pet scams, globally cause millions of dollars in financial damage every year.In response, the security community has developed highly accurate detection systems able to determine if a website is fraudulent.However, finding candidate scam websites that can be passed as input to these downstream detection systems is challenging: relying on user reports is inherently reactive and slow, and proactive systems issuing search engine queries to return candidate websites suffer from low coverage and do not generalize to new scam types.In this paper, we present LOKI, a system designed to identify search engine queries likely to return a high fraction of fraudulent websites.LOKI implements a keyword scoring model grounded in Learning Under Privileged Information (LUPI) and feature distillation from Search Engine Result Pages (SERPs).We rigorously validate LOKI across 10 major scam categories and demonstrate a 20.58 times improvement in discovery over both heuristic and datadriven baselines across all categories.Leveraging a small seed set of only 1,663 known scam sites, we use the keywords identified by our method to discover 52,493 previously unreported scams in the wild.Finally, we show that LOKI generalizes to previously-unseen scam categories, highlighting its utility in surfacing emerging threats.
Publisher DOI
Revealing The Secret Power: How Algorithms Can Influence Content Visibility on Social Media
2026-01-01
articleOpen accessSenior author
In recent years, the opaque design and the limited public understanding of social networks' recommendation algorithms have raised concerns about potential manipulation of information exposure.Reducing content visibility, aka shadow banning, may help limit harmful content; however, it can also be used to suppress dissenting voices.This prompts the need for greater transparency and a better understanding of this practice.In this paper, we investigate the presence of visibility alterations through a large-scale quantitative analysis of two Twitter/X datasets comprising over 40 million tweets from more than 9 million users, focused on discussions surrounding the Ukraine-Russia conflict and the 2024 US Presidential Elections.We use view counts to detect patterns of reduced or inflated visibility and examine how these correlate with user opinions, social roles, and narrative framings.Our analysis shows that the algorithm systematically penalizes tweets containing links to external resources, reducing their visibility by up to a factor of eight, regardless of the ideological stance or source reliability.Rather, content visibility may be penalized or favored depending on the specific accounts producing it, as observed when comparing tweets from the Kyiv Independent and RT.com or tweets by Donald Trump and Kamala Harris.Overall, our work highlights the importance of transparency in content moderation and recommendation systems to protect the integrity of public discourse and ensure equitable access to online platforms.
Publisher DOI
Praxium: Diagnosing Cloud Anomalies with AI-based Telemetry and Dependency Analysis
arXiv (Cornell University) · 2026-03-25
preprintOpen access
As the modern microservice architecture for cloud applications grows in popularity, cloud services are becoming increasingly complex and more vulnerable to misconfiguration and software bugs. Traditional approaches rely on expert input to diagnose and fix microservice anomalies, which lacks scalability in the face of the continuous integration and continuous deployment (CI/CD) paradigm. Microservice rollouts, containing new software installations, have complex interactions with the components of an application. Consequently, this added difficulty in attributing anomalous behavior to any specific installation or rollout results in potentially slower resolution times. To address the gaps in current diagnostic methods, this paper introduces Praxium, a framework for anomaly detection and root cause inference. Praxium aids administrators in evaluating target metric performance in the context of dependency installation information provided by a software discovery tool, PraxiPaaS. Praxium continuously monitors telemetry data to identify anomalies, then conducts root cause analysis via causal impact on recent software installations, in order to provide site reliability engineers (SRE) relevant information about an observed anomaly. In this paper, we demonstrate that Praxium is capable of effective anomaly detection and root cause inference, and we provide an analysis on effective anomaly detection hyperparameter tuning as needed in a practical setting. Across 75 total trials using four synthetic anomalies, anomaly detection consistently performs at >0.97 macro-F1. In addition, we show that causal impact analysis reliably infers the correct root cause of anomalies, even as package installations occur at increasingly shorter intervals.
Publisher DOI
Lessons Learned from Anomaly Detection in Chameleon Cloud
2025-09-23
article
Cloud computing has become integral to modern technology infrastructure, supporting a wide range of services from e-commerce to AI applications. Chameleon is a large-scale, configurable testbed designed to enable edge-to-cloud research through full bare-metal provisioning, virtualization, and diverse hardware resources, which is built on a leading open source cloud platform OpenStack. However, monitoring Chameleon’s heterogeneous infrastructure is challenging, particularly across Open-Stack services and hardware components. Traditional threshold-based alerting methods struggle to keep up with the scale and complexity of such environments. In this work, we present an anomaly detection framework for OpenStack services in the Chameleon Cloud. We curate and publish the first dataset of resource usage metrics collected from OpenStack control plane services. We evaluate four state-of-the-art unsupervised multivariate time series models, namely TranAD, Prodigy, USAD, and OmniAnomaly, on this dataset and share key insights from deploying them. Our findings indicate that for our use case, while all models achieve high F1 scores, training with three days of healthy data effectively balances training cost and detection accuracy.
Publisher DOI
Timeliness Matters: Leveraging Reinforcement Learning on Social Media Data to Prioritize High-Risk Conversations for Promoting Youth Online Safety
Proceedings of the International AAAI Conference on Web and Social Media · 2025-06-07 · 1 citations
articleOpen access
Ensuring the online safety of youth has motivated research towards the development of machine learning (ML) methods capable of accurately detecting social media risks after-the-fact. However, for these detection models to be effective, they must proactively identify high-risk scenarios (e.g., sexual solicitations, cyberbullying) to mitigate harm. This `real-time' responsiveness is a recognized challenge within the risk detection literature. Therefore, this paper presents a novel two-level framework that first uses reinforcement learning to identify conversation stop points to prioritize messages for evaluation. Then, we optimize state-of-the-art deep learning models to accurately categorize risk priority (low, high). We apply this framework to a time-based simulation using a rich dataset of 23K private conversations with over 7 million messages donated by 194 youth (ages 13-21). We conducted an experiment comparing our new approach to a traditional conversation-level baseline. We found that the timeliness of conversations significantly improved from over 2 hours to approximately 16 minutes with only a slight reduction in accuracy (0.88 to 0.84). This study advances real-time detection approaches for social media data and provides a benchmark for future training reinforcement learning that prioritizes the timeliness of classifying high-risk conversations.
Publisher OA PDF DOI
From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs
ArXiv.org · 2025-09-01
preprintOpen accessSenior author
High-quality datasets of real-world vulnerabilities and their corresponding verifiable exploits are crucial resources in software security research. Yet such resources remain scarce, as their creation demands intensive manual effort and deep security expertise. In this paper, we present CVE-GENIE, an automated, large language model (LLM)-based multi-agent framework designed to reproduce real-world vulnerabilities, provided in Common Vulnerabilities and Exposures (CVE) format, to enable creation of high-quality vulnerability datasets. Given a CVE entry as input, CVE-GENIE gathers the relevant resources of the CVE, automatically reconstructs the vulnerable environment, and (re)produces a verifiable exploit. Our systematic evaluation highlights the efficiency and robustness of CVE-GENIE's design and successfully reproduces approximately 51% (428 of 841) CVEs published in 2024-2025, complete with their verifiable exploits, at an average cost of $2.77 per CVE. Our pipeline offers a robust method to generate reproducible CVE benchmarks, valuable for diverse applications such as fuzzer evaluation, vulnerability patching, and assessing AI's security capabilities.
Publisher OA PDF DOI
Mirror Mirror on the Wall, which APK Mirror Site is the Largest of Them All?
2025-10-28
article
Publisher DOI

Recent grants

Collaborative Research: SaTC: TTP: Medium: iDRAMA.cloud: A Platform for Measuring and Understanding Information Manipulation
NSF · $226k · 2023–2025
CAREER: Towards Data-Driven Methods to Counter Online Aggression
NSF · $549k · 2020–2025
SaTC: CORE: Small: Enabling the Automated Delivery of Context-Aware Notifications
NSF · $77k · 2024–2025
Collaborative Research: SaTC: CORE: Small: Detecting Accounts Involved in Influence Campaigns on Social Media
NSF · $280k · 2021–2024
Collaborative Research: SaTC: CORE: Small: Research on Concurrent Inauthentic Account and Narrative Detection
NSF · $175k · 2024–2026

Frequent coauthors

Jeremy Blackburn
157 shared
Emiliano De Cristofaro
University of California, Riverside
128 shared
Savvas Zannettou
Delft University of Technology
92 shared
Michael Sirivianos
Cyprus University of Technology
49 shared
Nicolas Kourtellis
Telefonica Research and Development
45 shared
Ilias Leontiadis
Meta (United Kingdom)
42 shared
Enrico Mariconti
31 shared
Adam Doupé
31 shared

Labs

Data Mining & Data ManagementPI

Education

Ph.D., Computer Science
University of California, San Diego
2012
M.S., Computer Science
University of California, San Diego
2009
B.S., Computer Science
University of Rome 'La Sapienza'
2007

Awards & honors

Facebook Secure the Internet Grant (2018)
Google Faculty Research Award (2015)
Symantec Research Labs Fellowship (2012)
multiple best paper awards

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Gianluca Stringhini

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you