Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Hao Peng

Hao Peng

· Assistant ProfessorVerified

University of Illinois Urbana-Champaign · Computer Science

Active 2012–2022

h-index14
Citations882
Papers291 last 5y
Funding
See your match with Hao Peng — sign in to PhdFit.Sign in

About

Hao Peng is an Assistant Professor at the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign, starting in 2023. He holds a Ph.D. from the Paul G. Allen School of Computer Science & Engineering at the University of Washington, obtained in 2022, and a B.S. from the School of Electronics Engineering and Computer Science at Peking University, earned in 2016. His research interests include Natural Language Processing, Computational Linguistics, Machine Learning, Large Language Models, and AI for Science. Hao Peng has taught courses such as Natural Language Processing and Efficiency in NLP, and has been involved in interdisciplinary research projects, including collaborations with colleagues from the Hebrew University of Jerusalem. His recent work involves context-length generalization of large language models and the development of innovative AI tools to advance drug research and materials science. He has received recognition for his contributions, including awards for his research papers and participation in notable conferences and lectures.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Information Retrieval
  • Data Mining
  • Natural Language Processing
  • World Wide Web
  • Data science

Selected publications

  • A Meta-framework for Spatiotemporal Quantity Extraction from Text

    Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) · 2022 · 7 citations

    • Computer Science
    • Computer Science
    • Data Mining

    News events are often associated with quantities (e.g., the number of COVID-19 patients or the number of arrests in a protest), and it is often important to extract their type, time, and location from unstructured text in order to analyze these quantity events. This paper thus formulates the NLP problem of spatiotemporal quantity extraction, and proposes the first meta-framework for solving it. This meta-framework contains a formalism that decomposes the problem into several information extraction tasks, a shareable crowdsourcing pipeline, and transformer-based baseline models. We demonstrate the meta-framework in three domains-the COVID-19 pandemic, Black Lives Matter protests, and 2020 California wildfires-to show that the formalism is general and extensible, the crowdsourcing pipeline facilitates fast and high-quality data annotation, and the baseline system can handle spatiotemporal quantity extraction well enough to be practically useful. We release all resources for future research on this topic.

  • Toward any-language zero-shot topic classification of textual documents

    Artificial Intelligence · 2019-02-13 · 18 citations

    article
  • KnowSemLM: A Knowledge Infused Semantic Language Model

    2019-01-01 · 16 citations

    articleOpen access1st authorCorresponding

    Prior knowledge -both statistical and declarative -is essential in guiding such expectations. While existing semantic language models (SemLM) capture event co-occurrence information by modeling event sequences as semantic frames, entities, and other semantic units, this paper aims at augmenting them with causal knowledge (i.e., one event is likely to lead to another). Such knowledge is modeled at the frame and entity level, and can be obtained either statistically from text or stated declaratively. The proposed method, KnowSemLM 1 , infuses this knowledge into a semantic LM by joint training and inference, and is shown to be effective on both the event cloze test and story/referent prediction tasks.

  • CogCompTime: A Tool for Understanding Time in Natural Language Text

    arXiv (Cornell University) · 2019-06-12 · 1 citations

    preprintOpen access

    Automatic extraction of temporal information in text is an important component of natural language understanding. It involves two basic tasks: (1) Understanding time expressions that are mentioned explicitly in text (e.g., February 27, 1998 or tomorrow), and (2) Understanding temporal information that is conveyed implicitly via relations. In this paper, we introduce CogCompTime, a system that has these two important functionalities. It incorporates the most recent progress, achieves state-of-the-art performance, and is publicly available.1 We believe that this demo will be useful for multiple time-aware applications and provide valuable insight for future research in temporal understanding.

  • Solving Hard Coreference Problems

    arXiv (Cornell University) · 2019-07-11 · 2 citations

    preprintOpen access1st authorCorresponding

    Coreference resolution is a key problem in natural language understanding that still escapes reliable solutions. One fundamental difficulty has been that of resolving instances involving pronouns since they often require deep language understanding and use of background knowledge. In this paper, we propose an algorithmic solution that involves a new representation for the knowledge required to address hard coreference problems, along with a constrained optimization framework that uses this knowledge in coreference decision making. Our representation, Predicate Schemas, is instantiated with knowledge acquired in an unsupervised way, and is compiled automatically into constraints that impact the coreference decision. We present a general coreference resolution system that significantly improves state-of-the-art performance on hard, Winograd-style, pronoun resolution cases, while still performing at the state-of-the-art level on standard coreference resolution datasets.

  • CogCompTime: A Tool for Understanding Time in Natural Language

    2018-01-01 · 66 citations

    articleOpen access

    Automatic extraction of temporal information is important for natural language understanding. It involves two basic tasks: (1) Understanding time expressions that are mentioned explicitly in text (e.g., February 27, 1998 or tomorrow), and (2) Understanding temporal information that is conveyed implicitly via relations. This paper introduces CogCompTime, a system that has these two important functionalities. It incorporates the most recent progress, achieves state-of-the-art performance, and is publicly available. 1 We believe that this demo will provide valuable insight for temporal understanding and be useful for multiple time-aware applications.

  • Improving Temporal Relation Extraction with a Globally Acquired Statistical Resource

    arXiv (Cornell University) · 2018-04-17 · 4 citations

    preprintOpen access

    Extracting temporal relations (before, after, overlapping, etc.) is a key aspect of understanding events described in natural language. We argue that this task would gain from the availability of a resource that provides prior knowledge in the form of the temporal order that events usually follow. This paper develops such a resource -- a probabilistic knowledge base acquired in the news domain -- by extracting temporal relations between events from the New York Times (NYT) articles over a 20-year span (1987--2007). We show that existing temporal extraction systems can be improved via this resource. As a byproduct, we also show that interesting statistics can be retrieved from this resource, which can potentially benefit other time-aware tasks. The proposed system and resource are both publicly available.

  • Improving Temporal Relation Extraction with a Globally Acquired Statistical Resource

    2018-01-01 · 61 citations

    preprintOpen access

    Qiang Ning, Hao Wu, Haoruo Peng, Dan Roth. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

  • Understanding stories via event sequence modeling

    Illinois Digital Environment for Access to Learning and Scholarship (University of Illinois at Urbana-Champaign) · 2018-07-03

    articleOpen access1st authorCorresponding

    Understanding stories, i.e. sequences of events, is a crucial yet challenging natural language understanding (NLU) problem, which requires dealing with multiple aspects of semantics, including actions, entities and emotions, as well as background knowledge. In this thesis, towards the goal of building a NLU system that can model what has happened in stories and predict what would happen in the future, we contribute on three fronts: First, we investigate the optimal way to model events in text; Second, we study how we can model a sequence of events with the balance of generality and specificity; Third, we improve event sequence modeling by joint modeling of semantic information and incorporating background knowledge.\n\nEach of the above three research problems poses both conceptual and computational challenges. For event extraction, we find that Semantic Role Labeling (SRL) signals can be served as good intermediate representations for events, thus giving us the ability to reliably identify events with minimal supervision. In addition, since it is important to resolve co-referred entities for extracted events, we make improvements to an existing co-reference resolution system. To model event sequences, we start from studying within document event co-reference (the simplest flow of events); and then extend to model two other more natural event sequences along with discourse phenomena while abstracting over the specific mentions of predicates and entities. We further identify problems for the basic event sequence models, where we fail to capture multiple semantic aspects and background knowledge. We then improve our system by jointly modeling frames, entities and sentiments, yielding joint representations of all these semantic aspects; while at the same time incorporate explicit background knowledge acquired from other corpus as well as human experience. For all tasks, we evaluate the developed algorithms and models on benchmark datasets and achieve better performance compared to other highly competitive methods.

  • Regions, Periods, Activities

    2017-04-03 · 150 citations

    articleOpen access

    With the ever-increasing urbanization process, systematically modeling people's activities in the urban space is being recognized as a crucial socioeconomic task. This task was nearly impossible years ago due to the lack of reliable data sources, yet the emergence of geo-tagged social media (GTSM) data sheds new light on it. Recently, there have been fruitful studies on discovering geographical topics from GTSM data. However, their high computational costs and strong distributional assumptions about the latent topics hinder them from fully unleashing the power of GTSM.

Frequent coauthors

  • Dan Roth

    23 shared
  • Yangqiu Song

    6 shared
  • Shyam Upadhyay

    5 shared
  • Mark Sammons

    5 shared
  • Chen-Tse Tsai

    4 shared
  • Qiang Ning

    3 shared
  • Shuchang Zhou

    3 shared
  • Hao Wu

    2 shared

Education

  • Ph.D., Computer Science

    University of Illinois at Urbana-Champaign

    2010
  • M.S., Computer Science

    University of Illinois at Urbana-Champaign

    2006
  • B.S., Computer Science

    University of Science and Technology of China

    2002

Awards & honors

  • Outstanding Paper Award at NAACL 2024
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Hao Peng

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup