Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Jiangyun Sheng

Jiangyun Sheng

· Clinical Assistant Professor of General Dentistry

Boston University · Department of General Dentistry

Active 2011–2026

h-index12
Citations328
Papers222 last 5y
Funding
See your match with Jiangyun Sheng — sign in to PhdFit.Sign in

About

Jiangyun Sheng is a Clinical Assistant Professor of General Dentistry at the Henry M. Goldman School of Dental Medicine. He earned his DMD from the Henry M. Goldman School of Dental Medicine in 2011, a CAGS (AEGD) from the University of Rochester in 2009, a PhD in Oral Microbiology from Shanghai Second Medical University School of Stomatology in 1999, and a BDS from Beijing Medical University School of Stomatology in 1987. His academic background encompasses extensive training in dentistry and oral microbiology, contributing to his expertise in general dentistry and related fields.

Research topics

  • Computer Science
  • Programming language
  • Embedded system
  • Operating system
  • Software engineering

Selected publications

  • Towards Fine-Grained Document Tampering Detection: New Dataset and Benchmark

    Lecture notes in computer science · 2026-01-01

    book-chapter
  • Solving Inequality Proofs with Large Language Models

    ArXiv.org · 2025-06-09

    preprintOpen access

    Inequality proving, crucial across diverse scientific and mathematical fields, tests advanced reasoning skills such as discovering tight bounds and strategic theorem application. This makes it a distinct, demanding frontier for large language models (LLMs), offering insights beyond general mathematical problem-solving. Progress in this area is hampered by existing datasets that are often scarce, synthetic, or rigidly formal. We address this by proposing an informal yet verifiable task formulation, recasting inequality proving into two automatically checkable subtasks: bound estimation and relation prediction. Building on this, we release IneqMath, an expert-curated dataset of Olympiad-level inequalities, including a test set and training corpus enriched with step-wise solutions and theorem annotations. We also develop a novel LLM-as-judge evaluation framework, combining a final-answer judge with four step-wise judges designed to detect common reasoning flaws. A systematic evaluation of 29 leading LLMs on IneqMath reveals a surprising reality: even top models like o1 achieve less than 10% overall accuracy under step-wise scrutiny; this is a drop of up to 65.5% from their accuracy considering only final answer equivalence. This discrepancy exposes fragile deductive chains and a critical gap for current LLMs between merely finding an answer and constructing a rigorous proof. Scaling model size and increasing test-time computation yield limited gains in overall proof correctness. Instead, our findings highlight promising research directions such as theorem-guided reasoning and self-refinement. Code and data are available at https://ineqmath.github.io/.

  • Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair

    ACM Transactions on Software Engineering and Methodology · 2025-10-03 · 3 citations

    article

    LLMs have garnered considerable attention for their potential to streamline Automated Program Repair (APR). LLM-based approaches can either insert the correct code using an infilling-style technique or directly generate patches when provided with buggy methods, aiming for plausible patches to pass all tests. However, most of LLM-based APR methods rely on a single type of software information, such as issue descriptions or error stack traces, without fully leveraging a combination of diverse software artifacts. Human developers, in contrast, often use a range of information — such as debugging data, issue discussions, and error stack traces — to diagnose and fix bugs. Despite this, many LLM-based approaches do not explore which specific types of software information best assist in localizing and repairing software bugs. Addressing this gap is crucial for advancing LLM-based APR techniques. To investigate this and mimic the way human developers fix bugs, we propose DEVLoRe (short for DEV eloper Lo calization and Re pair). In this framework, LLMs first use issue content (description and discussion) and stack error traces to localize buggy methods, then rely on debug information in buggy methods and issue content and stack error to localize buggy lines and generate valid patches. We evaluated the effectiveness of issue content, error stack traces, and debugging information in bug localization and automatic program repair. Our results show that while issue content and error stack is particularly effective in assisting LLMs with fault localization and program repair respectively, different types of software artifacts complement each other in addressing various bugs. By incorporating these three types of artifacts and using the Defects4J v2.0 dataset for evaluation, DEVLoRe successfully localizes 49.3% of single-method bugs and generates 56.0% plausible patches. Additionally, DEVLoRe can localize 47.6% of non-single-method bugs and generates 14.5% plausible patches. Moreover, our framework streamlines the end-to-end process from buggy source code to a complete repair, and achieves a 39.7% and 17.1% of single-method and non-single-method bug repair rate, outperforming current state-of-the-art APR methods. Furthermore, we re-implemented and evaluated our framework, demonstrating its effectiveness in resolving 9 unique issues compared to other state-of-the-art frameworks using the same or more advanced models on SWE-bench Lite. We also discussed whether a leading framework for Python code can be directly applied to Java code, or vice versa. The source code and experimental results of this work for replication are available at https://github.com/XYZboom/DEVLoRe .

  • Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair

    arXiv (Cornell University) · 2024

    • Computer Science
    • Computer Science
    • Software engineering

    LLMs have garnered considerable attention for their potential to streamline Automated Program Repair (APR). LLM-based approaches can either insert the correct code or directly generate patches when provided with buggy methods. However, most of LLM-based APR methods rely on a single type of software information, without fully leveraging different software artifacts. Despite this, many LLM-based approaches do not explore which specific types of information best assist in APR. Addressing this gap is crucial for advancing LLM-based APR techniques. We propose DEVLoRe to use issue content (description and message) and stack error traces to localize buggy methods, then rely on debug information in buggy methods and issue content and stack error to localize buggy lines and generate plausible patches which can pass all unit tests. The results show that while issue content is particularly effective in assisting LLMs with fault localization and program repair, different types of software artifacts complement each other. By incorporating different artifacts, DEVLoRe successfully locates 49.3% and 47.6% of single and non-single buggy methods and generates 56.0% and 14.5% plausible patches for the Defects4J v2.0 dataset, respectively. This outperforms current state-of-the-art APR methods. Furthermore, we re-implemented and evaluated our framework, demonstrating its effectiveness in its effectiveness in resolving 9 unique issues compared to other state-of-the-art frameworks using the same or more advanced models on SWE-bench Lite.We also discussed whether a leading framework for Python code can be directly applied to Java code, or vice versa. The source code and experimental results of this work for replication are available at https://github.com/XYZboom/DEVLoRe.

  • MOCHA

    2021 · 11 citations

    • Computer Science
    • Computer Science
    • Operating system

    FPGAs have been widely deployed in public clouds, e.g., Amazon Web Services (AWS) and Huawei Cloud. However, simply offloading accelerated kernels from CPU hosts to PCIe-based FPGAs does not guarantee out-of-pocket cost savings in a pay-as-you-go public cloud. Taking Genome Analysis Toolkit (GATK) applications as case studies, although the adoption of FPGAs reduces the overall execution time, it introduces 2.56× extra cost, due to insufficient application-level speedup by Amdahl's law. To optimize the out-of-pocket cost while keeping high speedup and throughput, we propose Mocha framework as a distributed runtime system to fully utilize the accelerator resource by accelerator sharing and CPU-FPGA partial task offloading. Evaluation results on Haplotype Caller (HTC) and Mutect2 in GATK show that on AWS, Mocha saves on the application cost by 2.82x for HTC, 1.06x for Mutect2 and on Huawei Cloud by 1.22x, 1.52x respectively than straightforward CPU-FPGA integration solution with less than 5.1% performance overhead.

  • Fully Integrated On-FPGA Molecular Dynamics Simulations

    arXiv (Cornell University) · 2019-05-14 · 8 citations

    preprintOpen access

    The implementation of Molecular Dynamics (MD) on FPGAs has received substantial attention. Previous work, however, has consisted of either proof-of-concept implementations of components, usually the range-limited force; full systems, but with much of the work shared by the host CPU; or prototype demonstrations, e.g., using OpenCL, that neither implement a whole system nor have competitive performance. In this paper, we present what we believe to be the first full-scale FPGA-based simulation engine, and show that its performance is competitive with a GPU (running Amber in an industrial production environment). The system features on-chip particle data storage and management, short- and long-range force evaluation, as well as bonded forces, motion update, and particle migration. Other contributions of this work include exploring numerous architectural trade-offs and analysis on various mappings schemes among particles/cells and the various on-chip compute units. The potential impact is that this system promises to be the basis for long timescale Molecular Dynamics with a commodity cluster.

  • Fully integrated FPGA molecular dynamics simulations

    2019-11-07 · 48 citations

    articleOpen access

    The implementation of Molecular Dynamics (MD) on FPGAs has received substantial attention. Previous work, however, has consisted of either proof-of-concept implementations of components, usually the range-limited force; full systems, but with much of the work shared by the host CPU; or prototype demonstrations, e.g., using OpenCL, that neither implement a whole system nor have competitive performance. In this paper, we present what we believe to be the first full-scale FPGA-based simulation engine, and show that its performance is competitive with a GPU (running Amber in an industrial production environment). The system features on-chip particle data storage and management, short- and long-range force evaluation, as well as bonded forces, motion update, and particle migration. Other contributions of this work include exploring numerous architectural trade-offs and analysis of various mappings schemes among particles/cells and the various on-chip compute units. The potential impact is that this system promises to be the basis for long timescale Molecular Dynamics with a commodity cluster.

  • Molecular Dynamics Range-Limited Force Evaluation Optimized for FPGAs

    2019-07-01 · 16 citations

    article

    FPGA Molecular Dynamics was much studied from 2004-2010. Due to limited chip resources of that era, and the inherent variety and complexity of tasks comprising Molecular Dynamics simulations (MD), those FPGA accelerators relied on host or embedded processors to organize and pre-process input and output data. This introduced long latency for data movement between simulation iterations and, as technology advanced, drastically limited performance. Current generation FPGAs are equipped not only with abundant on-chip resources, but also have hardware support for floating point operations; these advances provide an opportunity for creating self-contained MD simulation systems on a single device. In this paper, we demonstrate such a system based on the range-limited force, which comprises 90% of the flops in a typical MD simulation. It features online particle-pair generation, hundreds of force evaluation pipelines, motion update, and particle data migration. We integrate into OpenMM and find that, for a representative dataset (liquid argon with 20K atoms), we can achieve a simulation throughput of 1.4us/day with a single FPGA, more than twice the performance of a comparable generation GPU. The bulk of the work presented here explores the design of an independent MD range-limited force evaluation system tailored for modern FPGAs without data exchange with any off-chip devices. The primary contributions are the designs of the new features, the methods for coupling those features into an integrated system, and, especially, the analysis of the most likely mappings among particles/cells, on-chip memories (BRAMs), and on-chip compute units (pipelines).

  • High Performance Dynamic Communication on Reconfigurable Clusters

    2018-04-01 · 13 citations

    article1st authorCorresponding

    FPGA clusters with the FPGAs directly linked through their Multi-Gigabit Transceiver (MGT) ports have a proven advantage over other commodity architectures for communication-bound applications. We find that the standard wormhole routers need some modification to be appropriate for clusters with tightly coupled FPGAs, and create such a router. We generalize this router so that it is parameterized by several parameters including routing algorithm, arbitration policy, virtual channels, and buffers. We have evaluated these designs with respect to a number standard communication patterns and packet sizes. These results enable selection of the appropriate router for any resource budget. Finally, We find that the optimality of the router design varies significantly with workload. We observe that for a 512 FPGA cluster, connected in an 8^3 torus, compared with the router configuration with the best average performance, application-aware router configurations reduce average batch latency by 3%, improve the throughput by 6% on average, and improve area consumption by 50%.

  • High Performance Communication on Reconfigurable Clusters

    2018-08-01 · 7 citations

    article1st authorCorresponding

    FPGA clusters with the FPGAs directly linked through their Multi-Gigabit Transceivers (MGT) have a proven advantage over other commodity architectures for communication-bound applications. To date, however, communication infrastructure for such clusters has generally taken one of two approaches: nearest neighbor only, which is fast but has limited utility, and processor-based, which is general, but relatively slow. What is needed is for communication microarchitecture of these systems to be systematically explored, as has been done for HPC clusters and for Networks on Chip (NoC) on both FPGAs and ASICs. Our first contribution is finding that the properties of clusters of tightly coupled FPGAs substantially influence the router design space. We create a candidate router and generalize it so that it is parameterized by routing algorithm, arbitration policy, and virtual channels (VC). We have created a cycle-accurate simulator validated on a four-FPGA system. We evaluate the design space with respect to a number of standard communication patterns and packet sizes. These results enable selection of the appropriate router for any resource budget. We find that the optimality of the router design varies significantly with workloads. We present a framework that helps to determine appropriate parameters based on different applications and generate the HDL design. We observe that for a 512 FPGA cluster, compared with the router configuration with the best average performance, application-aware router selection can lead to substantial improvement in performance or reduction in area.

Frequent coauthors

  • Martin C. Herbordt

    Boston University

    17 shared
  • Chen Yang

    Xi'an Jiaotong University

    8 shared
  • Minge Jing

    4 shared
  • Vipin Sachdeva

    Roivant Sciences (United States)

    4 shared
  • Qingqing Xiong

    Boston University

    4 shared
  • Tianqi Wang

    4 shared
  • Ahmed Sanaullah

    Red Hat (United States)

    4 shared
  • Di Wu

    University of Central Florida

    3 shared

Education

  • Other

    Henry M. Goldman School of Dental Medicine

    2011
  • Other, AEGD

    University of Rochester

    2009
  • Ph.D., Oral Microbiology

    Shanghai Second Medical University School of Stomatology

    1999
  • Other

    Beijing Medical University School of Stomatology

    1987
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Jiangyun Sheng

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup