Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Poulami Das

Poulami Das

· Assistant ProfessorVerified

University of Texas at Austin · Electrical and Computer Engineering

Active 2002–2025

h-index14
Citations618
Papers6335 last 5y
Funding
See your match with Poulami Das — sign in to PhdFit.Sign in

About

Poulami Das is an Assistant Professor and a Fellow of the Advanced Micro Devices (AMD) Chair in Computer Engineering in the Chandra Family Department of Electrical and Computer Engineering at The University of Texas at Austin. Her research focuses on software and architecture for improving the reliability of quantum computers. She is also interested in computer architecture, memory systems, and emerging technologies. Before joining UT Austin as a faculty, she obtained her PhD degree from Georgia Tech and MS degree from UT Austin.

Research topics

  • Computer Science
  • Algorithm
  • Parallel computing
  • Computer engineering

Selected publications

  • Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs

    ArXiv.org · 2025-03-02

    preprintOpen accessSenior author

    Autoregressive Transformers rely on Key-Value (KV) caching to accelerate inference. However, the linear growth of the KV cache with context length leads to excessive memory consumption and bandwidth constraints. This bottleneck is particularly problematic in real-time applications -- such as chatbots and interactive assistants -- where low latency and high memory efficiency are critical. Existing methods drop distant tokens or compress states in a lossy manner, sacrificing accuracy by discarding vital context or introducing bias. We propose MorphKV, an inference-time technique that maintains a constant-sized KV cache while preserving accuracy. MorphKV balances long-range dependencies and local coherence during text generation. It eliminates early-token bias while retaining high-fidelity context by adaptively ranking tokens through correlation-aware selection. Unlike heuristic retention or lossy compression, MorphKV iteratively refines the KV cache via lightweight updates guided by attention patterns of recent tokens. This approach captures inter-token correlation with greater accuracy, crucial for tasks like content creation and code generation. Our studies on long-response tasks show 52.9$\%$ memory savings and 18.2$\%$ higher accuracy on average compared to state-of-the-art prior works, enabling efficient real-world deployment.

  • HiSpec: Hierarchical Speculative Decoding for LLMs

    ArXiv.org · 2025-10-01

    preprintOpen accessSenior author

    Speculative decoding accelerates LLM inference by using a smaller draft model to speculate tokens that a larger target model verifies. Verification is often the bottleneck (e.g. verification is $4\times$ slower than token generation when a 3B model speculates for a 70B target model), but most prior works focus only on accelerating drafting. $\textit{``Intermediate"}$ verification reduces verification time by discarding inaccurate draft tokens early, but existing methods incur substantial training overheads in incorporating the intermediate verifier, increase the memory footprint to orchestrate the intermediate verification step, and compromise accuracy by relying on approximate heuristics. We propose $\underline{\textit{Hi}}\textit{erarchical }\underline{\textit{Spec}}\textit{ulative Decoding (HiSpec)}$, a framework for high-throughput speculative decoding that exploits $\textit{early-exit (EE) models}$ for low-overhead intermediate verification. EE models allow tokens to exit early by skipping layer traversal and are explicitly trained so that hidden states at selected layers can be interpreted, making them uniquely suited for intermediate verification without drastically increasing compute and memory overheads. To improve resource-efficiency even further, we design a methodology that enables HiSpec to re-use key-value caches and hidden states between the draft, intermediate verifier, and target models. To maintain accuracy, HiSpec periodically validates the draft tokens accepted by the intermediate verifier against the target model. Our evaluations using various representative benchmarks and models show that HiSpec improves throughput by 1.28$\times$ on average and by up to 2.01$\times$ compared to the baseline single-layer speculation without compromising accuracy.

  • decoder-bench: Benchmarking Decoders for Quantum Error Correction

    Zenodo (CERN European Organization for Nuclear Research) · 2025-10-12

    datasetOpen access

    As the field of quantum computing moves towards the realization of Quantum Error Correction (QEC), an increasing amount of attention has been paid to the implementation of decoders--classical systems analyzing the state of error on the quantum device in real-time. However, today there is a lack of tools available to characterize and compare the performance of decoders. In this work, we address this need and introduce decoder-bench, a framework for benchmarking decoders on relevant QEC code traces. decoder-bench integrates with Stim, a high performance tool for analyzing stabilizer circuits, to create traces for a variety of QEC codes and fault-tolerant subroutines. We use decoder-bench to evaluate the accuracy and latency performance of multiple decoders for circuit-level simulations of color code memory experiments, bivariate-bicycle code memory experiments, surface code memory experiments, and surface code lattice surgery experiments.

  • MoPAC: Efficiently Mitigating Rowhammer with Probabilistic Activation Counting

    2025-06-20 · 4 citations

    articleOpen access

    Rowhammer has worsened over the last decade.Existing in-DRAM solutions, such as TRR, were broken with simple patterns.In response, the recent DDR5 JEDEC standards modify the DRAM array to enable Per-Row Activation Counters (PRAC) for tracking aggressor rows.They also extend the DRAM timings to support the operations required to update the PRAC counters.Unfortunately, the increased memory timings cause significant performance overheads (on average 10%) even for benign applications and even at current Rowhammer thresholds.The goal of this paper is to minimize the slowdown of PRAC while retaining the security benefits of PRAC.This paper proposes Mitigating Rowhammer with Probabilistic Activation Counts (MoPAC), which reduces the slowdown of updating the PRAC counters by performing the updates probabilistically, thereby incurring the latency overhead of counter updates for only a small subset of activations.To ensure security in the presence of probabilistic counters, MOPAC adjusts the threshold at which the row undergoes mitigation.We propose two variants of MoPAC: MoPAC-C (Memory-Controller Side) and MoPAC-D (DRAM Side).MoPAC-C relies on having two types of precharge commands: one that incurs normal latency and does not do counter updates, and the other that incurs higher latency and performs counter updates.MoPAC-C probabilistically chooses when the longer precharge must be used to perform update of the PRAC counter.MoPAC-D is a completely in-DRAM solution that probabilistically selects which activations will be selected for performing counter updates and obtains the time required for counter-updates using ALERT or REF.Our evaluations show that, for a Rowhammer threshold of 500 (10 lower than current thresholds), MoPAC-C and MoPAC-D incur an average slowdown of only 1.7% and 0.7%, much less than the 10% incurred by PRAC.MoPAC removes one of the major obstacles to the commercial adoption of PRAC.

  • Advancing software quality assurance: methods, security, and AI impact

    2025-08-19

    book-chapterSenior author

    Software quality assurance (SQA) is crucial in ensuring that developed software meets defined requirements and user expectations. This review paper systematically analyzes literature on SQA practices from 2000-2023, drawing insights from 40 research papers sourced from IEEE Xplore, ACM Digital Library, SpringerLink, and ScienceDirect. The review explores the integration of QA practices throughout the software development lifecycle (SDLC), examining aspects such as cost of quality measurement, maturity models, risk assessment, and the role of emerging technologies like AI/ML in security assurance. Key findings emphasize the importance of continuous code review and testing across all phases, collaborative QA approaches, and economic factors influencing QA adoption. Identified gaps include the need for simulation models to predict QA effort and the validation of maturity models across industries. This analysis highlights future research directions to enhance QA frameworks in line with global software industry needs and technological advancements. By synthesizing these insights, the paper provides a reference for both practitioners and researchers aiming to strengthen SQA methodologies, ensuring the development of robust, secure software systems. The paper concludes with an outline of SDLC models, security integration, and the impact of AI in software quality practices.

  • decoder-bench: Benchmarking Decoders for Quantum Error Correction

    Zenodo (CERN European Organization for Nuclear Research) · 2025-10-12

    datasetOpen access

    As the field of quantum computing moves towards the realization of Quantum Error Correction (QEC), an increasing amount of attention has been paid to the implementation of decoders--classical systems analyzing the state of error on the quantum device in real-time. However, today there is a lack of tools available to characterize and compare the performance of decoders. In this work, we address this need and introduce decoder-bench, a framework for benchmarking decoders on relevant QEC code traces. decoder-bench integrates with Stim, a high performance tool for analyzing stabilizer circuits, to create traces for a variety of QEC codes and fault-tolerant subroutines. We use decoder-bench to evaluate the accuracy and latency performance of multiple decoders for circuit-level simulations of color code memory experiments, bivariate-bicycle code memory experiments, surface code memory experiments, and surface code lattice surgery experiments.

  • Optimizing FTQC Programs through QEC Transpiler and Architecture Codesign

    arXiv (Cornell University) · 2024-12-19

    preprintOpen access

    Fault-tolerant quantum computing (FTQC) is essential for executing reliable quantum computations of meaningful scale. Widely adopted QEC codes for FTQC, such as the surface code and color codes, utilize Clifford+T gate sets, where T gates are generally considered as the primary bottleneck due to their high resource costs. Recent advances in T gate optimization have significantly reduced this overhead, making Clifford gate complexity an increasingly critical bottleneck that remains largely unaddressed in present FTQC compiler and architecture designs. To address this new bottleneck, this paper introduces TACO, a \textbf{T}ranspiler-\textbf{A}rchitecture \textbf{C}odesign \textbf{O}ptimization framework, to reduce Clifford cost. Specifically, we observe that, through codesign, insights rooted in the FTQC architecture can inform novel circuit-level optimizations for FTQC compilers. These optimizations, in turn, provide new opportunities to redesign and improve the underlying architecture. Evaluations show that TACO achieves an average 91.7% reduction in Clifford gates across diverse quantum circuits and significantly enhances gate parallelism compared to Pauli-based approaches. These improvements enable an efficient FTQC architecture that can achieve single-gate-per-cycle throughput using only $1.5n+4$ logical qubit tiles, considerably pushing forward upon previously proposed designs that require $2n+\sqrt{8n}+1$ tiles. These results highlight the benefits of bidirectional optimization through codesign. TACO will be open-source.

  • Shared-Custodial Password-Authenticated Deterministic Wallets

    Lecture notes in computer science · 2024-01-01 · 3 citations

    book-chapter1st authorCorresponding
  • Élivágar: Efficient Quantum Circuit Search for Classification

    arXiv (Cornell University) · 2024-01-17 · 1 citations

    preprintOpen access

    Designing performant and noise-robust circuits for Quantum Machine Learning (QML) is challenging -- the design space scales exponentially with circuit size, and there are few well-supported guiding principles for QML circuit design. Although recent Quantum Circuit Search (QCS) methods attempt to search for performant QML circuits that are also robust to hardware noise, they directly adopt designs from classical Neural Architecture Search (NAS) that are misaligned with the unique constraints of quantum hardware, resulting in high search overheads and severe performance bottlenecks. We present Élivágar, a novel resource-efficient, noise-guided QCS framework. Élivágar innovates in all three major aspects of QCS -- search space, search algorithm and candidate evaluation strategy -- to address the design flaws in current classically-inspired QCS methods. Élivágar achieves hardware-efficiency and avoids an expensive circuit-mapping co-search via noise- and device topology-aware candidate generation. By introducing two cheap-to-compute predictors, Clifford noise resilience and Representational capacity, Élivágar decouples the evaluation of noise robustness and performance, enabling early rejection of low-fidelity circuits and reducing circuit evaluation costs. Due to its resource-efficiency, Élivágar can further search for data embeddings, significantly improving performance. Based on a comprehensive evaluation of Élivágar on 12 real quantum devices and 9 QML applications, Élivágar achieves 5.3% higher accuracy and a 271$\times$ speedup compared to state-of-the-art QCS methods.

  • Promatch: Extending the Reach of Real-Time Quantum Error Correction with Adaptive Predecoding

    2024-04-24 · 10 citations

    articleOpen access

    Fault-tolerant quantum computing relies on Quantum Error Correction (QEC), which encodes logical qubits into data and parity qubits. Error decoding is the process of translating the measured parity bits into types and locations of errors. To prevent a backlog of errors, error decoding must be performed in real-time (i.e., within 1μs on superconducting machines). Minimum Weight Perfect Matching (MWPM) is an accurate decoding algorithm for surface code, and recent research has demonstrated real-time implementations of MWPM (RT-MWPM) for a distance of up to 9. Unfortunately, beyond d=9, the number of flipped parity bits in the syndrome, referred to as the Hamming weight of the syndrome, exceeds the capabilities of existing RT-MWPM decoders. In this work, our goal is to enable larger distance RT-MWPM decoders by using adaptive predecoding that converts high Hamming weight syndromes into low Hamming weight syndromes, which are accurately decoded by the RT-MWPM decoder.

Frequent coauthors

  • Moinuddin K. Qureshi

    Georgia Institute of Technology

    32 shared
  • Ramin Ayanzadeh

    15 shared
  • Swamit Tannu

    12 shared
  • Debnath Bhattacharyya

    9 shared
  • Narges Alavisamani

    9 shared
  • Samir Kumar Bandyopadhyay

    8 shared
  • Sebastian Faust

    7 shared
  • Douglas M. Carmean

    Microsoft (United States)

    6 shared

Education

  • PhD, Electrical and Computer Engineering

    Georgia Institute of Technology

    2023

Awards & honors

  • Fellow of the Advanced Micro Devices (AMD) Chair in Computer…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Poulami Das

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup