Mary Lou Soffa

Verified

University of Virginia · Computer Science

Active 1977–2024

h-index51

Citations10.7k

Papers28211 last 5y

Funding$1.2M

Faculty page Lab page

See your match with Mary Lou Soffa — sign in to PhdFit.Sign in

About

Mary Lou Soffa is a professor with a distinguished career in computer science, focusing on areas related to software testing, performance optimization, and computer architecture. Her research encompasses the development of testing frameworks for neural networks using deep generative models, as well as addressing processor over-provisioning on large-scale multi-core platforms. Throughout her career, she has supervised numerous students and post-doctoral researchers, contributing to advancements in dynamic binary parallelization, resource contention mitigation in warehouse-scale computers, and fault detection frameworks. Her work is characterized by a strong emphasis on improving the reliability, performance, and efficiency of computing systems, and she has been actively involved in mentoring the next generation of computer scientists.

Research topics

Computer Science
Artificial Intelligence
Data Mining
Machine Learning
Operating system
Embedded system
Programming language

Selected publications

CIT4DNN: Generating Diverse and Rare Inputs for Neural Networks Using Latent Space Combinatorial Testing
2024-04-12 · 12 citations
articleOpen accessSenior author
Deep neural networks (DNN) are being used in a wide range of applications including safety-critical systems. Several DNN test generation approaches have been proposed to generate fault-revealing test inputs. However, the existing test generation approaches do not systematically cover the input data distribution to test DNNs with diverse inputs, and none of the approaches investigate the relationship between rare inputs and faults. We propose cit4dnn, an automated black-box approach to generate DNN test sets that are feature-diverse and that comprise rare inputs. cit4dnn constructs diverse test sets by applying combinatorial interaction testing to the latent space of generative models and formulates constraints over the geometry of the latent space to generate rare and fault-revealing test inputs. Evaluation on a range of datasets and models shows that cit4dnn generated tests are more feature diverse than the state-of-the-art, and can target rare fault-revealing testing inputs more effectively than existing methods.
Publisher OA PDF DOI
Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network Testing
ACM Transactions on Software Engineering and Methodology · 2022 · 20 citations
Senior authorCorresponding
- Computer Science
- Computer Science
- Machine Learning
Testing deep neural networks (DNNs) has garnered great interest in the recent years due to their use in many applications. Black-box test adequacy measures are useful for guiding the testing process in covering the input domain. However, the absence of input specifications makes it challenging to apply black-box test adequacy measures in DNN testing. The Input Distribution Coverage (IDC) framework addresses this challenge by using a variational autoencoder to learn a low dimensional latent representation of the input distribution, and then using that latent space as a coverage domain for testing. IDC applies combinatorial interaction testing on a partitioning of the latent space to measure test adequacy. Empirical evaluation demonstrates that IDC is cost-effective, capable of detecting feature diversity in test inputs, and more sensitive than prior work to test inputs generated using different DNN test generation methods. The findings demonstrate that IDC overcomes several limitations of white-box DNN coverage approaches by discounting coverage from unrealistic inputs and enabling the calculation of test adequacy metrics that capture the feature diversity present in the input space of DNNs.
Publisher OA PDF DOI
Message from the Program Chairs
2021-02-27
articleOpen access1st authorCorresponding
We are pleased to welcome you to CGO 2021, the first virtual CGO Conference. In addition, the Program Committee was virtual due to the worldwide infection rate of the coronavirus. On behalf of the Program Committee, we are pleased to present an exciting and stimulating program for the 2021 International Symposium on Code Generation and Optimization Conference.
Publisher OA PDF DOI
Artifact: Distribution-Aware Testing of Neural Networks Using Generative Models
2021-05-01
articleSenior author
The artifact used for the experimental evaluation of Distribution-Aware Testing of Neural Networks Using Generative Models is publicly available on GitHub and it is reusable. The artifact consists of python scripts, trained deep neural network model files and data required for running the experiments. It is also provided as a VirtualBox VM image for reproducing the paper results. Users should be familiar with using VirtualBox software and Linux platform to reproduce or reuse the artifact.
Publisher DOI
Distribution-Aware Testing of Neural Networks Using Generative Models
2021-05-01 · 3 citations
preprintOpen accessSenior author
The reliability of software that has a Deep Neural Network (DNN) as a component is urgently important today given the increasing number of critical applications being deployed with DNNs. The need for reliability raises a need for rigorous testing of the safety and trustworthiness of these systems. In the last few years, there have been a number of research efforts focused on testing DNNs. However the test generation techniques proposed so far lack a check to determine whether the test inputs they are generating are valid, and thus invalid inputs are produced. To illustrate this situation, we explored three recent DNN testing techniques. Using deep generative model based input validation, we show that all the three techniques generate significant number of invalid test inputs. We further analyzed the test coverage achieved by the test inputs generated by the DNN testing techniques and showed how invalid test inputs can falsely inflate test coverage metrics. To overcome the inclusion of invalid inputs in testing, we propose a technique to incorporate the valid input space of the DNN model under test in the test generation process. Our technique uses a deep generative model-based algorithm to generate only valid inputs. Results of our empirical studies show that our technique is effective in eliminating invalid tests and boosting the number of valid test inputs generated.
Publisher OA PDF DOI
Testing deep neural networks (keynote)
2020-11-15
article1st authorCorresponding
The reliability of software that has a Deep Neural Network (DNN) as a component is urgently important today given the increasing number of critical applications being deployed with DNNs. The need for reliability raises a need for rigorous testing of the safety and trustworthiness of these systems. In the last few years, there have been a number of research efforts focused on testing DNNs. However, the test generation techniques proposed so far lack a check to determine whether the test inputs they are generating are valid, and thus invalid inputs are produced. To illustrate this situation, we explored three recent DNN testing techniques. Using deep generative model based input validation, we show that all the three techniques generate significant number of invalid test inputs. We further analyzed the test coverage achieved by the test inputs generated by the DNN testing techniques and showed how invalid test inputs can falsely inflate test coverage metrics. To overcome the inclusion of invalid inputs in testing, we propose a technique to incorporate the valid input space of the DNN model under test in the test generation process. Our technique uses a deep generative model-based algorithm to generate only valid inputs. Results of our empirical studies show that our technique is effective in eliminating invalid tests and boosting the number of valid test inputs generated.
Publisher DOI
A Language for Autonomous Vehicles Testing Oracles
arXiv (Cornell University) · 2020-06-17 · 1 citations
preprintOpen access
Testing autonomous vehicles (AVs) requires complex oracles to determine if the AVs behavior conforms with specifications and humans' expectations. Available open source oracles are tightly embedded in the AV simulation software and are developed and implemented in an ad hoc way. We propose a domain specific language that enables defining oracles independent of the AV solutions and the simulator. A testing analyst can encode safety, liveness, timeliness and temporal properties in our language. To show the expressiveness of our language we implement three different types of available oracles. We find that the same AV solutions may be ranked significantly differently across existing oracles, thus existing oracles do not evaluate AVs in a consistent manner.
Publisher OA PDF DOI
Is rust used safely by software developers?
2020 · 57 citations
Senior authorCorresponding
- Computer Science
- Computer Science
- Operating system
Rust, an emerging programming language with explosive growth, provides a robust type system that enables programmers to write memory-safe and data-race free code. To allow access to a machine's hardware and to support low-level performance optimizations, a second language, Unsafe Rust, is embedded in Rust. It contains support for operations that are difficult to statically check, such as C-style pointers for access to arbitrary memory locations and mutable global variables. When a program uses these features, the compiler is unable to statically guarantee the safety properties Rust promotes. In this work, we perform a large-scale empirical study to explore how software developers are using Unsafe Rust in real-world Rust libraries and applications. Our results indicate that software engineers use the keyword unsafe in less than 30% of Rust libraries, but more than half cannot be entirely statically checked by the Rust compiler because of Unsafe Rust hidden somewhere in a library's call chain. We conclude that although the use of the keyword unsafe is limited, the propagation of unsafeness offers a challenge to the claim of Rust as a memory-safe language. Furthermore, we recommend changes to the Rust compiler and to the central Rust repository's interface to help Rust software developers be aware of when their Rust code is unsafe.
Publisher OA PDF DOI
ESEC/FSE 2019 - A Statistics-based Performance Testing Methodology for Cloud Applications
Figshare · 2019-01-01
articleOpen accessSenior author
There are the experiment result data sets for ESEC/FSE paper: “A Statistics-based Performance Testing Methodology for Cloud Applications” Including source code and dataset For details please refer to Install and Readme
Publisher DOI
A statistics-based performance testing methodology for cloud applications
2019-08-09 · 60 citations
articleSenior author
The low cost of resource ownership and flexibility have led users to increasingly port their applications to the clouds. To fully realize the cost benefits of cloud services, users usually need to reliably know the execution performance of their applications. However, due to the random performance fluctuations experienced by cloud applications, the black box nature of public clouds and the cloud usage costs, testing on clouds to acquire accurate performance results is extremely difficult. In this paper, we present a novel cloud performance testing methodology called PT4Cloud. By employing non-parametric statistical approaches of likelihood theory and the bootstrap method, PT4Cloud provides reliable stop conditions to obtain highly accurate performance distributions with confidence bands. These statistical approaches also allow users to specify intuitive accuracy goals and easily trade between accuracy and testing cost. We evaluated PT4Cloud with 33 benchmark configurations on Amazon Web Service and Chameleon clouds. When compared with performance data obtained from extensive performance tests, PT4Cloud provides testing results with 95.4% accuracy on average while reducing the number of test runs by 62%. We also propose two test execution reduction techniques for PT4Cloud, which can reduce the number of test runs by 90.1% while retaining an average accuracy of 91%. We compared our technique to three other techniques and found that our results are much more accurate.
Publisher DOI

Recent grants

Collaborative Research: CSR--AES--Debugging Dynamic Code Modifications
NSF · $110k · 2005–2007
CPA-CPL-T: Collaborative Research: REEact: A Robust Execution Environment for Fragile Multicore Systems
NSF · $566k · 2008–2013
CSR: Medium: Collaborative Research: Scaling the Implicitly Parallel Programming Model with Lifelong Thread Extraction and Dynamic Adaptation
NSF · $262k · 2010–2014
SHF: SMALL: Collaborative Research: Cloud Mentoring: Guiding Cloud Users for Cost Performance through Testing and Recommendation
NSF · $312k · 2016–2020

Frequent coauthors

Rajiv Gupta
University of California, Riverside
74 shared
Bruce R. Childers
University of Pittsburgh
30 shared
Jack W. Davidson
19 shared
Jason Mars
17 shared
David A. Berson
Intel (United Kingdom)
14 shared
Wei Wang
14 shared
Atif M. Memon
Apple (United States)
14 shared
Rastislav Bodík
Google (United States)
13 shared

Labs

Mary Lou Soffa's LabPI
Research in software engineering, computer architecture, and parallel computing

Awards & honors

Fellow of the Association for Computing Machinery (ACM)
Fellow of The Institute of Electrical and Electronic Enginee…
Ken Kennedy Award (2012)
Anita Borg Technical Leadership Award (2011)
ACM SIGSOFT Influential Educator Award (2014)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Mary Lou Soffa

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you