Alin Dobra

· Ph.D. Associate ProfessorVerified

University of Florida · Computer & Information Science & Engineering

Active 2001–2023

h-index22

Citations3.1k

Papers784 last 5y

Funding$600k

Faculty page Lab page

See your match with Alin Dobra — sign in to PhdFit.Sign in

About

Alin Dobra is an Associate Professor at the University of Florida in the Department of Computer & Information Science & Engineering. His research focuses on areas related to computer science and information technology, although specific details about his research interests, projects, or contributions are not provided in the page text. His academic background, research achievements, and key contributions are not explicitly described in the available content.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Data Mining
Artificial Intelligence
Machine Learning
Data science
Computational biology
Biology
Computer network
Mathematics
Theoretical computer science

Selected publications

Optimal Supervised Reduction of High Dimensional Transcription Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics · 2023 · 1 citations
- Computer Science
- Computer Science
- Data Mining
The plight of navigating high-dimensional transcription datasets remains a persistent problem. This problem is further amplified for complex disorders, such as cancer as these disorders are often multigenic traits with multiple subsets of genes collectively affecting the type, stage, and severity of the trait. We are often faced with a trade off between reducing the dimensionality of our datasets and maintaining the integrity of our data. To accomplish both tasks simultaneously for very high dimensional transcriptome for complex multigenic traits, we propose a new supervised technique, Class Separation Transformation (CST). CST accomplishes both tasks simultaneously by significantly reducing the dimensionality of the input space into a one-dimensional transformed space that provides optimal separation between the differing classes. Furthermore, CST offers an means of explainable ML, as it computes the relative importance of each feature for its contribution to class distinction, which can thus lead to deeper insights and discovery. We compare our method with existing state-of-the-art methods using both real and synthetic datasets, demonstrating that CST is the more accurate, robust, scalable, and computationally advantageous technique relative to existing methods. Code used in this paper is available on https://github.com/richiebailey74/CST.
Publisher DOI
Optimal separation of high dimensional transcriptome for complex multigenic traits
2022-07-28
article
The plight of navigating high-dimensional transcription datasets remains a persistent problem. This problem is further amplified for complex disorders, such as cancer, as these disorders are often multigenic traits with multiple subsets of genes collectively affecting the type, stage, and severity of the trait. We are often faced with a trade-off between reducing the dimensionality of our datasets and maintaining the integrity of our data. Almost exclusively, researchers apply techniques commonly known as dimensionality reduction to reduce the dimensions of the feature space to allow classifiers to work in more appropriately sized input spaces. As the number of dimensions is reduced, however, the ability to distinguish classes from one another reduces as well. Thus, to accomplish both tasks simultaneously for very high dimensional transcriptome for complex multigenic traits, we propose a new supervised technique, Class Separation Transformation (CST). CST accomplishes both tasks simultaneously by significantly reducing the dimensionality of the input space into a one-dimensional transformed space that provides optimal separation between the differing classes. We compare our method with existing state-of-the-art methods using both real and synthetic datasets, demonstrating that CST is the more accurate, robust, and scalable technique relative to existing methods. Code used in this paper is available on https://github.com/aisharjya/CST
Publisher DOI
Identification of co-existing embeddings of a motif in multilayer networks
2022
- Computer Science
- Computer Science
- Theoretical computer science
Interactions among molecules, also known as biological networks, are often modeled as binary graphs, where nodes and edges represent the molecules and the interaction among those molecules, such as signal transmission, genes-regulation, and protein-protein interactions. Subgraph patterns which are recurring in these networks, called motifs, describe conserved biological functions. Although traditional binary graph provides a simple model to study biological interactions, it lacks the expressive power to provide a holistic view of cell behavior as the interaction topology alters and adopts under different stress conditions as well as genetic variations. Multilayer network model captures the complexity of cell functions for such systems. Unlike the classic binary network model, multilayer network model provides an opportunity to identify conserved functions in cell among varying conditions. In this paper, we introduce the problem of co-existing motifs in multilayer networks. These motifs describe the dual conservation of the functions of cells within a network layer (i.e., cell condition) as well as across different layers of networks. We propose a new algorithm to solve the co-existing motif identification problem efficiently and accurately. Our experiments on both synthetic and real datasets demonstrate that our method identifies all co-existing motifs at near 100 % accuracy for all networks we tested on, while competing method's accuracy varies greatly between 10 to 95 %. Furthermore, our method runs at least an order of magnitude faster than state of the art motif identification methods for binary network models.
Publisher DOI
Pattern Discovery in Multilayer Networks
IEEE/ACM Transactions on Computational Biology and Bioinformatics · 2021 · 21 citations
- Computer Science
- Computer Science
- Data science
MOTIVATION: In bioinformatics, complex cellular modeling and behavior simulation to identify significant molecular interactions is considered a relevant problem. Traditional methods model such complex systems using single and binary network. However, this model is inadequate to represent biological networks as different sets of interactions can simultaneously take place for different interaction constraints (such as transcription regulation and protein interaction). Furthermore, biological systems may exhibit varying interaction topologies even for the same interaction type under different developmental stages or stress conditions. Therefore, models which consider biological systems as solitary interactions are inaccurate as they fail to capture the complex behavior of cellular interactions within organisms. Identification and counting of recurrent motifs within a network is one of the fundamental problems in biological network analysis. Existing methods for motif counting on single network topologies are inadequate to capture patterns of molecular interactions that have significant changes in biological expression when identified across different organisms that are similar, or even time-varying networks within the same organism. That is, they fail to identify recurrent interactions as they consider a single snapshot of a network among a set of multiple networks. Therefore, we need methods geared towards studying multiple network topologies and the pattern conservation among them. Contributions: In this paper, we consider the problem of counting the number of instances of a user supplied motif topology in a given multilayer network. We model interactions among a set of entities (e.g., genes)describing various conditions or temporal variation as multilayer networks. Thus a separate network as each layer shows the connectivity of the nodes under a unique network state. Existing motif counting and identification methods are limited to single network topologies, and thus cannot be directly applied on multilayer networks. We apply our model and algorithm to study frequent patterns in cellular networks that are common in varying cellular states under different stress conditions, where the cellular network topology under each stress condition describes a unique network layer. RESULTS: We develop a methodology and corresponding algorithm based on the proposed model for motif counting in multilayer networks. We performed experiments on both real and synthetic datasets. We modeled the synthetic datasets under a wide spectrum of parameters, such as network size, density, motif frequency. Results on synthetic datasets demonstrate that our algorithm finds motif embeddings with very high accuracy compared to existing state-of-the-art methods such as G-tries, ESU (FANMODE)and mfinder. Furthermore, we observe that our method runs from several times to several orders of magnitude faster than existing methods. For experiments on real dataset, we consider Escherichia coli (E. coli)transcription regulatory network under different experimental conditions. We observe that the genes selected by our method conserves functional characteristics under various stress conditions with very low false discovery rates. Moreover, the method is scalable to real networks in terms of both network size and number of layers.
Publisher DOI
Finding Conserved Patterns in Multilayer Networks
2019-09-04 · 7 citations
article
Motivation: Traditional methods often represent complex systems as a single, static, and binary network. These models are inadequate in capturing complex cellular interactions which vary under different conditions as well as over time. Furthermore the same set of molecules can interact in varying patterns across different interactomes. In this paper, we model cellular interactions as a set of network topologies, called multilayer networks. We consider motif counting, one of the most fundamental problems in network analysis. Existing motif counting and identification methods are limited to single network topologies, and thus they cannot be directly applied on multilayer networks. Results: In this paper, we extend the classical network motif identification problem to multilayer networks. We develop an efficient and accurate method to solve this problem. Our results on Escherichia coli (E.coli) transcription regulatory network under different experimental conditions show that our method scales to real networks and more importantly can uncover conserved functional characteristics of genes participating in the network under various conditions with very low false discovery rates.
Publisher DOI
Characterizing building blocks of resource constrained biological networks
BMC Bioinformatics · 2019-06-01 · 2 citations
articleOpen access
BACKGROUND: Identification of motifs-recurrent and statistically significant patterns-in biological networks is the key to understand the design principles, and to infer governing mechanisms of biological systems. This, however, is a computationally challenging task. This task is further complicated as biological interactions depend on limited resources, i.e., a reaction takes place if the reactant molecule concentrations are above a certain threshold level. This biochemical property implies that network edges can participate in a limited number of motifs simultaneously. Existing motif counting methods ignore this problem. This simplification often leads to inaccurate motif counts (over- or under-estimates), and thus, wrong biological interpretations. RESULTS: In this paper, we develop a novel motif counting algorithm, Partially Overlapping MOtif Counting (POMOC), that considers capacity levels for all interactions in counting motifs. CONCLUSIONS: Our experiments on real and synthetic networks demonstrate that motif count using the POMOC method significantly differs from the existing motif counting approaches, and our method extends to large-scale biological networks in practical time. Our results also show that our method makes it possible to characterize the impact of different stress factors on cell's organization of network. In this regard, analysis of a S. cerevisiae transcriptional regulatory network using our method shows that oxidative stress is more disruptive to organization and abundance of motifs in this network than mutations of individual genes. Our analysis also suggests that by focusing on the edges that lead to variation in motif counts, our method can be used to find important genes, and to reveal subtle topological and functional differences of the biological networks under different cell states.
Publisher DOI
AMS Sketch
Encyclopedia of Database Systems · 2018-01-01
book-chapter1st authorCorresponding
Publisher DOI
Characterizing Building Blocks of Resource Constrained Biological Networks
2018-08-15
articleOpen access
Identification of motifs-recurrent and statistically significant patterns-in biological networks is the key to understand the design principles, and to infer governing mechanisms of biological systems. This, however, is a computationally challenging task. This task is further complicated as biological interactions depend on limited resources, i.e., a reaction takes place if the reactant molecule concentrations are above a certain threshold level. This biochemical property implies that network edges can participate in a limited number of motifs simultaneously. Existing motif counting methods ignore this problem. This simplification often leads to inaccurate motif counts (over- or under-estimates), and thus, wrong biological interpretations. In this paper, we develop a novel motif counting algorithm, Partially Overlapping MOtif Counting (POMOC), that considers capacity levels for all interactions in counting motifs. Our experiments on real and synthetic networks demonstrate that motif count using the POMOC method significantly differs from the existing motif counting approaches, and our method extends to large-scale biological networks in practical time. Our results also show that our method makes it possible to characterize the impact of different stress factors on cell's organization of network. In this regard, analysis of a S. cerevisiae transcriptional regulatory network using our method shows that oxidative stress is more disruptive to organization and abundance of motifs in this network than mutations of individual genes. Our analysis also suggests that by focusing on the edges that lead to variation in motif counts, our method can be used to find important genes, and to reveal subtle topological and functional differences of the biological networks under different cell states.
Publisher OA PDF DOI
Decision Tree Classification
Encyclopedia of Database Systems · 2018-01-01 · 8 citations
book-chapter1st authorCorresponding
Publisher DOI
Characterizing Building Blocks of Resource Constrained Biological Networks
bioRxiv (Cold Spring Harbor Laboratory) · 2018-06-20 · 1 citations
preprintOpen accessCorresponding
ABSTRACT Identification of motifs-recurrent and statistically significant patterns-in biological networks is the key to understand the design principles, and to infer governing mechanisms of biological systems. This, however, is a computationally challenging task. This task is further complicated as biological interactions depend on limited resources, i.e., a reaction takes place if the reactant molecule concentrations are above a certain threshold level. This biochemical property implies that network edges can participate in a limited number of motifs simultaneously. Existing motif counting methods ignore this problem. This simplification often leads to inaccurate motif counts (over-or under-estimates), and thus, wrong biological interpretations. In this paper, we develop a novel motif counting algorithm, Partially Overlapping MOtif Counting ( POMOC ), that considers capacity levels for all interactions in counting motifs. Our experiments on real and synthetic networks demonstrate that motif count using the POMOC method significantly differs from the existing motif counting approaches, and our method extends to large-scale biological networks in practical time. Our results also show that our method makes it possible to characterize the impact of different stress factors on cell’s organization of network. In this regard, analysis of a S. cerevisiae transcriptional regulatory network using our method shows that oxidative stress is more disruptive to organization and abundance of motifs in this network than mutations of individual genes. Our analysis also suggests that by focusing on the edges that lead to variation in motif counts, our method can be used to find important genes, and to reveal subtle topological and functional differences of the biological networks under different cell states.
Publisher OA PDF DOI

Recent grants

CAREER: New Technologies for Approximate Query Processing
NSF · $500k · 2005–2011
III: EAGER: A Framework for Large Data Analysis
NSF · $100k · 2011–2014

Frequent coauthors

Tamer Kahveci
20 shared
Johannes Gehrke
Microsoft (United States)
13 shared
Minos Garofalakis
9 shared
Florin Rusu
9 shared
Andrei Todor
Emory University
9 shared
Rajeev Rastogi
8 shared
Christopher Jermaine
7 shared
Haitham Gabr
6 shared

Labs

Adaptive Learning & Optimization LabPI

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Alin Dobra

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you