Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Javed Aslam

Javed Aslam

· Professor, Chief of Artificial IntelligenceVerified

Northeastern University · Artificial Intelligence and Data Science

Active 1979–2023

h-index28
Citations3.3k
Papers11813 last 5y
Funding$1.6M
See your match with Javed Aslam — sign in to PhdFit.Sign in

About

Javed Aslam is chief of artificial intelligence and a professor at the Khoury College of Computer Sciences at Northeastern University, based in Boston. His research primarily focuses on machine learning and information retrieval, with experience spanning human computation, transportation, computer security, wireless networking, and medical informatics. In machine learning, he has developed models and algorithms for multi-label classification and learning in the presence of noisy or erroneous training data. In information retrieval, he has applied techniques from machine learning, statistics, information theory, and social choice theory to develop algorithms for automatic information organization, metasearch, and efficient search engine training and evaluation. Prior to his current role, he served as an assistant professor at Dartmouth College and was a postdoctoral researcher at Harvard University.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Machine Learning
  • Information Retrieval
  • Data Mining
  • Natural Language Processing
  • Multimedia
  • Theoretical computer science
  • Engineering
  • Data science
  • Programming language
  • World Wide Web

Selected publications

  • Unbiased Identification of Broadly Appealing Content Using a Pure Exploration Infinitely Armed Bandit Strategy

    ACM Transactions on Recommender Systems · 2023-10-05 · 1 citations

    articleOpen accessSenior author

    Podcasting is an increasingly popular medium for entertainment and discourse around the world, with tens of thousands of new podcasts released on a monthly basis. We consider the problem of identifying from these newly released podcasts those with the largest potential audiences so they can be considered for personalized recommendation to users. We first study and then discard a supervised approach due to the inadequacy of either content or consumption features for this task and instead propose a novel non-contextual bandit algorithm in the fixed-budget infinitely armed pure-exploration setting. We demonstrate that our algorithm is well suited to the best-arm identification task for a broad class of arm reservoir distributions, out-competing a large number of state-of-the-art algorithms. We then apply the algorithm to identifying podcasts with broad appeal in a simulated study and show that it efficiently sorts podcasts into groups by increasing appeal while avoiding the popularity bias inherent in supervised approaches. Finally, we study a setting in which users are more likely to stream more-streamed podcasts independent of their general appeal and find that our proposed algorithm is robust to this type of popularity bias. 1

  • Identifying New Podcasts with High General Appeal Using a Pure Exploration Infinitely-Armed Bandit Strategy

    2022 · 7 citations

    Senior authorCorresponding
    • Computer Science
    • Computer Science
    • Machine Learning

    Podcasting is an increasingly popular medium for entertainment and discourse around the world, with tens of thousands of new podcasts released on a monthly basis. We consider the problem of identifying from these newly-released podcasts those with the largest potential audiences so they can be considered for personalized recommendation to users. We first study and then discard a supervised approach due to the inadequacy of either content or consumption features for this task, and instead propose a novel non-contextual bandit algorithm in the fixed-budget infinitely-armed pure-exploration setting. We demonstrate that our algorithm is well-suited to the best-arm identification task for a broad class of arm reservoir distributions, out-competing a large number of state-of-the-art algorithms. We then apply the algorithm to identifying podcasts with broad appeal in a simulated study, and show that it efficiently sorts podcasts into groups by increasing appeal while avoiding the popularity bias inherent in supervised approaches.

  • Supervised Learning in the Presence of Noise: Application in ICD-10 Code Classification

    arXiv (Cornell University) · 2021-03-13

    preprintOpen accessSenior author

    ICD coding is the international standard for capturing and reporting health conditions and diagnosis for revenue cycle management in healthcare. Manually assigning ICD codes is prone to human error due to the large code vocabulary and the similarities between codes. Since machine learning based approaches require ground truth training data, the inconsistency among human coders is manifested as noise in labeling, which makes the training and evaluation of ICD classifiers difficult in presence of such noise. This paper investigates the characteristics of such noise in manually-assigned ICD-10 codes and furthermore, proposes a method to train robust ICD-10 classifiers in the presence of labeling noise. Our research concluded that the nature of such noise is systematic. Most of the existing methods for handling label noise assume that the noise is completely random and independent of features or labels, which is not the case for ICD data. Therefore, we develop a new method for training robust classifiers in the presence of systematic noise. We first identify ICD-10 codes that human coders tend to misuse or confuse, based on the codes' locations in the ICD-10 hierarchy, the types of the codes, and baseline classifier's prediction behaviors; we then develop a novel training strategy that accounts for such noise. We compared our method with the baseline that does not handle label noise and the baseline methods that assume random noise, and demonstrated that our proposed method outperforms all baselines when evaluated on expert validated labels.

  • From Extreme Multi-label to Multi-class: A Hierarchical Approach for Automated ICD-10 Coding Using Phrase-level Attention

    arXiv (Cornell University) · 2021 · 6 citations

    • Computer Science
    • Computer Science
    • Artificial Intelligence

    Clinical coding is the task of assigning a set of alphanumeric codes, referred to as ICD (International Classification of Diseases), to a medical event based on the context captured in a clinical narrative. The latest version of ICD, ICD-10, includes more than 70,000 codes. As this is a labor-intensive and error-prone task, automatic ICD coding of medical reports using machine learning has gained significant interest in the last decade. Existing literature has modeled this problem as a multi-label task. Nevertheless, such multi-label approach is challenging due to the extremely large label set size. Furthermore, the interpretability of the predictions is essential for the endusers (e.g., healthcare providers and insurance companies). In this paper, we propose a novel approach for automatic ICD coding by reformulating the extreme multi-label problem into a simpler multi-class problem using a hierarchical solution. We made this approach viable through extensive data collection to acquire phrase-level human coder annotations to supervise our models on learning the specific relations between the input text and predicted ICD codes. Our approach employs two independently trained networks, the sentence tagger and the ICD classifier, stacked hierarchically to predict a codeset for a medical report. The sentence tagger identifies focus sentences containing a medical event or concept relevant to an ICD coding. Using a supervised attention mechanism, the ICD classifier then assigns each focus sentence with an ICD code. The proposed approach outperforms strong baselines by large margins of 23% in subset accuracy, 18% in micro-F1, and 15% in instance based F-1. With our proposed approach, interpretability is achieved not through implicitly learned attention scores but by attributing each prediction to a particular sentence and words selected by human coders.

  • Improving Query Graph Generation for Complex Question Answering over Knowledge Base

    Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing · 2021 · 14 citations

    Senior authorCorresponding
    • Computer Science
    • Computer Science
    • Information Retrieval

    Most of the existing Knowledge-based Question Answering (KBQA) methods first learn to map the given question to a query graph, and then convert the graph to an executable query to find the answer. The query graph is typically expanded progressively from the topic entity based on a sequence prediction model. In this paper, we propose a new solution to query graph generation that works in the opposite manner: we start with the entire knowledge base and gradually shrink it to the desired query graph. This approach improves both the efficiency and the accuracy of query graph generation, especially for complex multi-hop questions. Experimental results show that our method achieves state-of-the-art performance on ComplexWebQuestion (CWQ) dataset.

  • An Ensemble Approach for Automatic Structuring of Radiology Reports

    2020-01-01 · 2 citations

    preprintOpen access

    Automatic structuring of electronic medical records is of high demand for clinical workflow solutions to facilitate extraction, storage, and querying of patient care information. However, developing a scalable solution is extremely challenging, specifically for radiology reports, as most healthcare institutes use either no template or department/institute specific templates. Moreover, radiologists' reporting style varies from one to another as sentences are telegraphic and do not follow general English grammar rules. We present an ensemble method that consolidates the predictions of three models, capturing various attributes of textual information for automatic labeling of sentences with section labels. These three models are: 1) Focus Sentence model, capturing context of the target sentence; 2) Surrounding Context model, capturing the neighboring context of the target sentence; and finally, 3) Formatting/Layout model, aimed at learning report formatting cues. We utilize Bi-directional LSTMs, followed by sentence encoders, to acquire the context. Furthermore, we define several features that incorporate the structure of reports. We compare our proposed approach against multiple baselines and state-of-the-art approaches on a proprietary dataset as well as 100 manually annotated radiology notes from the MIMIC-III dataset, which we are making publicly available. Our proposed approach significantly outperforms other approaches by achieving 97.1% accuracy.

  • A Complex KBQA System using Multiple Reasoning Paths

    arXiv (Cornell University) · 2020-05-22 · 5 citations

    preprintOpen accessSenior author

    Multi-hop knowledge based question answering (KBQA) is a complex task for natural language understanding. Many KBQA approaches have been proposed in recent years, and most of them are trained based on labeled reasoning path. This hinders the system's performance as many correct reasoning paths are not labeled as ground truth, and thus they cannot be learned. In this paper, we introduce an end-to-end KBQA system which can leverage multiple reasoning paths' information and only requires labeled answer as supervision. We conduct experiments on several benchmark datasets containing both single-hop simple questions as well as muti-hop complex questions, including WebQuestionSP (WQSP), ComplexWebQuestion-1.1 (CWQ), and PathQuestion-Large (PQL), and demonstrate strong performance.

  • Learning to Calibrate and Rerank Multi-label Predictions

    Lecture notes in computer science · 2020-01-01 · 5 citations

    book-chapter
  • A Supervised Topic Model Approach to Learning Effective Styles within Human-Agent Negotiation

    2020-05-05 · 2 citations

    articleOpen access

    We present a method that analyzes a person's negotiation behavior to automatically detect co-occurrence of tactics and combination of tactics (i.e., negotiation styles). We first identify action features consistent with use of the common negotiation tactics based on prior research in negotiation. Next, we apply regularized linear regression over a negotiation dataset to assess how effective particular tactics are in predicting the negotiation outcome. Finally, we use a supervised variant of a topic model to derive effective negotiation styles. Results from the clusters produced by the topic models provide insights regarding the effectiveness of negotiation styles that people utilize.

  • Adapting RNN Sequence Prediction Model to Multi-label Set Prediction

    arXiv (Cornell University) · 2019-04-11 · 15 citations

    preprintOpen accessSenior author

    We present an adaptation of RNN sequence models to the problem of multi-label classification for text, where the target is a set of labels, not a sequence. Previous such RNN models define probabilities for sequences but not for sets; attempts to obtain a set probability are after-thoughts of the network design, including pre-specifying the label order, or relating the sequence probability to the set probability in ad hoc ways. Our formulation is derived from a principled notion of set probability, as the sum of probabilities of corresponding permutation sequences for the set. We provide a new training objective that maximizes this set probability, and a new prediction objective that finds the most probable set on a test document. These new objectives are theoretically appealing because they give the RNN model freedom to discover the best label order, which often is the natural one (but different among documents). We develop efficient procedures to tackle the computation difficulties involved in training and prediction. Experiments on benchmark datasets demonstrate that we outperform state-of-the-art methods for this task.

Recent grants

Frequent coauthors

Labs

  • Khoury College of Computer SciencesPI

Awards & honors

  • Best Poster Paper Award (2013)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Javed Aslam

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup