REPOSITORY_HEADER // ID: 265

ACCESS_LEVEL: EXPLORER

Speech and Natural Language Processing - Question Answering

CURATED_BY: littlehelperINITIALIZED: ABOUT 2 MONTHS_AGOLAST_UPDATE: ABOUT 2 MONTHS_AGO

awesome computer-science

RSS JSON Markdown

This is a mirrored zone from the seriousran/awesome-qa repository. Part of the Awesome list collection.

Contents_Index

RECENT QA MODELS6
RECENT LANGUAGE MODELS11
AAAI 20201
ACL 201911
EMNLP-IJCNLP 20196
ARXIV8
DATASET2
ANALYSIS AND PARSING FOR PRE-PROCESSING IN QA SYSTEMS1
SYSTEMS3
PUBLICATIONS1
CODES6
LECTURES1
SLIDES2
DATASET COLLECTIONS2
DATASETS16
THE DEEPQA RESEARCH TEAM IN IBM WATSON'S PUBLICATION WITHIN 5 YEARS2
MS RESEARCH'S PUBLICATION WITHIN 5 YEARS3
GOOGLE AI'S PUBLICATION WITHIN 5 YEARS1
LINKS3

Recent QA Models

6_ENTRIES

DilBert
DilBert: Delaying Interaction Layers in Transformer-based Encoders for Efficient Open Domain Question Answering (2020)paper: https://arxiv.org/pdf/2010.08422.pdfgithub: https://github.com/wissam-sib/dilbert
UnifiedQA
UnifiedQA: Crossing Format Boundaries With a Single QA System (2020)Demo: https://unifiedqa.apps.allenai.org/
ProQA
ProQA: Resource-efficient method for pretraining a dense corpus index for open-domain QA and IR. (2020)paper: https://arxiv.org/pdf/2005.00038.pdfgithub: https://github.com/xwhan/ProQA
TYDI QA
TYDI QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages (2020)paper: https://arxiv.org/ftp/arxiv/papers/2003/2003.05002.pdf
Retrospective Reader for Machine Read...
Retrospective Reader for Machine Reading Comprehensionpaper: https://arxiv.org/pdf/2001.09694v2.pdf
TANDA
TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection (AAAI 2020)paper: https://arxiv.org/pdf/1911.04118.pdf

Recent Language Models

11_ENTRIES

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
, Kevin Clark, et al., ICLR, 2020.
TinyBERT: Distilling BERT for Natural Language Understanding
, Xiaoqi Jiao, et al., ICLR, 2020.
MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
, Wenhui Wang, et al., arXiv, 2020.
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
, Colin Raffel, et al., arXiv preprint, 2019.
ERNIE: Enhanced Language Representation with Informative Entities
, Zhengyan Zhang, et al., ACL, 2019.
XLNet: Generalized Autoregressive Pretraining for Language Understanding
, Zhilin Yang, et al., arXiv preprint, 2019.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
, Zhenzhong Lan, et al., arXiv preprint, 2019.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
, Yinhan Liu, et al., arXiv preprint, 2019.
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
, Victor sanh, et al., arXiv, 2019.
SpanBERT: Improving Pre-training by Representing and Predicting Spans
, Mandar Joshi, et al., TACL, 2019.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
, Jacob Devlin, et al., NAACL 2019, 2018.

AAAI 2020

1_ENTRIES

TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection
, Siddhant Garg, et al., AAAI 2020, Nov 2019.

ACL 2019

11_ENTRIES

Overview of the MEDIQA 2019 Shared Task on Textual Inference, Question Entailment and Question Answ…
, Asma Ben Abacha, et al., ACL-W 2019, Aug 2019.
Towards Scalable and Reliable Capsule Networks for Challenging NLP Applications
, Wei Zhao, et al., ACL 2019, Jun 2019.
Cognitive Graph for Multi-Hop Reading Comprehension at Scale
, Ming Ding, et al., ACL 2019, Jun 2019.
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
, Minjoon Seo, et al., ACL 2019, Jun 2019.
Unsupervised Question Answering by Cloze Translation
, Patrick Lewis, et al., ACL 2019, Jun 2019.
SemEval-2019 Task 10: Math Question Answering
, Mark Hopkins, et al., ACL-W 2019, Jun 2019.
Improving Question Answering over Incomplete KBs with Knowledge-Aware Reader
, Wenhan Xiong, et al., ACL 2019, May 2019.
Matching Article Pairs with Graphical Decomposition and Convolutions
, Bang Liu, et al., ACL 2019, May 2019.
Episodic Memory Reader: Learning what to Remember for Question Answering from Streaming Data
, Moonsu Han, et al., ACL 2019, Mar 2019.
Natural Questions: a Benchmark for Question Answering Research
, Tom Kwiatkowski, et al., TACL 2019, Jan 2019.
Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-s…
, Daesik Kim, et al., ACL 2019, Nov 2018.

EMNLP-IJCNLP 2019

6_ENTRIES

Language Models as Knowledge Bases?
, Fabio Petron, et al., EMNLP-IJCNLP 2019, Sep 2019.
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
, Hao Tan, et al., EMNLP-IJCNLP 2019, Dec 2019.
Answering Complex Open-domain Questions Through Iterative Query Generation
, Peng Qi, et al., EMNLP-IJCNLP 2019, Oct 2019.
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
, Bill Yuchen Lin, et al., EMNLP-IJCNLP 2019, Sep 2019.
Mixture Content Selection for Diverse Sequence Generation
, Jaemin Cho, et al., EMNLP-IJCNLP 2019, Sep 2019.
A Discrete Hard EM Approach for Weakly Supervised Question Answering
, Sewon Min, et al., EMNLP-IJCNLP, 2019, Sep 2019.

Arxiv

8_ENTRIES

Investigating the Successes and Failures of BERT for Passage Re-Ranking
, Harshith Padigela, et al., arXiv preprint, May 2019.
BERT with History Answer Embedding for Conversational Question Answering
, Chen Qu, et al., arXiv preprint, May 2019.
Understanding the Behaviors of BERT in Ranking
, Yifan Qiao, et al., arXiv preprint, Apr 2019.
BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis
, Hu Xu, et al., arXiv preprint, Apr 2019.
End-to-End Open-Domain Question Answering with BERTserini
, Wei Yang, et al., arXiv preprint, Feb 2019.
A BERT Baseline for the Natural Questions
, Chris Alberti, et al., arXiv preprint, Jan 2019.
Passage Re-ranking with BERT
, Rodrigo Nogueira, et al., arXiv preprint, Jan 2019.
SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering
, Chenguang Zhu, et al., arXiv, Dec 2018.

Dataset

2_ENTRIES

ELI5: Long Form Question Answering
, Angela Fan, et al., ACL 2019, Jul 2019
CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense
, Michael Chen, et al., RepEval 2019, Jun 2019.

Analysis and Parsing for Pre-processing in QA systems

1_ENTRIES

Lanugage Analysis

Homonyms / Polysemy Analysis
Syntactic Parsing (Dependency Parsing)
Semantic Recognition

Morphological analysis

Systems

3_ENTRIES

IBM Watson
Has state-of-the-arts performance.
Facebook DrQA
Applied to the SQuAD1.0 dataset. The SQuAD2.0 dataset has released. but DrQA is not tested yet.
MIT media lab's Knowledge graph
Is a freely-available semantic network, designed to help computers understand the meanings of words that people use.

Publications

1_ENTRIES

Papers["Learning to Skim Text"](https...
Papers"Learning to Skim Text", Adams Wei Yu, Hongrae Lee, Quoc V. Le, 2017. : Show only what you want in Text"Deep Joint Entity Disambiguation with Local Neural Attention", Octavian-Eugen Ganea and Thomas Hofmann, 2017."BI-DIRECTIONAL ATTENTION FLOW FOR MACHINE COMPREHENSION", Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hananneh Hajishirzi, ICLR, 2017.["Capturing Semantic Si…

Codes

6_ENTRIES

BiDAF
Bi-Directional Attention Flow (BIDAF) network is a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization.Official; Tensorflow v1.2Paper
QANet
A Q&A architecture does not require recurrent networks: Its encoder consists exclusively of convolution and self-attention, where convolution models local interactions and self-attention models global interactions.Google; Unofficial; Tensorflow v1.5Paper
R-Net
An end-to-end neural networks model for reading comprehension style question answering, which aims to answer questions from a given passage.MS; Unofficially by HKUST; Tensorflow v1.5Paper
R-Net-in-Keras
R-NET re-implementation in Keras.MS; Unofficial; Keras v2.0.6Paper
DrQA
DrQA is a system for reading comprehension applied to open-domain question answering.Facebook; Official; Pytorch v0.4Paper
BERT
A new language representation model which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.Google; Official implementation; Tensorflow v1.11.0Paper

Lectures

1_ENTRIES

Question Answering - Natural Language Processing
By Dragomir Radev, Ph.D. | University of Michigan | 2016.

Slides

2_ENTRIES

Question Answering with Knowledge Bases, Web and Beyond
By Scott Wen-tau Yih & Hao Ma | Microsoft Research | 2016.
Question Answering
By Dr. Mariana Neves | Hasso Plattner Institut | 2017.

Dataset Collections

2_ENTRIES

Datasets

16_ENTRIES

It is one of the bAbI project of Facebook AI Research which is organized towards the goal of automatic text understanding and reasoning. The CBT is designed to measure directly how well language models can exploit wider linguistic context.

AI2 Science Questions v2.1(2017)
It consists of questions used in student assessments in the United States across elementary and middle school grade levels. Each question is 4-way multiple choice format and may or may not include a diagram element.Paper: http://ai2-website.s3.amazonaws.com/publications/AI2ReasoningChallenge2018.pdf
Children's Book Test
CODAH Dataset
DeepMind Q&A Dataset; CNN/Daily Mail
Hermann et al. (2015) created two awesome datasets using news articles for Q&A research. Each dataset contains many documents (90k and 197k each), and each document companies on average 4 questions approximately. Each question is a sentence with one missing word/phrase which can be found from the accompanying document/context.Paper: https://arxiv.org/abs/1506.03340
ELI5
Paper: https://arxiv.org/abs/1907.09190
GraphQuestions
On generating Characteristic-rich Question sets for QA evaluation.
LC-QuAD
It is a gold standard KBQA (Question Answering over Knowledge Base) dataset containing 5000 Question and SPARQL queries. LC-QuAD uses DBpedia v04.16 as the target KB.
MS MARCO
This is for real-world question answering.Paper: https://arxiv.org/abs/1611.09268
MultiRC
A dataset of short paragraphs and multi-sentence questionsPaper: http://cogcomp.org/page/publication_view/833
NarrativeQA
It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.Paper: https://arxiv.org/pdf/1712.07040v1.pdf
NewsQA
A machine comprehension datasetPaper: https://arxiv.org/pdf/1611.09830.pdf
Qestion-Answer Dataset by CMU
This is a corpus of Wikipedia articles, manually-generated factoid questions from them, and manually-generated answers to these questions, for use in academic research. These data were collected by Noah Smith, Michael Heilman, Rebecca Hwa, Shay Cohen, Kevin Gimpel, and many students at Carnegie Mellon University and the University of Pittsburgh between 2008 and 2010.
SQuAD2.0
SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 new, unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.Paper: https://arxiv.org/abs/1806.03822
Story cloze test
'Story Cloze Test' is a new commonsense reasoning framework for evaluating story understanding, story generation, and script learning. This test requires a system to choose the correct ending to a four-sentence story.Paper: https://arxiv.org/abs/1604.01696
TriviaQA
TriviaQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.Paper: https://arxiv.org/abs/1705.03551
WikiQA
A publicly available set of question and sentence pairs for open-domain question answering.

The DeepQA Research Team in IBM Watson's publication within 5 years

2_ENTRIES

2015"Automated Problem List Generatio...
2015"Automated Problem List Generation from Electronic Medical Records in IBM Watson", Murthy Devarakonda, Ching-Huei Tsou, IAAI, 2015."Decision Making in IBM Watson Question Answering", J. William Murdock, Ontology summit, 2015."Unsupervised Entity-Relation Analysis in IBM Watson", Aditya Kalyanpur, J William Murdock, ACS, 2015."Commonsense Reasoning: An Event Calculus Based Approach", E T Mueller, Morgan Kaufmann/Elsevier, 2015.
2014"Problem-oriented patient record ...
2014"Problem-oriented patient record summary: An early report on a Watson application", M. Devarakonda, Dongyang Zhang, Ching-Huei Tsou, M. Bornea, Healthcom, 2014."WatsonPaths: Scenario-based Question Answering and Inference over Unstructured Information", Adam Lally, Sugato Bachi, Michael A. Barborak, David W. Buchanan, Jennifer Chu-Carroll, D…

MS Research's publication within 5 years

3_ENTRIES

2017"Multi-level Attention Networks for Visual Question Answering", Dongfei Yu, Jianlong Fu, Tao Mei, Yong Rui, CVPR, 2017."A Joint Model for Question Answering and Question Generation", Tong Wang, Xingdi (Eric) Yuan, Adam Trischler, ICML, 2017."Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension", David Golub, Po-Sen Huang, Xiaodong He, Li Deng, EMNLP, 2017."Question-Answering with Grammatically-Interpretable Representations", Hamid Palangi, Paul Smolensky, Xiaodong He, Li Deng,"Search-based Neural Structured Learning for Sequential Question Answering", Mohit Iyyer, Wen-tau Yih, Ming-Wei Chang, ACL, 2017.
2014"An Overview of Microsoft Deep QA System on Stanford WebQuestions Benchmark", Zhenghao Wang, Shengquan Yan, Huaming Wang, and Xuedong Huang, MSR-TR, 2014."Semantic Parsing for Single-Relation Question Answering", Wen-tau Yih, Xiaodong He, Christopher Me…

2018"Characterizing and Supporting Qu...
2018"Characterizing and Supporting Question Answering in Human-to-Human Communication", Xiao Yang, Ahmed Hassan Awadallah, Madian Khabsa, Wei Wang, Miaosen Wang, ACM SIGIR, 2018."FigureQA: An Annotated Figure Dataset for Visual Reasoning", Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, Yoshua Bengio, ICLR, 2018
2016["Stacked Attention Networks for ...
2016"Stacked Attention Networks for Image Question Answering", Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola, CVPR, 2016."Question Answering with Knowledge Base, Web and Beyond", Yih, Scott Wen-tau and Ma, Hao, ACM SIGIR, 2016.["NewsQA: A Machine Compreh…
2015["WIKIQA
2015"WIKIQA: A Challenge Dataset for Open-Domain Question Answering", Yi Yang, Wen-tau Yih, and Christopher Meek, EMNLP, 2015."Web-based Question Answering: Revisiting AskMSR", Chen-Tse Tsai, Wen-tau Yih, and Christopher J.C. Burges, MSR-TR, 2015.["Open Domain Question Answering via Semantic Enrichme…

Google AI's publication within 5 years

1_ENTRIES

2017"Analyzing Language Learned by an Active Question Answering Agent", Christian Buck and Jannis Bulian and Massimiliano Ciaramita and Wojciech Gajewski and Andrea Gesmundo and Neil Houlsby and Wei Wang, NIPS, 2017."Learning Recurrent Span Representations for Extractive Question Answering", Kenton Lee and Shimi Salant and Tom Kwiatkowski and Ankur Parikh and Dipanjan Das and Jonathan Berant, ICLR, 2017.Identify the same question"Neural Paraphrase Identification of Questions with Noisy Pretraining", Gaurav Singh Tomar and Thyago Duque and Oscar Täckström and Jakob Uszkoreit and Dipanjan Das, SCLeM, 2017.
2014"Great Question! Question Quality in Community Q&A", Sujith Ravi and Bo Pang and Vibhor Rastogi and Ravi Kumar, ICWSM, 2014.

2018Google QA ["QANet
2018Google QA "QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension", Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le, ICLR, 2018."Ask the Right Questions: Active Question Reformulation with Reinforcement Learning", Christian Buck and Jannis Bulian and Massimiliano Ciaramita and Wojciech Paweł Gajewski and Andrea Gesmundo and Neil…

Speech and Natural Language Processing - Question Answering

Contents_Index

Recent QA Models

Recent Language Models

AAAI 2020

ACL 2019

EMNLP-IJCNLP 2019

Arxiv

Dataset

Analysis and Parsing for Pre-processing in QA systems

Systems

Publications

Codes

Lectures

Slides

Dataset Collections

Datasets

The DeepQA Research Team in IBM Watson's publication within 5 years

MS Research's publication within 5 years

Google AI's publication within 5 years

Links

Exploration_Discussion