Speech and Natural Language Processing - Natural Language Generation
Datasets
10_ENTRIES- Alex Context NLG Dataset
A dataset for NLG in dialogue systems in the public transport information domain.
- Box-score data
This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.
- E2E
This shared task focuses on recent end-to-end (E2E), data-driven NLG methods, which jointly learn sentence planning and surface realisation from non-aligned data.
- Neural-Wikipedian
The repository contains the code along with the required corpora that were used in order to build a system that "learns" how to generate English biographies for Semantic Web triples.
- WeatherGov
Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.
- WebNLG
The enriched version of the WebNLG - a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation.
- WikiBio - wikipedia biography dataset
This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms.
- The Schema-Guided Dialogue Dataset
The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant.
- The Wikipedia company corpus
Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English.
- YelpNLG
YelpNLG provides resources for natural language generation of restaurant reviews.
Dialog
5_ENTRIES- Chatito
Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
- NNDIAL
NNDial is an open source toolkit for building end-to-end trainable task-oriented dialogue models.
- Plato
This is the Plato Research Dialogue System, a flexible platform for developing conversational AI agents.
- RNNLG
RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains.
- TGen
Statistical NLG for spoken dialogue systems.
Evaluation
5_ENTRIES- compare-mt
A tool for holistic analysis of language generations systems.
- GEM
a benchmark environment for NLG with a focus on its Evaluation, both through human annotations and automated Metrics.
- NLG-eval
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
- VizSeq
A Visual Analysis Toolkit for Text Generation Tasks.
Grammar
5_ENTRIES- OpenCCG
OpenCCG library for parsing and realization with CCG.
- GrammaticalFramework
A programming language for multilingual grammar applications.
- EasyCCG
CCG: All combinators, common grammar format, parsing to logical form, parameter estimation for probabilistic CCG.
- CCG Lab
All combinators, common grammar format, parsing to logical form, parameter estimation for probabilistic CCG.
- CCGweb
A Web platform for parsing and annotation.
Libraries
3_ENTRIES- Cron Expression Descriptor
A .NET library that converts cron expressions into human readable descriptions.
- Number Words
Convert a number to an approximated text expression: from '0.23' to 'less than a quarter'.
- Writebot
A NodeJS library that makes it easier to use GPT-3 by using presets.
Narrative Generation
2_ENTRIES- Random Story Generator
Using Natural Language Generation (NLG) to create a random short story.
- Tracery
A story-grammar generation library for JavaScript.
Neural Natural Language Generation
12_ENTRIES- aitextgen
A robust Python tool for text-based AI training and generation using GPT-2.
- graph-2-text
Graph to sequence implemented in Pytorch combining Graph convolutional networks and opennmt-py.
- Image Caption Generator
A Neural Network based generative model for captioning images using Tensorflow.
- lightnlg
A minimalistic codebase for finetuning and interacting with NLG models using PyTorch Lightning.
- PaperRobot: Incremental Draft Generation of Scientific Ideas
We present a PaperRobot who performs as an automatic research assistant.
- PPLM
Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
- Question Generation using hugstransformers
Question generation is the task of automatically generating questions from a text paragraph.
- Texar
Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks.
- textgenrnn
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
- This Word Does Not Exist
This is a project allows people to train a variant of GPT-2 that makes up words, definitions and examples from scratch.
- Transformers
State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
- Summary Generation From Structured Data
For converting information present in the form of structured data into natural language text.
Papers and Articles
16_ENTRIESProducts
3_ENTRIES- Accelerated Text
Automatically generate multiple natural language descriptions of your data varying in wording and structure.
- RosaeNLG
An open-source library for node.js or client side (browser) execution, based on the Pug template engine, to generate texts in English, French, German and Italian.
- Twine
An open-source tool for telling interactive, nonlinear stories.
Realizers
5_ENTRIES- Genl
Surface realiser (part of a Natural Language Generation system) using Tree Adjoining Grammar.
- JSrealB
A JavaScript bilingual text realizer for web development.
- SimpleNLG
Java API for Natural Language Generation.
- SimpleNLG DE
German version of SimpleNLG 4.
- SimpleNLG-EnFr
SimpleNLG-EnFr 1.1 is a bilingual English/French adaption of SimpleNLG v4.2.
Templating Languages
3_ENTRIES- calyx
A Ruby library for generating text with recursive template grammars.
- nalgene
Natural language generation language.
- StringTemplate
Java template engine (with ports for C##, Objective-C, JavaScript, Scala) for generating source code, web pages, emails, or any other formatted text output.