Speech and Natural Language Processing - NLP with Ruby
Contents_Index
- PIPELINE GENERATION5
- MULTIPURPOSE ENGINES6
- ON-LINE APIS5
- LANGUAGE IDENTIFICATION1
- SEGMENTATION8
- STEMMING2
- LEMMATIZATION1
- LEXICAL STATISTICS: COUNTING TYPES AND TOKENS3
- FILTERING STOP WORDS1
- PHRASAL LEVEL PROCESSING3
- CONSTITUENCY PARSING3
- SEMANTIC ANALYSIS6
- PRAGMATICAL ANALYSIS1
- SPELLING AND ERROR CORRECTION4
- TEXT ALIGNMENT1
- MACHINE TRANSLATION4
- SENTIMENT ANALYSIS1
- NUMBERS, DATES, AND TIME PARSING7
- NAMED ENTITY RECOGNITION2
- TEXT-TO-SPEECH-TO-TEXT4
- DIALOG AGENTS, ASSISTANTS, AND CHATBOTS2
- LINGUISTIC RESOURCES2
- MACHINE LEARNING LIBRARIES15
- OPTICAL CHARACTER RECOGNITION1
- TEXT EXTRACTION1
- FULL TEXT SEARCH, INFORMATION RETRIEVAL, INDEXING6
- LANGUAGE AWARE STRING MANIPULATION13
- ARTICLES, POSTS, TALKS, AND PRESENTATIONS5
- PROJECTS AND CODE EXAMPLES4
- BOOKS2
- COMMUNITY3
- NEEDS YOUR HELP!2
- RELATED RESOURCES9
Pipeline Generation
5_ENTRIES- composable_operations
Definition framework for operation pipelines.
- ruby-spark
Spark bindings with an easy to understand DSL.
- phobos
Simplified Ruby Client for Apache Kafka.
- parallel
Supervisor for parallel execution on multiple CPUs or in many threads.
- pwrake
Rake extensions to run local and remote tasks in parallel.
Multipurpose Engines
6_ENTRIES- stanford-core-nlp
Ruby Bindings for the Stanford CoreNLP tools.
- nlp_toolz
Wrapper over some OpenNLP classes and the original Berkeley Parser.
- ruby-spacy
Wrapper module for spaCy NLP library via PyCall.
On-line APIs
5_ENTRIES- alchemyapi_ruby
Legacy Ruby SDK for AlchemyAPI/Bluemix.
- wlapi
Ruby client library for Wortschatz Leipzig web services.
- monkeylearn-ruby
Sentiment Analysis, Topic Modelling, Language Detection, Named Entity Recognition via a Ruby based Web API client.
- google-cloud-language
Google's Natural Language service API for Ruby.
Language Identification
1_ENTRIESLanguage Identification is one of the first crucial steps in every NLP Pipeline.
- scylla
Language Categorization and Identification.
Segmentation
8_ENTRIESTools for Tokenization, Word and Sentence Boundary Detection and Disambiguation.
- pragmatic_tokenizer
Multilingual tokenizer to split a string into tokens.
- nlp-pure
Natural language processing algorithms implemented in pure Ruby with minimal dependencies.
- textoken
Simple and customizable text tokenization library.
- pragmatic_segmenter
Word Boundary Disambiguation with many cookies.
- punkt-segmenter
Pure Ruby implementation of the Punkt Segmenter.
- tactful_tokenizer
RegExp based tokenizer for different languages.
- scapel
Sentence Boundary Disambiguation tool.
Stemming
2_ENTRIESStemming is the term used in information retrieval to describe the process for reducing wordforms to some base representation. Stemming should be distinguished from Lemmatization since stems are not necessarily have linguistic motivation.
- ruby-stemmer
Ruby-Stemmer exposes the SnowBall API to Ruby.
- uea-stemmer
Conservative stemmer for search and indexing.
Lemmatization
1_ENTRIESLemmatization is considered a process of finding a base form of a word. Lemmas are often collected in dictionaries.
- lemmatizer
WordNet based Lemmatizer for English texts.
Lexical Statistics: Counting Types and Tokens
3_ENTRIES- wc
Facilities to count word occurrences in a text.
- word_count
Word counter for
StringandHashobjects. - words_counted
Pure Ruby library counting word statistics with different custom options.
Filtering Stop Words
1_ENTRIES- stopwords-filter
Filter and Stop Word Lexicon based on the SnowBall lemmatizer.
Phrasal Level Processing
3_ENTRIES- n_gram
N-Gram generator.
- ruby-ngram
Break words and phrases into ngrams.
- raingrams
Flexible and general-purpose ngrams library written in pure Ruby.
Constituency Parsing
3_ENTRIES- stanfordparser
Ruby based wrapper for the Stanford Parser.
- rsyntaxtree
Visualization for syntactic trees in Ruby based on RMagick. [dep: ImageMagick]
Semantic Analysis
6_ENTRIES- amatch
Set of five distance types between strings (including Levenshtein, Sellers, Jaro-Winkler, 'pair distance').
- damerau-levenshtein
Calculates edit distance using the Damerau-Levenshtein algorithm.
- hotwater
Fast Ruby FFI string edit distance algorithms.
- levenshtein-ffi
Fast string edit distance computation, using the Damerau-Levenshtein algorithm.
- tf_idf
Term Frequency / Inverse Document Frequency in pure Ruby.
- tf-idf-similarity
Calculate the similarity between texts using TF/IDF.
Pragmatical Analysis
1_ENTRIES- SentimentLib
Simple extensible sentiment analysis gem.
Spelling and Error Correction
4_ENTRIES- hunspell-i18n
Ruby bindings to the standard Hunspell Spell Checker.
- ffi-hunspell
FFI based Ruby bindings for Hunspell.
Text Alignment
1_ENTRIES- alignment
Alignment routines for bilingual texts (Gale-Church implementation).
Machine Translation
4_ENTRIES- google-api-client
Google API Ruby Client.
- microsoft_translator
Ruby client for the microsoft translator API.
- termit
Google Translate with speech synthesis in your terminal.
- zipf
implementation of BLEU and other base algorithms.
Sentiment Analysis
1_ENTRIESNumbers, Dates, and Time Parsing
7_ENTRIES- chronic
Pure Ruby natural language date parser.
- chronic_between
Simple Ruby natural language parser for date and time ranges.
- chronic_duration
Pure Ruby parser for elapsed time.
- kronic
Methods for parsing and formatting human readable dates.
- nickel
Extracts date, time, and message information from naturally worded text.
- tickle
Parser for recurring and repeating events.
- numerizer
Ruby parser for English number expressions.
Named Entity Recognition
2_ENTRIESText-to-Speech-to-Text
4_ENTRIES- espeak-ruby
Small Ruby API for utilizing 'espeak' and 'lame' to create text-to-speech mp3 files.
- tts
Text-to-Speech conversion using the Google translate service.
- att_speech
Ruby wrapper over the AT&T Speech API for speech to text.
- pocketsphinx-ruby
Pocketsphinx bindings.
Dialog Agents, Assistants, and Chatbots
2_ENTRIES- chatterbot
Straightforward ruby-based Twitter Bot Framework, using OAuth to authenticate.
Linguistic Resources
2_ENTRIES- rwordnet
Pure Ruby self contained API library for the Princeton WordNet®.
- wordnet
Performance tuned bindings for the Princeton WordNet®.
Machine Learning Libraries
15_ENTRIESMachine Learning Algorithms in pure Ruby or written in other programming languages with appropriate bindings for Ruby.
For more up-to-date list please look at the Awesome ML with Ruby list.
- rb-libsvm
Support Vector Machines with Ruby.
- weka
JRuby bindings for Weka, different ML algorithms implemented through Weka.
- decisiontree
Decision Tree ID3 Algorithm in pure Ruby [post].
- rtimbl
Memory based learners from the Timbl framework.
- classifier-reborn
General classifier module to allow Bayesian and other types of classifications.
- liblinear-ruby-swig
Ruby interface to LIBLINEAR (much more efficient than LIBSVM for text classification).
- linnaeus
Redis-backed Bayesian classifier.
- maxent_string_classifier
JRuby maximum entropy classifier for string data, based on the OpenNLP Maxent framework.
- naive_bayes
Simple Naive Bayes classifier.
- nbayes
Full-featured, Ruby implementation of Naive Bayes.
- omnicat
Generalized rack framework for text classifications.
- omnicat-bayes
Naive Bayes text classification implementation as an OmniCat classifier strategy.
- ruby-fann
Ruby bindings to the Fast Artificial Neural Network Library (FANN).
- rblearn
Feature Extraction and Crossvalidation library.
Optical Character Recognition
1_ENTRIES- tesseract-ocr
FFI based wrapper over the Tesseract OCR Engine.
Text Extraction
1_ENTRIES- yomu
library for extracting text and metadata from files and documents using the Apache Tika content analysis toolkit.
Full Text Search, Information Retrieval, Indexing
6_ENTRIES- rsolr
Ruby and Rails client library for Apache Solr.
- sunspot
Rails centric client for Apache Solr.
- thinking-sphinx
Active Record plugin for using Sphinx in (not only) Rails based projects.
- elasticsearch
Ruby client and API for Elasticsearch.
- elasticsearch-rails
Ruby and Rails integrations for Elasticsearch.
- google-api-client
Ruby API library for Google services.
Language Aware String Manipulation
13_ENTRIESLibraries for language aware string manipulation, i.e. search, pattern matching, case conversion, transcoding, regular expressions which need information about the underlying language.
- fuzzy_match
Fuzzy string comparison with Distance measures and Regular Expression.
- fuzzy-string-match
Fuzzy string matching library for Ruby.
- active_support
RoR
ActiveSupportgem has various string extensions that can handle case. - fuzzy_tools
Toolset for fuzzy searches in Ruby tuned for accuracy.
- u
U extends Ruby’s Unicode support.
- unicode
Unicode normalization library.
- CommonRegexRuby
Find a lot of kinds of common information in a string.
- regexp-examples
Generate strings that match a given regular expression.
- verbal_expressions
Make difficult regular expressions easy.
- translit_kit
Transliterate Hebrew & Yiddish text into Latin characters.
- re2
hight-speed Regular Expression library for Text Mining and Text Extraction.
- regex_sample
sample string generation from a given Regular Expression.
- iuliia
transliteration Cyrillic to Latin in many possible ways (defined by the reference implementation).
Articles, Posts, Talks, and Presentations
5_ENTRIES-
2019_Extracting Text From Images Using Ruby_ by aonemd [post | code]
-
2018_Natural Language Processing and Tweet Sentiment Analysis_ by Cassandra Corrales [post]
-
2015_N-gram Analysis for Fun and Profit_ by Jesus Castello [tutorial]Machine Learning made simple with Ruby by Lorenzo Masini [tutorial]Using Ruby Machine Learning to Find Paris Hilton Quotes by Rick Carlino [[tutorial](http://web.archive.org/web/20160414072324/http://datamelo…
- 2017_The Google NLP API Meets Ruby_ b...
2017_The Google NLP API Meets Ruby_ by Aja Hammerly [post]Syntax Isn't Everything: NLP For Rubyists by Aja Hammerly [slides]Scientific Computing on JRuby by Prasun Anand [[slides](https://www.slideshare.net/PrasunAnand2/fosdem2017-scientific-comp…
- 2016_Quickly Create a Telegram Bot in...
2016_Quickly Create a Telegram Bot in Ruby_ by Ardian Haxha [tutorial]Deep Learning: An Introduction for Ruby Developers by Geoffrey Litt [slides]How I made a pure-Ruby word2vec program more than 3x faster by Kei Sawada [[slides](https:…
- 2014_Natural Language Parsing with Ru...
2014_Natural Language Parsing with Ruby_ by Glauco Custódio [tutorial]Demystifying Data Science: Analyzing Conference Talks with Rails and Ngrams by Todd Schneider [video | code]Natural Language Processing with Ruby by [Konstantin Tennhard](https://…
- 2013_How to parse 'go' - Natural Lang...
2013_How to parse 'go' - Natural Language Processing in Ruby_ by Tom Cartwright [slides | video]Natural Language Processing in Ruby by Brandon Black [slides | [video](http://confreaks.tv/videos/railsconf20…
- 2006_Speak My Language
2006_Speak My Language: Natural Language Processing With Ruby_ by Michael Granger [slides | write-up | write-up]
Projects and Code Examples
4_ENTRIES- Going the Distance
Implementations of various distance algorithms with example calculations.
- Named entity recognition with Stanford NER and Ruby
NER Examples in Ruby and Java with some explanations.
- Words Counted
examples of customizable word statistics powered by words_counted.
- RSyntaxTree
Web based demonstration of the syntactic tree visualization.
Books
2_ENTRIES- Miller, Rob
. Text Processing with Ruby: Extract Value from the Data That Surrounds You. Pragmatic Programmers, 2015. [link]
- Watson, Mark
. Practical Semantic Web and Linked Data Applications. Lulu, 2010. [link]
Community
3_ENTRIESNeeds your Help!
2_ENTRIESAll projects in this section are really important for the community but need more attention. Please if you have spare time and dedication spend some hours on the code here.
- ferret
Information Retrieval in C and Ruby.
- summarize
Ruby native wrapper for Open Text Summarizer.