Description
Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition. Learn more by taking a quick tour or by reading the manual.
Treat alternatives and similar gems
Based on the "Natural Language Processing" category.
Alternatively, view Treat alternatives based on common mentions on social networks and blogs.
-
Ruby Natural Language Processing Resources
A collection of links to Ruby Natural Language Processing (NLP) libraries, tools and software -
Pragmatic Segmenter
Pragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages.
SaaSHub - Software Alternatives and Reviews
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of Treat or a related project?
README
New in v2.0.5: OpenNLP integration and Yomu support
Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition. Learn more by taking a quick tour or by reading the manual.
Features
- Text extractors for PDF, HTML, XML, Word, AbiWord, OpenOffice and image formats (Ocropus).
- Text chunkers, sentence segmenters, tokenizers, and parsers (Stanford & Enju).
- Lexical resources (WordNet interface, several POS taggers for English).
- Language, date/time, topic words (LDA) and keyword (TF*IDF) extraction.
- Word inflectors, including stemmers, conjugators, declensors, and number inflection.
- Serialization of annotated entities to YAML, XML or to MongoDB.
- Visualization in ASCII tree, directed graph (DOT) and tag-bracketed (standoff) formats.
- Linguistic resources, including language detection and tag alignments for several treebanks.
- Machine learning (decision tree, multilayer perceptron, LIBLINEAR, LIBSVM).
- Text retrieval with indexation and full-text search (Ferret).
Contributing
I am actively seeking developers that can help maintain and expand this project. You can find a list of ideas for contributing to the project here.
Authors
Lead developper: @louismullie [Twitter]
Contributors:
- @bdigital
- @automatedtendencies
- @LeFnord
- @darkphantum
- @whistlerbrk
- @smileart
- @erol
License
This software is released under the GPL License and includes software released under the GPL, Ruby, Apache 2.0 and MIT licenses.
*Note that all licence references and agreements mentioned in the Treat README section above
are relevant to that project's source code only.