131 Open Source Linguistics Software Projects
Free and open source linguistics code projects including engines, APIs, generators, and tools.
Xiamx Awesome Sentiment Analysis 862 ⭐
😀😄😂😭 A curated list of Sentiment Analysis methods, implementations and misc. 😥😟😱😤
Pynlpl 444 ⭐
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Prosodic 192 ⭐
Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
Colibri Core 115 ⭐
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Flat 95 ⭐
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
Textannotationgraphs 81 ⭐
A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
Folia 54 ⭐
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
Language Statistics 50 ⭐
A visual color bar of the programming languages in your directory, with percentages and labels
Mlconjug 53 ⭐
A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techniques.
Eliza Rs 46 ⭐
A rust implementation of ELIZA - a natural language processing program developed by Joseph Weizenbaum in 1966.
Lingvo Ner Ru 38 ⭐
Named entity recognition (NER) in Russian texts / Определение именованных сущностей (NER) в тексте на русском языке
Event Embedding Multitask 22 ⭐
*SEM 2018: Learning Distributed Event Representations with a Multi-Task Approach
Verbecc 33 ⭐
Complete Conjugation of any Verb using Machine Learning for French, Spanish, Portuguese, Italian and Romanian
Phonet 22 ⭐
Keras-based python framework to compute phonological posterior probabilities from audio files
Praaline 21 ⭐
Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora
Korpling Pepper 16 ⭐
A highly extensible plattform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used stand-alone as a command line interface, or be integrated as an API into other software products.
Uncertainty 15 ⭐
A Python implementation of the uncertainty classifier, based on the work of Veronika Vincze.
Pragmatic Guide To Geoparsing Evaluation 25 ⭐
Full resources supporting the publication "A Pragmatic Guide to Geoparsing Evaluation."
Verbecc Svc 16 ⭐
Dockerized Python microservice with REST API for verbs conjugation in French, Spanish and Portuguese
Global Signbank 15 ⭐
An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
Corenlp Jmwe 15 ⭐
Stanford CoreNLP annotator implementing jMWE for detecting Multi-Word Expressions / collocations
Latininflector 15 ⭐
PHP code that analyzes Latin and Greek words' parts of speech, tenses, genders, moods, etc.
Margaret Python Datamuse 13 ⭐
(Deprecated - please use https://github.com/gmarmstrong/python-datamuse) Python wrapper for the Datamuse API
Greek Name Klitiki 16 ⭐
A NodeJS package that transforms greek names to the vocative grammatical form (klitiki).