27 Open Source Corpus Linguistics Software Projects
Free and open source corpus linguistics code projects including engines, APIs, generators, and tools.
Wordless 426 ⭐
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Nlp_bahasa_resources 209 ⭐
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
German Nlp 164 ⭐
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
Goclassy 77 ⭐
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
Lennes Spect 39 ⭐
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/
Praaline 18 ⭐
Praaline is an open-source system to manage, annotate, visualise and analyse spoken language corpora
Lyrics Corpora 17 ⭐
An unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts
Biomedical_corpora 17 ⭐
Table compiling the list of biomedically-related corpora available for named entity recognition (and some also suitable for association detection). This has been published as part of the paper: Dieter Galea, Ivan Laponogov, Kirill Veselkov; Exploiting and assessing multi-source data for supervised biomedical named entity recognition, Bioinformatics, bty152, https://doi.org/10.1093/bioinformatics/bty152 . If you would like to add other (or your) corpora, please submit a pull request and I'll happily approve it.