86 Open Source Data Processing Software Projects
Free and open source data processing code projects including engines, APIs, generators, and tools.
Dali 2862 ⭐
A library containing both highly optimized building blocks and an execution engine for data pre-processing in deep learning applications
Miller 2533 ⭐
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Texar 2027 ⭐
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow
Bash Oneliner 1201 ⭐
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Dataflowjavasdk 852 ⭐
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Data Science On Gcp 779 ⭐
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Texar Pytorch 611 ⭐
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation
Xidel 301 ⭐
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
RAPIdtables 287 ⭐
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
Nonechucks 285 ⭐
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Pxi 230 ⭐
🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
Data Processing Agreements 104 ⭐
Collection of Data Processing Agreement (DPA) and GDPR compliance resources
Cotk 99 ⭐
Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
Machine Learning For Solar Energy Prediction 77 ⭐
Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning
Cbrain 49 ⭐
CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.
Vortex Exoplanet Vip 44 ⭐
VIP is a python package/library for angular, reference star and spectral differential imaging for exoplanet/disk detection through high-contrast imaging.
Lrs3 For Speech Separation 45 ⭐
Multi-modal speech separation task data generation script on LRS3 data set.
Data Science Using Python University Course Module 41 ⭐
“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.
2019 Electronic Design Competition 45 ⭐
【电赛】2019 全国大学生电子设计竞赛 （F题）纸张数量检测装置 （基于STM32F407 & FDC2214 & USART HMI）
Atomgraph Processor 35 ⭐
Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.
Itertable 32 ⭐
⇔ IterTable is a Pythonic API for iterating through tabular data formats, including CSV, XLS, XML, and JSON.
Skytrax Data Warehouse 33 ⭐
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
Zenaton Python 25 ⭐
🐍 Python library to run and orchestrate background jobs with Zenaton Workflow Engine
Machine Learning Data Pipeline 20 ⭐
Pipeline module for parallel real-time data processing for machine learning models development and production purposes.
Ibm Cloud Functions Data Processing Message Hub 19 ⭐
Create a serverless, event-driven application with Apache OpenWhisk on IBM Cloud Functions that executes code in response to messages or to handle streams of data records from Apache Kafka or IBM Message Hub.
Speech Recognition 19 ⭐
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Brainnet Ml Toolbox 17 ⭐
Python Machine Learning Toolbox for Brain Network Classification. Source codes are included of the top 20 teams in the Kaggle competition.
Ds Project Maker 15 ⭐
Template repository for initializing data science projects. Designed for student project work on the Make School Data Science track.
Data Processing And Visualization 15 ⭐
This document forms the basis of several workshops/talks that get into everyday programming with R, but also includes mirrored code in Python as Jupyter notebooks.
Skillship Internship Project 1 Prediction Of A Patient S No_show Appointments 14 ⭐
Skillship Foundation internship project.
Automated Data Preprocessing 16 ⭐
A command-line utility program for automating the trivial, frequently occurring data preparation tasks: missing value interpolation, outlier removal, and encoding categorical variables.
Meds Processor 13 ⭐
Learn C# and .NET Core by building a scraper, downloader and parser for Croatia's Health Insurance Fund primary and supplementary drugs list.
Rpi 11 ⭐
RPJiOS: RPJ's RPi OS, a sensor data platform for the Raspberry Pi built with python2.7 and redis.
Computing With Data 11 ⭐
Code samples for my book "Computing with Data: An Introduction to the Data Industry"
Ibm Cloud Functions Message Hub Trigger 10 ⭐
IBM Cloud Functions building block - Message Hub Trigger - This project provides a starting point for handling events from Message Hub with IBM Cloud Functions powered by Apache OpenWhisk.