42 Open Source Data Pipeline Software Projects
Free and open source data pipeline code projects including engines, APIs, generators, and tools.
Kedro 3028 ⭐
A Python library that implements software engineering best-practice for data and ML pipelines.
Data Science On Gcp 779 ⭐
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Nonechucks 285 ⭐
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Scalable Data Science Platform 158 ⭐
Content for architecting a data science platform for products using Luigi, Spark & Flask.
Aws Pdf Textract Pipeline 91 ⭐
:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
Ob_bulkstash 88 ⭐
Bulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.
Serverless Data Pipeline Sam 64 ⭐
Serverless Data Pipeline powered by Kinesis Firehose, API Gateway, Lambda, S3, and Athena
Feagen 33 ⭐
(deprecated) A fast and memory-efficient Python data engineering framework for machine learning.
Stairs 35 ⭐
Framework which helps you to make parallel/distributed calculations using data pipelines
Mldotnet Real Time Data Streaming Workshop 32 ⭐
A Machine Learning and Real-Time Data Analytics Workshop
Network Pipeline 25 ⭐
Network traffic data pipeline for real-time predictions and building datasets for deep neural networks
Saisoku 22 ⭐
Saisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.
Machine Learning Data Pipeline 20 ⭐
Pipeline module for parallel real-time data processing for machine learning models development and production purposes.
Jobanalytics_and_search 18 ⭐
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Richflow 13 ⭐
Dataquest_eng 13 ⭐
Here's how to get DataQuest's Data Engineering Track missions' content to work on your localhost. Using data from my Valenbisi ARIMA modeling project, I document my steps using PostgreSQL, Postico, and the Command Line to get our DataQuest exercises running out of a Jupyter Notebook.
Automating Your Data Pipeline With Apache Airflow 15 ⭐
Automating Your Data Pipeline with Apache Airflow
Aws Data Pipeline Developer Guide 11 ⭐
The open source version of the AWS Data Pipeline documentation. To provide feedback & requests for changes, submit issues in this repository, or make proposed changes & submit a pull request.
Rpi 11 ⭐
RPJiOS: RPJ's RPi OS, a sensor data platform for the Raspberry Pi built with python2.7 and redis.