Open Source Libs
Find Open Source Packages
Open Source Libraries
👉
Etl
314 Open Source Etl Software Projects
Free and open source etl code projects including engines, APIs, generators, and tools.
Benthos
3931 ⭐
Fancy stream processing made operationally mundane
Linq2db
2241 ⭐
Linq to database provider.
Riko
1579 ⭐
A Python stream processing engine modeled after Yahoo! Pipes
Mara Pipelines
1854 ⭐
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Kiba
1630 ⭐
Data processing & ETL framework for Ruby
Compose Transporter
1222 ⭐
Sync data between persistence engines, like ETL only not stodgy
Aws Data Wrangler
2479 ⭐
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Awesome Business Intelligence
1434 ⭐
Actively curated list of awesome BI tools. PRs welcome!
Dataspherestudio
1856 ⭐
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Ethereum Etl
1487 ⭐
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Panther Labs Panther
905 ⭐
[DEPRECATED] Detect threats with log data and improve cloud security posture
Monstache
905 ⭐
a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.
React Csv
873 ⭐
React components to build CSV files on the fly basing on Array/literal object of data
Singer Io Getting Started
921 ⭐
This repository is a getting started guide to Singer.
Baby Names Analysis
558 ⭐
Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Ananas Desktop
560 ⭐
A hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Pyspark Example Project
827 ⭐
Example project implementing best practices for PySpark ETL jobs and applications.
Koop
527 ⭐
:crystal_ball: Transform, query, and download geospatial data on the web.
Bigslice
481 ⭐
A serverless cluster computing system for the Go programming language
Smartcode
492 ⭐
SmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!
Etlalchemy
500 ⭐
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Pglogical
554 ⭐
Logical Replication extension for PostgreSQL 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Go Streams
830 ⭐
A lightweight stream processing library for Go
Datacleaner
445 ⭐
The premier open source Data Quality solution
Appbaseio Abc
411 ⭐
Power of appbase.io via CLI, with nifty imports from your favorite data sources
Choetl
485 ⭐
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Wedatasphere
472 ⭐
WeDataSphere is a financial grade, one-stop big data platform suite.
Smooks
334 ⭐
Extensible data integration Java framework for building XML and non-XML fragment-based applications
Metorikku
462 ⭐
A simplified, lightweight ETL Framework based on Apache Spark
Datavec
282 ⭐
ETL Library for Machine Learning - data pipelines, data munging and wrangling
Webkettle
418 ⭐
基于web版kettle开发的一套分布式综合调度,管理,ETL开发的用户专业版B/S架构工具
Data Making Guidelines
258 ⭐
:blue_book: Making Data, the DataMade Way
Dataform
464 ⭐
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Ropensci Elastic
234 ⭐
R client for the Elasticsearch HTTP API
Example Airflow Dags
282 ⭐
Example DAGs using hooks and operators from Airflow Plugins
Storagetapper
271 ⭐
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Aws Etl Orchestrator
282 ⭐
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Bulk Writer
215 ⭐
Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Mongo Es
187 ⭐
A MongoDB to Elasticsearch connector
Icij Extract
202 ⭐
A cross-platform command line tool for parallelised content extraction and analysis.
Jumpmind Metl
190 ⭐
Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file based Extract/Transform/Load (ETL), and remote procedure invocation via Web Services. Read more at www.jumpmind.com/products/metl/overview
Categoricaldata Cql
227 ⭐
Categorical Query Language IDE
Grafter
180 ⭐
Linked Data & RDF Manufacturing Tools in Clojure
Nextdoor Bender
176 ⭐
Bender - Serverless ETL Framework
Etlbox
221 ⭐
A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Eland
323 ⭐
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Mara Example Project 2
165 ⭐
An example mini data warehouse for python project stats, template for new projects
Metl
157 ⭐
mito ETL tool
Etl_unicorn
157 ⭐
数据可视化, 数据挖掘, 数据处理 ETL
Hydrograph
146 ⭐
A visual ETL development and debugging tool for big data
Open Semantic Etl
194 ⭐
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Eel Sdk
145 ⭐
Big Data Toolkit for the JVM
Bitcoin Etl
250 ⭐
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Xe Crawler
122 ⭐
Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure :dizzy: 多语言执行器,分布式爬虫
Marklogic Data Hub
118 ⭐
The MarkLogic Data Hub: documentation ==>
Openkettlewebui
134 ⭐
一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Transformalize
136 ⭐
Configurable Extract, Transform, and Load
Etl.net
175 ⭐
Mass processing data with a complete ETL for .net developers
Toaco Carry
114 ⭐
Python ETL(Extract-Transform-Load) tool / Data migration tool
Kettle Web
187 ⭐
基于spring boot通过java代码调用kette
Kafka Connect
118 ⭐
equivalent to kafka-connect :wrench: for nodejs :sparkles::turtle::rocket::sparkles:
Open Data Etl Utility Kit
95 ⭐
Use Pentaho's open source data integration tool (Kettle) to create Extract-Transform-Load (ETL) processes to update a Socrata open data portal. Documentation is available at http://open-data-etl-utility-kit.readthedocs.io/en/stable
Dataxserver
128 ⭐
为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer) 分布式运行(DataX on YARN) 功能
Csv2db
115 ⭐
The CSV to database command line loader
Nbi
95 ⭐
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.
Linkedpipes Etl
106 ⭐
LinkedPipes ETL is an RDF based, lightweight ETL tool
Aws Ecs Airflow
130 ⭐
Run Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Od
117 ⭐
Česká otevřená data
Luigi Warehouse
84 ⭐
A luigi powered analytics / warehouse stack
Hale
104 ⭐
(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)
Butterfree
196 ⭐
A tool for building feature stores.
Thain
79 ⭐
Thain is a distributed flow schedule platform.
Ohara
67 ⭐
Ohara - Easy to deploy the streaming application
Csvplus
67 ⭐
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Udacity Data Engineering
117 ⭐
Udacity Data Engineering Nano Degree (DEND)
Stetl
67 ⭐
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Data Load And Copy Using Python
80 ⭐
locopy: Loading/Unloading to Redshift and Snowflake using Python.
Etl_with_python
79 ⭐
ETL with Python - Taught at DWH course 2017 (TAU)
Nextract
87 ⭐
Nextract is a Extract Transform Load (ETL) platform build on top of Node.js streams
Target Postgres
85 ⭐
A Singer.io Target for Postgres
Django Calaccess Raw Data
59 ⭐
A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Discreetly
64 ⭐
ETLy is an add-on dashboard service on top of Apache Airflow.
Lsc
73 ⭐
LSC engine
Setl
126 ⭐
A simple Spark-powered ETL framework that just works 🍺
Etw2JSON
65 ⭐
Tool and library to convert ETW logs to JSON files
Dswarm
55 ⭐
an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)
Openrefine Batch
70 ⭐
Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.
Bellboy
72 ⭐
Highly performant JavaScript data stream ETL engine.
Sqlbucket
61 ⭐
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
Uptasticsearch
47 ⭐
An Elasticsearch client tailored to data science workflows.
Skaetl
59 ⭐
Open Source ETL designed for and dedicated to Log processing and transformation
Dbtvault
173 ⭐
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Mbd Bidw
45 ⭐
Business Intelligence and Data Warehousing
Ruby For Pentaho Kettle
42 ⭐
Ruby scripting for pentaho-kettle
Bentools Etl
57 ⭐
PHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.
Datasphere Integration
45 ⭐
an data-centric integration platform
Etl Light
41 ⭐
A light Kafka to HDFS/S3 ETL library based on Apache Spark
Yaetl
51 ⭐
Yet Another ETL in PHP
Kafka Connect File Pulse
174 ⭐
🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka
Architect_big_data_solutions_with_spark
41 ⭐
code, labs and lectures for the course