314 Open Source Etl Software Projects
Free and open source etl code projects including engines, APIs, generators, and tools.
Mara Pipelines 1854 ⭐
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Aws Data Wrangler 2479 ⭐
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Dataspherestudio 1856 ⭐
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Ethereum Etl 1487 ⭐
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Panther Labs Panther 905 ⭐
[DEPRECATED] Detect threats with log data and improve cloud security posture
React Csv 873 ⭐
React components to build CSV files on the fly basing on Array/literal object of data
Baby Names Analysis 558 ⭐
Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Ananas Desktop 560 ⭐
A hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Pyspark Example Project 827 ⭐
Example project implementing best practices for PySpark ETL jobs and applications.
Pglogical 554 ⭐
Logical Replication extension for PostgreSQL 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Choetl 485 ⭐
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Smooks 334 ⭐
Extensible data integration Java framework for building XML and non-XML fragment-based applications
Dataform 464 ⭐
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Storagetapper 271 ⭐
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Aws Etl Orchestrator 282 ⭐
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Bulk Writer 215 ⭐
Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Icij Extract 202 ⭐
A cross-platform command line tool for parallelised content extraction and analysis.
Jumpmind Metl 190 ⭐
Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file based Extract/Transform/Load (ETL), and remote procedure invocation via Web Services. Read more at www.jumpmind.com/products/metl/overview
Etlbox 221 ⭐
A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Eland 323 ⭐
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Mara Example Project 2 165 ⭐
An example mini data warehouse for python project stats, template for new projects
Open Semantic Etl 194 ⭐
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Bitcoin Etl 250 ⭐
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Xe Crawler 122 ⭐
Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure :dizzy: 多语言执行器，分布式爬虫
Kafka Connect 118 ⭐
equivalent to kafka-connect :wrench: for nodejs :sparkles::turtle::rocket::sparkles:
Open Data Etl Utility Kit 95 ⭐
Use Pentaho's open source data integration tool (Kettle) to create Extract-Transform-Load (ETL) processes to update a Socrata open data portal. Documentation is available at http://open-data-etl-utility-kit.readthedocs.io/en/stable
Dataxserver 128 ⭐
为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用（ThriftServer，HttpServer） 分布式运行（DataX on YARN） 功能
Nbi 95 ⭐
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.
Csvplus 67 ⭐
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stetl 67 ⭐
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Data Load And Copy Using Python 80 ⭐
locopy: Loading/Unloading to Redshift and Snowflake using Python.
Django Calaccess Raw Data 59 ⭐
A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Dswarm 55 ⭐
an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)
Openrefine Batch 70 ⭐
Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.
Sqlbucket 61 ⭐
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
Dbtvault 173 ⭐
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Bentools Etl 57 ⭐
PHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.
Kafka Connect File Pulse 174 ⭐
🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka