40 Open Source Data Integration Software Projects
Free and open source data integration code projects including engines, APIs, generators, and tools.
Mara Pipelines 1553 ⭐
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Awesome Single Cell 1498 ⭐
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Immunogenomics Harmony 187 ⭐
Fast, sensitive and accurate integration of single-cell data with Harmony
Mara Example Project 2 153 ⭐
An example mini data warehouse for python project stats, template for new projects
Olehmberg Winter 77 ⭐
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
Commoncoreontologies 55 ⭐
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
Harmonypy 38 ⭐
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
Schemamapper 21 ⭐
A .NET class library that allows you to import data from different sources into a unified destination
Doctoral Thesis 19 ⭐
📖 Generation and Applications of Knowledge Graphs in Systems and Networks Biology
Integrate 19 ⭐
Scripts and resources to create Hetionet v1.0, a heterogeneous network for drug repurposing
R Learning Journey 18 ⭐
Some of the projects i made when starting to learn R for Data Science at the university
Gellish 18 ⭐
Development of the Gellish Communicator reference application and tools for universal data exchange and data integration supporting Formal English and other Gellish formalized natural languages.
Data Warehouse Concepts Design And Data Integration 14 ⭐
Repo for Data Warehouse Concepts, Design, and Data Integration by University of Colorado System (coursera)(Notes,Assignments, quiz and research papers)
Marklogic Community Pipes 12 ⭐
Pipes for MarkLogic DataHub is visual programming tool for MarkLogic Data Hub. It integrates with MarkLogic's Datahub and produces custom code step(s) using a no-code UI environment.
Schema Matching 12 ⭐
Match schema attributes of relational databases by value similarity. As a study assignment, this isn't well documented, but you can contact me for questions and I may even add docs, if I sense enough interest.
Assignpop 12 ⭐
Population Assignment using Genetic, Non-genetic or Integrated Data in a Machine-learning Framework. Methods in Ecology and Evolution. 2018;9:439–446.
Datax 12 ⭐
通用数据采集工具，源自 Alibaba DataX，做了改进和功能增强，支持 cassandra, clickhouse, dbf, hive, mysql, oracle, prestosql, postgresql, sqlserver, text 等数据源
Robustsinglecell 10 ⭐
Robust single cell clustering and comparison of population compositions across tissues and experimental models via similarity analysis.