54 Open Source Data Cleaning Software Projects
Free and open source data cleaning code projects including engines, APIs, generators, and tools.
Miller 2533 ⭐
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Data Forge Ts 899 ⭐
Nonechucks 285 ⭐
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Voicebook 204 ⭐
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Refinr 85 ⭐
Cluster and merge similar char values: an R implementation of Open Refine clustering algorithms
Data Analysis Using Python 75 ⭐
Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊
Bumblebee 75 ⭐
🚕 A spreadsheet-like data preparation web app that works over Optimus (pandas, dask, cuDF, dask-cuDF and PySpark)
Akanz1 Klib 73 ⭐
Easy to use Python library of customized functions for cleaning and analyzing data.
Jim Schwoebel Allie 44 ⭐
🤖 A machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers).
Gratefuldata 38 ⭐
Grateful Data isn't programming code, but an online tutorial about data acquisition, cleaning and enriching, using publicly accessible data on the band the Grateful Dead as examples. Read the Wiki to find out how to use the sample data.
Drugs Recommendation Using Reviews 35 ⭐
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Skytrax Data Warehouse 33 ⭐
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
Fifa 2019 Analysis 23 ⭐
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Multimodal Sentiment Analysis 20 ⭐
Engaged in research to help improve to boost text sentiment analysis using facial features from video using machine learning.
R Learning Journey 18 ⭐
Some of the projects i made when starting to learn R for Data Science at the university
Exemplary Ml Pipeline 17 ⭐
Exemplary, annotated machine learning pipeline for any tabular data problem.
Vulcan 13 ⭐
A high level deep learning framework for quickly prototyping networks with added tools in data visualisation, model interpretability and performance metrics
Churn Modelling Dataset 13 ⭐
Predicting which set of the customers are gong to churn out from the organization by looking into some of the important attributes and applying Machine Learning and Deep Learning on it.
Titanic Survival In Depth Analysis 12 ⭐
Used Pandas , Matplotlib , Seaborn libraries to Analyze , Visualize and Explore the data of people travelling on Titanic, and Used Scikit-learn Modelling Algorithms to predict their probability of Survival.
Udacity Bertelsmann Data Science Challenge Scholarship 2018 11 ⭐
This is a repo for my Bertelsmann Data Science Scholarship Challenge: notes, exercises, quizzes.
World Food Production 10 ⭐
Comparing Top food and feed Producers around the globe and also seeking some interesting answers, solutions, patterns, hints and warnings through the power of Data Analysis and Data Visualization using Machine Learning.