Open Source Libs
Find Open Source Packages
Open Source Libraries
π
Data Wrangling
73 Open Source Data Wrangling Software Projects
Free and open source data wrangling code projects including engines, APIs, generators, and tools.
Openrefine
8589 β
OpenRefine is a free, open source power tool for working with messy data and improving it
Hypertools
1687 β
A Python toolbox for gaining geometric insights into high-dimensional data
Ironmussa Optimus
1173 β
:truck: Agile Data Preparation Workflows madeΒ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Data Forge Ts
1087 β
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Data Science Best Resources
1629 β
Carefully curated resource links for data science in one place
Moderndive_book
583 β
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse
Prose
525 β
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.
Cracking The Data Science Interview
1595 β
A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep
Sqawk
281 β
Like Awk but with SQL and table joins
Data Cleaning 101
255 β
Data Cleaning Libraries with Python
Datatest
253 β
Tools for test driven data-wrangling and data validation.
R Ecology Lesson
241 β
Data Analysis and Visualization in R for Ecologists
Web Database Analytics
202 β
Web scrapping and related analytics using Python tools
Data Forge JS
140 β
JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Sjmisc
149 β
Data transformation and utility functions for R
Qsacnpj
240 β
Pacote que trata e organiza os dados do Cadastro Nacional da Pessoa JurΓdica (CNPJ)
R Novice Gapminder
140 β
R for Reproducible Scientific Analysis
R Novice Inflammation
128 β
Programming with R
Python Ecology Lesson
134 β
Data Analysis and Visualization in Python for Ecologists
Python Novice Gapminder
123 β
Plotting and Programming in Python
Data Analysis Using Python
114 β
Exploratory data analysis πusing python πof used car π database taken from βπππππ
R Raster Vector Geospatial
93 β
Introduction to Geospatial Raster and Vector Data with R
Uc R.github.io
79 β
Main repository for R programming courses @ University of Cincinnati, courses and tutorials that focus on data wrangling, exploration, visualization, and analysis with R.
Tidycells
77 β
Automatic transformation of untidy spreadsheet-like data into tidy form
R Socialsci
82 β
R for Social Scientists
Weekly_r_quiz
55 β
Data wrangling & visualization quizzes for R users
Sql Novice Survey
57 β
Databases and SQL
Dtcleaner
37 β
DTCleaner: data cleaning using multi-target decision trees.
Data Wrangling With Python
71 β
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
The Data Visualization Workshop
59 β
A New, Interactive Approach to Learning Data Visualization
Wrangling Genomics
44 β
Data Wrangling and Processing for Genomics
Sql Ecology Lesson
36 β
Data Management with SQL for Ecologists
Data Analyst Nanodegree
41 β
Kai Sheng Teh - Udacity Data Analyst Nanodegree
Kiwis
36 β
A Pandas-inspired data wrangling toolkit in JavaScript
Seifip Udacity Data Analyst Nanodegree
32 β
Project work for the Udacity Data Analyst Nanodegree
Ubodin Mimir
26 β
Data-ish exploration through SQL+Uncertainty
R Intro Geospatial
38 β
Introduction to R for Geospatial Data
Online Courses
37 β
Free online R courses
Pyrefine
25 β
Execute OpenRefine JSON scripts without OpenRefine (or Java)
Foofah
25 β
Foofah: programming-by-example data transformation program synthesizer
Mandliya Ml
28 β
A 60 days+ streak of daily learning of ML/DL/Maths concepts through projects
Funique
19 β
βοΈ A faster unique() function
Data Science 101
19 β
Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Datalind Udacity Data Analyst Nanodegree
24 β
Repository for the projects needed to complete the Data Analyst Nanodegree.
Qualmap
14 β
R package for working with semi-structured qualitative GIS data
Pranavsuri Data Analyst Nanodegree
13 β
This repo consists of the projects that I completed as a part of the Udacity's Data Analyst Nanodegree's curriculum.
Stata Economics
16 β
Economics Lesson with Stata
Data Munging Python
11 β
Data Wrangling using Pandas
Python Socialsci
21 β
Data Analysis and Visualization with Python for Social Scientists
Xplore
21 β
A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.
Chapter 2
16 β
Code examples for Chapter 2 of Data Wrangling with JavaScript
Fuchsia Programming Scrape
10 β
When you need those jobs hypersonic π scrape πͺ
Advanced Data Wrangling In R
14 β
Advanced-data-wrangling-in-R, Workshop
Prosto
53 β
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Udacity Nanodegree Projects
16 β
Udacity nanodegree projects: DLND, DRLND, DAND
Dasel
1783 β
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
Hands On Data Analysis With Pandas
211 β
Materials for following along with Hands-On Data Analysis with Pandas.
Hands On Data Analysis With Pandas 2nd Edition
134 β
Materials for following along with Hands-On Data Analysis with Pandas β Second Edition
R Fundamentals
101 β
D-Lab's 12 hour introduction to R Fundamentals. Learn how to create variables and functions, manipulate data frames, make visualizations, use control flow structures, and more, using R in RStudio.
Pandas Workshop
101 β
A 3-hour introductory workshop on pandas with notebooks and exercises for following along.
Springboard Datasciencetrack Student
73 β
Springboard Program: Data Science Career Track - NLP
Datacamp_ _track_ _data_scientist_with_r_ _course_03_ _introduction_to_the_tidyverse
15 β
Repository of DataCamp's "Introduction to the Tidyverse" course.
Genomics R Intro
13 β
Intro to R and RStudio for Genomics
Lc Sql
11 β
Library Carpentry: SQL
Data Cleaning
11 β
Data Cleaning with Python
Cognito
10 β
ππ€ Cognito - Simplifies AutoML Data Preprocessing.
Datawrangler
11 β
Make quick and dirty data mining made easier in Sublime Text
Whyqd
15 β
data wrangling simplicity, complete audit transparency, and at speed
Chapter 3
10 β
Code examples for Chapter 3 of Data Wrangling with JavaScript
Covid19 Italy Integrated Surveillance Data
21 β
COVID-19 integrated surveillance data provided by the Italian Institute of Health and processed via UnrollingAverages.jl to deconvolve the weekly moving averages.
Jqnatividad Qsv
48 β
CSVs sliced, diced & analyzed.
Dlab Berkeley R Data Wrangling
13 β
D-Lab's 6 hour introduction to data wrangling with R. Learn how to manipulate dataframes using the tidyverse in R.
Crysda
11 β
Crystal library for Data Analysis, Wrangling, Munging