646 Open Source Data Analysis Software Projects
Free and open source data analysis code projects including engines, APIs, generators, and tools.
Pandas Dev Pandas 26903 ⭐
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Metabase 22423 ⭐
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
Goaccess 12213 ⭐
GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
Cyberchef 10339 ⭐
The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
Openrefine 7651 ⭐
OpenRefine is a free, open source power tool for working with messy data and improving it
Data Analysis And Machine Learning Projects 4836 ⭐
Repository of teaching materials, code, and data for my data analysis and machine learning projects.
Imbalanced Learn 4773 ⭐
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Spiderclub Weibospider 4465 ⭐
:zap: A distributed crawler for weibo, building with celery and requests.
Knowledge Repo 4475 ⭐
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Gonum Gonum 4248 ⭐
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more
Sqlpad 3396 ⭐
Web-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC
Aksnzhy Xlearn 2769 ⭐
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
Octosql 2262 ⭐
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Akshare 2334 ⭐
AkShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Ai Learn 2361 ⭐
人工智能学习路线图，整理近200个实战案例与项目，免费提供配套教材，零基础入门，就业实战！包括：Python，数学，机器学习，数据分析，深度学习，计算机视觉，自然语言处理，PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Pandas Datareader 1689 ⭐
Extract data from a wide range of Internet sources into a pandas DataFrame.
Aachartkit Swift 1629 ⭐
📈📊📱📺💻An elegant modern declarative data visualization chart framework for iOS, iPadOS and macOS. Extremely powerful, supports line, spline, area, areaspline, column, bar, pie, scatter, angular gauges, arearange, areasplinerange, columnrange, bubble, box plot, error bars, funnel, waterfall and polar chart types. 极其精美而又强大的跨平台数据可视化图表框架,支持柱状图、条形图、折线图、曲线图、折线填充图、曲线填充图、气泡图、扇形图、环形图、散点图、雷达图、混合图等各种类型的多达几十种的信息图图表,完全满足工作所需.
Awesome Ts Anomaly Detection 1409 ⭐
List of tools & datasets for anomaly detection on time-series data.
Spark Py Notebooks 1275 ⭐
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Root 1254 ⭐
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
Patmartin Dex 1204 ⭐
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
100 Pandas Puzzles 1227 ⭐
100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
Hyperlearn 1185 ⭐
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Data Selfie 1010 ⭐
Data Selfie - a browser extension to track yourself on Facebook and analyze your data.
Sweetviz 1020 ⭐
Visualize and compare datasets, target values and associations, with one line of code.
Ironmussa Optimus 939 ⭐
:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Data Forge Ts 899 ⭐
Dataflowjavasdk 852 ⭐
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Data Science On Gcp 779 ⭐
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Dataframe 692 ⭐
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Awesome Python Data Science 633 ⭐
Probably the best curated list of data science software in Python.
Cookbook 2nd Code 494 ⭐
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Iclr2020 Openreviewdata 398 ⭐
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
Scitools Iris 400 ⭐
A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
Jupyter_pivottable.js 395 ⭐
Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js
The Elements Of Statistical Learning Python Notebooks 337 ⭐
A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book