962 Open Source Data Analysis Software Projects
Free and open source data analysis code projects including engines, APIs, generators, and tools.
Pandas Dev Pandas 32482 ⭐
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Metabase 27292 ⭐
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
Goaccess 14222 ⭐
GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
Cyberchef 15135 ⭐
The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
Openrefine 8589 ⭐
OpenRefine is a free, open source power tool for working with messy data and improving it
Data Analysis And Machine Learning Projects 5208 ⭐
Repository of teaching materials, code, and data for my data analysis and machine learning projects.
Imbalanced Learn 5674 ⭐
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Spiderclub Weibospider 4670 ⭐
:zap: A distributed crawler for weibo, building with celery and requests.
Knowledge Repo 4997 ⭐
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Gonum Gonum 5467 ⭐
Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more
Sqlpad 4214 ⭐
Web-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Trino, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC
Aksnzhy Xlearn 2978 ⭐
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
Octosql 2660 ⭐
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Akshare 4494 ⭐
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Ai Learn 4521 ⭐
人工智能学习路线图，整理近200个实战案例与项目，免费提供配套教材，零基础入门，就业实战！包括：Python，数学，机器学习，数据分析，深度学习，计算机视觉，自然语言处理，PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Pandas Datareader 2216 ⭐
Extract data from a wide range of Internet sources into a pandas DataFrame.
Aachartkit Swift 1984 ⭐
📈📊📱💻🖥️An elegant modern declarative data visualization chart framework for iOS, iPadOS and macOS. Extremely powerful, supports line, spline, area, areaspline, column, bar, pie, scatter, angular gauges, arearange, areasplinerange, columnrange, bubble, box plot, error bars, funnel, waterfall and polar chart types. 极其精美而又强大的跨平台数据可视化图表框架,支持柱状图、条形图、折线图、曲线图、折线填充图、曲线填充图、气泡图、扇形图、环形图、散点图、雷达图、混合图等各种类型的多达几十种的信息图图表,完全满足工作所需.
Awesome Ts Anomaly Detection 2079 ⭐
List of tools & datasets for anomaly detection on time-series data.
Spark Py Notebooks 1424 ⭐
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Root 1648 ⭐
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
Patmartin Dex 1272 ⭐
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
100 Pandas Puzzles 1621 ⭐
100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
Hyperlearn 1233 ⭐
Waiting hours for a future prediction is unacceptable. Hyperlearn makes AI and ML algorithms 50% faster, use 90% less memory and doesn't require you to use new hardware! ML Algorithms like PCA, Linear Regression, NMF are all faster!
Data Selfie 1017 ⭐
Data Selfie - a browser extension to track yourself on Facebook and analyze your data.
Sweetviz 1897 ⭐
Visualize and compare datasets, target values and associations, with one line of code.
Ironmussa Optimus 1173 ⭐
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Data Forge Ts 1087 ⭐
Dataflowjavasdk 859 ⭐
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Data Science On Gcp 994 ⭐
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Dataframe 1360 ⭐
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
Awesome Python Data Science 1188 ⭐
Probably the best curated list of data science software in Python.
Vectorbt 1602 ⭐
Find your trading edge, using the fastest engine for backtesting, algorithmic trading, and research.
Cookbook 2nd Code 613 ⭐
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Iclr2020 Openreviewdata 447 ⭐
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
Scitools Iris 469 ⭐
A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
Jupyter_pivottable.js 455 ⭐
Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js
The Elements Of Statistical Learning Python Notebooks 528 ⭐
A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book