400 Open Source Data Mining Software Projects
Free and open source data mining code projects including engines, APIs, generators, and tools.
Ml From Scratch 18174 ⭐
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
Awesome Datascience 14325 ⭐
:memo: An awesome Data Science repository to learn and apply for real world problems.
Microsoft Lightgbm 11659 ⭐
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Python Machine Learning Book 10914 ⭐
The "Python Machine Learning (1st edition)" book code repository and info resource
Jaidedai Easyocr 8463 ⭐
Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai
Awesome Production Machine Learning 6703 ⭐
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Catboost 5446 ⭐
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Mlxtend 3155 ⭐
A library of extension and helper modules for Python's data analysis and machine learning libraries.
Ai Learn 2361 ⭐
人工智能学习路线图，整理近200个实战案例与项目，免费提供配套教材，零基础入门，就业实战！包括：Python，数学，机器学习，数据分析，深度学习，计算机视觉，自然语言处理，PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Pdftabextract 1848 ⭐
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Awesome Machine Learning Interpretability 1865 ⭐
A curated list of awesome machine learning interpretability resources.
Awesome Ts Anomaly Detection 1409 ⭐
List of tools & datasets for anomaly detection on time-series data.
Patmartin Dex 1204 ⭐
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
Tsv Utils 1176 ⭐
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Papers Literature Ml Dl Rl Ai 1066 ⭐
Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning
Dataflowjavasdk 852 ⭐
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Clevercsv 852 ⭐
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
Interpretable_machine_learning_with_python 487 ⭐
Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
Cookbook 2nd Code 494 ⭐
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Book Socialmediaminingpython 439 ⭐
Companion code for the book "Mastering Social Media Mining with Python"
Feature Engineering And Feature Selection 434 ⭐
A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.
Artificial Adversary 335 ⭐
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Text_mining_resources 328 ⭐
Resources for learning about Text Mining and Natural Language Processing
Knowage Server 256 ⭐
Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Graph Adversarial Learning Literature 253 ⭐
A curated list of adversarial attacks and defenses papers on graph-structured data.
Statistical Learning 213 ⭐
Lecture Slides and R Sessions for Trevor Hastie and Rob Tibshinari's "Statistical Learning" Stanford course
Scriptsmith Reaper 211 ⭐
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Automlpipeline.jl 206 ⭐
A package that makes it trivial to create and evaluate machine learning pipeline architectures.
Game Datasets 209 ⭐
:video_game: A curated list of awesome game datasets, and tools to artificial intelligence in games
Smartproxy Smartproxy 197 ⭐
HTTP(S) Rotating Residential proxies - Code examples & General information
Qminer 196 ⭐
Analytic platform for real-time large-scale streams containing structured and unstructured data.
Prefixspan Py 193 ⭐
The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
Urs 185 ⭐
Universal Reddit Scraper - Scrape Subreddits, Redditors, and submission comments. A command-line tool written in Python (PRAW).
Graph Fraud Detection Papers 192 ⭐
A curated list of fraud detection papers using graph information or graph neural networks
Estadistica Con R 163 ⭐
Apuntes personales sobre estadística, machine learning y lenguaje de programación R
Pyss3 166 ⭐
A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat: