903 Open Source Dataset Software Projects
Free and open source dataset code projects including engines, APIs, generators, and tools.
Awesome Project Ideas 5183 ⭐
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
Label Studio 3525 ⭐
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Browser Compat Data 3052 ⭐
This repository contains compatibility data for Web technologies as displayed on MDN
Covid Chestxray Dataset 2365 ⭐
We are building an open database of COVID-19 cases with chest X-ray or CT images.
Semantic Segmentation Suite 2216 ⭐
Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!
Awesome JSon Datasets 2107 ⭐
A curated list of awesome JSON datasets that don't require authentication.
Pandas Datareader 1689 ⭐
Extract data from a wide range of Internet sources into a pandas DataFrame.
Unsplash Datasets 1495 ⭐
🎁 2,000,000+ Unsplash images made available for research and machine learning
Iso 3166 Countries With Regional Codes 1292 ⭐
ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets
Cluebenchmark Clue 1312 ⭐
中文语言理解基准测评 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Pomber Covid19 1137 ⭐
JSON time-series of coronavirus cases (confirmed, deaths and recovered) per country - updated daily
Raccoon_dataset 1127 ⭐
The dataset is used to train my own raccoon detector and I blogged about it on Medium
Ccpd 1087 ⭐
[ECCV 2018] CCPD: a diverse and well-annotated dataset for license plate detection and recognition
Animegan 1023 ⭐
A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.
Universal Data Tool 1175 ⭐
Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app.
Facerank 827 ⭐
FaceRank - Rank Face by CNN Model based on TensorFlow (add keras version). FaceRank-人脸打分基于 TensorFlow (新增 Keras 版本) 的 CNN 模型（QQ群：167122861）。技术支持：http://tensorflow123.com
Datastream.io 791 ⭐
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana
Clusterdata 667 ⭐
cluster data collected from production clusters in Alibaba for cluster management research
Chatito 628 ⭐
🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Wilayah Administratif Indonesia 591 ⭐
Data Provinsi, Kota/Kabupaten, Kecamatan, dan Kelurahan/Desa di Indonesia
Total Text Dataset 539 ⭐
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.
Tensorflow_object_tracking_video 485 ⭐
Object Tracking in Tensorflow ( Localization Detection Classification ) developed to partecipate to ImageNET VID competition
Hate Speech And Offensive Language 482 ⭐
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
Seq2seqchatbots 444 ⭐
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Mongodb JSon Files 427 ⭐
:package: A curated list of JSON / BSON datasets from the web in order to practice / use in MongoDB
Vpgnet 364 ⭐
VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition (ICCV 2017)
Lidar Bonnetal 375 ⭐
Semantic and Instance Segmentation of LiDAR point clouds for autonomous driving
Comma2k19 340 ⭐
A driving dataset for the development and validation of fused pose estimators and mapping algorithms
Cmu Multimodalsdk 323 ⭐
CMU MultimodalSDK is a machine learning platform for development of advanced multimodal models as well as easily accessing and processing multimodal datasets.
Dsprites Dataset 313 ⭐
Dataset to assess the disentanglement properties of unsupervised learning methods
Cryptocmd 263 ⭐
Cryptocurrency historical price data library in Python. Data from https://coinmarketcap.com.
Meglass 261 ⭐
An eyeglass face dataset collected and cleaned for face recognition evaluation, CCBR 2018.
Knowage Server 256 ⭐
Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Voice_datasets 315 ⭐
🔊 A comprehensive list of open-source datasets for voice and sound computing (40+ datasets).
Awesome Segmentation Saliency Dataset 281 ⭐
A collection of some datasets for segmentation / saliency detection. Welcome to PR...:smile: