1173 Open Source Dataset Software Projects
Free and open source dataset code projects including engines, APIs, generators, and tools.
Awesome Project Ideas 6148 ⭐
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
Label Studio 7399 ⭐
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Browser Compat Data 3758 ⭐
This repository contains compatibility data for Web technologies as displayed on MDN
Covid Chestxray Dataset 2764 ⭐
We are building an open database of COVID-19 cases with chest X-ray or CT images.
Semantic Segmentation Suite 2403 ⭐
Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!
Awesome JSon Datasets 2434 ⭐
A curated list of awesome JSON datasets that don't require authentication.
Pandas Datareader 2202 ⭐
Extract data from a wide range of Internet sources into a pandas DataFrame.
Unsplash Datasets 1817 ⭐
🎁 3,000,000+ Unsplash images made available for research and machine learning
Iso 3166 Countries With Regional Codes 1565 ⭐
ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets
Cluebenchmark Clue 2468 ⭐
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Pomber Covid19 1218 ⭐
JSON time-series of coronavirus cases (confirmed, deaths and recovered) per country - updated daily
Raccoon_dataset 1217 ⭐
The dataset is used to train my own raccoon detector and I blogged about it on Medium
Ccpd 1560 ⭐
[ECCV 2018] CCPD: a diverse and well-annotated dataset for license plate detection and recognition
Animegan 1178 ⭐
A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.
Universal Data Tool 1560 ⭐
Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app.
Wikisql 1084 ⭐
A large annotated semantic parsing corpus for developing natural language interfaces.
Facerank 846 ⭐
FaceRank - Rank Face by CNN Model based on TensorFlow (add keras version). FaceRank-人脸打分基于 TensorFlow (新增 Keras 版本) 的 CNN 模型（QQ群：167122861）。技术支持：http://tensorflow123.com
Datastream.io 855 ⭐
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana
Clusterdata 879 ⭐
cluster data collected from production clusters in Alibaba for cluster management research
Chatito 732 ⭐
🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Wilayah Administratif Indonesia 762 ⭐
Data Provinsi, Kota/Kabupaten, Kecamatan, dan Kelurahan/Desa di Indonesia
Total Text Dataset 627 ⭐
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.
Tensorflow_object_tracking_video 496 ⭐
Object Tracking in Tensorflow ( Localization Detection Classification ) developed to partecipate to ImageNET VID competition
Hate Speech And Offensive Language 608 ⭐
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
Seq2seqchatbots 473 ⭐
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Mongodb JSon Files 528 ⭐
:package: A curated list of JSON / BSON datasets from the web in order to practice / use in MongoDB
Vpgnet 427 ⭐
VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition (ICCV 2017)
Lidar Bonnetal 609 ⭐
Semantic and Instance Segmentation of LiDAR point clouds for autonomous driving
Comma2k19 458 ⭐
A driving dataset for the development and validation of fused pose estimators and mapping algorithms
Cmu Multimodalsdk 523 ⭐
CMU MultimodalSDK is a machine learning platform for development of advanced multimodal models as well as easily accessing and processing multimodal datasets.
Dsprites Dataset 383 ⭐
Dataset to assess the disentanglement properties of unsupervised learning methods
Cryptocmd 366 ⭐
Cryptocurrency historical price data library in Python. Data from https://coinmarketcap.com.
Meglass 300 ⭐
An eyeglass face dataset collected and cleaned for face recognition evaluation, CCBR 2018.
Knowage Server 316 ⭐
Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Voice_datasets 779 ⭐
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
Awesome Segmentation Saliency Dataset 367 ⭐
A collection of some datasets for segmentation / saliency detection. Welcome to PR...:smile: