559 Open Source Big Data Software Projects
Free and open source big data code projects including engines, APIs, generators, and tools.
Rakam API788 ⭐
📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
H2o 35701 ⭐
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Delta Io Delta3983 ⭐
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Data Science Career692 ⭐
Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Apache Couchdb5200 ⭐
Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Datumbox Framework1076 ⭐
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Data Science Ipython Notebooks22374 ⭐
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Spark Py Notebooks1424 ⭐
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Spark Movie Lens769 ⭐
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Listenbrainz Server466 ⭐
Cogcomp Nlp432 ⭐
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stream Framework4584 ⭐
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Open source platform for X.509 certificate based service authentication and fine grained access control in dynamic infrastructures. Athenz supports provisioning and configuration (centralized authorization) use cases as well as serving/runtime (decentralized authorization) use cases.
Vue Virtual Scroll List3262 ⭐
⚡️A vue component support big amount data list with high render performance and efficient.
Nodefluent Kafka Streams715 ⭐
equivalent to kafka-streams :octopus: for nodejs :sparkles::turtle::rocket::sparkles: