202 Open Source Bigdata Software Projects
Free and open source bigdata code projects including engines, APIs, generators, and tools.
Tdengine 13921 ⭐
An open-source big data platform designed and optimized for the Internet of Things (IoT).
Awesome Bigdata 9344 ⭐
A curated list of awesome big data frameworks, ressources and other awesomeness.
Vaex 5163 ⭐
Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀
Poli 1714 ⭐
An easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.
Dotnet Spark 1508 ⭐
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Griddb 1308 ⭐
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
Spark Py Notebooks 1275 ⭐
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Ironmussa Optimus 939 ⭐
:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Spark Movie Lens 732 ⭐
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Kube Batch 724 ⭐
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
Bigdata Interview 700 ⭐
Coding Now 655 ⭐
Rdkmaster Jigsaw 342 ⭐
Jigsaw七巧板 provides a set of web components based on Angular5/8/9+. The main purpose of Jigsaw is to help the application developers to construct complex & intensive interacting & user friendly web pages. Jigsaw is supporting the development of all applications of Big Data Product of ZTE.
Feedirss API 333 ⭐
RSS as RESTful. This service allows you to transform RSS feed into an awesome API.
Datawave 326 ⭐
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Mvillarrealb Docker Spark Cluster 246 ⭐
A simple spark standalone cluster for your testing environment purposses
Datafaker 249 ⭐
Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Every Single Day I Tldr 234 ⭐
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Big Data Rosetta Code 236 ⭐
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Aws Etl Orchestrator 221 ⭐
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Hadoop Attack Library 214 ⭐
A collection of pentest tools and resources targeting Hadoop environments
Sparkrdma 204 ⭐
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Athenacli 132 ⭐
AthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.
Azure Event Hubs Spark 130 ⭐
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Hadoopcryptoledger 123 ⭐
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Spark R Notebooks 110 ⭐
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Kotlin Spark API 130 ⭐
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Tennis Crystal Ball 97 ⭐
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Memverge Splash 94 ⭐
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Tensorbase 111 ⭐
TensorBase is building a modern big data warehouse with performance in its core mind.
Covid19 Market Waiting Times 94 ⭐
A project to help people stand in line at the market as little as possible
Clustering4ever 89 ⭐
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Ignite Book Code Samples 86 ⭐
All code samples, scripts and more in-depth examples for the book high performance in-memory computing with Apache Ignite. Please use the repository "the-apache-ignite-book" for Ignite version 2.6 or above.
Big Data Engineering Coursera Yandex 69 ⭐
Big Data for Data Engineers Coursera Specialization from Yandex
Meetups Archivos 61 ⭐
Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros de Conocimiento e Investigación.
The Apache Ignite Book 45 ⭐
All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above