173 Open Source Feature Engineering Software Projects
Free and open source feature engineering code projects including engines, APIs, generators, and tools.
A neural text process python lib for context-based feature extraction on Seq-Tagging data.
Detecting drug-drug interaction (DDI) has become a vital part of public health safety. This project is an implementation of NLP based approach for such relation extraction between entities.
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Loan Prediction Analytics Vidhya23 ⭐
The solution to the Loan Prediction Practice Problem on Analytics Vidhya (https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii/)
Automatic feature engineering using deep learning and Bayesian inference using TensorFlow.
Predicting Transportation Modes Of Gps Trajectories34 ⭐
Understanding transportation mode from GPS (Global Positioning System) traces is an essential topic in the data mobility domain. In this paper, a framework is proposed to predict transportation modes. This framework follows a sequence of five steps: (i) data preparation, where GPS points are grouped in trajectory samples; (ii) point features generation; (iii) trajectory features extraction; (iv) noise removal; (v) normalization. We show that the extraction of the new point features: bearing rate, the rate of rate of change of the bearing rate and the global and local trajectory features, like medians and percentiles enables many classifiers to achieve high accuracy (96.5%) and f1 (96.3%) scores. We also show that the noise removal task affects the performance of all the models tested. Finally, the empirical tests where we compare this work against state-of-art transportation mode prediction strategies show that our framework is competitive and outperforms most of them.
Pubmed Best Match31 ⭐
Machine-learning based pipeline relying on LambdaMART currently used in PubMed for relevance (Best Match) searches
Feature Selection580 ⭐
Features selector based on the self selected-algorithm, loss function and validation method
Open Solution Home Credit403 ⭐
Open solution to the Home Credit Default Risk challenge :house_with_garden:
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Titanic Survival In Depth Analysis12 ⭐
Used Pandas , Matplotlib , Seaborn libraries to Analyze , Visualize and Explore the data of people travelling on Titanic, and Used Scikit-learn Modelling Algorithms to predict their probability of Survival.
Diamonds In Depth Analysis17 ⭐
Given dataset of Diamonds with features such as Cut, Carat, Clarity etc. I have used libraries such as Pandas, Numpy, Matplotlib, Seaborn to Analyse and Estimate the Price of Diamonds based on the features. Using Scikit-Learn , implemented Algorithms to increase the effective R2 score.
A high level deep learning framework for quickly prototyping networks with added tools in data visualisation, model interpretability and performance metrics
Awesome Feature Engineering497 ⭐
A curated list of resources dedicated to Feature Engineering Techniques for Machine Learning
Home Credit Default Risk73 ⭐
Default risk prediction for Home Credit competition - Fast, scalable and maintainable SQL-based feature engineering pipeline
Disentangled Attribution Curves20 ⭐
Using / reproducing DAC from the paper "Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees"
Rl_sutton Barto_solutions18 ⭐
Solutions and figures for problems from Reinforcement Learning: An Introduction Sutton&Barto
R package for automation of machine learning, forecasting, feature engineering, model evaluation, model interpretation, recommenders, and EDA.
Deep Learning Machine Learning Stock485 ⭐
Deep Learning and Machine Learning stocks represent a promising long-term or short-term opportunity for investors and traders.
Mljar Supervised1756 ⭐
Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
汉字字符特征提取器 (featurizer)，提取汉字的特征（发音特征、字形特征）用做深度学习的特征 ｜ A Chinese character feature extractor, which extracts the features of Chinese characters (pronunciation features, glyph features) as features for deep learning
Feature Engineering And Feature Selection676 ⭐
A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.
Dominance Analysis105 ⭐
This package can be used for dominance analysis or Shapley Value Regression for finding relative importance of predictors on given dataset. This library can be used for key driver analysis or marginal resource allocation models.
Predict Household Poverty22 ⭐
Predict the poverty of households in Costa Rica using automated feature engineering.
Fifa 2019 Analysis25 ⭐
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
World Food Production12 ⭐
Comparing Top food and feed Producers around the globe and also seeking some interesting answers, solutions, patterns, hints and warnings through the power of Data Analysis and Data Visualization using Machine Learning.
Black Friday Regression Analysis10 ⭐
Predicting Prices for the products to be sold on Black Friday in US using Regression Analysis, Feature Engineering, Feature Selection, Feature Extraction and Data analysis - Data Visualizations.
Drugs Recommendation Using Reviews41 ⭐
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Image Classification for Android using Artificial Neural Network using NumPy and Kivy.
Exemplary Ml Pipeline21 ⭐
Exemplary, annotated machine learning pipeline for any tabular data problem.
Feature Selection Techniques37 ⭐
Python code source for features selection 👨🔬 series on medium website. 📰
Feature Engineering For Fraud Detection27 ⭐
Implementation of feature engineering from Feature engineering strategies for credit card fraud
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Bike Sharing Demand Kaggle33 ⭐
Top 5th percentile solution to the Kaggle knowledge problem - Bike Sharing Demand
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
The Building Data Genome Project137 ⭐
A collection of non-residential buildings for performance analysis and algorithm benchmarking
Sgx Full Orderbook Tick Data Trading Strategy913 ⭐
Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.
Clj Example Nlp Ml13 ⭐
Example Project for Natural Language Processing and Machine Learning Libraries
(deprecated) A fast and memory-efficient Python data engineering framework for machine learning.
This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Xiaoganghan Awesome Feature Engineering48 ⭐
A curated list of feature engineering techniques for image and text machine learning
Kaggle Quora Question Pairs720 ⭐
Kaggle：Quora Question Pairs, 4th/3396 (https://www.kaggle.com/c/quora-question-pairs)
Quora Paraphrase Question Identification20 ⭐
Paraphrase question identification using Feature Fusion Network (FFN).
Marcnuth Genetics15 ⭐
Genetic Algorithm in Python, which could be used for Sampling, Feature Select, Model Select, etc in Machine Learning
Cortana Intelligence Customer36022 ⭐
This repository contains instructions and code to deploy a customer 360 profile solution on Azure stack using the Cortana Intelligence Suite.