Categories
Data Analysis Resources Machine Learning scikit-learn

Countvectorizer sklearn example

This countvectorizer sklearn example is from Pycon Dublin 2016. For further information please visit this link. The dataset is from UCI. In [2]: messages = [line.rstrip() for line in open(‘smsspamcollection/SMSSpamCollection’)] In [3]: print (len(messages)) 5574 In [5]: for num,message in enumerate(messages[:10]): print(num,message) print (‘\n’) 0 ham Go until jurong point, crazy.. Available only in bugis n great world la e […]

Categories
Competition Notes Machine Learning

A Quick Fix for Overcoming a Future Without Jobs

“What to do about mass unemployment? This is going to be a massive social challenge. There will be fewer and fewer jobs that a robot cannot do better [than a human]. These are not things that I wish will happen. These are simply things that I think probably will happen.” — Elon Musk The rapid growth […]

Categories
Competition Notes Machine Learning

Lessons Learnt from AIB Data Hack

Yesterday I attended AIB Data Hack. It was my first one-day data hackathon. This notebook contains some of the lessons learnt from AIB Data Hack while working on a complicated, large dataset and little time. I have taken part in few Kaggle Competitions. However, I have to say the experience in a single day or […]

Categories
Competition Notes Machine Learning scikit-learn

Modeling Women’s Health Risk Assessment

Women’s Health Risk Assessment is a multi-class classification competition for finding an optimized machine learning a solution that allows a young woman (age 15-30 years old) to be accurately categorized for their particular health risk. Based on the category a patient falls within, healthcare providers can offer appropriate education and training programs to help reduce […]

Categories
Data Analysis Resources Machine Learning

The Comprehensive Guide for Feature Engineering

Feature Engineering is the art/science of representing data is the best way possible. This is the comprehensive guide for Feature Engineering for myself  but I figured that they might be of interest to some of the blog readers too. Comments on what is written below are most welcome! Good Feature Engineering involves an elegant blend […]

Categories
Data Analysis Resources Machine Learning scikit-learn

Installing XGBoost for Windows – walk-through

I have the following specification on my computer: Windows10, 64 bit,Python 3.5 and Anaconda3.I tried many times to install XGBoost but somehow it never worked for me. Today I decided to make it happen and am sharing this post to help anyone else who is struggling with installing XGBoost for Windows. XGBoost is short for […]

Categories
Kaggle Machine Learning scikit-learn

Evaluating Algorithms using MNIST

This post is evaluating Aagorithms using MNIST In [1]: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import warnings warnings.filterwarnings(‘ignore’) In [2]: # importing the train dataset train = pd.read_csv(r’C:\Users\piush\Desktop\Dataset\DigitRecognizer\train.csv’) train.head(10) Out[2]: label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 … pixel774 pixel775 pixel776 pixel777 […]

Categories
Kaggle Machine Learning Predictive Analysis

Submission for Kaggle’s Titanic Competition

Following is my submission for Kaggle’s Titanic Competition In [361]: import pandas as pd import numpy as np In [362]: df_train = pd.read_csv(r’C:\Users\piush\Desktop\Dataset\Titanic\train.csv’) In [363]: df_train.head(2) Out[363]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 1 2 […]