Predicting NBA winners with Decision Trees and Random Forests in Scikit-learn
Machine Learning, Predictive Analysis, scikit-learn

Predicting NBA winners with Decision Trees and Random Forests in Scikit-learn

In this blog, we will be predicting NBA winners with Decision Trees and Random Forests in Scikit-learn.The National Basketball Association (NBA) is the major men’s professional basketball league in North America and is widely considered to be the premier men’s professional basketball league in the world. It has 30 teams (29 in the United States and 1 in Canada). The data…

Continue Reading

countvectorizer sklearn example
Data Analysis Resources, Machine Learning, scikit-learn

Countvectorizer sklearn example

This countvectorizer sklearn example is from Pycon Dublin 2016. For further information please visit this link. The dataset is from UCI. In [2]: messages = [line.rstrip() for line in open(‘smsspamcollection/SMSSpamCollection’)] In [3]: print (len(messages)) 5574 In [5]: for num,message in enumerate(messages[:10]): print(num,message) print (‘\n’) 0 ham Go until jurong point, crazy.. Available only in bugis n great world la e buffet… Cine there got amore…

Continue Reading

Kaggle, Predictive Analysis, scikit-learn

Features selection for determining House Prices ?

Home values are influenced by many factors. Basically, there are two major aspects: The environmental information, including location, local economy, school district, air quality, etc. The characteristics information of the property, such as lot size, house size and age, the number of rooms, heating / AC systems, garage, and so on. When people consider buying homes, usually the location has…

Continue Reading

Womens Health Risk Assessment
Competition Notes, Machine Learning, scikit-learn

Modeling Women’s Health Risk Assessment

Women’s Health Risk Assessment is a multi-class classification competition for finding an optimized machine learning a solution that allows a young woman (age 15-30 years old) to be accurately categorized for their particular health risk. Based on the category a patient falls within, healthcare providers can offer appropriate education and training programs to help reduce the patient’s reproductive health risks.…

Continue Reading

Installing XGBoost for Windows
Data Analysis Resources, Machine Learning, scikit-learn

Installing XGBoost for Windows – walk-through

I have the following specification on my computer: Windows10, 64 bit,Python 3.5 and Anaconda3.I tried many times to install XGBoost but somehow it never worked for me. Today I decided to make it happen and am sharing this post to help anyone else who is struggling with installing XGBoost for Windows. XGBoost is short for “Extreme Gradient Boosting”.XGBoost is an…

Continue Reading

Kaggle, Machine Learning, scikit-learn

Evaluating Algorithms using Kaggle’s Digit Recognizer Data

In [1]: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import warnings warnings.filterwarnings(‘ignore’) In [2]: # importing the train dataset train = pd.read_csv(r’C:\Users\piush\Desktop\Dataset\DigitRecognizer\train.csv’) train.head(10) Out[2]: label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 … pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783 0 1 0 0 0 0…

Continue Reading