Categories
Machine Learning Predictive Analysis scikit-learn

Predicting NBA winners with Decision Trees and Random Forests in Scikit-learn

In this blog, we will be predicting NBA winners with Decision Trees and Random Forests in Scikit-learn.The National Basketball Association (NBA) is the major men’s professional basketball league in North America and is widely considered to be the premier men’s professional basketball league in the world. It has 30 teams (29 in the United States and […]

Categories
Machine Learning Predictive Analysis scikit-learn

Modeling Bitcoin’s Market Capitalization

Bitcoin has been in news quite a bit lately with the price soaring. It was named the top performing currency four of the last five year. And it’ price has the potential to hit over $100,000 in 10 years, which would mark a 3,483 percent rise from its recent record high. In this post, we are […]

Categories
Data Analysis Resources Machine Learning scikit-learn

Countvectorizer sklearn example

This countvectorizer sklearn example is from Pycon Dublin 2016. For further information please visit this link. The dataset is from UCI. In [2]: messages = [line.rstrip() for line in open(‘smsspamcollection/SMSSpamCollection’)] In [3]: print (len(messages)) 5574 In [5]: for num,message in enumerate(messages[:10]): print(num,message) print (‘\n’) 0 ham Go until jurong point, crazy.. Available only in bugis n great world la e […]

Categories
Kaggle Predictive Analysis scikit-learn

Features Selection for determining House Prices

Home values are influenced by many factors. Basically, there are two major aspects: The environmental information, including location, local economy, school district, air quality, etc. The characteristics information of the property, such as lot size, house size and age, the number of rooms, heating / AC systems, garage, and so on. When people consider buying […]

Categories
Competition Notes Machine Learning scikit-learn

Modeling Women’s Health Risk Assessment

Women’s Health Risk Assessment is a multi-class classification competition for finding an optimized machine learning a solution that allows a young woman (age 15-30 years old) to be accurately categorized for their particular health risk. Based on the category a patient falls within, healthcare providers can offer appropriate education and training programs to help reduce […]

Categories
Data Analysis Resources Machine Learning scikit-learn

Installing XGBoost for Windows – walk-through

I have the following specification on my computer: Windows10, 64 bit,Python 3.5 and Anaconda3.I tried many times to install XGBoost but somehow it never worked for me. Today I decided to make it happen and am sharing this post to help anyone else who is struggling with installing XGBoost for Windows. XGBoost is short for […]

Categories
Kaggle Machine Learning scikit-learn

Evaluating Algorithms using MNIST

This post is evaluating Aagorithms using MNIST In [1]: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import warnings warnings.filterwarnings(‘ignore’) In [2]: # importing the train dataset train = pd.read_csv(r’C:\Users\piush\Desktop\Dataset\DigitRecognizer\train.csv’) train.head(10) Out[2]: label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 … pixel774 pixel775 pixel776 pixel777 […]

Categories
Machine Learning scikit-learn

Principal Component Analysis in scikit-learn

Principal Component Analysis (PCA) is an orthogonal linear transformation that turns a set of possibly correlated variables into a new set of variables that are as uncorrelated as possible. The new variables lie in a new coordinate system such that the greatest variance is obtained by projecting the data in the first coordinate, the second […]