Kaggle, Predictive Analysis, scikit-learn

Features selection for determining House Prices ?

Home values are influenced by many factors. Basically, there are two major aspects: The environmental information, including location, local economy, school district, air quality, etc. The characteristics information of the property, such as lot size, house size and age, the number of rooms, heating / AC systems, garage, and so on. When people consider buying homes, usually the location has…

Continue Reading

visualization of house prices
Data Analysis Resources, Kaggle

Visualisation of House Prices

Visualisation is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. This visualisation of house prices is for the Kaggle dataset. With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges predicting the…

Continue Reading

4 different ways to predict survival on Titanic - part 4
Data Analysis Resources, Kaggle

4 different ways to predict survival on Titanic – part 4

continued from part 3 4. Way to predict survival on Titianic These notes are taken from this link In [2]: import matplotlib.pyplot as plt %matplotlib inline import numpy as np import pandas as pd import statsmodels.api as sm from statsmodels.nonparametric.kde import KDEUnivariate from statsmodels.nonparametric import smoothers_lowess from pandas import Series, DataFrame from patsy import dmatrices from sklearn import datasets, svm In [3]:…

Continue Reading

Data Analysis Resources, Kaggle

4 different ways to predict survival on Titanic – part 3

continued from part 2 3. Way to predict survival on Titianic These notes are from this link I – Exploratory data analysis We tweak the style of this notebook a little bit to have centered plots. In [1]: from IPython.core.display import HTML HTML(“”” <style> .output_png { display: table-cell; text-align: center; vertical-align: middle; } </style> “””) Out[1]: In [2]: #Import the libraries #…

Continue Reading

Data Analysis Resources, Kaggle

4 different ways to predict survival on Titanic – part 1

These are my notes from various blogs to find different ways to predict survival on Titanic using Python-stack. I am interested to compare how different people have attempted the kaggle competition. I am going to compare and contrast different analysis to find similarity and difference in approaches to predict survival on Titanic. This Notebook will show basic examples of: Data…

Continue Reading

4 different ways to predict survival on Titanic - part 4
Data Analysis Resources, Kaggle, Predictive Analysis

4 different ways to predict survival on Titanic – part 2

continued from part 1 Classification KNeighborsClassifier In [16]: from sklearn.neighbors import KNeighborsClassifi alg_ngbh = KNeighborsClassifier(n_neighbors=3) scores = cross_validation.cross_val_score(alg_ngbh, train_data_scaled, train_data_munged[“Survived”], cv=cv, n_jobs=-1) print(“Accuracy (k-neighbors): {}/{}”.format(scores.mean(), scores.std())) Accuracy (k-neighbors): 0.7957351290684623/0.011110544261068086 SGDClassifier In [17]: from sklearn.linear_model.stochastic_gradient import SGDClassifier alg_sgd = SGDClassifier(random_state=1) scores = cross_validation.cross_val_score(alg_sgd, train_data_scaled, train_data_munged[“Survived”], cv=cv, n_jobs=-1) print(“Accuracy (sgd): {}/{}”.format(scores.mean(), scores.std())) Accuracy (sgd): 0.7239057239057239/0.015306601231185043 SVC In [18]: from sklearn.svm import SVC alg_svm = SVC(C=1.0)…

Continue Reading

Kaggle, Machine Learning, scikit-learn

Evaluating Algorithms using Kaggle’s Digit Recognizer Data

In [1]: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import warnings warnings.filterwarnings(‘ignore’) In [2]: # importing the train dataset train = pd.read_csv(r’C:\Users\piush\Desktop\Dataset\DigitRecognizer\train.csv’) train.head(10) Out[2]: label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 … pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783 0 1 0 0 0 0…

Continue Reading

Submission for Kaggle's Titanic Competition
Kaggle, Machine Learning, Predictive Analysis

Submission for Kaggle’s Titanic Competition

Following is my submission for Kaggle’s Titanic Competition In [361]: import pandas as pd import numpy as np In [362]: df_train = pd.read_csv(r’C:\Users\piush\Desktop\Dataset\Titanic\train.csv’) In [363]: df_train.head(2) Out[363]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 1 2 1 1 Cumings, Mrs. John…

Continue Reading