visualization of house prices
Data Analysis Resources, Kaggle

Visualisation of House Prices

Visualisation is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. This visualisation of house prices is for the Kaggle dataset. With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges predicting the…

Continue Reading

Data Analysis Resources

Computational Statistics

Statistics and Math are the two things which a data scientist must be good at. Effect Size This notebook is a copy of statistics inference from Pycon 2016 In [1]: from __future__ import print_function, division import numpy import scipy.stats import matplotlib.pyplot as pyplot from ipywidgets import interact, interactive, fixed import ipywidgets as widgets # seed the random number generator so we…

Continue Reading

4 different ways to predict survival on Titanic - part 4
Data Analysis Resources, Kaggle

4 different ways to predict survival on Titanic – part 4

continued from part 3 4. Way to predict survival on Titianic These notes are taken from this link In [2]: import matplotlib.pyplot as plt %matplotlib inline import numpy as np import pandas as pd import statsmodels.api as sm from statsmodels.nonparametric.kde import KDEUnivariate from statsmodels.nonparametric import smoothers_lowess from pandas import Series, DataFrame from patsy import dmatrices from sklearn import datasets, svm In [3]:…

Continue Reading

Data Analysis Resources, Kaggle

4 different ways to predict survival on Titanic – part 3

continued from part 2 3. Way to predict survival on Titianic These notes are from this link I – Exploratory data analysis We tweak the style of this notebook a little bit to have centered plots. In [1]: from IPython.core.display import HTML HTML(“”” <style> .output_png { display: table-cell; text-align: center; vertical-align: middle; } </style> “””) Out[1]: In [2]: #Import the libraries #…

Continue Reading

Data Analysis Resources, Kaggle

4 different ways to predict survival on Titanic – part 1

These are my notes from various blogs to find different ways to predict survival on Titanic using Python-stack. I am interested to compare how different people have attempted the kaggle competition. I am going to compare and contrast different analysis to find similarity and difference in approaches to predict survival on Titanic. This Notebook will show basic examples of: Data…

Continue Reading

4 different ways to predict survival on Titanic - part 4
Data Analysis Resources, Kaggle, Predictive Analysis

4 different ways to predict survival on Titanic – part 2

continued from part 1 Classification KNeighborsClassifier In [16]: from sklearn.neighbors import KNeighborsClassifi alg_ngbh = KNeighborsClassifier(n_neighbors=3) scores = cross_validation.cross_val_score(alg_ngbh, train_data_scaled, train_data_munged[“Survived”], cv=cv, n_jobs=-1) print(“Accuracy (k-neighbors): {}/{}”.format(scores.mean(), scores.std())) Accuracy (k-neighbors): 0.7957351290684623/0.011110544261068086 SGDClassifier In [17]: from sklearn.linear_model.stochastic_gradient import SGDClassifier alg_sgd = SGDClassifier(random_state=1) scores = cross_validation.cross_val_score(alg_sgd, train_data_scaled, train_data_munged[“Survived”], cv=cv, n_jobs=-1) print(“Accuracy (sgd): {}/{}”.format(scores.mean(), scores.std())) Accuracy (sgd): 0.7239057239057239/0.015306601231185043 SVC In [18]: from sklearn.svm import SVC alg_svm = SVC(C=1.0)…

Continue Reading