Hostelworld Challenge

Exploratory Data Analysis for Hostelworld Challenge

Posted on Leave a commentPosted in Competition Notes, Data Analysis Resources, Personal Stories

I recently took part in a challenge by Hostelworld. The challenge proposed by Hostelworld is to build a recommendation engine for users. Recommendations can save Recommendations can save travellers valuable time, improve their hostel experience, and increase user retention. This challenge will use user information, reviews, and hostel details. This is a link for Exploratory […]

visualization of house prices

Visualisation of House Prices

Posted on 1 CommentPosted in Data Analysis Resources, Kaggle

Visualisation is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. This visualisation of house prices is for the Kaggle dataset. With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, […]

Computational Statistics

Posted on Leave a commentPosted in Data Analysis Resources

Statistics and Math are the two things which a data scientist must be good at. Effect Size This notebook is a copy of statistics inference from Pycon 2016 In [1]: from __future__ import print_function, division import numpy import scipy.stats import matplotlib.pyplot as pyplot from ipywidgets import interact, interactive, fixed import ipywidgets as widgets # seed the […]

Comprehensive Guide Feature Engineering

The Comprehensive Guide for Feature Engineering

Posted on 2 CommentsPosted in Data Analysis Resources, Machine Learning

Feature Engineering is the art/science of representing data is the best way possible. This is the comprehensive guide for Feature Engineering for myself  but I figured that they might be of interest to some of the blog readers too. Comments on what is written below are most welcome! Good Feature Engineering involves an elegant blend […]

4 different ways to predict survival on Titanic - part 4

4 different ways to predict survival on Titanic – part 4

Posted on Leave a commentPosted in Data Analysis Resources, Kaggle

continued from part 3 4. Way to predict survival on Titianic These notes are taken from this link In [2]: import matplotlib.pyplot as plt %matplotlib inline import numpy as np import pandas as pd import statsmodels.api as sm from statsmodels.nonparametric.kde import KDEUnivariate from statsmodels.nonparametric import smoothers_lowess from pandas import Series, DataFrame from patsy import dmatrices from […]

4 different ways to predict survival on Titanic – part 3

Posted on Leave a commentPosted in Data Analysis Resources, Kaggle

continued from part 2 3. Way to predict survival on Titianic These notes are from this link I – Exploratory data analysis We tweak the style of this notebook a little bit to have centered plots. In [1]: from IPython.core.display import HTML HTML(“”” <style> .output_png { display: table-cell; text-align: center; vertical-align: middle; } </style> “””) Out[1]: […]