## Exploratory Data Analysis for Hostelworld Challenge

I recently took part in a challenge by Hostelworld. The challenge proposed by Hostelworld is to build a recommendation engine for users. Recommendations can save Recommendations can save travellers valuable time, improve their hostel experience, and increase user retention. This challenge will use user information, reviews, and hostel details. This is a link for Exploratory […]

## Time-Series Predictive Analysis of DAX 30

In this blog post we’ll examine some common techniques used in time-series analysis of DAX 30 by applying them to a data set containing daily closing values from 1990 up to present day. The DAX (Deutscher Aktienindex (German stock index)) is a blue chip stock market index consisting of the 30 major German companies trading […]

## Visualisation of House Prices

Visualisation is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. This visualisation of house prices is for the Kaggle dataset. With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, […]

## Computational Statistics

Statistics and Math are the two things which a data scientist must be good at. Effect Size This notebook is a copy of statistics inference from Pycon 2016 In [1]: from __future__ import print_function, division import numpy import scipy.stats import matplotlib.pyplot as pyplot from ipywidgets import interact, interactive, fixed import ipywidgets as widgets # seed the […]

## How to win a Data Science competition ?

These are mainly notes for myself to win a Data Science competition , but I figured that they might be of interest to some of the blog readers too. Comments on what is written below are most welcome! Typically, steps 1-5 would happen once per competition or problem, while steps 6-9 would be repeated in a […]

## The Comprehensive Guide for Feature Engineering

Feature Engineering is the art/science of representing data is the best way possible. This is the comprehensive guide for Feature Engineering for myself  but I figured that they might be of interest to some of the blog readers too. Comments on what is written below are most welcome! Good Feature Engineering involves an elegant blend […]

## 4 different ways to predict survival on Titanic – part 4

continued from part 3 4. Way to predict survival on Titianic These notes are taken from this link In [2]: import matplotlib.pyplot as plt %matplotlib inline import numpy as np import pandas as pd import statsmodels.api as sm from statsmodels.nonparametric.kde import KDEUnivariate from statsmodels.nonparametric import smoothers_lowess from pandas import Series, DataFrame from patsy import dmatrices from […]

## 4 different ways to predict survival on Titanic – part 3

continued from part 2 3. Way to predict survival on Titianic These notes are from this link I – Exploratory data analysis We tweak the style of this notebook a little bit to have centered plots. In [1]: from IPython.core.display import HTML HTML(“”” <style> .output_png { display: table-cell; text-align: center; vertical-align: middle; } </style> “””) Out[1]: […]