It is crucial to learn the methods of dealing with categorical variables as categorical variables are known to hide and mask lots of interesting information in a data set. A categorical variable identifies a group to which the thing belongs. You could categorise persons according to their race or ethnicity, cities according to their geographic […]

# Tag: data visualization

I recently took part in a challenge by Hostelworld. The challenge proposed by Hostelworld is to build a recommendation engine for users. Recommendations can save Recommendations can save travellers valuable time, improve their hostel experience, and increase user retention. This challenge will use user information, reviews, and hostel details. This is a link for Exploratory […]

Visualisation is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. This visualisation of house prices is for the Kaggle dataset. With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, […]

continued from part 3 4. Way to predict survival on Titianic These notes are taken from this link In [2]: import matplotlib.pyplot as plt %matplotlib inline import numpy as np import pandas as pd import statsmodels.api as sm from statsmodels.nonparametric.kde import KDEUnivariate from statsmodels.nonparametric import smoothers_lowess from pandas import Series, DataFrame from patsy import dmatrices from […]

These are my notes from various blogs to find different ways to predict survival on Titanic using Python-stack. This is continued from part 2 3. Way to predict survival on Titianic These notes are from this link I – Exploratory data analysis We tweak the style of this notebook a little bit to have centered plots. […]

These are my notes from various blogs to find different ways to predict survival on Titanic using Python-stack. I am interested to compare how different people have attempted the kaggle competition. I am going to compare and contrast different analysis to find similarity and difference in approaches to predict survival on Titanic. This Notebook will […]

This post is exploratory data analysis with pandas – 2. Exploratory Data Analysis with pandas can be effective should be fast and graphic. This is continued from part 1 In [10]: densityplot = iris_df.plot(kind=’density’) In [11]: single_distribution = iris_df[‘petal width (cm)’].plot(kind=’hist’, alpha=0.5) Scatterplots Scatterplots can be used to effectively understand whether the variables are in a nonlinear […]

This post is exploratory data analysis with pandas – 1. Clear data plots that explicate the relationship between variables can lead to the creation of newer and better features that can predict more than the existing ones. Exploratory Data Analysis, which can be effective if it has the following characteristics: • It should be fast, allowing […]