Analysis of winning numbers of Irish Lotto

This blog is an analysis of winning numbers of Irish Lotto from last two years. The National Lottery brought new initiatives from Thursday, September 3, 2015, with adding two numbers to the draw meaning players choose from 47 numbers rather than 45 numbers. With this change, the odds of picking the six winning numbers went from just […]

Jobs which are most susceptible to automation

Throughout history, the technological advances have raised fears that traditional jobs will become obsolete. In this post, I find out the jobs which are most susceptible to automation. Elon Musk told the National Governors Association: “There certainly will be job disruption. Because what’s going to happen is robots will be able to do everything better […]

Analysis of Residential Property Prices in Dublin

Living in Dublin, Ireland is amazingly expensive. Residential property prices in Dublin are growing. Yet we all think about buying a home while still wondering whether we might be better off continuing to rent. The data analyst in me wanted to dive deeper, to look back historically, to quantify, to visualize the trends, etc. to […]

10 groups of Machine Learning Algorithms

In this article, I grouped some of the popular machine learning algorithms either by learning or problem type. There is a brief description of how these algorithms work and their potential use case. Regression How it works: A regression uses the historical relationship between an independent and a dependent variable to predict the future values […]

Countvectorizer sklearn example

This countvectorizer sklearn example is from Pycon Dublin 2016. For further information please visit this link. The dataset is from UCI. In [2]: messages = [line.rstrip() for line in open(‘smsspamcollection/SMSSpamCollection’)] In [3]: print (len(messages)) 5574 In [5]: for num,message in enumerate(messages[:10]): print(num,message) print (‘n’) 0 ham Go until jurong point, crazy.. Available only in bugis n great world la e […]

What Make A Really Good Diamond?

The aim of this blog is to assess the quality and characteristics of the diamonds and gain insights about what makes a really good diamond. The data set is from ggplot2. The explanatory data analysis is done in Python and the notebooks are available on my Github. This blog address few important questions such as: […]

Visualise Categorical Variables in Python

It is crucial to learn the methods of dealing with categorical variables as categorical variables are known to hide and mask lots of interesting information in a data set. A categorical variable identifies a group to which the thing belongs. You could categorise persons according to their race or ethnicity, cities according to their geographic […]

Visualisation of House Prices

Visualization is the presentation of data in a pictorial or graphical format. It enables decision-makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. This visualization of house prices is for the Kaggle dataset. With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this […]

The Comprehensive Guide for Feature Engineering

Feature Engineering is the art/science of representing data is the best way possible. This is the comprehensive guide for Feature Engineering for myself  but I figured that they might be of interest to some of the blog readers too. Comments on what is written below are most welcome! Good Feature Engineering involves an elegant blend […]

4 different ways to predict survival on Titanic

These are my notes from various blogs to find different ways to predict survival on Titanic using Python-stack. I am interested to compare how different people have attempted the kaggle competition. I am going to compare and contrast different analysis to find similarity and difference in approaches to predict survival on Titanic.   This Notebook […]