Analysis of winning numbers of Irish Lotto

Posted 4 CommentsPosted in Data Analysis Resources, Experience

This blog is an analysis of winning numbers of Irish Lotto from last two years. The National Lottery brought new initiatives from Thursday, September 3, 2015, with adding two numbers to the draw meaning players choose from 47 numbers rather than 45 numbers. With this change, the odds of picking the six winning numbers went from just […]

Jobs which are most susceptible to automation

Throughout history, the technological advances have raised fears that traditional jobs will become obsolete. In this post, I find out the jobs which are most susceptible to automation. Elon Musk told the National Governors Association: “There certainly will be job disruption. Because what’s going to happen is robots will be able to do everything better […]

Analysis of Residential Property Prices in Dublin

Living in Dublin, Ireland is amazingly expensive. Residential property prices in Dublin are growing. Yet we all think about buying a home while still wondering whether we might be better off continuing to rent. The data analyst in me wanted to dive deeper, to look back historically, to quantify, to visualize the trends, etc. to […]

10 groups of Machine Learning Algorithms

In this article, I grouped some of the popular machine learning algorithms either by learning or problem type. There is a brief description of how these algorithms work and their potential use case. Regression How it works: A regression uses the historical relationship between an independent and a dependent variable to predict the future values […]

Countvectorizer sklearn example

This countvectorizer sklearn example is from Pycon Dublin 2016. For further information please visit this link. The dataset is from UCI. In [2]: messages = [line.rstrip() for line in open(‘smsspamcollection/SMSSpamCollection’)] In [3]: print (len(messages)) 5574 In [5]: for num,message in enumerate(messages[:10]): print(num,message) print (‘\n’) 0 ham Go until jurong point, crazy.. Available only in bugis n great world la e […]

What Make A Really Good Diamond?

The aim of this blog is to assess the quality and characteristics of the diamonds and gain insights about what makes a really good diamond. The data set is from ggplot2. The explanatory data analysis is done in Python and the notebooks are available on my Github. This blog address few important questions such as: […]

Truth About Nutritional Information in Recipes

So much has been said about Proteins, Fat & Calories in recent years, that the single biggest challenge faced when trying to answer the question is how to “separate the wheat from the chaff.” Protein and fats are what our body uses and they all have calorie counts. Too many and we get fat, too […]

Visualise Categorical Variables in Python

Posted 2 CommentsPosted in Data Analysis Resources

It is crucial to learn the methods of dealing with categorical variables as categorical variables are known to hide and mask lots of interesting information in a data set. A categorical variable identifies a group to which the thing belongs. You could categorise persons according to their race or ethnicity, cities according to their geographic […]