## Guide for Linear Regression using Python – Part 2

Guide for Linear Regression using Python – Part 2 This blog is the continuation of guide for linear regression using Python from this post. There must be no correlation among independent variables. Multicollinearity is the presence of correlation in independent variables. If variables are correlated, it becomes extremely difficult for the model to determine the […]

## Guide for Linear Regression using Python – Part 1

Regression is the first algorithm we need to master if we are aspiring to become a data scientist. It is one of the easiest algorithms to learn yet requires understanding and effort to get to the master it. In this blog is a guide for linear regression using Python. It will focus on linear and multiple […]

## Predicting NBA winners with Decision Trees and Random Forests in Scikit-learn

In this blog, we will be predicting NBA winners with Decision Trees and Random Forests in Scikit-learn.The National Basketball Association (NBA) is the major men’s professional basketball league in North America and is widely considered to be the premier men’s professional basketball league in the world. It has 30 teams (29 in the United States and […]

## Jobs which are most susceptible to automation

Posted Leave a commentPosted in Business, Data Analysis Resources

Throughout history, the technological advances have raised fears that traditional jobs will become obsolete. In this post, I find out the jobs which are most susceptible to automation. Elon Musk told the National Governors Association: “There certainly will be job disruption. Because what’s going to happen is robots will be able to do everything better […]

## 10 groups of Machine Learning Algorithms

In this article, I grouped some of the popular machine learning algorithms either by learning or problem type. There is a brief description of how these algorithms work and their potential use case. Regression How it works: A regression uses the historical relationship between an independent and a dependent variable to predict the future values […]

## Countvectorizer sklearn example

This countvectorizer sklearn example is from Pycon Dublin 2016. For further information please visit this link. The dataset is from UCI. In [2]: messages = [line.rstrip() for line in open(‘smsspamcollection/SMSSpamCollection’)] In [3]: print (len(messages)) 5574 In [5]: for num,message in enumerate(messages[:10]): print(num,message) print (‘\n’) 0 ham Go until jurong point, crazy.. Available only in bugis n great world la e […]

## Lessons Learnt from AIB Data Hack

Posted Leave a commentPosted in Competition Notes, Machine Learning

Yesterday I attended AIB Data Hack. It was my first one-day data hackathon. This notebook contains some of the lessons learnt from AIB Data Hack while working on a complicated, large dataset and little time. I have taken part in few Kaggle Competitions. However, I have to say the experience in a single day or […]

## Time-Series Predictive Analysis of DAX 30

In this blog post we’ll examine some common techniques used in time-series analysis of DAX 30 by applying them to a data set containing daily closing values from 1990 up to present day. The DAX (Deutscher Aktienindex (German stock index)) is a blue chip stock market index consisting of the 30 major German companies trading […]