This blog is an analysis of winning numbers of Irish Lotto from last two years.

The National Lottery brought new initiatives from Thursday, September 3, 2015, with adding two numbers to the draw meaning players choose from 47 numbers rather than 45 numbers. With this change, the odds of picking the six winning numbers went from just over eight million to one to 10.7 million to one.

The Irish Lotto takes place every Wednesday and Saturday night at the cost of €2 and offers excellent odds of winning any prize of just 1 in 29 and boasts minimum jackpots of €2 million. In addition to this, you can take the chance to win two more jackpots in the Plus 1 and Plus 2 games for just 50c more. You can simply choose 6 numbers from 1–47. You can also enter the Irish Lotto Plus 1 and Irish Lotto Plus 2 draws by ticking the Irish Lotto + box located directly under the main game slip. The same numbers you selected in the main game will be entered in the Plus 1 and Plus 2 draws.

The draw works by drawing six numbers and an additional bonus number are drawn from a drum containing numbered balls from 1–47. Jackpots in the main game, Plus 1 and Plus 2 draws, are won when a player matches all six main numbers.

More details can be found here. The data is being scrapped using Python and Beautiful soup from the same website.

If you like to get data from last 20 years to do an analysis of your own, please visit the GitHub page.

### Scraping Data for analysis of winning numbers of Irish Lotto

```
#import the library used to query a website
import requests
#specify the url for 2017
lottery = "https://www.irishlottery.com/archive-2016"
#Query the website and return the html to the variable 'page'
page = requests.get(lottery)
```

```
page.status_code
```

A status_code of 200 means that the page downloaded successfully.

```
#import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup
#Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(page.content, 'html.parser')
```

```
# Find the 6 lotto numbers available at the class = "Ball"
ball_class = soup.find_all('td', class_='ball')
#get the text using list comprehension
ball = [pt.get_text() for pt in ball_class]
ball
type(ball)
```

```
#convert into a dataframe in acsending order
import numpy as np
import pandas as pd
df_ball = pd.DataFrame(np.array(ball).reshape(-1,6),columns = list("abcdef"))
```

```
# Find the bonus number available at the class = "bonus-ball"
ball_bonus_class = soup.find_all('td', class_='bonus-ball')
ball_bonus = [pt.get_text() for pt in ball_bonus_class]
type(ball_bonus)
```

```
df_ball_bonus = pd.DataFrame(ball_bonus,columns = ['Bonus'])
```

```
#join the two dataframes
df_2017 = pd.concat([df_ball, df_ball_bonus], axis=1)
```

```
#find all the dates available in th
date = soup.find_all('th')
date1 = [pt.get_text() for pt in date]
```

```
#convert to the dtaframe
df_date = pd.DataFrame(date1,columns =['Date'])
#select the datframe after first two rows as they show result date and draw result
df_date = df_date[2:]
```

```
#reset the index from 0
df_date = df_date.reset_index()
del df_date['index']
```

```
df_date.dtypes
```

```
#convert the date into date time
df_date = df_date.apply(pd.to_datetime)
```

```
#https://stackoverflow.com/questions/42693410/how-to-convert-object-to-date-in-pandas
df_date.dtypes
```

```
df1 = pd.concat([df_date, df_2017], axis=1)
```

```
#df1.to_csv('2016.csv')
```

#### Make a main file

```
from glob import glob
with open('main_csv.csv', 'a') as singleFile:
for csv in glob('*.csv'):
if csv == 'main_csv.csv':
pass
else:
for line in open(csv, 'r'):
singleFile.write(line)
```

### Analysis of winning numbers of Irish Lotto from 2nd September 2015 to 9th September 2017

```
df = pd.read_csv("main.csv")
```

```
df.head(2)
```

```
df.tail(2)
```

```
#Taking mean of the dataset gives the following values
df.describe()
```

#### The Magic Touch

In 2009 Derren Brown hosted a live show in tandem with the UK lottery draw, where he announced to viewers that he was going to try and predict the outcome of that night’s events. Brown correctly predicted 6 numbers. His explanation, which went down like a tonne of bricks, was that he quite simply asked 24 people to predict 6 numbers, then he added up the total for each one, divided it by 24 and voila, somehow that led him to predict the lottery.

The analysis of winning numbers of Irish lotto find the most frequent winning numbers.

```
# https://stackoverflow.com/questions/6987285/python-find-the-item-with-maximum-occurrences-in-a-list
# https://stackoverflow.com/questions/1518522/python-most-common-element-in-a-list
#Select the numbers only and convert to a list
df_ball = df.drop(['Date','Bonus'], axis=1)
```

```
list_ball = list(df_ball.values.T.flatten())
```

```
## The maximum occurance of number in the list of numbers
from collections import Counter
most_common,num_most_common = Counter(list_ball).most_common(1)[0]
```

```
print(most_common)
print(num_most_common)
```

```
#https://stackoverflow.com/questions/3594514/how-to-find-most-common-elements-of-a-list
# find the 6 most common numbers in the list
most_common_6 = Counter(list_ball).most_common(6)
most_common_6
```

```
number_counter = {}
for number in list_ball:
if number in number_counter:
number_counter[number] += 1
else:
number_counter[number] = 1
popular_numbers = sorted(number_counter, key = number_counter.get, reverse = True)
print(popular_numbers)
print(number_counter)
```

```
#for plot of the most popular numbers
import matplotlib.pyplot as plt
%matplotlib inline
#lists = sorted(number_counter.items()) # sorted by key, return a list of tuples
x, y = zip(*number_counter.items()) # unpack a list of pairs into two tuples
plt.plot(x, y)
plt.ylabel('times')
plt.xlabel("Numbers")
plt.title("Ocuurance of numbers")
plt.show()
```

```
#print the dictionary with sorted values
dictionary_numbers = Counter(list_ball)
dictionary_numbers_sorted_keys = sorted(dictionary_numbers, key=dictionary_numbers.get, reverse=True)
for r in dictionary_numbers_sorted_keys:
print ("{}\t: \t {} times".format(r, dictionary_numbers[r]))
```

#### Picking the most commonly drawn numbers

One approach would be to choose the numbers that come up most often. At the moment the most frequently drawn ball is the number 10 which has come 38 times. The other numbers are:

- 7: 35 times
- 27: 35 times
- 16: 34 times
- 2: 33 times
- 18: 32 times
- 42: 32 times
- 45: 32 times

Their frequency of appearance is no indication that they will be drawn together. In fact, the chance of these numbers cropping up in a winning combination is the same as any other set of six.

However, you can observe certain number appear more often than another number.

```
#counts of unique values for the bonus to find the most occuring numbers
df['Bonus'].value_counts()
```

#### Hence, 35 , 13, 9,31 are the most common winning bonus numbers.

### The lottery selects random numbers.

You can generate 10,000,000 numbers between ranges of 1 and 47 and calculate the mean.

```
import random
list_random = np.random.randint(1,47, size=10000000)
print(np.mean(list_random))
print(np.std(list_random))
```

#### Mean of winning numbers from last two years

```
#mean
np.mean(list_ball)
```

```
#standard deviation
np.std(list_ball)
```

### This shows that both randomly generated numbers and the winning numbers from last 2 years have an approximate mean of 23 and standard deviation of 13.

```
np.median(list_ball)
```

### Analysis of winning numbers of the Irish Lotto

Assuming normal probability distribution you can use the Z value to calculate the number that is one standard deviation from the means.

#### Investigate the validity by calculating the probability of the number X.

You can select the two most winning numbers: 10 and 42. i.e P(10 <= x => 42). This probability is the area between 10 and 42 under the normal table.

We can calculate the Z-score or standard score.

The basic z score formula for a sample is: z = (x – μ) / σ

```
z_10 = (10 - 23) / 14
z_42 = (42-23) / 14
print("10 is {} standard deviation below the mean m = 23".format(z_10))
print("42 is {} standard deviation above the mean m = 23".format(z_42))
```

The area between 10 and 42 under a normal curve having mean m = 23 and standard deviation d = 14 equal the area between -0.92 and 1.35 under the standard normal curve.This equals the area between –0.92 and 0 plus the area between 0 and 1.35.

Z is the standard normal random variable. The table value for Z is the value of the cumulative normal distribution at z.

The normal table tells us that the area between –0.92 and 0, which equals the area between –0.92 and 0 is 0.3212. The normal table also tells that the area between 0 and 1.35, which equals the area between 0 and –1.35 is 0.4115. Hence, the probability is 0.3212 + .4118 = .7327

## This probability states that 73.27 percent of all of the numbers are between 10 and 42.

Please leave any comment or queries.