Analysis of winning numbers of Irish Lotto

Analysis of winning numbers of Irish Lotto

Posted on Posted in Data Analysis Resources, Experience

This blog is an analysis of winning numbers of Irish Lotto from last two years.

The National Lottery brought new initiatives from Thursday, September 3, 2015, with adding two numbers to the draw meaning players choose from 47 numbers rather than 45 numbers. With this change, the odds of picking the six winning numbers went from just over eight million to one to 10.7 million to one.

The Irish Lotto takes place every Wednesday and Saturday night at the cost of €2 and offers excellent odds of winning any prize of just 1 in 29 and boasts minimum jackpots of €2 million. In addition to this, you can take the chance to win two more jackpots in the Plus 1 and Plus 2 games for just 50c more. You can simply choose 6 numbers from 1–47. You can also enter the Irish Lotto Plus 1 and Irish Lotto Plus 2 draws by ticking the Irish Lotto + box located directly under the main game slip. The same numbers you selected in the main game will be entered in the Plus 1 and Plus 2 draws.

The draw works by drawing six numbers and an additional bonus number are drawn from a drum containing numbered balls from 1–47. Jackpots in the main game, Plus 1 and Plus 2 draws, are won when a player matches all six main numbers.

More details can be found here. The data is being scrapped using Python and Beautiful soup from the same website.

If you like to get data from last 20 years to do an analysis of your own, please visit the GitHub page.

Scraping Data for analysis of winning numbers of Irish Lotto

In [1]:
#import the library used to query a website
import requests
#specify the url for 2017
lottery = "https://www.irishlottery.com/archive-2016"
#Query the website and return the html to the variable 'page'
page = requests.get(lottery)
In [2]:
page.status_code
Out[2]:
200

A status_code of 200 means that the page downloaded successfully.

In [3]:
#import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup
#Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(page.content, 'html.parser')
In [4]:
# Find the 6 lotto numbers available at the class = "Ball"
ball_class = soup.find_all('td', class_='ball')
#get the text using list comprehension
ball = [pt.get_text() for pt in ball_class]
ball
type(ball)
Out[4]:
list
In [5]:
#convert into a dataframe in acsending order
import numpy as np
import pandas as pd
df_ball = pd.DataFrame(np.array(ball).reshape(-1,6),columns = list("abcdef"))
In [6]:
# Find the bonus number available at the class = "bonus-ball"
ball_bonus_class = soup.find_all('td', class_='bonus-ball')
ball_bonus = [pt.get_text() for pt in ball_bonus_class]
type(ball_bonus)
Out[6]:
list
In [7]:
df_ball_bonus = pd.DataFrame(ball_bonus,columns = ['Bonus'])
In [8]:
#join the two dataframes
df_2017 = pd.concat([df_ball, df_ball_bonus], axis=1)
In [9]:
#find all the dates available in th
date = soup.find_all('th')
date1 = [pt.get_text() for pt in date]
In [10]:
#convert to the dtaframe
df_date = pd.DataFrame(date1,columns =['Date'])
#select the datframe after first two rows as they show result date and draw result
df_date = df_date[2:]
In [11]:
#reset the index from 0
df_date = df_date.reset_index()
del df_date['index']
In [12]:
df_date.dtypes
Out[12]:
Date    object
dtype: object
In [13]:
#convert the date into date time
df_date = df_date.apply(pd.to_datetime)
In [14]:
#https://stackoverflow.com/questions/42693410/how-to-convert-object-to-date-in-pandas
df_date.dtypes
Out[14]:
Date    datetime64[ns]
dtype: object
In [15]:
df1 = pd.concat([df_date, df_2017], axis=1)
In [16]:
#df1.to_csv('2016.csv')

Make a main file

In [17]:
from glob import glob

with open('main_csv.csv', 'a') as singleFile:
    for csv in glob('*.csv'):
        if csv == 'main_csv.csv':
            pass
        else:
            for line in open(csv, 'r'):
                singleFile.write(line)

Analysis of winning numbers of Irish Lotto from 2nd September 2015 to 9th September 2017

In [18]:
df = pd.read_csv("main.csv")
In [19]:
df.head(2)
Out[19]:
Date a b c d e f Bonus
0 09/09/2017 2 8 20 25 33 45 7
1 06/09/2017 1 4 5 6 18 47 35
In [20]:
df.tail(2)
Out[20]:
Date a b c d e f Bonus
210 05/09/2015 7 9 17 20 26 27 40
211 02/09/2015 7 20 22 33 35 42 41
In [21]:
#Taking mean of the dataset gives the following values
df.describe()
Out[21]:
a b c d e f Bonus
count 212.000000 212.000000 212.000000 212.000000 212.000000 212.000000 212.000000
mean 6.268868 12.731132 20.363208 27.080189 34.051887 40.929245 24.108491
std 5.393376 6.873838 7.814506 7.642993 7.334405 5.778674 13.820533
min 1.000000 2.000000 5.000000 6.000000 15.000000 19.000000 1.000000
25% 2.000000 8.000000 15.000000 22.000000 29.000000 38.000000 12.750000
50% 5.000000 11.000000 19.000000 27.000000 35.000000 42.000000 23.000000
75% 8.250000 16.000000 25.250000 32.000000 40.000000 45.000000 36.250000
max 33.000000 36.000000 44.000000 45.000000 46.000000 47.000000 47.000000

The Magic Touch

In 2009 Derren Brown hosted a live show in tandem with the UK lottery draw, where he announced to viewers that he was going to try and predict the outcome of that night’s events. Brown correctly predicted 6 numbers. His explanation, which went down like a tonne of bricks, was that he quite simply asked 24 people to predict 6 numbers, then he added up the total for each one, divided it by 24 and voila, somehow that led him to predict the lottery.

The analysis of winning numbers of Irish lotto find the most frequent winning numbers.

In [22]:
# https://stackoverflow.com/questions/6987285/python-find-the-item-with-maximum-occurrences-in-a-list

# https://stackoverflow.com/questions/1518522/python-most-common-element-in-a-list

#Select the numbers only and convert to a list
df_ball = df.drop(['Date','Bonus'], axis=1)
In [23]:
list_ball = list(df_ball.values.T.flatten())
In [24]:
## The maximum occurance of number in the list of numbers
from collections import Counter
most_common,num_most_common = Counter(list_ball).most_common(1)[0]
In [25]:
print(most_common)
print(num_most_common)
10
38
In [26]:
#https://stackoverflow.com/questions/3594514/how-to-find-most-common-elements-of-a-list
# find the 6 most common numbers in the list
most_common_6 = Counter(list_ball).most_common(6)
most_common_6
Out[26]:
[(10, 38), (7, 35), (27, 35), (16, 34), (2, 33), (18, 32)]
In [27]:
number_counter = {}
for number in list_ball:
    if number in number_counter:
        number_counter[number] += 1
    else:
        number_counter[number] = 1
 
popular_numbers = sorted(number_counter, key = number_counter.get, reverse = True)
 
print(popular_numbers)
print(number_counter)
[10, 7, 27, 16, 2, 18, 42, 45, 6, 15, 1, 17, 40, 9, 22, 28, 29, 31, 34, 5, 25, 32, 43, 8, 20, 38, 47, 4, 19, 37, 41, 11, 12, 14, 39, 44, 46, 30, 36, 3, 24, 26, 21, 23, 13, 33, 35]
{1: 30, 2: 33, 3: 22, 4: 26, 5: 28, 6: 31, 7: 35, 8: 27, 9: 29, 10: 38, 11: 25, 12: 25, 13: 18, 14: 25, 15: 31, 16: 34, 17: 30, 18: 32, 19: 26, 20: 27, 21: 20, 22: 29, 23: 20, 24: 21, 25: 28, 26: 21, 27: 35, 28: 29, 29: 29, 30: 23, 31: 29, 32: 28, 33: 18, 34: 29, 35: 17, 36: 23, 37: 26, 38: 27, 39: 25, 40: 30, 41: 26, 42: 32, 43: 28, 44: 24, 45: 32, 46: 24, 47: 27}
In [28]:
#for plot of the most popular numbers
import matplotlib.pyplot as plt
%matplotlib inline

#lists = sorted(number_counter.items()) # sorted by key, return a list of tuples
x, y = zip(*number_counter.items()) # unpack a list of pairs into two tuples

plt.plot(x, y)
plt.ylabel('times')
plt.xlabel("Numbers")
plt.title("Ocuurance of numbers")

plt.show()
In [30]:
#print the dictionary with sorted values
dictionary_numbers = Counter(list_ball)
dictionary_numbers_sorted_keys = sorted(dictionary_numbers, key=dictionary_numbers.get, reverse=True)
for r in dictionary_numbers_sorted_keys:
    print ("{}\t: \t {} times".format(r, dictionary_numbers[r]))
10	: 	 38 times
7	: 	 35 times
27	: 	 35 times
16	: 	 34 times
2	: 	 33 times
18	: 	 32 times
42	: 	 32 times
45	: 	 32 times
6	: 	 31 times
15	: 	 31 times
1	: 	 30 times
17	: 	 30 times
40	: 	 30 times
9	: 	 29 times
22	: 	 29 times
28	: 	 29 times
29	: 	 29 times
31	: 	 29 times
34	: 	 29 times
5	: 	 28 times
25	: 	 28 times
32	: 	 28 times
43	: 	 28 times
8	: 	 27 times
20	: 	 27 times
38	: 	 27 times
47	: 	 27 times
4	: 	 26 times
19	: 	 26 times
37	: 	 26 times
41	: 	 26 times
11	: 	 25 times
12	: 	 25 times
14	: 	 25 times
39	: 	 25 times
44	: 	 24 times
46	: 	 24 times
30	: 	 23 times
36	: 	 23 times
3	: 	 22 times
24	: 	 21 times
26	: 	 21 times
21	: 	 20 times
23	: 	 20 times
13	: 	 18 times
33	: 	 18 times
35	: 	 17 times

Picking the most commonly drawn numbers

One approach would be to choose the numbers that come up most often. At the moment the most frequently drawn ball is the number 10 which has come 38 times. The other numbers are:

  • 7: 35 times
  • 27: 35 times
  • 16: 34 times
  • 2: 33 times
  • 18: 32 times
  • 42: 32 times
  • 45: 32 times

Their frequency of appearance is no indication that they will be drawn together. In fact, the chance of these numbers cropping up in a winning combination is the same as any other set of six.

However, you can observe certain number appear more often than another number.

In [31]:
#counts of unique values for the bonus to find the most occuring numbers
df['Bonus'].value_counts()
Out[31]:
35    10
13     8
9      8
31     8
3      7
5      7
16     6
17     6
19     6
7      6
29     6
47     6
37     6
39     6
44     6
40     6
41     5
6      5
28     5
34     5
22     5
20     5
4      4
14     4
45     4
43     4
18     4
46     4
21     4
27     4
10     4
38     4
30     4
8      3
23     3
12     3
15     3
32     3
1      3
24     2
25     2
33     2
42     2
2      2
11     1
36     1
Name: Bonus, dtype: int64

Hence, 35 , 13, 9,31 are the most common winning bonus numbers.

The lottery selects random numbers.

You can generate 10,000,000 numbers between ranges of 1 and 47 and calculate the mean.

In [32]:
import random
list_random = np.random.randint(1,47, size=10000000)
print(np.mean(list_random))
print(np.std(list_random))
23.5031823
13.2781247574

Mean of winning numbers from last two years

In [33]:
#mean
np.mean(list_ball)
Out[33]:
23.570754716981131
In [34]:
#standard deviation
np.std(list_ball)
Out[34]:
13.737419404066852

This shows that both randomly generated numbers and the winning numbers from last 2 years have an approximate mean of 23 and standard deviation of 13.

In [35]:
np.median(list_ball)
Out[35]:
23.0

Analysis of winning numbers of the Irish Lotto

Assuming normal probability distribution you can use the Z value to calculate the number that is one standard deviation from the means.

 The calculation of the population of all lotto numbers is normally distributed with a mean equal to 23 and a standard deviation equal to 14.

Investigate the validity by calculating the probability of the number X.

You can select the two most winning numbers: 10 and 42. i.e P(10 <= x => 42). This probability is the area between 10 and 42 under the normal table.

We can calculate the Z-score or standard score.

The basic z score formula for a sample is: z = (x – μ) / σ

In [36]:
z_10 = (10 - 23) / 14
z_42 = (42-23) / 14
print("10 is  {} standard deviation below the mean m = 23".format(z_10))
print("42 is {} standard deviation above the mean m = 23".format(z_42))
10 is  -0.9285714285714286 standard deviation below the mean m = 23
42 is 1.3571428571428572 standard deviation above the mean m = 23

The area between 10 and 42 under a normal curve having mean m = 23 and standard deviation d = 14 equal the area between -0.92 and 1.35 under the standard normal curve.This equals the area between –0.92 and 0 plus the area between 0 and 1.35.

Z is the standard normal random variable. The table value for Z is the value of the cumulative normal distribution at z.

The normal table tells us that the area between –0.92 and 0, which equals the area between –0.92 and 0 is 0.3212. The normal table also tells that the area between 0 and 1.35, which equals the area between 0 and –1.35 is 0.4115. Hence, the probability is 0.3212 + .4118 = .7327

This probability states that 73.27 percent of all of the numbers are between 10 and 42.

Please leave any comment or queries.

Leave a Reply

Your email address will not be published. Required fields are marked *