This blog is an analysis of winning numbers of Irish Lotto from last two years.

The National Lottery brought new initiatives from Thursday, September 3, 2015, with adding two numbers to the draw meaning players choose from 47 numbers rather than 45 numbers. With this change, the odds of picking the six winning numbers went from just over eight million to one to 10.7 million to one.

The Irish Lotto takes place every Wednesday and Saturday night at the cost of €2 and offers excellent odds of winning any prize of just 1 in 29 and boasts minimum jackpots of €2 million. In addition to this, you can take the chance to win two more jackpots in the Plus 1 and Plus 2 games for just 50c more. You can simply choose 6 numbers from 1–47. You can also enter the Irish Lotto Plus 1 and Irish Lotto Plus 2 draws by ticking the Irish Lotto + box located directly under the main game slip. The same numbers you selected in the main game will be entered in the Plus 1 and Plus 2 draws.

The draw works by drawing six numbers and an additional bonus number are drawn from a drum containing numbered balls from 1–47. Jackpots in the main game, Plus 1 and Plus 2 draws, are won when a player matches all six main numbers.

More details can be found here. The data is being scrapped using Python and Beautiful soup from the same website.

If you like to get data from last 20 years to do an analysis of your own, please visit the GitHub page.

Scraping Data for analysis of winning numbers of Irish Lotto

In [1]:

#import the library used to query a website
import requests
#specify the url for 2017
lottery = "https://www.irishlottery.com/archive-2016"
#Query the website and return the html to the variable 'page'
page = requests.get(lottery)

In [2]:

page.status_code

Out[2]:

A status_code of 200 means that the page downloaded successfully.

In [3]:

#import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup
#Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(page.content, 'html.parser')

In [4]:

# Find the 6 lotto numbers available at the class = "Ball"
ball_class = soup.find_all('td', class_='ball')
#get the text using list comprehension
ball = [pt.get_text() for pt in ball_class]
ball
type(ball)

Out[4]:

list

In [5]:

#convert into a dataframe in acsending order
import numpy as np
import pandas as pd
df_ball = pd.DataFrame(np.array(ball).reshape(-1,6),columns = list("abcdef"))

In [6]:

# Find the bonus number available at the class = "bonus-ball"
ball_bonus_class = soup.find_all('td', class_='bonus-ball')
ball_bonus = [pt.get_text() for pt in ball_bonus_class]
type(ball_bonus)

Out[6]:

list

In [7]:

df_ball_bonus = pd.DataFrame(ball_bonus,columns = ['Bonus'])

In [8]:

#join the two dataframes
df_2017 = pd.concat([df_ball, df_ball_bonus], axis=1)

In [9]:

#find all the dates available in th
date = soup.find_all('th')
date1 = [pt.get_text() for pt in date]

In [10]:

#convert to the dtaframe
df_date = pd.DataFrame(date1,columns =['Date'])
#select the datframe after first two rows as they show result date and draw result
df_date = df_date[2:]

In [11]:

#reset the index from 0
df_date = df_date.reset_index()
del df_date['index']

In [12]:

df_date.dtypes

Out[12]:

Date    object
dtype: object

In [13]:

#convert the date into date time
df_date = df_date.apply(pd.to_datetime)

In [14]:

#https://stackoverflow.com/questions/42693410/how-to-convert-object-to-date-in-pandas
df_date.dtypes

Out[14]:

Date    datetime64[ns]
dtype: object

In [15]:

df1 = pd.concat([df_date, df_2017], axis=1)

In [16]:

#df1.to_csv('2016.csv')

Make a main file

In [17]:

from glob import glob

with open('main_csv.csv', 'a') as singleFile:
    for csv in glob('*.csv'):
        if csv == 'main_csv.csv':
            pass
        else:
            for line in open(csv, 'r'):
                singleFile.write(line)

Analysis of winning numbers of Irish Lotto from 2nd September 2015 to 9th September 2017

In [18]:

df = pd.read_csv("main.csv")

In [19]:

df.head(2)

Out[19]:

	Date	a	b	c	d	e	f	Bonus
0	09/09/2017	2	8	20	25	33	45	7
1	06/09/2017	1	4	5	6	18	47	35

In [20]:

df.tail(2)

Out[20]:

	Date	a	b	c	d	e	f	Bonus
210	05/09/2015	7	9	17	20	26	27	40
211	02/09/2015	7	20	22	33	35	42	41

In [21]:

#Taking mean of the dataset gives the following values
df.describe()

Out[21]:

	a	b	c	d	e	f	Bonus
count	212.000000	212.000000	212.000000	212.000000	212.000000	212.000000	212.000000
mean	6.268868	12.731132	20.363208	27.080189	34.051887	40.929245	24.108491
std	5.393376	6.873838	7.814506	7.642993	7.334405	5.778674	13.820533
min	1.000000	2.000000	5.000000	6.000000	15.000000	19.000000	1.000000
25%	2.000000	8.000000	15.000000	22.000000	29.000000	38.000000	12.750000
50%	5.000000	11.000000	19.000000	27.000000	35.000000	42.000000	23.000000
75%	8.250000	16.000000	25.250000	32.000000	40.000000	45.000000	36.250000
max	33.000000	36.000000	44.000000	45.000000	46.000000	47.000000	47.000000

The Magic Touch

In 2009 Derren Brown hosted a live show in tandem with the UK lottery draw, where he announced to viewers that he was going to try and predict the outcome of that night’s events. Brown correctly predicted 6 numbers. His explanation, which went down like a tonne of bricks, was that he quite simply asked 24 people to predict 6 numbers, then he added up the total for each one, divided it by 24 and voila, somehow that led him to predict the lottery.

The analysis of winning numbers of Irish lotto find the most frequent winning numbers.

In [22]:

# https://stackoverflow.com/questions/6987285/python-find-the-item-with-maximum-occurrences-in-a-list

# https://stackoverflow.com/questions/1518522/python-most-common-element-in-a-list

#Select the numbers only and convert to a list
df_ball = df.drop(['Date','Bonus'], axis=1)

In [23]:

list_ball = list(df_ball.values.T.flatten())

In [24]:

## The maximum occurance of number in the list of numbers
from collections import Counter
most_common,num_most_common = Counter(list_ball).most_common(1)[0]

In [25]:

print(most_common)
print(num_most_common)

10
38

In [26]:

#https://stackoverflow.com/questions/3594514/how-to-find-most-common-elements-of-a-list
# find the 6 most common numbers in the list
most_common_6 = Counter(list_ball).most_common(6)
most_common_6

Out[26]:

[(10, 38), (7, 35), (27, 35), (16, 34), (2, 33), (18, 32)]

In [27]:

number_counter = {}
for number in list_ball:
    if number in number_counter:
        number_counter[number] += 1
    else:
        number_counter[number] = 1

popular_numbers = sorted(number_counter, key = number_counter.get, reverse = True)

print(popular_numbers)
print(number_counter)

[10, 7, 27, 16, 2, 18, 42, 45, 6, 15, 1, 17, 40, 9, 22, 28, 29, 31, 34, 5, 25, 32, 43, 8, 20, 38, 47, 4, 19, 37, 41, 11, 12, 14, 39, 44, 46, 30, 36, 3, 24, 26, 21, 23, 13, 33, 35]
{1: 30, 2: 33, 3: 22, 4: 26, 5: 28, 6: 31, 7: 35, 8: 27, 9: 29, 10: 38, 11: 25, 12: 25, 13: 18, 14: 25, 15: 31, 16: 34, 17: 30, 18: 32, 19: 26, 20: 27, 21: 20, 22: 29, 23: 20, 24: 21, 25: 28, 26: 21, 27: 35, 28: 29, 29: 29, 30: 23, 31: 29, 32: 28, 33: 18, 34: 29, 35: 17, 36: 23, 37: 26, 38: 27, 39: 25, 40: 30, 41: 26, 42: 32, 43: 28, 44: 24, 45: 32, 46: 24, 47: 27}

In [28]:

#for plot of the most popular numbers
import matplotlib.pyplot as plt
%matplotlib inline

#lists = sorted(number_counter.items()) # sorted by key, return a list of tuples
x, y = zip(*number_counter.items()) # unpack a list of pairs into two tuples

plt.plot(x, y)
plt.ylabel('times')
plt.xlabel("Numbers")
plt.title("Ocuurance of numbers")

plt.show()

In [30]:

#print the dictionary with sorted values
dictionary_numbers = Counter(list_ball)
dictionary_numbers_sorted_keys = sorted(dictionary_numbers, key=dictionary_numbers.get, reverse=True)
for r in dictionary_numbers_sorted_keys:
    print ("{}t: t {} times".format(r, dictionary_numbers[r]))

10	: 	 38 times
7	: 	 35 times
27	: 	 35 times
16	: 	 34 times
2	: 	 33 times
18	: 	 32 times
42	: 	 32 times
45	: 	 32 times
6	: 	 31 times
15	: 	 31 times
1	: 	 30 times
17	: 	 30 times
40	: 	 30 times
9	: 	 29 times
22	: 	 29 times
28	: 	 29 times
29	: 	 29 times
31	: 	 29 times
34	: 	 29 times
5	: 	 28 times
25	: 	 28 times
32	: 	 28 times
43	: 	 28 times
8	: 	 27 times
20	: 	 27 times
38	: 	 27 times
47	: 	 27 times
4	: 	 26 times
19	: 	 26 times
37	: 	 26 times
41	: 	 26 times
11	: 	 25 times
12	: 	 25 times
14	: 	 25 times
39	: 	 25 times
44	: 	 24 times
46	: 	 24 times
30	: 	 23 times
36	: 	 23 times
3	: 	 22 times
24	: 	 21 times
26	: 	 21 times
21	: 	 20 times
23	: 	 20 times
13	: 	 18 times
33	: 	 18 times
35	: 	 17 times

Picking the most commonly drawn numbers

One approach would be to choose the numbers that come up most often. At the moment the most frequently drawn ball is the number 10 which has come 38 times. The other numbers are:

7: 35 times
27: 35 times
16: 34 times
2: 33 times
18: 32 times
42: 32 times
45: 32 times

Their frequency of appearance is no indication that they will be drawn together. In fact, the chance of these numbers cropping up in a winning combination is the same as any other set of six.

However, you can observe certain number appear more often than another number.

In [31]:

#counts of unique values for the bonus to find the most occuring numbers
df['Bonus'].value_counts()

Out[31]:

35    10
13     8
9      8
31     8
3      7
5      7
16     6
17     6
19     6
7      6
29     6
47     6
37     6
39     6
44     6
40     6
41     5
6      5
28     5
34     5
22     5
20     5
4      4
14     4
45     4
43     4
18     4
46     4
21     4
27     4
10     4
38     4
30     4
8      3
23     3
12     3
15     3
32     3
1      3
24     2
25     2
33     2
42     2
2      2
11     1
36     1
Name: Bonus, dtype: int64

Hence, 35 , 13, 9,31 are the most common winning bonus numbers.

The lottery selects random numbers.

You can generate 10,000,000 numbers between ranges of 1 and 47 and calculate the mean.

In [32]:

import random
list_random = np.random.randint(1,47, size=10000000)
print(np.mean(list_random))
print(np.std(list_random))

23.5031823
13.2781247574

Mean of winning numbers from last two years

In [33]:

#mean
np.mean(list_ball)

Out[33]:

23.570754716981131

In [34]:

#standard deviation
np.std(list_ball)

Out[34]:

13.737419404066852

This shows that both randomly generated numbers and the winning numbers from last 2 years have an approximate mean of 23 and standard deviation of 13.

In [35]:

np.median(list_ball)

Out[35]:

23.0

Analysis of winning numbers of the Irish Lotto

Assuming normal probability distribution you can use the Z value to calculate the number that is one standard deviation from the means.

The calculation of the population of all lotto numbers is normally distributed with a mean equal to 23 and a standard deviation equal to 14.

Investigate the validity by calculating the probability of the number X.

You can select the two most winning numbers: 10 and 42. i.e P(10 <= x => 42). This probability is the area between 10 and 42 under the normal table.

We can calculate the Z-score or standard score.

The basic z score formula for a sample is: z = (x – μ) / σ

In [36]:

z_10 = (10 - 23) / 14
z_42 = (42-23) / 14
print("10 is  {} standard deviation below the mean m = 23".format(z_10))
print("42 is {} standard deviation above the mean m = 23".format(z_42))

10 is  -0.9285714285714286 standard deviation below the mean m = 23
42 is 1.3571428571428572 standard deviation above the mean m = 23

The area between 10 and 42 under a normal curve having mean m = 23 and standard deviation d = 14 equal the area between -0.92 and 1.35 under the standard normal curve.This equals the area between –0.92 and 0 plus the area between 0 and 1.35.

Z is the standard normal random variable. The table value for Z is the value of the cumulative normal distribution at z.

The normal table tells us that the area between –0.92 and 0, which equals the area between –0.92 and 0 is 0.3212. The normal table also tells that the area between 0 and 1.35, which equals the area between 0 and –1.35 is 0.4115. Hence, the probability is 0.3212 + .4118 = .7327

This probability states that 73.27 percent of all of the numbers are between 10 and 42.

Please leave any comment or queries.

Tagged Data Analysis, Education, Ireland

9 Responses

Viet Nguyen says:

December 30, 2017 at 5:37 am

Hello Admin,
I have a problem with my code
It not similar to you
In my data, i hava 1 unexpection column that is ‘Unname’ in df_2017 such as: ,Date, a,b,c,d,e,f, Bonus.
In font of ,Date i have a column Unname, When i load df.head(2).
For example:
Unname Date a b c d e f Bonus
210 210 05/09/2015 7 9 17 20 26 27 40
211 211 02/09/2015 7 20 22 33 35 42 41
What can i do drop uname?
I am looking foward to you repone
Thanks you so much
Best regards.
1. piush vaish says:
  
  January 5, 2018 at 7:16 pm
  
  del df_2017.Unname
  1. Viet Nguyen says:
    
    January 10, 2018 at 3:15 pm
    
    Can you help me?
    I want to predict next lotto number with python,
    Can You show me your code?
    Thanks you very much
    1. Master Wenom says:
      
      January 9, 2020 at 12:49 am
      
      Hello, I would like to receive the same information!
Viet Nguyen says:

January 7, 2018 at 9:56 am

Thanks for your respone
Master Wenom says:

February 10, 2020 at 11:04 pm

Hello! Do you have forecasting for the next run based on these statistics?
Killian says:

November 7, 2020 at 5:07 pm

Hi. I am trying to run this script but get_text does not get any information for class_=’ball’ would you happen to know why? It works for the date and interestingly it works if i get_text() just for “td”
In
# Find the 6 lotto numbers available at the class = “Ball”
ball_class = soup.find_all(“td”, class_=”ball”)
#get the text using list comprehension
ball = [pt.get_text() for pt in ball_class]
ball
print(ball)
type(ball)
out
[]
list
Thanks in advance
Rob says:

February 16, 2021 at 7:11 am

Not so great article, difficult to follow it.
Sam Korichi says:

July 22, 2021 at 8:49 pm

Hello,
I have a csv file of 3 years lottery results
How can I use your Web Service to predict a future results ?

Thans’s in advance
Kind regards