# Time Series Data

Time Series is a big component of our everyday lives. They are in fact used in medicine (EEG analysis), finance (Stock Prices) and electronics (Sensor Data Analysis). Many Machine Learning models have been created in order to tackle these types of tasks, two examples are ARIMA (AutoRegressive Integrated Moving Average) models and RNNs (Recurrent Neural Networks).

# Introduction

I have been recently working on a Stock Market Dataset on Kaggle. This dataset provides all US-based stocks daily price and volume data. If you want to find out more about it, all my code is freely available on my Kaggle and GitHub profiles.

# ARIMA (AutoRegressive Integrated Moving Average)

The acronym of ARIMA stands for :

• Integrated = differencing between raw observations (eg. subtracting observations at different time steps).
• Moving Average = the model takes advantage of the relationship between the residual error and the observations.
• d = the degree of differencing.
• q = the size of the moving average window.

# Analysis

In order to realise the following code exercise, I made use of the following libraries and dependencies.

`import numpy as np import pandas as pd import matplotlib.pyplot as pltfrom pandas.plotting import lag_plotfrom pandas import datetimefrom statsmodels.tsa.arima_model import ARIMAfrom sklearn.metrics import mean_squared_error`
`df = pd.read_csv("../input/Data/Stocks/msft.us.txt").fillna(0)df.head()`
`plt.figure(figsize=(10,10))lag_plot(df['Open'], lag=5)plt.title('Microsoft Autocorrelation plot')`
`train_data, test_data = df[0:int(len(df)*0.8)], df[int(len(df)*0.8):]plt.figure(figsize=(12,7))plt.title('Microsoft Prices')plt.xlabel('Dates')plt.ylabel('Prices')plt.plot(df['Open'], 'blue', label='Training Data')plt.plot(test_data['Open'], 'green', label='Testing Data')plt.xticks(np.arange(0,7982, 1300), df['Date'][0:7982:1300])plt.legend()`
`def smape_kun(y_true, y_pred):    return np.mean((np.abs(y_pred - y_true) * 200/ (np.abs(y_pred) +       np.abs(y_true))))`
`train_ar = train_data['Open'].valuestest_ar = test_data['Open'].valueshistory = [x for x in train_ar]print(type(history))predictions = list()for t in range(len(test_ar)):    model = ARIMA(history, order=(5,1,0))    model_fit = model.fit(disp=0)    output = model_fit.forecast()    yhat = output    predictions.append(yhat)    obs = test_ar[t]    history.append(obs)error = mean_squared_error(test_ar, predictions)print('Testing Mean Squared Error: %.3f' % error)error2 = smape_kun(test_ar, predictions)print('Symmetric mean absolute percentage error: %.3f' % error2)`
`Testing Mean Squared Error: 0.343Symmetric mean absolute percentage error: 40.776`
`plt.figure(figsize=(12,7))plt.plot(df['Open'], 'green', color='blue', label='Training Data')plt.plot(test_data.index, predictions, color='green', marker='o', linestyle='dashed',          label='Predicted Price')plt.plot(test_data.index, test_data['Open'], color='red', label='Actual Price')plt.title('Microsoft Prices Prediction')plt.xlabel('Dates')plt.ylabel('Prices')plt.xticks(np.arange(0,7982, 1300), df['Date'][0:7982:1300])plt.legend()`
`plt.figure(figsize=(12,7))plt.plot(test_data.index, predictions, color='green', marker='o', linestyle='dashed',label='Predicted Price')plt.plot(test_data.index, test_data['Open'], color='red', label='Actual Price')plt.legend()plt.title('Microsoft Prices Prediction')plt.xlabel('Dates')plt.ylabel('Prices')plt.xticks(np.arange(6386,7982, 300), df['Date'][6386:7982:300])plt.legend()`