Deploy Serverless Machine Learning Model to AWS

In this post, we will deploy a serverless machine learning model to AWS using Serverless. The set-up of Serverless is discussed here.

Let’s create a directory

mkdir scikit-regression && cd scikit-regression

Windows 10, we can create a virtual environment and activate it as follows:

Install Virtualenv

In your VS Code command shell prompt type

pip install virtualenv

Start virtualenv

virtualenv env

Activate virtualenv

On Windows, virtualenv (venv) creates a batch file called

envScriptsactivate.bat

To activate virtualenv on Windows, and activate the script is in the Scripts folder :

pathtoenvScriptsactivate

Example:

C:Users'Username'envScriptsactivate.bat

Create a requirements.txt file

Add the scikit-learn version

scikit-learn==0.22.0

Run in your command prompt

pip install -r requirements.txt

For more information on how to set up a virtual environment, please visit here.

Train and Save Model

Linear Regression on Boston Housing Dataset

This data was originally a part of UCI Machine Learning Repository and has been removed now. This data also ships with the scikit-learn library. There are 506 samples and 13 feature variables in this data-set. The objective is to predict the value of prices of the house using the given features.

The description of all the features is given below:

CRIM: Per capita crime rate by town

ZN: Proportion of residential land zoned for lots over 25,000 sq. ft

INDUS: Proportion of non-retail business acres per town

CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

NOX: Nitric oxide concentration (parts per 10 million)

RM: Average number of rooms per dwelling

AGE: Proportion of owner-occupied units built prior to 1940

DIS: Weighted distances to five Boston employment centers

RAD: Index of accessibility to radial highways

TAX: Full-value property tax rate per $10,000

B: 1000(Bk – 0.63)², where Bk is the proportion of [people of African American descent] by town

LSTAT: Percentage of lower status of the population

MEDV: Median value of owner-occupied homes in $1000s

We are going to use three variables: ‘LSTAT’, ‘AGE’ and ‘RM’ as features and MEDV as traget variable.

# To add a new cell, type '# %%'
# To add a new markdown cell, type '# %% [markdown]'
# %%

from sklearn.datasets import load_boston
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import joblib
import time
import numpy as np

RANDOM_STATE = 42


# %%
from sklearn.datasets import load_boston
boston_dataset = load_boston()


# %%
print(boston_dataset.DESCR)


# %%
boston = pd.DataFrame(boston_dataset.data, columns=boston_dataset.feature_names)
boston.head()


# %%
boston.info()


# %%
boston.describe()


# %%
# Prepare the data for training
X = pd.DataFrame(np.c_[boston['LSTAT'], boston['AGE'], boston['RM']], columns = ['LSTAT','AGE','RM'])
_, y = load_boston(return_X_y=True)

# %% [markdown]
# Splitting the data into training and testing sets

# %%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=RANDOM_STATE)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)


# %%
def create_model():
    model = Pipeline([
        ('scaler', StandardScaler()),
        ('selector', SelectKBest(score_func=f_regression, k='all')),
         ('lr', LinearRegression())
    ])
    return model


# %%
model = create_model()


# %%
model.fit(X_train, y_train)


# %%
y_pred_test = model.predict(X_test)
print(mean_squared_error(y_test, y_pred_test))


# %%
model = create_model()


# %%
model.fit(X_train, y_train)


# %%
model_id = str(time.time())
model_name = 'model_' + model_id + '.joblib'
joblib.dump(model, model_name, compress=False)
print(model_name, 'saved.')

Create Serverless Project and Handler Prototype

sls create --template aws-python3 --name boston-housing

Install a plugin for Python requirements

This will automatically add the plugin to your project’s package.json and the plugins section of its serverless.yml. The plugin will now bundle your python dependencies specified in your requirements.txt or Pipfile when you run sls deploy. We also install a particular version of the plugin.

Link for more info

sls plugin install -n serverless-python-requirements@4.2.4

handler.py

import json
import joblib

model_name = 'model_1616253959.4820366.joblib'
model = joblib.load(model_name)

def predict(event, context):
    body = {
        "message": "OK",
    }

    if 'queryStringParameters' in event.keys():
        params = event['queryStringParameters']

        LSTAT = (params['LSTAT'])
        AGE = (params['AGE'])
        RM = (params['RM'])

        inputVector = [LSTAT, AGE, RM]
        data = [inputVector]
        predictedPrice = model.predict(data)[0] # convert to units of 1 USDs
        predictedPrice = round(predictedPrice, 2)
        body['predictedPrice'] = predictedPrice

    else:
        body['message'] = 'queryStringParameters not in event.'

    print(body['message'])

    response = {
        "statusCode": 200,
        "body": json.dumps(body),
        "headers": {
            "Content-Type": 'application/json',
            "Access-Control-Allow-Origin": "*"
        }
    }

    return response


# to test locally
def do_main():
    event = {
        'queryStringParameters': {
            'LSTAT': 7.14,
            'AGE': 28.14,
            'RM': 6.62
        }
    }

    response = predict(event, None)
    body = json.loads(response['body'])
    print('Price:', body['predictedPrice'])

    with open('event.json', 'w') as event_file:
        event_file.write(json.dumps(event))


# do_main()

Test function locally using Serverless

# invoke lambda function locally
# change serverless.yml file
service: boston-housing # NOTE: update this with your service name
functions:
  predict-price:
    handler: handler.predict
    memorySize: 512
    timeout: 30
    events:
      - http:
          path: get-price
          method: get
          request:
            parameters:
              queryStrings:
                LSTAT: true
                AGE: true
                RM: true

# event.json file we created earlier using the local test
sls invoke local --function predict-price --path event.json

You can also test the function by invoking globally or using unit tests.

Deploy Model to AWS

serverless.yml



service: boston-housing # NOTE: update this with your service name


provider:
  name: aws
  runtime: python3.8
  lambdaHashingVersion: 20201221

  stage: dev
  region: us-east-1

# you can add packaging information here
package:
#  include:
#    - include-me.py
#    - include-me-dir/**
  exclude:
    - node_modules/**
    - .vscode/**
    - __pycache__/**
    - .ipynb_checkpoints/**
    - (*).ipynb
    - env/**


functions:
  predict-price:
    handler: handler.predict
    memorySize: 512
    timeout: 30
    events:
      - http:
          path: get-price
          method: get
          request:
            parameters:
              queryStrings:
                LSTAT: true
                AGE: true
                RM: true

plugins:
  - serverless-python-requirements
custom:
  pythonRequirements:
    dockerizePip: non-linux
    slim: true

Deploy using the following command

set AWS_ACCESS_KEY_ID=<your-key-here>
set AWS_SECRET_ACCESS_KEY=<your-secret-key-here>
# AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are now available for serverless to use
sls deploy

To learn how to create AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY please visit here.

Invoke function globally

sls invoke --function predict-price --path event.json

To debug

set SLS_DEBUG=true
# or logs using
sls logs --function predict-price

References

https://www.udemy.com/course/deploy-serverless-machine-learning-models-to-aws-lambda/
https://github.com/serverless/examples

Tagged Automation, AWS, Serverless