This first blog post (MLOps#1) provides an overview of the food recipe recommender system. I am going to share blog posts as I go through a development process of a Machine Learning Project to production. I am aiming to use MLOps for the deployment of the application. MLOps is an engineering discipline that combines Machine Learning, DevOps and Data Engineering. The aim is to deploy and maintain ML systems in production reliably, efficiently in order to standardize and streamline the continuous delivery of high-performing models in production. The project develops a food recipe recommender system from scratch. I will try to explain concepts as simply as possible and use examples, references I find wherever possible.
Data for Food Recipe Recommender System
Data is sourced from Kaggle thanks to Li Yangshu, it consists of over 230k recipe data, over 220k users, and over 1 million recipe reviews. For this exercise, the future posts will include different models, deployment methods, and CI/CD pipeline automation. This automated CI/CD system will enable us to rapidly explore new ideas around feature engineering, model architecture, and hyperparameters. These ideas will be implemented and automatically build, test, and deploy the new pipeline components to the target environment. The project will not only deploy a model as an API for prediction but an ML pipeline deployed that can automate the retraining and deployment of new models. Setting up a CI/CD system enables us to automatically test and deploy new pipeline implementations. This system lets you cope with rapid changes in your data and business environment.
The pipeline consists of the following stages:
- Development and experimentation: It includes trying out new ML algorithms and new modeling iteratively. The output of this stage is the source code of the ML pipeline steps that are then pushed to a source repository.
- Pipeline continuous integration: Source code and various tests are the pipeline components (packages, executables, and artifacts) to be deployed in a later stage.
- Pipeline continuous delivery: The artifacts produced by the CI stage are deployed to the target environment with the new implementation of the model.
- Automated triggering: A trained model is pushed to the model registry automatically executed in production based on a schedule or in response to a trigger.
- Model continuous delivery: The trained model is deployed as a prediction service for the predictions.
- Monitoring: Collect statistics on the model performance based on live data.
In the next post, I discuss how to save a trained machine learning model.