Machine Learning in Production / AI Engineering (17-445/17-645/17-745/11-695)

*Formerly Software Engineering for AI-Enabled Systems (SE4AI), CMU course that covers how to build, deploy, assure, and maintain applications with machine-learned models. Covers responsible AI (safety, security, fairness, explainability, transparency) and MLOps.*

Course topics overview

In 2022, the class will be offered both in the Spring and the Fall semester. In 2023, it will be offered only in the spring. The class does not have formal prerequisites, but expects basic programming skills and some familiarity with machine learning concepts.

See the specific offering of the course you are interested in:

Fall 2019: F2019 website and F2019 GitHub branch.
Summer 2020 (with video recordings): S2020 website and S2020 GitHub branch
Fall 2020: F2020 website and F2020 GitHub branch
Spring 2021: S2021 website and S2021 GitHub branch
Spring 2022: S2022 website and S2022 GitHub branch
Fall 2022: F2022 website and F2022 GitHub branch
Spring 2023: S2023 website and S2023 on GitHub

For researchers, educators, or others interested in this topic, we share all course material, including slides and assignments, under a creative commons license on GitHub (https://github.com/ckaestne/seai/) and have recently completed a textbook complementing the course. We also published an article describing the rationale and the design of the first iteration of the course: Teaching Software Engineering for AI-Enabled Systems. Video recordings of the Summer 2020 offering, now slightly dated, are online on the course page. We would be happy to see this course or a similar version taught at other universities. See also an annotated bibliography on the topic.

Course Description

This is a course for those who want to build applications and products with machine learning. Assuming we can learn a model to make predictions, what does it take to turn the model into a product and actually deploy it, build a business, and successfully operate and maintain it?

The course is designed to establish a working relationship between software engineers and data scientists: both contribute to building production ML systems but have different expertise and focuses. To work together they need a mutual understanding of their roles, tasks, concerns, and goals and build a working relationship. This course is aimed at software engineers who want to build robust and responsible systems meeting the specific challenges of working with ML components and at data scientists who want to facilitate getting a prototype model into production; it facilitates communication and collaboration between both roles. The course focuses on all the steps needed to turn a model into a production system.

It covers topics such as:

How to design for wrong predictions the model may make? How to assure safety and security despite possible mistakes? How to design the user interface and the entire system to operate in the real world?
How to reliably deploy and update models in production? How can we test the entire machine learning pipeline? How can MLOps tools help to automate and scale the deployment process? How can we experiment in production (A/B testing, canary releases)? How do we detect data quality issues, concept drift, and feedback loops in production?
How do we scale production ML systems? How do we design a system to process huge amounts of training data, telemetry data, and user requests? Should we use stream processing, batch processing, lambda architecture, or data lakes?
How to we test and debug production ML systems? How can we evaluate the quality of a model’s predictions in production? How can we test the entire AI-enabled system, not just the model? What lessons can we learn from software testing, automated test case generation, simulation, and continuous integration for testing for production machine learning?
Which qualities matter beyond a model’s prediction accuracy? How can we identify and measure important quality requirements, including learning and inference latency, operating cost, scalability, explainablity, fairness, privacy, robustness, and safety? Does the application need to be able to operate offline and how often do we need to update the models? How do we identify what’s important in a AI-enabled product in a production setting for a business? How do we resolve conflicts and tradeoffs?
What does it take to build responsible products? How to think about fairness of a production system at the model and system level? How to mitigate safety and security concerns? How can we communicate the reasons of an automated decision or explain uncertainty to users?
How do we build effective interdisciplinary teams? How can we bring data scientists, software engineers, UI designers, managers, domain experts, big data specialists, operators, legal council, and other roles together and develop a shared understanding and team culture?

Examples of ML-driven products we discuss include automated audio transcription; distributed detection of missing children on webcams and instant translation in augmented reality; cancer detection, fall detection, COVID diagnosis, and other smart medical and health services; automated slide layout in Powerpoint; semi-automated college admissions; inventory management; smart playlists and movie recommendations; ad fraud detection; delivery robots and smart driving features; and many others.

An extended group project focuses on building, deploying, evaluating, and maintaining a robust and scalable movie recommendation service under realistic “production” conditions.

Learning Outcomes

After taking this course, among others, students should be able to

analyze tradeoffs for designing production systems with AI-components, analyzing various qualities beyond accuracy such as operation cost, latency, updateability, and explainability
implement production-quality systems that are robust to mistakes of AI components
design fault-tolerant and scalable data infrastructure for learning models, serving models, versioning, and experimentation
ensure quality of the entire machine learning pipeline with test automation and other quality assurance techniques, including automated checks for data quality, data drift, feedback loops, and model quality
build systems that can be tested in production and build deployment pipelines that allow careful rollouts and canary testing
consider privacy, fairness, and security when building complex AI-enabled systems
communicate effectively in teams with both software engineers and data analysts

In addition, students will gain familiarity with production-quality infrastructure tools, including stream processing with Apache Kafka, distributed data storage with SQL and NoSQL databases, deployment with Docker and Kubernetes, and test automation with Travis or Jenkins.

Design Rationale

Data scientists often make great progress at building models with cutting edge techniques but turning those models into products is challenging. For example, data scientists may work with unversioned notebooks on static data sets and focus on prediction accuracy while ignoring scalability, robustness, update latency, or operating cost.
Software engineers are trained with clear specifications and tend to focus on code, but may not be aware of the difficulties of working with data and unreliable models. They have a large toolset for decision making and quality assurance but it is not obvious how to apply those to AI-enabled systems and their challenges.
To what degree can existing SE practices be used for building intelligent systems? To what degree are new practices needed?
This course adopts a software engineering and operator perspective on building intelligent systems, focusing on how to turn a machine learning idea into a scalable and reliable product. Rather than focusing on modeling and learning itself, it assumes a working relationship with a data scientist and focuses on issues of design, implementation, operation, and assurance and how those interact with the data scientist's modeling.
The course will use software and systems engineering terminology and techniques (e.g., test coverage, architecture views, fault trees) and make explicit transfers to challenges posed by using machine learning/AI components. The course will not teach fundamentals of machine learning or AI, but will assume a basic understanding of relevant concepts (e.g., feature engineering, linear regression vs fault trees vs neural networks). It will heavily train design thinking and tradeoff analysis. It will focus primarily on practical approaches that can be used now and will feature hands-on practice with modern tools and infrastructure.

Course content

For a description of topics covered and course structure, see learning goals.

The course content evolves from semester to semester. Below is the schedule from the Fall 2022 offering. See the webpages for specific semesters above.

Date	Topic	Reading	Assignment due
Mon, Aug 29	Introduction and Motivation (md, pdf, book chapter)
Wed, Aug 31	From Models to Systems (md, pdf, book chapter)	Building Intelligent Systems, Ch. 5, 7, 8
Fri, Sep 02	Git & ML APIs
Mon, Sep 05	Labor day, no classes
Wed, Sep 07	Model Quality (md, pdf, book chapter 1, chapter 2)	Building Intelligent Systems, Ch. 19	I1: ML Product
	Teamwork Primer (md, pdf)
Fri, Sep 09	Stream processing: Apache Kafka
Mon, Sep 12	Model Testing Beyond Accuracy (md, pdf, book chapter)	Behavioral Testing of NLP Models with CheckList
Wed, Sep 14	Goals and Measurement (md, pdf, book chapter 1, book chapter 2)	Building Intelligent Systems, Ch. 2, 4
Fri, Sep 16	Measurement and Teamwork
Mon, Sep 19	Gathering and Untangling Requirements (md, pdf, book chapter)	The World and the Machine
Wed, Sep 21	Planning for Mistakes (md, pdf, book chapter)	Building Intelligent Systems, Ch. 6, 7, 24	M1: Modeling and First Deployment
Fri, Sep 23	Requirements and Risk Analysis
Mon, Sep 26	Toward Architecture and Design (md, pdf, book chapter 1, chapter 2, chapter 3)	Building Intelligent Systems, Ch. 18 & Choosing the right ML alg.
Wed, Sep 28	Deploying a Model (md, pdf, book chapter)	Building Intelligent Systems, Ch. 13 and Machine Learning Design Patterns, Ch. 16	I2: Requirements
Fri, Sep 30	Architecture & Midterm Questions
Mon, Oct 03	Testing in Production (md, pdf, book chapter)	Building Intelligent Systems, Ch. 14, 15
Wed, Oct 05	Midterm
Fri, Oct 07	Containers: Docker (Code)
Mon, Oct 10	Infrastructure Quality and MLOps (md, pdf, book chapter 1, book chapter 2, book chapter 3, operations chapter)	The ML Test Score
Wed, Oct 12	Data Quality (md, pdf, book chapter)	Data Cascades in High-Stakes AI	I3: Architecture
Fri, Oct 14	Unit Tests and Continuous Integration (PDF, Code, Video)
Mon, Oct 17	Fall break, no classes
Wed, Oct 19	Fall break, no classes
Fri, Oct 21	Fall break, no classes
Mon, Oct 24	Scaling Data Storage and Data Processing (md, pdf, book chapter)	Big Data, Ch. 1
Wed, Oct 26	Process & Technical Debt (md, pdf, book chapter 1, chapter 2)	Hidden Technical Debt in Machine Learning Systems
Fri, Oct 28	Tartan community day, no classes
Mon, Oct 31	Responsible ML Engineering (md, pdf, book chapter 1, chapter 2)	Algorithmic Accountability: A Primer
Wed, Nov 02	Measuring Fairness (md, pdf, book chapter)	Improving Fairness in Machine Learning Systems	M2: Infrastructure Quality
Fri, Nov 04	Monitoring: Prometheus, Grafana
Mon, Nov 07	Building Fairer Products (md, pdf, book chapter)	A Mulching Proposal
Wed, Nov 09	Explainability & Interpretability (md, pdf, book chapter)	Black boxes not required or Stop Explaining Black Box ML Models…	I4: MLOps Tools: Aequitas, Aim, Amazon ECS, ArangoDB, Artillery, Assertible, AWS Cloudwatch, AWS DocumentDB, AWS Glue, Azure Pipelines to deploy on Azure Kubernetes Service, Brooklin, ClearML, Cronitor (ML Pipelines), d6tflow, Dagster, DataPrep, deepchecks, Elasticsearch, FastAPI, Guild AI , HuggingFace, Katib, Kedro, Kubeflow, LightFM, Lightning AI, Logstash, Loki, Mlflow, MongoDB Compass, MySQL, Neptune AI, Neural Network Intelligence (NNI), OpenDP, optuna, Pachyderm, Ploomber, Postman, Prefect, PyJanitor, Qlik Sense, Quilt, Spacy, Splunk, TorchServe, Using Airflow , ZenML
Fri, Nov 11	Fairness
Mon, Nov 14	Transparency & Accountability (md, pdf, book chapter)	People + AI, Ch. Explainability and Trust
Wed, Nov 16	Versioning, Provenance, and Reproducability (md, pdf, book chapter)	Building Intelligent Systems, Ch. 21 & Goods: Organizing Google's Datasets
Fri, Nov 18	Model Explainability & Interpretability (PDF, Code, Video)
Mon, Nov 21	Debugging (Guest lecture by Sherry Tongshuang Wu)	-
Wed, Nov 23	Thanksgiving break
Fri, Nov 25	Thanksgiving break
Mon, Nov 28	Security and Privacy (md, pdf, book chapter)	Building Intelligent Systems, Ch. 25 & The Top 10 Risks of Machine Learning Security
Wed, Nov 30	Safety (md, pdf, book chapter)	Practical Solutions for Machine Learning Safety in Autonomous Vehicles	M3: Monitoring and CD
Fri, Dec 02	Threat modeling
Mon, Dec 05	Fostering Interdisciplinary Teams (md, pdf, book chapter)	Collaboration Challenges in Building ML-Enabled Systems
Wed, Dec 07	Summary and Reflection (md, pdf)		M4: Fairness, Security and Feedback Loops
Sun, Dec 18 (9:30-11:30am)	Final Project Presentations		Final report

Course Syllabus and Policies

See the web pages for the specific semester for details.

Students taking the PhD version of this class (17-745) will replace two individual assignments with a research project instead, resulting in a draft of a paper of at least workshop quality.

17-649 Artificial Intelligence for Software Engineering: This course focuses on how AI techniques can be used to build better software engineering tools and goes into more depth with regard to specific AI techniques, whereas we focus on how software engineering techniques can be used to build AI-enabled systems. Our application scenarios are typical web-based systems for end users, rather than tools for software developers.
05-318 Human-AI Interaction: Focuses on the HCI angle on designing AI-enabled products. Overlaps in some coverage on fairness, covers in much more detail user interface design and how to involving humans in ML-supported decisions, whereas this course focuses more on architecture design, requirements engineering, and deploying systems in production. Both courses are complementary.
17-646 DevOps: Modern Deployment, 17-647 Engineering Data Intensive Scalable Systems, and similar: These course cover techniques to build scalable, reactive, and reliable systems in depth. We will survey DevOps, and big data systems in the context of designing and deploying systems, but will not explore them in as much detail as a dedicated course can. We will look at MLOps as a ML-specific variant of DevOps.
10-601 Machine Learning, 15-381 Artificial Intelligence: Representation and Problem Solving, 05-834 Applied Machine Learning, 95-865 Unstructured Data Analytics, and many others: CMU offers many course that teach how machine learning and artificial intelligence techniques work internally or how to apply them to specific problems (including feature engineering and model evaluation), often on static data sets. We assume a basic understanding of such techniques and processes (see prerequisites) but focus on the engineering process for production ML systems.
10-613 Machine Learning, Ethics and Society, 16-735 Ethics and Robotics, [05-899 Fairness, Accountability, Transparency, & Ethics (FATE) in Sociotechnical Systems], and others dive much deeper into ethical issues and fairness in machine learning, in some cases diving deeper into statistical notions or policy. We will cover these topics in a two-week segment among many others.