Machine Learning in Production / AI Engineering (17-445/17-645/17-745/11-695)

*Formerly Software Engineering for AI-Enabled Systems (SE4AI), CMU course that covers how to build, deploy, assure, and maintain applications with machine-learned models. Covers responsible AI (safety, security, fairness, explainability, transparency) and MLOps.*


Course topics overview

In 2022, the class will be offered both in the Spring and the Fall semester. In 2023, it will be offered only in the spring. The class does not have formal prerequisites, but expects basic programming skills and some familiarity with machine learning concepts.

See the specific offering of the course you are interested in:

For researchers, educators, or others interested in this topic, we share all course material, including slides and assignments, under a creative commons license on GitHub (https://github.com/ckaestne/seai/) and have recently completed a textbook complementing the course. We also published an article describing the rationale and the design of the first iteration of the course: Teaching Software Engineering for AI-Enabled Systems. Video recordings of the Summer 2020 offering, now slightly dated, are online on the course page. We would be happy to see this course or a similar version taught at other universities. See also an annotated bibliography on the topic.

Course Description

This is a course for those who want to build applications and products with machine learning. Assuming we can learn a model to make predictions, what does it take to turn the model into a product and actually deploy it, build a business, and successfully operate and maintain it?

The course is designed to establish a working relationship between software engineers and data scientists: both contribute to building production ML systems but have different expertise and focuses. To work together they need a mutual understanding of their roles, tasks, concerns, and goals and build a working relationship. This course is aimed at software engineers who want to build robust and responsible systems meeting the specific challenges of working with ML components and at data scientists who want to facilitate getting a prototype model into production; it facilitates communication and collaboration between both roles. The course focuses on all the steps needed to turn a model into a production system.

It covers topics such as:

  • How to design for wrong predictions the model may make? How to assure safety and security despite possible mistakes? How to design the user interface and the entire system to operate in the real world?
  • How to reliably deploy and update models in production? How can we test the entire machine learning pipeline? How can MLOps tools help to automate and scale the deployment process? How can we experiment in production (A/B testing, canary releases)? How do we detect data quality issues, concept drift, and feedback loops in production?
  • How do we scale production ML systems? How do we design a system to process huge amounts of training data, telemetry data, and user requests? Should we use stream processing, batch processing, lambda architecture, or data lakes?
  • How to we test and debug production ML systems? How can we evaluate the quality of a model’s predictions in production? How can we test the entire AI-enabled system, not just the model? What lessons can we learn from software testing, automated test case generation, simulation, and continuous integration for testing for production machine learning?
  • Which qualities matter beyond a model’s prediction accuracy? How can we identify and measure important quality requirements, including learning and inference latency, operating cost, scalability, explainablity, fairness, privacy, robustness, and safety? Does the application need to be able to operate offline and how often do we need to update the models? How do we identify what’s important in a AI-enabled product in a production setting for a business? How do we resolve conflicts and tradeoffs?
  • What does it take to build responsible products? How to think about fairness of a production system at the model and system level? How to mitigate safety and security concerns? How can we communicate the reasons of an automated decision or explain uncertainty to users?
  • How do we build effective interdisciplinary teams? How can we bring data scientists, software engineers, UI designers, managers, domain experts, big data specialists, operators, legal council, and other roles together and develop a shared understanding and team culture?

Examples of ML-driven products we discuss include automated audio transcription; distributed detection of missing children on webcams and instant translation in augmented reality; cancer detection, fall detection, COVID diagnosis, and other smart medical and health services; automated slide layout in Powerpoint; semi-automated college admissions; inventory management; smart playlists and movie recommendations; ad fraud detection; delivery robots and smart driving features; and many others.

An extended group project focuses on building, deploying, evaluating, and maintaining a robust and scalable movie recommendation service under realistic “production” conditions.

Learning Outcomes

After taking this course, among others, students should be able to

  • analyze tradeoffs for designing production systems with AI-components, analyzing various qualities beyond accuracy such as operation cost, latency, updateability, and explainability
  • implement production-quality systems that are robust to mistakes of AI components
  • design fault-tolerant and scalable data infrastructure for learning models, serving models, versioning, and experimentation
  • ensure quality of the entire machine learning pipeline with test automation and other quality assurance techniques, including automated checks for data quality, data drift, feedback loops, and model quality
  • build systems that can be tested in production and build deployment pipelines that allow careful rollouts and canary testing
  • consider privacy, fairness, and security when building complex AI-enabled systems
  • communicate effectively in teams with both software engineers and data analysts

In addition, students will gain familiarity with production-quality infrastructure tools, including stream processing with Apache Kafka, distributed data storage with SQL and NoSQL databases, deployment with Docker and Kubernetes, and test automation with Travis or Jenkins.

Design Rationale

  • Data scientists often make great progress at building models with cutting edge techniques but turning those models into products is challenging. For example, data scientists may work with unversioned notebooks on static data sets and focus on prediction accuracy while ignoring scalability, robustness, update latency, or operating cost.
  • Software engineers are trained with clear specifications and tend to focus on code, but may not be aware of the difficulties of working with data and unreliable models. They have a large toolset for decision making and quality assurance but it is not obvious how to apply those to AI-enabled systems and their challenges.
  • To what degree can existing SE practices be used for building intelligent systems? To what degree are new practices needed?
  • This course adopts a software engineering and operator perspective on building intelligent systems, focusing on how to turn a machine learning idea into a scalable and reliable product. Rather than focusing on modeling and learning itself, it assumes a working relationship with a data scientist and focuses on issues of design, implementation, operation, and assurance and how those interact with the data scientist's modeling.
  • The course will use software and systems engineering terminology and techniques (e.g., test coverage, architecture views, fault trees) and make explicit transfers to challenges posed by using machine learning/AI components. The course will not teach fundamentals of machine learning or AI, but will assume a basic understanding of relevant concepts (e.g., feature engineering, linear regression vs fault trees vs neural networks). It will heavily train design thinking and tradeoff analysis. It will focus primarily on practical approaches that can be used now and will feature hands-on practice with modern tools and infrastructure.

Course content

For a description of topics covered and course structure, see learning goals.

The course content evolves from semester to semester. Below is the schedule from the Fall 2022 offering. See the webpages for specific semesters above.

Date Topic Reading Assignment due
Mon, Aug 29 Introduction and Motivation (md, pdf, book chapter)
Wed, Aug 31 From Models to Systems (md, pdf, book chapter) Building Intelligent Systems, Ch. 5, 7, 8
Fri, Sep 02 Recitation Git & ML APIs
Mon, Sep 05 Break Labor day, no classes
Wed, Sep 07 Model Quality (md, pdf, book chapter 1, chapter 2) Building Intelligent Systems, Ch. 19 I1: ML Product
Teamwork Primer (md, pdf)
Fri, Sep 09 Recitation Stream processing: Apache Kafka
Mon, Sep 12 Model Testing Beyond Accuracy (md, pdf, book chapter) Behavioral Testing of NLP Models with CheckList
Wed, Sep 14 Goals and Measurement (md, pdf, book chapter 1, book chapter 2) Building Intelligent Systems, Ch. 2, 4
Fri, Sep 16 Recitation Measurement and Teamwork
Mon, Sep 19 Gathering and Untangling Requirements (md, pdf, book chapter) The World and the Machine
Wed, Sep 21 Planning for Mistakes (md, pdf, book chapter) Building Intelligent Systems, Ch. 6, 7, 24 M1: Modeling and First Deployment
Fri, Sep 23 Recitation Requirements and Risk Analysis
Mon, Sep 26 Toward Architecture and Design (md, pdf, book chapter 1, chapter 2, chapter 3) Building Intelligent Systems, Ch. 18 & Choosing the right ML alg.
Wed, Sep 28 Deploying a Model (md, pdf, book chapter) Building Intelligent Systems, Ch. 13 and Machine Learning Design Patterns, Ch. 16 I2: Requirements
Fri, Sep 30 Recitation Architecture & Midterm Questions
Mon, Oct 03 Testing in Production (md, pdf, book chapter) Building Intelligent Systems, Ch. 14, 15
Wed, Oct 05 Midterm Midterm
Fri, Oct 07 Recitation Containers: Docker (Code)
Mon, Oct 10 Infrastructure Quality and MLOps (md, pdf, book chapter 1, book chapter 2, book chapter 3, operations chapter) The ML Test Score
Wed, Oct 12 Data Quality (md, pdf, book chapter) Data Cascades in High-Stakes AI I3: Architecture
Fri, Oct 14 Recitation Unit Tests and Continuous Integration (PDF, Code, Video)
Mon, Oct 17 Break Fall break, no classes
Wed, Oct 19 Break Fall break, no classes
Fri, Oct 21 Break Fall break, no classes
Mon, Oct 24 Scaling Data Storage and Data Processing (md, pdf, book chapter) Big Data, Ch. 1
Wed, Oct 26 Process & Technical Debt (md, pdf, book chapter 1, chapter 2) Hidden Technical Debt in Machine Learning Systems
Fri, Oct 28 Break Tartan community day, no classes
Mon, Oct 31 Responsible ML Engineering (md, pdf, book chapter 1, chapter 2) Algorithmic Accountability: A Primer
Wed, Nov 02 Measuring Fairness (md, pdf, book chapter) Improving Fairness in Machine Learning Systems M2: Infrastructure Quality
Fri, Nov 04 Recitation Monitoring: Prometheus, Grafana
Mon, Nov 07 Building Fairer Products (md, pdf, book chapter) A Mulching Proposal
Wed, Nov 09 Explainability & Interpretability (md, pdf, book chapter) Black boxes not required or Stop Explaining Black Box ML Models… I4: MLOps Tools: Aequitas, Aim, Amazon ECS, ArangoDB, Artillery, Assertible, AWS Cloudwatch, AWS DocumentDB, AWS Glue, Azure Pipelines to deploy on Azure Kubernetes Service, Brooklin, ClearML, Cronitor (ML Pipelines), d6tflow, Dagster, DataPrep, deepchecks, Elasticsearch, FastAPI, Guild AI , HuggingFace, Katib, Kedro, Kubeflow, LightFM, Lightning AI, Logstash, Loki, Mlflow, MongoDB Compass, MySQL, Neptune AI, Neural Network Intelligence (NNI), OpenDP, optuna, Pachyderm, Ploomber, Postman, Prefect, PyJanitor, Qlik Sense, Quilt, Spacy, Splunk, TorchServe, Using Airflow , ZenML
Fri, Nov 11 Recitation Fairness
Mon, Nov 14 Transparency & Accountability (md, pdf, book chapter) People + AI, Ch. Explainability and Trust
Wed, Nov 16 Versioning, Provenance, and Reproducability (md, pdf, book chapter) Building Intelligent Systems, Ch. 21 & Goods: Organizing Google's Datasets
Fri, Nov 18 Recitation Model Explainability & Interpretability (PDF, Code, Video)
Mon, Nov 21 Debugging (Guest lecture by Sherry Tongshuang Wu) -
Wed, Nov 23 Break Thanksgiving break
Fri, Nov 25 Break Thanksgiving break
Mon, Nov 28 Security and Privacy (md, pdf, book chapter) Building Intelligent Systems, Ch. 25 & The Top 10 Risks of Machine Learning Security
Wed, Nov 30 Safety (md, pdf, book chapter) Practical Solutions for Machine Learning Safety in Autonomous Vehicles M3: Monitoring and CD
Fri, Dec 02 Recitation Threat modeling
Mon, Dec 05 Fostering Interdisciplinary Teams (md, pdf, book chapter) Collaboration Challenges in Building ML-Enabled Systems
Wed, Dec 07 Summary and Reflection (md, pdf) M4: Fairness, Security and Feedback Loops
Sun, Dec 18 (9:30-11:30am) Final Project Presentations Final report

Course Syllabus and Policies

See the web pages for the specific semester for details.

Students taking the PhD version of this class (17-745) will replace two individual assignments with a research project instead, resulting in a draft of a paper of at least workshop quality.

  • 17-649 Artificial Intelligence for Software Engineering: This course focuses on how AI techniques can be used to build better software engineering tools and goes into more depth with regard to specific AI techniques, whereas we focus on how software engineering techniques can be used to build AI-enabled systems. Our application scenarios are typical web-based systems for end users, rather than tools for software developers.
  • 05-318 Human-AI Interaction: Focuses on the HCI angle on designing AI-enabled products. Overlaps in some coverage on fairness, covers in much more detail user interface design and how to involving humans in ML-supported decisions, whereas this course focuses more on architecture design, requirements engineering, and deploying systems in production. Both courses are complementary.
  • 17-646 DevOps: Modern Deployment, 17-647 Engineering Data Intensive Scalable Systems, and similar: These course cover techniques to build scalable, reactive, and reliable systems in depth. We will survey DevOps, and big data systems in the context of designing and deploying systems, but will not explore them in as much detail as a dedicated course can. We will look at MLOps as a ML-specific variant of DevOps.
  • 10-601 Machine Learning, 15-381 Artificial Intelligence: Representation and Problem Solving, 05-834 Applied Machine Learning, 95-865 Unstructured Data Analytics, and many others: CMU offers many course that teach how machine learning and artificial intelligence techniques work internally or how to apply them to specific problems (including feature engineering and model evaluation), often on static data sets. We assume a basic understanding of such techniques and processes (see prerequisites) but focus on the engineering process for production ML systems.
  • 10-613 Machine Learning, Ethics and Society, 16-735 Ethics and Robotics, [05-899 Fairness, Accountability, Transparency, & Ethics (FATE) in Sociotechnical Systems], and others dive much deeper into ethical issues and fairness in machine learning, in some cases diving deeper into statistical notions or policy. We will cover these topics in a two-week segment among many others.