Finalizing enrollment on Sep 7 (start of group projects)
Learning Goals
Understand how ML components are parts of larger systems
Illustrate the challenges in engineering an ML-enabled system beyond accuracy
Explain the role of specifications and their lack in machine learning and the relationship to deductive and inductive reasoning
Summarize the respective goals and challenges of software engineers vs data scientists
Explain the concept and relevance of "T-shaped people"
Agenda Today
Preliminaries (just done)
Case Study
Syllabus
Introductions
Case Study: A Transcription Service Startup
Transcription services
Take audio or video files and produce text.
Used by academics to analyze interview text
Podcast show notes
Subtitles for videos
State of the art a few years ago: Manual transcription, often mechanical turk (1.5 $/min)
Recently: Many ML models for transcription (e.g., in Youtube, Alexa, Siri, Zoom)
The startup idea
PhD research on domain-specific speech recognition, that can detect technical jargon
DNN trained on public PBS interviews + transfer learning on smaller manually annotated domain-specific corpus
Research has shown amazing accuracy for talks in medicine, poverty and inequality research, and talks at Ruby programming conferences; published at top conferences
Idea: Let's commercialize the software and sell to academics and conference organizers
Breakout: Likely challenges in building commercial product?
As a group, think about challenges that the team will likely focus when turning their research into a product:
One machine-learning challenge
One engineering challenge in building the product
One challenge from operating and updating the product
One team or management challenge
One business challenge
One safety or ethics challenge
Post answer to #lecture on Slack and tag all group members
What qualities are important for a good commercial transcription product?
ML in a Production System
ML in a Production System
and Data engineers + Domain specialists + Operators + Business team + Project managers + Designers, UI Experts + Safety, security specialists + Lawyers + Social scientists + ...
Data scientist
Often fixed dataset for training and evaluation (e.g., PBS interviews)
Focused on accuracy
Prototyping, often Jupyter notebooks or similar
Expert in modeling techniques and feature engineering
Model size, updateability, implementation stability typically does not matter
Software engineer
Builds a product
Concerned about cost, performance, stability, release time
Identify quality through customer satisfaction
Must scale solution, handle large amounts of data
Detect and handle mistakes, preferably automatically
Maintain, evolve, and extend the product over long periods
Consider requirements for security, safety, fairness
Likely collaboration challenges?
What might Software Engineers and Data Scientists Focus on?