If you can hear me, open the participant panel in Zoom and check "yes"
This is not normal. We understand.
This is not normal. We understand.
Expect:
Internet and bandwidth issues
Timezone issues?
Distractions -- parents, siblings, pets
Feeling isolated, feeling overwhelmed
Additional sources of stress
Hard time dealing with -gestures widely- everything...
Talk to us about accommodations of any kind
Simulating in-class experience
Discussions and interactions are important. We'll have regular in-class discussions and exercises
Use chat, "raise hand" feature, or just speak
If possible, keep camera on, muted by default
Set preferred name in Zoom
Synchronous "live" attendance only
Suggestion: Have chat and participant list open, maybe separate window for gallery view for faces, second monitor highly recommended
Contact us for accommodations!
Catastrophic Success
Personal Connection
This is hard. We know.
Talk inside and outside of class
We are here always 10 min before class and stay after class if you have questions, want to chat
We encourage collaboration in all assignments, even "individual" assignments and reading quizzes
We encourage social activities in teams
Learning Goals
Understand how ML components are parts of larger systems
Illustrate the challenges in engineering an ML-enabled system beyond accuracy
Explain the role of specifications and their lack in machine learning and the relationship to deductive and inductive reasoning
Summarize the respective goals and challenges of software engineers vs data scientists
Explain the concept and relevance of "T-shaped people"
Disclaimers
This class captures a rapidly evolving field.
We are scaling from 30 to 150 students. Expect some friction.
We are software engineers.
Agenda
Case Study: The Transcription Service Startup
Transcription services
Take audio or video files and produce text.
Used by academics to analyze interview text
Podcast show notes
Subtitles for videos
State of the art: Manual transcription, often mechanical turk (1.5 $/min)
The startup idea
PhD research on domain-specific speech recognition, that can detect technical jargon
DNN trained on public PBS interviews + transfer learning on smaller manually annotated domain-specific corpus
Research has shown amazing accuracy for talks in medicine, poverty and inequality research, and talks at Ruby programming conferences; published at top conferences
Idea: Let's commercialize the software and sell to academics and conference organizers
Short Breakout
Likely challenges in building commercial product?
Think about challenges that the team will likely focus when turning their research into a product:
One machine-learning challenge
One engineering challenge in building the product
One challenge from operating and updating the product
One team or management challenge
One business challenge
One safety or ethics challenge
Fill out one form per team and meet back here in 8 minutes to share suggestions
What qualities are important for a good commercial transcription product?
ML in a Production System
ML in a Production System
and Data engineers + Domain specialists + Operators + Business team + Project managers + Designers, UI Experts + Safety, security specialists + Lawyers + Social scientists + ...
Data scientist
Often fixed dataset for training and evaluation (e.g., PBS interviews)
Focused on accuracy
Prototyping, often Jupyter notebooks or similar
Expert in modeling techniques and feature engineering
Model size, updateability, implementation stability typically does not matter
Software engineer
Builds a product
Concerned about cost, performance, stability, release time
Identify quality through customer satisfaction
Must scale solution, handle large amounts of data
Detect and handle mistakes, preferably automatically
Maintain, evolve, and extend the product over long periods
Consider requirements for security, safety, fairness
Likely collaboration challenges?
Everybody, type one or two likely collaboration challenges in the chat but do not send them yet. Vote "yes" when done.
What might Software Engineers and Data Scientists Focus on?