Christian Kaestner and Eunsuk Kang
Take audio or video files and produce text.
State of the art: Manual transcription, often mechanical turk (1.5 $/min)
PhD research on domain-specific speech recognition, that can detect technical jargon
DNN trained on public PBS interviews + transfer learning on smaller manually annotated domain-specific corpus
Research has shown amazing accuracy for talks in medicine, poverty and inequality research, and talks at Ruby programming conferences; published at top conferences
Idea: Let's commercialize the software and sell to academics and conference organizers
Reference: Garvin, David A., What Does Product Quality Really Mean. Sloan management review 25 (1984).
[Highlights challenging fragments. Can see what users fix inplace to correct. Star rating for feedback.]
Algorithms.shortestDistance(g, "Tom", "Anne");
> ArrayOutOfBoundsException
Algorithms.shortestDistance(g, "Tom", "Anne");
> -1
class Algorithms {
/**
* This method finds the shortest distance between to
* verticies. It returns -1 if the two nodes are not
* connected.
*/
int shortestDistance(…) {…}
}
class Algorithms {
/**
* This method finds the shortest distance between to
* verticies. Method is only supported
* for connected verticies.
*/
int shortestDistance(…) {…}
}
/*@ requires amount >= 0;
ensures balance == \old(balance)-amount &&
\result == balance;
@*/
public int debit(int amount) {
...
}
(JML specification in Java, pre- and postconditions)
/**
* Calls the <code>read(byte[], int, int)</code> overloaded [...]
* @param buf The buffer to read bytes into
* @return The value retured from <code>in.read(byte[], int, int)</code>
* @exception IOException If an error occurs
*/
public int read(byte[] buf) throws IOException
{
return read(buf, 0, buf.length);
}
(textual specification with JavaDoc)
Source: Ryzhyk. On the Construction of Reliable Device Drivers. PhD Thesis 2009
Math.sqrt(-5);
> 0
/**
????
*/
String transcribe(File audioFile);
/**
????
*/
List<Product> suggestedPurchases(List<Product> pastPurchases);
(Daniel Miessler, CC SA 2.0)
From deductive reasoning to inductive reasoning
From clear specifications to goals
From guarantees to best effort
What does this mean for software engineering? For correctness of AI-enabled systems? For testing?
While it is possible to formally specify programs and prove them correct, this is rarely ever done.
In practice, specifications are often textual, local, weak, vague, or ambiguous, if they exist at all. Some informal requirements and some tests might be the only specifications available.
Software engineers have long development methods to deal with uncertainty, missing specifications, and unreliable components.
AI may raise the stakes, but the problem and solutions are not new.
"Machine learning: The high interest credit card of technical debt" -- Sculley et al. 2014
Jupyter Notebooks are a gift from God to those who work with data. They allow us to do quick experiments with Julia, Python, R, and more -- John Paul Ada
Further reading: Sculley, David, et al. Hidden technical debt in machine learning systems. Advances in Neural Information Processing Systems. 2015.
17-445/17-645, Fall 2019, 12 units
Monday/Wednesday 1:30-2:50
Christian Kaestner, Eunsuk Kang, Chu-Pan Wong
< brief introductions >
Email to se-ai@lists.andrew.cmu.edu preferred, rather than individual instructors
Announcements through canvas
Office hours and open door policy (When our door is open and we are not currently meeting with somebody else, feel free to interrupt us for course-related issues.)
Materials on GitHub. Pull requests encouraged.
Some software engineering experience
Machine learning basics
Use knowledge check on canvas to identify gaps. Talk to us about strategies to fill gaps.
Empirical research is fairly clear:
Smoking section policy: Avoid laptops beyond note-taking. If you want to use laptops, sit in the back.
[1]: Faria Sana, Tina Weston, and Nicholas J. Cepeda. 2013. Laptop multitasking hinders classroom learning for both users and nearby peers. Comput. Educ. 62 (March 2013), 24-31.
[2]: Mueller, Pam A., and Daniel M. Oppenheimer. The pen is mightier than the keyboard: advantages of longhand over laptop note taking Psychological science 25.6 (2014): 1159-1168.
Building Intelligent Systems: A Guide to Machine Learning Engineering
by Geoff Hulten
https://www.buildingintelligentsystems.com/
Various chapters assigned throughout the semester
Supplemented with research articles, blog posts, videos, podcasts, ...
Electronic version in the library
Series of small to medium-sized assignments:
Individual and team assignments
No capstone project
Late work in group assignments will receive feedback but no credit.
Late work in individual assignments will be accepted with a 10% penalty per day, for up to 3 days.
Talk to us (early) for concerns and exceptions.
See web page
In a nutshell: do not copy, do not lie, do not share or publicly release your solutions
In group work, be honest about contributions of team members, do not cover for others
If you feel overwhelmed or stressed, please come and talk to us (see syllabus for other support opportunities)
Survey helps us to tailor class and form teams.