Christian Kaestner
Required watching: Molham Aref. Business Systems with Machine Learning. Guest lecture, 2020.
Suggested reading: Martin Kleppmann. Designing Data-Intensive Applications. OReilly. 2017.
all potentially with huge total volumes and high throughput
need strategies for storage and processing
Efficent Algorithms
Faster Machines
More Machines
Simulating biological neural networks of neurons (nodes) and synapses (connections), popularized in 60s and 70s
Basic building blocks: Artificial neurons, with $n$ inputs and one output; output is activated if at least $m$ inputs are active
(assuming at least two activated inputs needed to activate output)
computing weighted sum of inputs + step function
$z = w_1 x_1 + w_2 x_2 + ... + w_n x_n = \mathbf{x}^T \mathbf{w}$
e.g., step: $\phi$(z) = if (z<0) 0 else 1
$o_1 = \phi(b_{1} + w_{1,1} x_1 + w_{1,2} x_2)$ $o_2 = \phi(b_{2} + w_{2,1} x_1 + w_{2,2} x_2)$ $o_3 = \phi(b_{3} + w_{3,1} x_1 + w_{3,2} x_2)$
$f_{\mathbf{W},\mathbf{b}}(\mathbf{X})=\phi(\mathbf{W} \cdot \mathbf{X}+\mathbf{b})$
($\mathbf{W}$ and $\mathbf{b}$ are parameters of the model)
$f_{\mathbf{W}_h,\mathbf{b}_h,\mathbf{W}_o,\mathbf{b}_o}(\mathbf{X})=\phi( \mathbf{W}_o \cdot \phi(\mathbf{W}_h \cdot \mathbf{X}+\mathbf{b}_h)+\mathbf{b}_o)$
(matrix multiplications interleaved with step function)
Intuition:
Works efficiently only for certain $\phi$, typically logistic function: $\phi(z)=1/(1+exp(-z))$ or ReLU: $\phi(z)=max(0,z)$.
See Chapter 10 in 🕮 Géron, Aurélien. ”Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, 2nd Edition (2019) or any other book on deep learning
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=[28, 28]),
keras.layers.Dense(300, activation="relu"),
keras.layers.Dense(100, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
How many parameters does this model have?
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=[28, 28]),
# 784*300+300 = 235500 parameter
keras.layers.Dense(300, activation="relu"),
# 300*100+100 = 30100 parameters
keras.layers.Dense(100, activation="relu"),
# 100*10+10 = 1010 parameters
keras.layers.Dense(10, activation="softmax")
])
Total of 266,610 parameters in this small example! (Assuming float types, that's 1 MB)
Consumption | CO2 (lbs) |
---|---|
Air travel, 1 passenger, NY↔SF | 1984 |
Human life, avg, 1 year | 11,023 |
American life, avg, 1 year | 36,156 |
Car, avg incl. fuel, 1 lifetime | 126,000 |
Training one model (GPU) | CO2 (lbs) |
---|---|
NLP pipeline (parsing, SRL) | 39 |
w/ tuning & experimentation | 78,468 |
Transformer (big) | 192 |
w/ neural architecture search | 626,155 |
Strubell, Emma, Ananya Ganesh, and Andrew McCallum. "Energy and Policy Considerations for Deep Learning in NLP." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3645-3650. 2019.
Model | Hardware | Hours | CO2 | Cloud cost in USD |
---|---|---|---|---|
Transformer | P100x8 | 84 | 192 | 289–981 |
ELMo | P100x3 | 336 | 262 | 433–1472 |
BERT | V100x64 | 79 | 1438 | 3751–13K |
NAS | P100x8 | 274,120 | 626,155 | 943K–3.2M |
GPT-2 | TPUv3x32 | 168 | — | 13K–43K |
Strubell, Emma, Ananya Ganesh, and Andrew McCallum. "Energy and Policy Considerations for Deep Learning in NLP." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3645-3650. 2019.
Li, Mu, et al. "Scaling distributed machine learning with the parameter server." OSDI, 2014.
Li, Mu, et al. "Scaling distributed machine learning with the parameter server." OSDI, 2014.
Increasing interest in the systems aspects of machine learning
e.g., building large scale and robust learning infrastructure
user_id | Name | dpt | |
---|---|---|---|
1 | Christian | kaestner@cs. | 1 |
2 | Eunsuk | eskang@cmu. | 1 |
2 | Tom | ... | 2 |
dpt_id | Name | Address |
---|---|---|
1 | ISR | ... |
2 | CSD | ... |
select d.name from user u, dpt d where u.dpt=d.dpt_id
{
"id": 1,
"name": "Christian",
"email": "kaestner@cs.",
"dpt": [
{"name": "ISR", "address": "..."}
],
"other": { ... }
}
db.getCollection('users').find({"name": "Christian"})
2020-06-25T13:44:14,601844,GET /data/m/goyas+ghosts+2006/17.mpg
2020-06-25T13:44:14,935791,GET /data/m/the+big+circus+1959/68.mpg
2020-06-25T13:44:14,557605,GET /data/m/elvis+meets+nixon+1997/17.mpg
2020-06-25T13:44:14,140291,GET /data/m/the+house+of+the+spirits+1993/53.mpg
2020-06-25T13:44:14,425781,GET /data/m/the+theory+of+everything+2014/29.mpg
2020-06-25T13:44:14,773178,GET /data/m/toy+story+2+1999/59.mpg
2020-06-25T13:44:14,901758,GET /data/m/ignition+2002/14.mpg
2020-06-25T13:44:14,911008,GET /data/m/toy+story+3+2010/46.mpg
Divide data:
Tradeoffs?
Benefits and Drawbacks?
cat /var/log/nginx/access.log |
awk '{print $7}' |
sort |
uniq -c |
sort -r -n |
head -n 5
MapReduce as common framework
Image Source: Ville Tuulos (CC BY-SA 3.0)
Moving Computation is Cheaper than Moving Data -- Hadoop Documentation
Like shell programs: Read from stream, produce output in other stream. Loose coupling
createUser(id=5, name="Christian", dpt="SCS")
updateUser(id=5, dpt="ISR")
deleteUser(id=5)
On a shopping website, a customer may add an item to their cart and then remove it again. Although the second event cancels out the first event from the point of view of order fulfillment, it may be useful to know for analytics purposes that the customer was considering a particular item but then decided against it. Perhaps they will choose to buy it in the future, or perhaps they found a substitute. This information is recorded in an event log, but would be lost in a database that deletes items when they are removed from the cart.
Source: Greg Young. CQRS and Event Sourcing. Code on the Beach 2014 via Martin Kleppmann. Designing Data-Intensive Applications. OReilly. 2017.
Source: Textractor (CC BY-SA 4.0)
Many data sources, many outputs, many copies
Which data is derived from what other data and how?
Is it reproducible? Are old versions archived?
How do you get the right data to the right place in the right format?
Plan and document data flows
Molham Aref "Business Systems with Machine Learning"
Extract, tranform, load
Molham Aref "Business Systems with Machine Learning"
Ideally architectural planning upfront
Existing system: Analyze performance bottlenecks
G.Serazzi Ed. Performance Evaluation Modelling with JMT: learning by examples. Politecnico di Milano - DEI, TR 2008.09, 366 pp., June 2008
Mostly used during development phase in single components
Recommended reading: Martin Kleppmann. Designing Data-Intensive Applications. OReilly. 2017.