Errata

Known corrections to the slide content. Each entry links to the affected slide.

Mislabeled Formulas / Diagrams

Lecture 04: Tuning — Slide 25

L1 norm formula (sum of absolute values) labeled as "L2-Norm" in the blue box. Directly confuses students learning L1 vs L2 regularization.

Lecture 05: CNN — Slide 10

Laplacian edge detection kernel labeled as "Edge Detection (Sobel)". Commentary repeats the error.

Lecture 08: Attention and Transformers — Slide 18

Query matrix diagram labeled "XΦ_k" (key subscript) instead of "XΦ_q" (query subscript). The slide's own formula and bullet text use the correct subscript.

Lecture 09: Large Language Models — Slide 5

Word2Vec labeled as "masked language model" on the slide. That's BERT. The slide even correctly labels BERT two lines below, contradicting its own Word2Vec label.

Wrong Classification / Attribution

Lecture 08: Attention and Transformers — Slide 32

T5 listed as encoder-only alongside BERT. T5 is an encoder-decoder model. Commentary repeats the error.

Lecture 06: NLP and Representation Learning — Slide 21

Commentary says OpenAI models produce 768-dimensional embeddings. 768 is BERT-base (Google). OpenAI's embedding models use 1536 dimensions.

Lecture 09: Large Language Models — Slide 17

Google Gemini Ultra listed as a reasoning model alongside o1/o3-mini and DeepSeek R1. Gemini Ultra is the original large Gemini 1.0 model, not a reasoning-focused model.

Numerical / Calculation Errors

Lecture 06: NLP and Representation Learning — Slide 13

States 20^10 = 10^12. Actual value is ~1.024 × 10^13, off by an order of magnitude. Both slide and commentary have the error.

Lecture 09: Large Language Models — Slide 15

Slide says ~168 GB VRAM for a 70B model at 16-bit, then immediately shows the math: 70B × 2 bytes = 140 GB. The two numbers contradict each other on the same slide.

Lecture 14: ML Use-Cases — Slide 28

States "3 customized amenity objects" — almost certainly should be 30, consistent with all surrounding slides. Commentary repeats the error.

Systematic Commentary-Slide Misalignment

Lecture 05: CNN — Slides 13–36

24 consecutive slides have headings and commentary shifted one position ahead of the slide images. Every commentary from slide 13 onward describes the next slide's content.

Lecture 02: Training Neural Networks — Slides 1–6

Commentary shifted ~2 positions. Slides 1–3 describe Keras code that was removed from the web page; slides 4–6 commentary narrates earlier slides' content.

Swapped / Reversed Labels

Lecture 02: Training Neural Networks — Slide 28

Slide says "Slope (w₁) and intercept (w₂)" but the model defined on Slide 25 is y = w₁ + w₂x, making w₁ the intercept and w₂ the slope. Labels are reversed.

Lecture 02: Training Neural Networks — Slide 30

Commentary says the green line is the noisy SGD (lr=0.1, mini-batch=1) and the blue line is smoother (lr=0.05, mini-batch=5). The slide legend shows the opposite: blue is noisy, green is smooth.

Lecture 12: Recommendations — Slide 14

Slide lists "Privacy" as a soft objective. Commentary replaces it with "Fairness" — a different concept entirely.

Incorrect Technical Descriptions

Lecture 01: Introduction to Neural Networks — Slide 18

Both Linear and ReLU output sections say "Appropriate for general regression problems." The ReLU description should specify non-negative regression. The commentary gets this right; the slide doesn't.

Lecture 03: Architecture — Slide 21

Commentary says "normalizing so the variance falls between -1 and 1." Variance becomes ~1 (a scalar); it's the data values that fall in [-1, 1].

Lecture 03: Architecture — Slide 30

Slide says "reduce [learning rate] by an order of magnitude" then gives examples (0.005, 0.0025, 0.001) that are reductions by factors of 2–2.5, not 10.

Lecture 12: Recommendations — Slide 29

Commentary describes Q as K × M but the slide defines Q = V_k, which is m × k. The transposed dimensions could confuse students reconciling formulas.

Lecture 12: Recommendations — Slide 33

Both slide and commentary describe F matrix entries as "how much item i increases proximity to concept d" — that describes Y (items × concepts), not F (users × items).

Inconsistent Facts Across Slides

Lecture 09: Large Language Models — Slides 2, 11, 12, 19

LLM training phases described as "three" (slides 2, 19), "four" (slide 11), and "two" (slide 12 commentary) without reconciliation.

Lecture 05: CNN — Slides 26, 27

ResNet slide cites arxiv link 1512.00567, which is the Inception v3 paper. The correct ResNet link is 1512.03385.

Lecture 06: NLP and Representation Learning — Slide 17

Commentary dates Word2Vec popularity to 2011. The Word2Vec paper was published in 2013.

Wrong Content on Slide

Lecture 01: Introduction to Neural Networks — Slide 19

Softmax chart plots 3 outputs (y1, y2, y3) but the slide's own text gives a 4-class example (Red, Green, Blue, Yellow). Internal contradiction.

Lecture 03: Architecture — Slide 32

Commentary describes manual checkpointing ("save weights, roll back if you overshoot") but the slide shows keras.callbacks.EarlyStopping — a different strategy.

Lecture 14: ML Use-Cases — Slide 25

Slide text says "selected 40" classes but the image labels on the same slide say "30 Classes."