Book detail
Library

Cuantum trackFull access
Under the Hood of Large Language Models
8 chapters and 47 canonical sections synced from the Cuantum content database.
Author
Cuantum Tech.
Chapters
8
Reading time
~ 9h
Level
Professional
Language
English
Edition
2025
Your progress0%
Chapters & sections
8 chapters - 47 sectionsChapter 01
Chapter 1: What Are LLMs? From Transformers to Titans
0/5Chapter 02
Chapter 2: Tokenization and Embeddings
0/5Chapter 03
Chapter 3: Anatomy of an LLM
0/5Chapter 04
Chapter 4: Training LLMs from Scratch
0/614.1 Data Collection, Cleaning, Deduplication, and Filtering12m24.2 Curriculum Learning, Mixture Datasets, and Synthetic Data12m34.3 Infrastructure: Distributed Training, GPUs vs TPUs vs Accelerators12m44.4 Cost Optimization & Sustainability in Large-Scale Training12m5Chapter 4 Summary – Training LLMs from Scratch12m6Practical Exercises – Chapter 412m
Chapter 05
Chapter 5: Beyond Text: Multimodal LLMs
0/5Chapter 06
Quiz
0/2Chapter 07
Project 1: Build a Toy Transformer from Scratch in PyTorch
0/8Chapter 08
Project 2: Train a Custom Domain-Specific Tokenizer (e.g., for legal or medical texts)
0/1110. Setup12m21. Gather a Representative Mini-Corpus12m32. Train a BPE Tokenizer (🤗 tokenizers)12m43. Train a SentencePiece Tokenizer (Unigram or BPE)12m54. Wrap Your Tokenizer for Transformers12m65. Evaluate Tokenizer Quality12m76. Add a User Vocabulary (optional but powerful)12m87. Save, Load, and Version12m98. Plug Into a Small Model (sanity run)12m11Learning outcomes12m10Pitfalls & Tips12m