LOCAL PREVIEW View on GitHub

Fine-Tuning Foundational Models for MangaAssist

Overview

This folder contains 18 deep-dive scenario documents covering fine-tuning, model customization, post-fine-tuning inspection techniques, and capstone intuition synthesis relevant to the MangaAssist chatbot. Each document is written as a group discussion among 5 engineers, includes mathematical derivations with geometric intuition, layer-by-layer mermaid diagrams or inspection workflows showing what happens inside models during training, production-grade code, and references to foundational research papers.

MangaAssist Scenario Companion Set

For the newer MangaAssist-grounded scenario documents, start with:

These companion docs sit beside the original deep dives and cover the non-intent fine-tuning topics: embeddings, reranking, RAFT, LoRA/QLoRA, prompt tuning, QAT, distillation, MoE, continual learning, few-shot learning, multi-task learning, sentiment, DPO/RLHF, MLOps, data curation, interpretability, and capstone decision-making.

Reading Order

Tier 1: Start Here (Core Fine-Tuning for MangaAssist Production)

These are the techniques already deployed or planned for MangaAssist:

# Document What You Learn Difficulty
01 Intent Classifier Fine-Tuning DistilBERT fine-tuning for 10-intent routing, focal loss, class imbalance Intermediate
02 Embedding Model Fine-Tuning Contrastive learning adapter on Titan V2, InfoNCE loss, hard negative mining Intermediate
03 Cross-Encoder Reranker Fine-Tuning ms-marco-MiniLM domain adaptation, pairwise ranking loss, LambdaRank Intermediate
08 Sentiment Classifier Fine-Tuning Frustration detection for escalation, gradual unfreezing, transfer learning Intermediate

Tier 2: Advanced Model Customization

These extend MangaAssist with more sophisticated fine-tuning:

# Document What You Learn Difficulty
04 LoRA and QLoRA LLM Customization Low-rank adapters for Claude/open-source LLMs, 4-bit quantization + training Advanced
05 Knowledge Distillation Pipeline Teacher (Sonnet) → Student distillation, dark knowledge, temperature scaling Advanced
10 RLHF and DPO Alignment Reinforcement learning from human feedback, Direct Preference Optimization Advanced
11 Prompt Tuning and Prefix Tuning Soft prompts as lightweight alternative to LoRA, P-Tuning v2 Advanced
14 Retrieval-Augmented Fine-Tuning (RAFT) Fine-tune LLM to properly use retrieved context, reduce hallucination Advanced

Tier 3: Specialized Techniques

For specific production challenges:

# Document What You Learn Difficulty
06 Continual Learning and Catastrophic Forgetting EWC, rehearsal buffers, drift detection, monthly retraining Advanced
07 Few-Shot Learning and Rapid Adaptation Prototypical networks, MAML, SetFit, adding new intents with <100 examples Advanced
12 Quantization-Aware Training INT8/INT4 quantization, GPTQ, AWQ, SmoothQuant Advanced
13 Multi-Task Learning Single model for intent + sentiment + entities, gradient surgery Expert
15 Mixture of Experts Routing MoE with genre-specific experts, gating networks, load balancing Expert
17 Visualization and Interpretability After Fine-Tuning BertViz, TransformerLens, LIT, Ecco, Captum, and Netron for post-fine-tuning analysis Advanced

Tier 4: Infrastructure and Data

# Document What You Learn Difficulty
09 Training Infrastructure and MLOps SageMaker, distributed training, MLflow, CI/CD for models Intermediate
16 Data Curation and Synthetic Generation Dataset preparation, synthetic data with Claude, confident learning Intermediate

Tier 5: Capstone Synthesis

# Document What You Learn Difficulty
18 Intuition Scenario and Strategic Direction Meta-learning across all 17 techniques, intervention ladders, CPQ instincts, and career growth signals Advanced

Group Discussion Personas

Every document features debates among these five engineers, representing a real Amazon-style cross-functional team:

Persona Role Focus Area Typical Question
Priya Senior ML Engineer Model architecture, training code, loss functions, math derivations "What does the gradient look like at layer 4 when we use focal loss?"
Marcus Staff Platform Architect Infrastructure, deployment, scaling, latency budgets "Can we serve 10 concurrent LoRA adapters within our P95 latency SLA?"
Aiko Data Scientist Data quality, evaluation metrics, experiment design, statistical rigor "The ablation study shows diminishing returns above rank 12."
Jordan MLOps Engineer CI/CD, monitoring, drift detection, reproducibility, model registry "How do we detect if the retrained model regresses on old intents?"
Sam Cost-Aware PM ROI, cost-per-quality-point, build vs buy, business impact "That's a $170K/year difference for 2.7% accuracy. What's the CPQ?"

Discussion Format

### Decision Point: Should we use LoRA rank 8 or 16?

**Priya (ML Engineer):** Rank 16 gives us more expressive capacity to capture
manga-specific patterns. The SVD analysis of weight updates shows the top 16
singular values capture 94% of the variance, vs 87% for rank 8...

**Marcus (Architect):** But rank 16 doubles adapter memory from 4MB to 8MB per
adapter. With 10 concurrent adapters for different intents, that's 80MB just
for adapters...

**Aiko (Data Scientist):** The ablation study on our validation set shows:
rank 4 = 89.1%, rank 8 = 91.8%, rank 12 = 92.3%, rank 16 = 92.5%.
Diminishing returns above rank 8...

**Sam (PM):** The cost difference is $240/month (rank 8) vs $480/month (rank 16)
for GPU memory. For 0.7% accuracy gain, that's $342 per quality point...

**Jordan (MLOps):** More parameters means longer CI validation cycles. Rank 8
validates in 12 minutes, rank 16 takes 22 minutes...

> **Resolution:** Rank 8 chosen. The 91.8% accuracy meets our 90% threshold,
> the cost-per-quality-point for rank 16 ($342/point) exceeds our $300
> threshold, and faster CI cycles matter for weekly adapter updates.

Mathematical Prerequisites

To get the most from these documents, you should be comfortable with:

Topic What You Need Used In
Linear Algebra Matrix multiplication, SVD, eigenvalues, rank Docs 04, 05, 12, 15
Calculus Partial derivatives, chain rule, gradients All docs
Probability Bayes' theorem, KL-divergence, entropy, softmax Docs 01, 05, 06, 10
Optimization SGD, Adam, learning rate schedules, loss landscapes All docs
Information Theory Cross-entropy, mutual information, InfoNCE Docs 01, 02, 05
Statistics Hypothesis testing, confidence intervals, Cohen's kappa Docs 08, 16, 17

Master Research Paper Reading List

Foundational (Read These First)

Paper Year Key Contribution Referenced In
Attention Is All You Need (Vaswani et al.) 2017 Transformer architecture All docs
BERT (Devlin et al.) 2018 Bidirectional pre-training Docs 01, 02, 08
DistilBERT (Sanh et al.) 2019 Knowledge distillation for BERT Docs 01, 05

Fine-Tuning Techniques

Paper Year Key Contribution Referenced In
How to Fine-Tune BERT for Text Classification (Sun et al.) 2019 Layer-wise LR, gradual unfreezing Docs 01, 08
ULMFiT (Howard & Ruder) 2018 Discriminative fine-tuning, slanted triangular LR Doc 08
LoRA (Hu et al.) 2021 Low-rank adaptation of large language models Doc 04
QLoRA (Dettmers et al.) 2023 4-bit quantization + LoRA Docs 04, 12
Prefix-Tuning (Li & Liang) 2021 Learnable prefix key-value pairs Doc 11
P-Tuning v2 (Liu et al.) 2021 Deep prompt tuning at every layer Doc 11
Prompt Tuning (Lester et al.) 2021 Soft prompts scale with model size Doc 11

Contrastive Learning and Retrieval

Paper Year Key Contribution Referenced In
SimCLR (Chen et al.) 2020 Contrastive learning framework Doc 02
Dense Passage Retrieval (Karpukhin et al.) 2020 Dual-encoder for retrieval Doc 02
Sentence-BERT (Reimers & Gurevych) 2019 Sentence embeddings via siamese BERT Doc 02
ColBERT (Khattab & Zaharia) 2020 Late interaction for efficient reranking Doc 03
MS-MARCO (Bajaj et al.) 2016 Large-scale passage ranking dataset Doc 03

Alignment and Preference Learning

Paper Year Key Contribution Referenced In
InstructGPT (Ouyang et al.) 2022 RLHF pipeline for instruction following Doc 10
DPO (Rafailov et al.) 2023 Direct preference optimization without reward model Doc 10
Constitutional AI (Bai et al.) 2022 Self-supervision for alignment Doc 10
RLAIF (Lee et al.) 2023 AI feedback instead of human feedback Doc 10

Distillation and Compression

Paper Year Key Contribution Referenced In
Distilling Knowledge (Hinton et al.) 2015 Temperature-scaled soft labels Doc 05
TinyBERT (Jiao et al.) 2019 Two-stage task-agnostic + task-specific distillation Doc 05
GPTQ (Frantar et al.) 2022 One-shot weight quantization via Hessian Doc 12
AWQ (Lin et al.) 2023 Activation-aware weight quantization Doc 12
SmoothQuant (Xiao et al.) 2022 Migrate quantization difficulty from activations to weights Doc 12

Continual and Few-Shot Learning

Paper Year Key Contribution Referenced In
EWC (Kirkpatrick et al.) 2017 Fisher Information for catastrophic forgetting prevention Doc 06
Progressive Neural Networks (Rusu et al.) 2016 Lateral connections for continual learning Doc 06
Prototypical Networks (Snell et al.) 2017 Distance-based few-shot classification Doc 07
MAML (Finn et al.) 2017 Model-agnostic meta-learning Doc 07
SetFit (Tunstall et al.) 2022 Few-shot fine-tuning via contrastive learning Doc 07

Multi-Task and Expert Models

Paper Year Key Contribution Referenced In
Multi-Task Uncertainty (Kendall et al.) 2018 Homoscedastic uncertainty for task weighting Doc 13
GradNorm (Chen et al.) 2018 Gradient normalization for multi-task Doc 13
Gradient Surgery (Yu et al.) 2020 Project conflicting gradients Doc 13
Switch Transformers (Fedus et al.) 2021 Sparse MoE with simplified routing Doc 15
Mixtral (Jiang et al.) 2024 Production MoE with expert routing Doc 15

RAG and Retrieval-Augmented Training

Paper Year Key Contribution Referenced In
RAFT (Zhang et al.) 2024 Fine-tune LLM to use retrieved context Doc 14
Self-RAG (Asai et al.) 2023 Self-reflective retrieval-augmented generation Doc 14
REALM (Guu et al.) 2020 Retrieval-augmented language model pre-training Doc 14

Data Quality and Synthesis

Paper Year Key Contribution Referenced In
Confident Learning (Northcutt et al.) 2021 Label noise estimation and correction Doc 16
Self-Instruct (Wang et al.) 2022 Synthetic instruction generation Doc 16
Alpaca (Taori et al.) 2023 Instruction tuning with synthetic data Doc 16

Infrastructure

Paper Year Key Contribution Referenced In
ZeRO (Rajbhandari et al.) 2019 Memory-efficient distributed training Doc 09
Mixed Precision Training (Micikevicius et al.) 2017 FP16 training with loss scaling Doc 09
Focal Loss (Lin et al.) 2017 Class imbalance via modulating factor Docs 01, 08
LambdaRank (Burges et al.) 2005 Learning to rank with NDCG gradients Doc 03

Cross-References to Other Project Folders

This Folder's Doc References For
Doc 01 (Intent Classifier) 04b-architecture-lld.md LLD-2 Intent classifier design spec
Doc 02 (Embeddings) 04b-architecture-lld.md LLD-3 RAG pipeline and vector store
Doc 04 (LoRA/QLoRA) Prompt-Engineering/ When prompting is enough vs fine-tuning
Doc 05 (Distillation) Model-Inference/ Inference pipeline constraints
Doc 09 (MLOps) MLflow/ Experiment tracking integration
Doc 14 (RAFT) Prompt-Engineering/ RAG grounding and prompt design
All docs model_evaluation_framework_deep_dive.md 7-dimensional evaluation framework
All docs 10-ai-llm-design.md Model selection and CPQ framework
All docs Optimization-Tradeoffs-User-Stories/ Cost-quality-performance trilemma

How to Use These Documents

  1. For interview prep: Start with the README, then read Tier 1 docs, then use the decision-point sections plus Docs 17 and 18 to rehearse architecture, training, interpretability, and strategic tradeoffs.
  2. For implementation: Read the relevant scenario doc end-to-end, then reference Doc 09 (MLOps) for deployment.
  3. For understanding the math: Each doc is self-contained. Read the "Mathematical Foundations" section, then trace through the mermaid layer diagrams.
  4. For the group discussion format: Each decision point shows how a real team would debate the tradeoff. Use these as templates for your own technical discussions, then read Doc 18 for the cross-technique synthesis.