Fine-Tuning Foundational Models for MangaAssist
Overview
This folder contains 18 deep-dive scenario documents covering fine-tuning, model customization, post-fine-tuning inspection techniques, and capstone intuition synthesis relevant to the MangaAssist chatbot. Each document is written as a group discussion among 5 engineers, includes mathematical derivations with geometric intuition, layer-by-layer mermaid diagrams or inspection workflows showing what happens inside models during training, production-grade code, and references to foundational research papers.
MangaAssist Scenario Companion Set
For the newer MangaAssist-grounded scenario documents, start with:
These companion docs sit beside the original deep dives and cover the non-intent fine-tuning topics: embeddings, reranking, RAFT, LoRA/QLoRA, prompt tuning, QAT, distillation, MoE, continual learning, few-shot learning, multi-task learning, sentiment, DPO/RLHF, MLOps, data curation, interpretability, and capstone decision-making.
Reading Order
Tier 1: Start Here (Core Fine-Tuning for MangaAssist Production)
These are the techniques already deployed or planned for MangaAssist:
Tier 2: Advanced Model Customization
These extend MangaAssist with more sophisticated fine-tuning:
Tier 3: Specialized Techniques
For specific production challenges:
| # |
Document |
What You Learn |
Difficulty |
| 06 |
Continual Learning and Catastrophic Forgetting |
EWC, rehearsal buffers, drift detection, monthly retraining |
Advanced |
| 07 |
Few-Shot Learning and Rapid Adaptation |
Prototypical networks, MAML, SetFit, adding new intents with <100 examples |
Advanced |
| 12 |
Quantization-Aware Training |
INT8/INT4 quantization, GPTQ, AWQ, SmoothQuant |
Advanced |
| 13 |
Multi-Task Learning |
Single model for intent + sentiment + entities, gradient surgery |
Expert |
| 15 |
Mixture of Experts Routing |
MoE with genre-specific experts, gating networks, load balancing |
Expert |
| 17 |
Visualization and Interpretability After Fine-Tuning |
BertViz, TransformerLens, LIT, Ecco, Captum, and Netron for post-fine-tuning analysis |
Advanced |
Tier 4: Infrastructure and Data
Tier 5: Capstone Synthesis
Group Discussion Personas
Every document features debates among these five engineers, representing a real Amazon-style cross-functional team:
| Persona |
Role |
Focus Area |
Typical Question |
| Priya |
Senior ML Engineer |
Model architecture, training code, loss functions, math derivations |
"What does the gradient look like at layer 4 when we use focal loss?" |
| Marcus |
Staff Platform Architect |
Infrastructure, deployment, scaling, latency budgets |
"Can we serve 10 concurrent LoRA adapters within our P95 latency SLA?" |
| Aiko |
Data Scientist |
Data quality, evaluation metrics, experiment design, statistical rigor |
"The ablation study shows diminishing returns above rank 12." |
| Jordan |
MLOps Engineer |
CI/CD, monitoring, drift detection, reproducibility, model registry |
"How do we detect if the retrained model regresses on old intents?" |
| Sam |
Cost-Aware PM |
ROI, cost-per-quality-point, build vs buy, business impact |
"That's a $170K/year difference for 2.7% accuracy. What's the CPQ?" |
### Decision Point: Should we use LoRA rank 8 or 16?
**Priya (ML Engineer):** Rank 16 gives us more expressive capacity to capture
manga-specific patterns. The SVD analysis of weight updates shows the top 16
singular values capture 94% of the variance, vs 87% for rank 8...
**Marcus (Architect):** But rank 16 doubles adapter memory from 4MB to 8MB per
adapter. With 10 concurrent adapters for different intents, that's 80MB just
for adapters...
**Aiko (Data Scientist):** The ablation study on our validation set shows:
rank 4 = 89.1%, rank 8 = 91.8%, rank 12 = 92.3%, rank 16 = 92.5%.
Diminishing returns above rank 8...
**Sam (PM):** The cost difference is $240/month (rank 8) vs $480/month (rank 16)
for GPU memory. For 0.7% accuracy gain, that's $342 per quality point...
**Jordan (MLOps):** More parameters means longer CI validation cycles. Rank 8
validates in 12 minutes, rank 16 takes 22 minutes...
> **Resolution:** Rank 8 chosen. The 91.8% accuracy meets our 90% threshold,
> the cost-per-quality-point for rank 16 ($342/point) exceeds our $300
> threshold, and faster CI cycles matter for weekly adapter updates.
Mathematical Prerequisites
To get the most from these documents, you should be comfortable with:
| Topic |
What You Need |
Used In |
| Linear Algebra |
Matrix multiplication, SVD, eigenvalues, rank |
Docs 04, 05, 12, 15 |
| Calculus |
Partial derivatives, chain rule, gradients |
All docs |
| Probability |
Bayes' theorem, KL-divergence, entropy, softmax |
Docs 01, 05, 06, 10 |
| Optimization |
SGD, Adam, learning rate schedules, loss landscapes |
All docs |
| Information Theory |
Cross-entropy, mutual information, InfoNCE |
Docs 01, 02, 05 |
| Statistics |
Hypothesis testing, confidence intervals, Cohen's kappa |
Docs 08, 16, 17 |
Master Research Paper Reading List
Foundational (Read These First)
| Paper |
Year |
Key Contribution |
Referenced In |
| Attention Is All You Need (Vaswani et al.) |
2017 |
Transformer architecture |
All docs |
| BERT (Devlin et al.) |
2018 |
Bidirectional pre-training |
Docs 01, 02, 08 |
| DistilBERT (Sanh et al.) |
2019 |
Knowledge distillation for BERT |
Docs 01, 05 |
Fine-Tuning Techniques
| Paper |
Year |
Key Contribution |
Referenced In |
| How to Fine-Tune BERT for Text Classification (Sun et al.) |
2019 |
Layer-wise LR, gradual unfreezing |
Docs 01, 08 |
| ULMFiT (Howard & Ruder) |
2018 |
Discriminative fine-tuning, slanted triangular LR |
Doc 08 |
| LoRA (Hu et al.) |
2021 |
Low-rank adaptation of large language models |
Doc 04 |
| QLoRA (Dettmers et al.) |
2023 |
4-bit quantization + LoRA |
Docs 04, 12 |
| Prefix-Tuning (Li & Liang) |
2021 |
Learnable prefix key-value pairs |
Doc 11 |
| P-Tuning v2 (Liu et al.) |
2021 |
Deep prompt tuning at every layer |
Doc 11 |
| Prompt Tuning (Lester et al.) |
2021 |
Soft prompts scale with model size |
Doc 11 |
Contrastive Learning and Retrieval
| Paper |
Year |
Key Contribution |
Referenced In |
| SimCLR (Chen et al.) |
2020 |
Contrastive learning framework |
Doc 02 |
| Dense Passage Retrieval (Karpukhin et al.) |
2020 |
Dual-encoder for retrieval |
Doc 02 |
| Sentence-BERT (Reimers & Gurevych) |
2019 |
Sentence embeddings via siamese BERT |
Doc 02 |
| ColBERT (Khattab & Zaharia) |
2020 |
Late interaction for efficient reranking |
Doc 03 |
| MS-MARCO (Bajaj et al.) |
2016 |
Large-scale passage ranking dataset |
Doc 03 |
Alignment and Preference Learning
| Paper |
Year |
Key Contribution |
Referenced In |
| InstructGPT (Ouyang et al.) |
2022 |
RLHF pipeline for instruction following |
Doc 10 |
| DPO (Rafailov et al.) |
2023 |
Direct preference optimization without reward model |
Doc 10 |
| Constitutional AI (Bai et al.) |
2022 |
Self-supervision for alignment |
Doc 10 |
| RLAIF (Lee et al.) |
2023 |
AI feedback instead of human feedback |
Doc 10 |
Distillation and Compression
| Paper |
Year |
Key Contribution |
Referenced In |
| Distilling Knowledge (Hinton et al.) |
2015 |
Temperature-scaled soft labels |
Doc 05 |
| TinyBERT (Jiao et al.) |
2019 |
Two-stage task-agnostic + task-specific distillation |
Doc 05 |
| GPTQ (Frantar et al.) |
2022 |
One-shot weight quantization via Hessian |
Doc 12 |
| AWQ (Lin et al.) |
2023 |
Activation-aware weight quantization |
Doc 12 |
| SmoothQuant (Xiao et al.) |
2022 |
Migrate quantization difficulty from activations to weights |
Doc 12 |
Continual and Few-Shot Learning
| Paper |
Year |
Key Contribution |
Referenced In |
| EWC (Kirkpatrick et al.) |
2017 |
Fisher Information for catastrophic forgetting prevention |
Doc 06 |
| Progressive Neural Networks (Rusu et al.) |
2016 |
Lateral connections for continual learning |
Doc 06 |
| Prototypical Networks (Snell et al.) |
2017 |
Distance-based few-shot classification |
Doc 07 |
| MAML (Finn et al.) |
2017 |
Model-agnostic meta-learning |
Doc 07 |
| SetFit (Tunstall et al.) |
2022 |
Few-shot fine-tuning via contrastive learning |
Doc 07 |
Multi-Task and Expert Models
| Paper |
Year |
Key Contribution |
Referenced In |
| Multi-Task Uncertainty (Kendall et al.) |
2018 |
Homoscedastic uncertainty for task weighting |
Doc 13 |
| GradNorm (Chen et al.) |
2018 |
Gradient normalization for multi-task |
Doc 13 |
| Gradient Surgery (Yu et al.) |
2020 |
Project conflicting gradients |
Doc 13 |
| Switch Transformers (Fedus et al.) |
2021 |
Sparse MoE with simplified routing |
Doc 15 |
| Mixtral (Jiang et al.) |
2024 |
Production MoE with expert routing |
Doc 15 |
RAG and Retrieval-Augmented Training
| Paper |
Year |
Key Contribution |
Referenced In |
| RAFT (Zhang et al.) |
2024 |
Fine-tune LLM to use retrieved context |
Doc 14 |
| Self-RAG (Asai et al.) |
2023 |
Self-reflective retrieval-augmented generation |
Doc 14 |
| REALM (Guu et al.) |
2020 |
Retrieval-augmented language model pre-training |
Doc 14 |
Data Quality and Synthesis
| Paper |
Year |
Key Contribution |
Referenced In |
| Confident Learning (Northcutt et al.) |
2021 |
Label noise estimation and correction |
Doc 16 |
| Self-Instruct (Wang et al.) |
2022 |
Synthetic instruction generation |
Doc 16 |
| Alpaca (Taori et al.) |
2023 |
Instruction tuning with synthetic data |
Doc 16 |
Infrastructure
| Paper |
Year |
Key Contribution |
Referenced In |
| ZeRO (Rajbhandari et al.) |
2019 |
Memory-efficient distributed training |
Doc 09 |
| Mixed Precision Training (Micikevicius et al.) |
2017 |
FP16 training with loss scaling |
Doc 09 |
| Focal Loss (Lin et al.) |
2017 |
Class imbalance via modulating factor |
Docs 01, 08 |
| LambdaRank (Burges et al.) |
2005 |
Learning to rank with NDCG gradients |
Doc 03 |
Cross-References to Other Project Folders
| This Folder's Doc |
References |
For |
| Doc 01 (Intent Classifier) |
04b-architecture-lld.md LLD-2 |
Intent classifier design spec |
| Doc 02 (Embeddings) |
04b-architecture-lld.md LLD-3 |
RAG pipeline and vector store |
| Doc 04 (LoRA/QLoRA) |
Prompt-Engineering/ |
When prompting is enough vs fine-tuning |
| Doc 05 (Distillation) |
Model-Inference/ |
Inference pipeline constraints |
| Doc 09 (MLOps) |
MLflow/ |
Experiment tracking integration |
| Doc 14 (RAFT) |
Prompt-Engineering/ |
RAG grounding and prompt design |
| All docs |
model_evaluation_framework_deep_dive.md |
7-dimensional evaluation framework |
| All docs |
10-ai-llm-design.md |
Model selection and CPQ framework |
| All docs |
Optimization-Tradeoffs-User-Stories/ |
Cost-quality-performance trilemma |
How to Use These Documents
- For interview prep: Start with the README, then read Tier 1 docs, then use the decision-point sections plus Docs 17 and 18 to rehearse architecture, training, interpretability, and strategic tradeoffs.
- For implementation: Read the relevant scenario doc end-to-end, then reference Doc 09 (MLOps) for deployment.
- For understanding the math: Each doc is self-contained. Read the "Mathematical Foundations" section, then trace through the mermaid layer diagrams.
- For the group discussion format: Each decision point shows how a real team would debate the tradeoff. Use these as templates for your own technical discussions, then read Doc 18 for the cross-technique synthesis.