Tools and Libraries for Modeling Foundations in MangaAssist
The mathematical ideas in this folder only become useful when they are implemented, trained, deployed, and monitored well. This document maps the theory to the tools that make the production workflow possible.
1. Overview
The stack spans five layers:
- numerical computation
- model definition and fine-tuning
- managed inference and retrieval
- evaluation and observability
- infrastructure optimization
2. Deep Learning Frameworks
2.1 PyTorch
PyTorch is the main low-level deep learning framework in the stack.
| Capability | Example API | Mathematical role |
|---|---|---|
| Linear layers | torch.nn.Linear |
Matrix multiplication + bias |
| Attention | torch.nn.MultiheadAttention |
Query-key-value attention |
| Classification loss | torch.nn.CrossEntropyLoss |
Multinomial log-loss |
| Binary loss | torch.nn.BCEWithLogitsLoss |
Sigmoid + binary cross-entropy |
| Normalization | torch.nn.LayerNorm |
Feature-wise normalization |
| Regularization | torch.nn.Dropout |
Stochastic regularization |
| Optimization | torch.optim.AdamW |
Gradient-based parameter updates |
| Tensor math | torch.matmul, torch.softmax |
Core linear algebra and probability transforms |
Why it matters:
- Hugging Face Transformers builds naturally on top of PyTorch
- training and inference share the same tensor semantics
- GPU acceleration is accessible without rewriting the math
2.2 TensorFlow / Keras
TensorFlow was not the primary framework here, but it is useful as a conceptual comparison:
| PyTorch API | Rough TensorFlow equivalent |
|---|---|
torch.nn.Linear |
tf.keras.layers.Dense |
torch.nn.CrossEntropyLoss |
tf.keras.losses.SparseCategoricalCrossentropy |
torch.optim.AdamW |
tf.keras.optimizers.AdamW |
3. Hugging Face Libraries
3.1 Transformers
The Transformers library provides pretrained architectures, tokenizers, training utilities, and schedulers.
| Project component | Typical usage |
|---|---|
| Intent classifier | DistilBertForSequenceClassification.from_pretrained(...) |
| Tokenization | AutoTokenizer.from_pretrained(...) |
| Fine-tuning loop | Trainer |
| Scheduling | get_linear_schedule_with_warmup(...) |
| Reranker model loading | AutoModelForSequenceClassification |
from transformers import (
AutoTokenizer,
DistilBertForSequenceClassification,
Trainer,
TrainingArguments,
)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = DistilBertForSequenceClassification.from_pretrained(
"distilbert-base-uncased",
num_labels=8,
)
3.2 Sentence-Transformers
Sentence-Transformers is especially convenient for embedding and reranking patterns.
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
scores = reranker.predict([
("dark fantasy manga like Berserk", "Berserk is a dark fantasy manga..."),
("dark fantasy manga like Berserk", "Cooking manga for beginners..."),
])
This is a direct operational bridge from the architecture discussion to a usable ranking model.
4. Amazon Bedrock
Bedrock provides managed inference for both generation and embedding models used in the project.
4.1 Claude 3.5 Sonnet
| Capability | Why it matters mathematically |
|---|---|
invoke_model() / converse-style APIs |
Executes the decoder-style generation path |
| Temperature | Scales logits before softmax |
| Max tokens | Caps autoregressive rollout length |
| Streaming | Returns tokens as generation unfolds |
| Usage metadata | Supports token-based cost analysis |
Illustrative invocation pattern:
import json
import os
import boto3
client = boto3.client("bedrock-runtime")
response = client.invoke_model(
modelId=os.environ["BEDROCK_CLAUDE_MODEL_ID"],
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"messages": [{"role": "user", "content": query}],
"max_tokens": 500,
"temperature": 0.3,
}),
)
Using an environment variable is better documentation than hard-coding a dated model ID, because teams often pin versions at deployment time.
4.2 Titan Text Embeddings V2
| Capability | Why it matters |
|---|---|
| Configurable dimensions | The model supports 256, 512, or 1,024 dimensions |
| Optional normalization | Makes cosine-style retrieval easier to reason about |
| Batch embedding | Supports efficient corpus ingestion and query evaluation |
response = client.invoke_model(
modelId=os.environ["BEDROCK_EMBED_MODEL_ID"],
body=json.dumps({
"inputText": "dark fantasy manga similar to Berserk",
"dimensions": 1024,
"normalize": True,
}),
)
5. Amazon OpenSearch Service
OpenSearch serves as the vector retrieval layer.
| Capability | Math connection |
|---|---|
| KNN / ANN search | Approximate nearest-neighbor search over dense vectors |
| HNSW indexing | Graph-based search with logarithmic-style behavior |
| Similarity scoring | Cosine similarity or related distance functions |
| Metadata filters | Restrict retrieval by source, type, or business rule |
Illustrative query shape:
{
"query": {
"knn": {
"embedding_vector": {
"vector": [0.12, -0.34, 0.78],
"k": 10
}
}
}
}
In practice, vectors are much longer than the toy example above, and the exact HNSW controls exposed depend on engine and version.
6. Amazon SageMaker
SageMaker handles model training, hosting, and batch evaluation workloads.
6.1 Training
| Capability | Usage |
|---|---|
| Training jobs | Fine-tune DistilBERT and related classifiers on GPU instances |
| Hyperparameter tuning | Search learning rate, batch size, warmup, and weight decay |
| Experiments | Track loss curves and validation metrics |
6.2 Hosting
| Capability | Usage |
|---|---|
| Real-time endpoints | Host low-latency classifiers |
| Auto-scaling | Match serving capacity to request load |
| Batch transform | Evaluate large golden datasets offline |
| Multi-model patterns | Consolidate lightweight models when latency budgets allow |
SageMaker matters mathematically because it is where the tensor-heavy computations are actually executed under production constraints.
7. Scikit-learn
Scikit-learn supports the classical side of the stack.
| Component | Typical use | Math behind it |
|---|---|---|
LogisticRegression |
Baseline intent model | Multinomial logistic regression |
TfidfVectorizer |
Sparse text features | Weighted term-frequency matrix |
classification_report |
Per-class metrics | Precision, recall, F1 |
confusion_matrix |
Error analysis | Count matrix over predictions vs. truth |
calibration_curve |
Confidence calibration checks | Reliability of predicted probabilities |
PCA |
Embedding visualization | Variance-preserving projection |
TruncatedSVD |
Sparse dimensionality reduction | Low-rank approximation |
8. NumPy
NumPy is the common denominator for low-level numerical work.
| Operation | Example API | Role |
|---|---|---|
| Dot product | np.dot() |
Similarity and projection math |
| Norms | np.linalg.norm() |
Vector normalization |
| Matrix multiply | np.matmul() |
Batch linear algebra |
| SVD | np.linalg.svd() |
Low-rank decomposition |
| Eigendecomposition | np.linalg.eig() |
Spectral analysis |
| Elementwise transforms | np.exp(), np.log() |
Softmax and log-loss calculations |
9. Evaluation Tooling
9.1 RAGAS
RAGAS helps evaluate retrieval-grounded generation.
| Metric | Mathematical intuition |
|---|---|
| Faithfulness | NLI-style support between answer and context |
| Answer relevancy | Semantic similarity between answer and query |
| Context precision | How much retrieved context is actually useful |
| Context recall | How much useful context the retriever recovered |
9.2 BERTScore
from bert_score import score
P, R, F1 = score(
candidates,
references,
model_type="microsoft/deberta-xlarge-mnli",
lang="en",
)
BERTScore compares contextual token embeddings with cosine similarity rather than relying only on exact n-gram overlap.
9.3 Promptfoo
| Capability | Why it matters |
|---|---|
| Prompt comparison | Supports controlled A/B prompt evaluation |
| Assertions | Turns qualitative expectations into machine-checkable tests |
| Model comparison | Makes regression analysis easier across versions |
10. Monitoring and Observability
10.1 CloudWatch
CloudWatch provides service-level metrics such as:
- latency percentiles
- request counts
- custom token metrics
- endpoint health signals
These map naturally to statistical monitoring concepts such as quantiles, rates, and rolling distributions.
10.2 Prometheus and Grafana
These tools are useful for:
- histogram-based latency tracking
- rate calculations
- ratio metrics
- dashboarding and alerting over time windows
10.3 Evidently
Evidently is useful for:
- feature drift detection
- embedding drift checks
- performance regressions
10.4 MLflow
MLflow ties experiments to metrics, artifacts, and versioned runs so modeling changes remain inspectable after deployment.
11. Infrastructure for Fast Linear Algebra
11.1 GPUs
Transformer workloads are dominated by matrix multiplications, especially in:
- query, key, and value projections
- feedforward layers
- batched embedding and classifier inference
GPU instances reduce latency because they are built to execute dense tensor operations efficiently and in parallel.
11.2 ONNX Runtime
ONNX Runtime can speed up inference through graph-level optimizations:
- operator fusion
- improved memory planning
- hardware-specific execution providers
Illustrative export flow:
import torch
from transformers import DistilBertForSequenceClassification
model = DistilBertForSequenceClassification.from_pretrained("./manga-intent-model")
dummy_input = torch.randint(0, 30000, (1, 128))
torch.onnx.export(model, dummy_input, "intent_model.onnx")
The math does not change. The execution plan does.
12. Tool Stack Summary
| Layer | Primary tools | Main mathematical role |
|---|---|---|
| Core math | NumPy, PyTorch | Tensor algebra, optimization, decomposition |
| Model architecture | Transformers, Sentence-Transformers | Attention, FFNs, sequence modeling |
| Classical ML | Scikit-learn, Statsmodels | Regression, sparse features, diagnostics |
| Managed inference | Bedrock, SageMaker | Hosted generation, embedding, and classifier execution |
| Retrieval | OpenSearch | ANN search and similarity scoring |
| Evaluation | RAGAS, BERTScore, Promptfoo, MLflow | Retrieval and generation quality measurement |
| Monitoring | CloudWatch, Prometheus, Grafana, Evidently | Drift, latency, and reliability tracking |
13. End-to-End Tooling Flow
User query
-> tokenization (Transformers)
-> intent classification (DistilBERT on SageMaker / PyTorch)
-> query embedding (Titan Text Embeddings V2 on Bedrock)
-> ANN retrieval (OpenSearch)
-> reranking (MiniLM cross-encoder)
-> response generation (Claude 3.5 Sonnet on Bedrock)
-> evaluation and monitoring (RAGAS, BERTScore, CloudWatch, Evidently, MLflow)
The key idea is that the tooling is not separate from the math. Each library exists because it efficiently implements a specific mathematical pattern the system depends on.