Tools and Libraries for Modeling Foundations in MangaAssist

The mathematical ideas in this folder only become useful when they are implemented, trained, deployed, and monitored well. This document maps the theory to the tools that make the production workflow possible.

1. Overview

The stack spans five layers:

numerical computation
model definition and fine-tuning
managed inference and retrieval
evaluation and observability
infrastructure optimization

2. Deep Learning Frameworks

2.1 PyTorch

PyTorch is the main low-level deep learning framework in the stack.

Capability	Example API	Mathematical role
Linear layers	`torch.nn.Linear`	Matrix multiplication + bias
Attention	`torch.nn.MultiheadAttention`	Query-key-value attention
Classification loss	`torch.nn.CrossEntropyLoss`	Multinomial log-loss
Binary loss	`torch.nn.BCEWithLogitsLoss`	Sigmoid + binary cross-entropy
Normalization	`torch.nn.LayerNorm`	Feature-wise normalization
Regularization	`torch.nn.Dropout`	Stochastic regularization
Optimization	`torch.optim.AdamW`	Gradient-based parameter updates
Tensor math	`torch.matmul`, `torch.softmax`	Core linear algebra and probability transforms

Why it matters:

Hugging Face Transformers builds naturally on top of PyTorch
training and inference share the same tensor semantics
GPU acceleration is accessible without rewriting the math

2.2 TensorFlow / Keras

TensorFlow was not the primary framework here, but it is useful as a conceptual comparison:

PyTorch API	Rough TensorFlow equivalent
`torch.nn.Linear`	`tf.keras.layers.Dense`
`torch.nn.CrossEntropyLoss`	`tf.keras.losses.SparseCategoricalCrossentropy`
`torch.optim.AdamW`	`tf.keras.optimizers.AdamW`

3. Hugging Face Libraries

3.1 Transformers

The Transformers library provides pretrained architectures, tokenizers, training utilities, and schedulers.

Project component	Typical usage
Intent classifier	`DistilBertForSequenceClassification.from_pretrained(...)`
Tokenization	`AutoTokenizer.from_pretrained(...)`
Fine-tuning loop	`Trainer`
Scheduling	`get_linear_schedule_with_warmup(...)`
Reranker model loading	`AutoModelForSequenceClassification`

from transformers import (
    AutoTokenizer,
    DistilBertForSequenceClassification,
    Trainer,
    TrainingArguments,
)

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=8,
)

3.2 Sentence-Transformers

Sentence-Transformers is especially convenient for embedding and reranking patterns.

from sentence_transformers import CrossEncoder

reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
scores = reranker.predict([
    ("dark fantasy manga like Berserk", "Berserk is a dark fantasy manga..."),
    ("dark fantasy manga like Berserk", "Cooking manga for beginners..."),
])

This is a direct operational bridge from the architecture discussion to a usable ranking model.

4. Amazon Bedrock

Bedrock provides managed inference for both generation and embedding models used in the project.

4.1 Claude 3.5 Sonnet

Capability	Why it matters mathematically
`invoke_model()` / converse-style APIs	Executes the decoder-style generation path
Temperature	Scales logits before softmax
Max tokens	Caps autoregressive rollout length
Streaming	Returns tokens as generation unfolds
Usage metadata	Supports token-based cost analysis

Illustrative invocation pattern:

import json
import os
import boto3

client = boto3.client("bedrock-runtime")

response = client.invoke_model(
    modelId=os.environ["BEDROCK_CLAUDE_MODEL_ID"],
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [{"role": "user", "content": query}],
        "max_tokens": 500,
        "temperature": 0.3,
    }),
)

Using an environment variable is better documentation than hard-coding a dated model ID, because teams often pin versions at deployment time.

4.2 Titan Text Embeddings V2

Capability	Why it matters
Configurable dimensions	The model supports 256, 512, or 1,024 dimensions
Optional normalization	Makes cosine-style retrieval easier to reason about
Batch embedding	Supports efficient corpus ingestion and query evaluation

response = client.invoke_model(
    modelId=os.environ["BEDROCK_EMBED_MODEL_ID"],
    body=json.dumps({
        "inputText": "dark fantasy manga similar to Berserk",
        "dimensions": 1024,
        "normalize": True,
    }),
)

5. Amazon OpenSearch Service

OpenSearch serves as the vector retrieval layer.

Capability	Math connection
KNN / ANN search	Approximate nearest-neighbor search over dense vectors
HNSW indexing	Graph-based search with logarithmic-style behavior
Similarity scoring	Cosine similarity or related distance functions
Metadata filters	Restrict retrieval by source, type, or business rule

Illustrative query shape:

{
  "query": {
    "knn": {
      "embedding_vector": {
        "vector": [0.12, -0.34, 0.78],
        "k": 10
      }
    }
  }
}

In practice, vectors are much longer than the toy example above, and the exact HNSW controls exposed depend on engine and version.

6. Amazon SageMaker

SageMaker handles model training, hosting, and batch evaluation workloads.

6.1 Training

Capability	Usage
Training jobs	Fine-tune DistilBERT and related classifiers on GPU instances
Hyperparameter tuning	Search learning rate, batch size, warmup, and weight decay
Experiments	Track loss curves and validation metrics

6.2 Hosting

Capability	Usage
Real-time endpoints	Host low-latency classifiers
Auto-scaling	Match serving capacity to request load
Batch transform	Evaluate large golden datasets offline
Multi-model patterns	Consolidate lightweight models when latency budgets allow

SageMaker matters mathematically because it is where the tensor-heavy computations are actually executed under production constraints.

7. Scikit-learn

Scikit-learn supports the classical side of the stack.

Component	Typical use	Math behind it
`LogisticRegression`	Baseline intent model	Multinomial logistic regression
`TfidfVectorizer`	Sparse text features	Weighted term-frequency matrix
`classification_report`	Per-class metrics	Precision, recall, F1
`confusion_matrix`	Error analysis	Count matrix over predictions vs. truth
`calibration_curve`	Confidence calibration checks	Reliability of predicted probabilities
`PCA`	Embedding visualization	Variance-preserving projection
`TruncatedSVD`	Sparse dimensionality reduction	Low-rank approximation

8. NumPy

NumPy is the common denominator for low-level numerical work.

Operation	Example API	Role
Dot product	`np.dot()`	Similarity and projection math
Norms	`np.linalg.norm()`	Vector normalization
Matrix multiply	`np.matmul()`	Batch linear algebra
SVD	`np.linalg.svd()`	Low-rank decomposition
Eigendecomposition	`np.linalg.eig()`	Spectral analysis
Elementwise transforms	`np.exp()`, `np.log()`	Softmax and log-loss calculations

9. Evaluation Tooling

9.1 RAGAS

RAGAS helps evaluate retrieval-grounded generation.

Metric	Mathematical intuition
Faithfulness	NLI-style support between answer and context
Answer relevancy	Semantic similarity between answer and query
Context precision	How much retrieved context is actually useful
Context recall	How much useful context the retriever recovered

9.2 BERTScore

from bert_score import score

P, R, F1 = score(
    candidates,
    references,
    model_type="microsoft/deberta-xlarge-mnli",
    lang="en",
)

BERTScore compares contextual token embeddings with cosine similarity rather than relying only on exact n-gram overlap.

9.3 Promptfoo

Capability	Why it matters
Prompt comparison	Supports controlled A/B prompt evaluation
Assertions	Turns qualitative expectations into machine-checkable tests
Model comparison	Makes regression analysis easier across versions

10. Monitoring and Observability

10.1 CloudWatch

CloudWatch provides service-level metrics such as:

latency percentiles
request counts
custom token metrics
endpoint health signals

These map naturally to statistical monitoring concepts such as quantiles, rates, and rolling distributions.

10.2 Prometheus and Grafana

These tools are useful for:

histogram-based latency tracking
rate calculations
ratio metrics
dashboarding and alerting over time windows

10.3 Evidently

Evidently is useful for:

feature drift detection
embedding drift checks
performance regressions

10.4 MLflow

MLflow ties experiments to metrics, artifacts, and versioned runs so modeling changes remain inspectable after deployment.

11. Infrastructure for Fast Linear Algebra

11.1 GPUs

Transformer workloads are dominated by matrix multiplications, especially in:

query, key, and value projections
feedforward layers
batched embedding and classifier inference

GPU instances reduce latency because they are built to execute dense tensor operations efficiently and in parallel.

11.2 ONNX Runtime

ONNX Runtime can speed up inference through graph-level optimizations:

operator fusion
improved memory planning
hardware-specific execution providers

Illustrative export flow:

import torch
from transformers import DistilBertForSequenceClassification

model = DistilBertForSequenceClassification.from_pretrained("./manga-intent-model")
dummy_input = torch.randint(0, 30000, (1, 128))
torch.onnx.export(model, dummy_input, "intent_model.onnx")

The math does not change. The execution plan does.

12. Tool Stack Summary

Layer	Primary tools	Main mathematical role
Core math	NumPy, PyTorch	Tensor algebra, optimization, decomposition
Model architecture	Transformers, Sentence-Transformers	Attention, FFNs, sequence modeling
Classical ML	Scikit-learn, Statsmodels	Regression, sparse features, diagnostics
Managed inference	Bedrock, SageMaker	Hosted generation, embedding, and classifier execution
Retrieval	OpenSearch	ANN search and similarity scoring
Evaluation	RAGAS, BERTScore, Promptfoo, MLflow	Retrieval and generation quality measurement
Monitoring	CloudWatch, Prometheus, Grafana, Evidently	Drift, latency, and reliability tracking

13. End-to-End Tooling Flow

User query
  -> tokenization (Transformers)
  -> intent classification (DistilBERT on SageMaker / PyTorch)
  -> query embedding (Titan Text Embeddings V2 on Bedrock)
  -> ANN retrieval (OpenSearch)
  -> reranking (MiniLM cross-encoder)
  -> response generation (Claude 3.5 Sonnet on Bedrock)
  -> evaluation and monitoring (RAGAS, BERTScore, CloudWatch, Evidently, MLflow)

The key idea is that the tooling is not separate from the math. Each library exists because it efficiently implements a specific mathematical pattern the system depends on.