LOCAL PREVIEW View on GitHub

Tools and Libraries for Modeling Foundations in MangaAssist

The mathematical ideas in this folder only become useful when they are implemented, trained, deployed, and monitored well. This document maps the theory to the tools that make the production workflow possible.

1. Overview

The stack spans five layers:

  1. numerical computation
  2. model definition and fine-tuning
  3. managed inference and retrieval
  4. evaluation and observability
  5. infrastructure optimization

2. Deep Learning Frameworks

2.1 PyTorch

PyTorch is the main low-level deep learning framework in the stack.

Capability Example API Mathematical role
Linear layers torch.nn.Linear Matrix multiplication + bias
Attention torch.nn.MultiheadAttention Query-key-value attention
Classification loss torch.nn.CrossEntropyLoss Multinomial log-loss
Binary loss torch.nn.BCEWithLogitsLoss Sigmoid + binary cross-entropy
Normalization torch.nn.LayerNorm Feature-wise normalization
Regularization torch.nn.Dropout Stochastic regularization
Optimization torch.optim.AdamW Gradient-based parameter updates
Tensor math torch.matmul, torch.softmax Core linear algebra and probability transforms

Why it matters:

  • Hugging Face Transformers builds naturally on top of PyTorch
  • training and inference share the same tensor semantics
  • GPU acceleration is accessible without rewriting the math

2.2 TensorFlow / Keras

TensorFlow was not the primary framework here, but it is useful as a conceptual comparison:

PyTorch API Rough TensorFlow equivalent
torch.nn.Linear tf.keras.layers.Dense
torch.nn.CrossEntropyLoss tf.keras.losses.SparseCategoricalCrossentropy
torch.optim.AdamW tf.keras.optimizers.AdamW

3. Hugging Face Libraries

3.1 Transformers

The Transformers library provides pretrained architectures, tokenizers, training utilities, and schedulers.

Project component Typical usage
Intent classifier DistilBertForSequenceClassification.from_pretrained(...)
Tokenization AutoTokenizer.from_pretrained(...)
Fine-tuning loop Trainer
Scheduling get_linear_schedule_with_warmup(...)
Reranker model loading AutoModelForSequenceClassification
from transformers import (
    AutoTokenizer,
    DistilBertForSequenceClassification,
    Trainer,
    TrainingArguments,
)

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=8,
)

3.2 Sentence-Transformers

Sentence-Transformers is especially convenient for embedding and reranking patterns.

from sentence_transformers import CrossEncoder

reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
scores = reranker.predict([
    ("dark fantasy manga like Berserk", "Berserk is a dark fantasy manga..."),
    ("dark fantasy manga like Berserk", "Cooking manga for beginners..."),
])

This is a direct operational bridge from the architecture discussion to a usable ranking model.


4. Amazon Bedrock

Bedrock provides managed inference for both generation and embedding models used in the project.

4.1 Claude 3.5 Sonnet

Capability Why it matters mathematically
invoke_model() / converse-style APIs Executes the decoder-style generation path
Temperature Scales logits before softmax
Max tokens Caps autoregressive rollout length
Streaming Returns tokens as generation unfolds
Usage metadata Supports token-based cost analysis

Illustrative invocation pattern:

import json
import os
import boto3

client = boto3.client("bedrock-runtime")

response = client.invoke_model(
    modelId=os.environ["BEDROCK_CLAUDE_MODEL_ID"],
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [{"role": "user", "content": query}],
        "max_tokens": 500,
        "temperature": 0.3,
    }),
)

Using an environment variable is better documentation than hard-coding a dated model ID, because teams often pin versions at deployment time.

4.2 Titan Text Embeddings V2

Capability Why it matters
Configurable dimensions The model supports 256, 512, or 1,024 dimensions
Optional normalization Makes cosine-style retrieval easier to reason about
Batch embedding Supports efficient corpus ingestion and query evaluation
response = client.invoke_model(
    modelId=os.environ["BEDROCK_EMBED_MODEL_ID"],
    body=json.dumps({
        "inputText": "dark fantasy manga similar to Berserk",
        "dimensions": 1024,
        "normalize": True,
    }),
)

5. Amazon OpenSearch Service

OpenSearch serves as the vector retrieval layer.

Capability Math connection
KNN / ANN search Approximate nearest-neighbor search over dense vectors
HNSW indexing Graph-based search with logarithmic-style behavior
Similarity scoring Cosine similarity or related distance functions
Metadata filters Restrict retrieval by source, type, or business rule

Illustrative query shape:

{
  "query": {
    "knn": {
      "embedding_vector": {
        "vector": [0.12, -0.34, 0.78],
        "k": 10
      }
    }
  }
}

In practice, vectors are much longer than the toy example above, and the exact HNSW controls exposed depend on engine and version.


6. Amazon SageMaker

SageMaker handles model training, hosting, and batch evaluation workloads.

6.1 Training

Capability Usage
Training jobs Fine-tune DistilBERT and related classifiers on GPU instances
Hyperparameter tuning Search learning rate, batch size, warmup, and weight decay
Experiments Track loss curves and validation metrics

6.2 Hosting

Capability Usage
Real-time endpoints Host low-latency classifiers
Auto-scaling Match serving capacity to request load
Batch transform Evaluate large golden datasets offline
Multi-model patterns Consolidate lightweight models when latency budgets allow

SageMaker matters mathematically because it is where the tensor-heavy computations are actually executed under production constraints.


7. Scikit-learn

Scikit-learn supports the classical side of the stack.

Component Typical use Math behind it
LogisticRegression Baseline intent model Multinomial logistic regression
TfidfVectorizer Sparse text features Weighted term-frequency matrix
classification_report Per-class metrics Precision, recall, F1
confusion_matrix Error analysis Count matrix over predictions vs. truth
calibration_curve Confidence calibration checks Reliability of predicted probabilities
PCA Embedding visualization Variance-preserving projection
TruncatedSVD Sparse dimensionality reduction Low-rank approximation

8. NumPy

NumPy is the common denominator for low-level numerical work.

Operation Example API Role
Dot product np.dot() Similarity and projection math
Norms np.linalg.norm() Vector normalization
Matrix multiply np.matmul() Batch linear algebra
SVD np.linalg.svd() Low-rank decomposition
Eigendecomposition np.linalg.eig() Spectral analysis
Elementwise transforms np.exp(), np.log() Softmax and log-loss calculations

9. Evaluation Tooling

9.1 RAGAS

RAGAS helps evaluate retrieval-grounded generation.

Metric Mathematical intuition
Faithfulness NLI-style support between answer and context
Answer relevancy Semantic similarity between answer and query
Context precision How much retrieved context is actually useful
Context recall How much useful context the retriever recovered

9.2 BERTScore

from bert_score import score

P, R, F1 = score(
    candidates,
    references,
    model_type="microsoft/deberta-xlarge-mnli",
    lang="en",
)

BERTScore compares contextual token embeddings with cosine similarity rather than relying only on exact n-gram overlap.

9.3 Promptfoo

Capability Why it matters
Prompt comparison Supports controlled A/B prompt evaluation
Assertions Turns qualitative expectations into machine-checkable tests
Model comparison Makes regression analysis easier across versions

10. Monitoring and Observability

10.1 CloudWatch

CloudWatch provides service-level metrics such as:

  • latency percentiles
  • request counts
  • custom token metrics
  • endpoint health signals

These map naturally to statistical monitoring concepts such as quantiles, rates, and rolling distributions.

10.2 Prometheus and Grafana

These tools are useful for:

  • histogram-based latency tracking
  • rate calculations
  • ratio metrics
  • dashboarding and alerting over time windows

10.3 Evidently

Evidently is useful for:

  • feature drift detection
  • embedding drift checks
  • performance regressions

10.4 MLflow

MLflow ties experiments to metrics, artifacts, and versioned runs so modeling changes remain inspectable after deployment.


11. Infrastructure for Fast Linear Algebra

11.1 GPUs

Transformer workloads are dominated by matrix multiplications, especially in:

  • query, key, and value projections
  • feedforward layers
  • batched embedding and classifier inference

GPU instances reduce latency because they are built to execute dense tensor operations efficiently and in parallel.

11.2 ONNX Runtime

ONNX Runtime can speed up inference through graph-level optimizations:

  • operator fusion
  • improved memory planning
  • hardware-specific execution providers

Illustrative export flow:

import torch
from transformers import DistilBertForSequenceClassification

model = DistilBertForSequenceClassification.from_pretrained("./manga-intent-model")
dummy_input = torch.randint(0, 30000, (1, 128))
torch.onnx.export(model, dummy_input, "intent_model.onnx")

The math does not change. The execution plan does.


12. Tool Stack Summary

Layer Primary tools Main mathematical role
Core math NumPy, PyTorch Tensor algebra, optimization, decomposition
Model architecture Transformers, Sentence-Transformers Attention, FFNs, sequence modeling
Classical ML Scikit-learn, Statsmodels Regression, sparse features, diagnostics
Managed inference Bedrock, SageMaker Hosted generation, embedding, and classifier execution
Retrieval OpenSearch ANN search and similarity scoring
Evaluation RAGAS, BERTScore, Promptfoo, MLflow Retrieval and generation quality measurement
Monitoring CloudWatch, Prometheus, Grafana, Evidently Drift, latency, and reliability tracking

13. End-to-End Tooling Flow

User query
  -> tokenization (Transformers)
  -> intent classification (DistilBERT on SageMaker / PyTorch)
  -> query embedding (Titan Text Embeddings V2 on Bedrock)
  -> ANN retrieval (OpenSearch)
  -> reranking (MiniLM cross-encoder)
  -> response generation (Claude 3.5 Sonnet on Bedrock)
  -> evaluation and monitoring (RAGAS, BERTScore, CloudWatch, Evidently, MLflow)

The key idea is that the tooling is not separate from the math. Each library exists because it efficiently implements a specific mathematical pattern the system depends on.