MangaAssist Interview Pack - Hard With Hints

Level: Hard
How to use: Your answer should cover tradeoffs, not just mechanics. If the hint feels short, expand it with explicit failure-handling decisions.

Failure and Degradation Map

graph TD
    A[User Request] --> B{Intent Type}
    B --> C[Structured API Path]
    B --> D[RAG Path]
    B --> E[Full LLM Path]
    C --> F{Dependency healthy?}
    D --> G{Retriever healthy?}
    E --> H{LLM healthy?}
    F -->|No| I[Fallback response]
    G -->|No| J[Reduced grounding or safe fallback]
    H -->|No| K[Unavailable message or escalation]
    F -->|Yes| L[Guardrails]
    G -->|Yes| L
    H -->|Yes| L
    L --> M{Safe and complete?}
    M -->|Yes| N[Return answer]
    M -->|No| O[Regenerate / redact / escalate]

Interview Questions With Hints

Staff Engineer

The system targets low latency but also uses multiple downstream services. Where would you parallelize, and where would you keep the flow sequential?

Hint: Parallelize independent fetches like recommendation, catalog, and some retrieval tasks; keep guardrails and final formatting after generation.

What would you do if conversation-memory reads become slow enough to threaten the response SLA?

Hint: Degrade gracefully into stateless or reduced-context mode, add cache support, and protect the main response path.

How would you design timeouts and retries for recommendation, catalog, and order-service calls so the chatbot does not stall?

Hint: Per-service budgets, small retry count with backoff and jitter, circuit breakers, and different fallback behavior by dependency tier.

Why is summarizing older turns better than simply truncating conversation history in this system?

Hint: Summarization preserves intent and preferences while staying inside token budgets.

Security Engineer

Walk through how you would defend against a prompt-injection attempt that asks the model to ignore the system prompt and reveal internal rules.

Hint: Separate system/user input, input scanning, hardened prompts, output checks, logging, and safe refusal paths.

Where should PII scrubbing happen, and what goes wrong if it only happens in the analytics pipeline?

Hint: Scrub before model logging and analytics. Otherwise sensitive data has already leaked into systems you did not intend.

ML Engineer

How would you detect that the intent classifier is drifting and misrouting messages more often over time?

Hint: Watch confidence shifts, labeled evaluation sets, escalation spikes, feedback drops, and routing anomalies.

If retrieved chunks conflict with each other, how should the system respond without sounding uncertain or misleading?

Hint: Prefer freshness/versioning, constrain the model, and treat conflicting sources as a KB quality problem to surface for review.

SRE

If Bedrock starts timing out for 5 percent of calls during a traffic spike, what immediate mitigations would you apply first?

Hint: Reduce expensive paths, enable fallbacks, examine quotas and traffic shaping, and preserve core experiences.

Principal Engineer

If you had to cut p99 first-token latency in half without materially hurting answer quality, what three changes would you try first and why?

Hint: Model tiering, more aggressive fast paths, better parallelism/caching, and cold-start reduction are strong first moves.

Low-Level Recall

stateDiagram-v2
    [*] --> Receive
    Receive --> LoadContext
    LoadContext --> ClassifyIntent
    ClassifyIntent --> Route
    Route --> FetchData
    FetchData --> Generate
    Generate --> GuardrailChecks
    GuardrailChecks --> SaveTurn
    SaveTurn --> ReturnResponse
    GuardrailChecks --> Escalate: unsafe or unresolved
    Escalate --> [*]
    ReturnResponse --> [*]