MangaAssist Interview Pack - Hard With Hints
Level: Hard
How to use: Your answer should cover tradeoffs, not just mechanics. If the hint feels short, expand it with explicit failure-handling decisions.
Failure and Degradation Map
graph TD
A[User Request] --> B{Intent Type}
B --> C[Structured API Path]
B --> D[RAG Path]
B --> E[Full LLM Path]
C --> F{Dependency healthy?}
D --> G{Retriever healthy?}
E --> H{LLM healthy?}
F -->|No| I[Fallback response]
G -->|No| J[Reduced grounding or safe fallback]
H -->|No| K[Unavailable message or escalation]
F -->|Yes| L[Guardrails]
G -->|Yes| L
H -->|Yes| L
L --> M{Safe and complete?}
M -->|Yes| N[Return answer]
M -->|No| O[Regenerate / redact / escalate]
Interview Questions With Hints
Staff Engineer
- The system targets low latency but also uses multiple downstream services. Where would you parallelize, and where would you keep the flow sequential?
Hint: Parallelize independent fetches like recommendation, catalog, and some retrieval tasks; keep guardrails and final formatting after generation.
- What would you do if conversation-memory reads become slow enough to threaten the response SLA?
Hint: Degrade gracefully into stateless or reduced-context mode, add cache support, and protect the main response path.
- How would you design timeouts and retries for recommendation, catalog, and order-service calls so the chatbot does not stall?
Hint: Per-service budgets, small retry count with backoff and jitter, circuit breakers, and different fallback behavior by dependency tier.
- Why is summarizing older turns better than simply truncating conversation history in this system?
Hint: Summarization preserves intent and preferences while staying inside token budgets.
Security Engineer
- Walk through how you would defend against a prompt-injection attempt that asks the model to ignore the system prompt and reveal internal rules.
Hint: Separate system/user input, input scanning, hardened prompts, output checks, logging, and safe refusal paths.
- Where should PII scrubbing happen, and what goes wrong if it only happens in the analytics pipeline?
Hint: Scrub before model logging and analytics. Otherwise sensitive data has already leaked into systems you did not intend.
ML Engineer
- How would you detect that the intent classifier is drifting and misrouting messages more often over time?
Hint: Watch confidence shifts, labeled evaluation sets, escalation spikes, feedback drops, and routing anomalies.
- If retrieved chunks conflict with each other, how should the system respond without sounding uncertain or misleading?
Hint: Prefer freshness/versioning, constrain the model, and treat conflicting sources as a KB quality problem to surface for review.
SRE
- If Bedrock starts timing out for 5 percent of calls during a traffic spike, what immediate mitigations would you apply first?
Hint: Reduce expensive paths, enable fallbacks, examine quotas and traffic shaping, and preserve core experiences.
Principal Engineer
- If you had to cut p99 first-token latency in half without materially hurting answer quality, what three changes would you try first and why?
Hint: Model tiering, more aggressive fast paths, better parallelism/caching, and cold-start reduction are strong first moves.
Low-Level Recall
stateDiagram-v2
[*] --> Receive
Receive --> LoadContext
LoadContext --> ClassifyIntent
ClassifyIntent --> Route
Route --> FetchData
FetchData --> Generate
Generate --> GuardrailChecks
GuardrailChecks --> SaveTurn
SaveTurn --> ReturnResponse
GuardrailChecks --> Escalate: unsafe or unresolved
Escalate --> [*]
ReturnResponse --> [*]