Analytics, Observability, And Feedback Loops
Covers Q24, Q37, Q40, Q47, Q49.
What The Interviewer Is Testing
- Whether you can instrument the system well enough to operate it.
- Whether you understand schema evolution and prompt experimentation.
- Whether you can connect traces, metrics, logs, and business KPIs.
Deep Dive
Event Design Principles
- Emit structured events with stable identifiers such as
session_id,response_id, and prompt version. - Scrub PII before emission.
- Treat analytics schemas as versioned contracts.
Observability Stack
You should be able to describe three layers clearly:
- traces for request path and dependency timing
- metrics for SLOs, error rates, and saturation
- logs for structured debugging context
What To Trace
- request entry
- session load
- intent classification
- service fan-out
- retrieval and reranking
- LLM invocation
- guardrail stages
- persistence and analytics emission
Prompt A/B Testing
Prompt versions should be configuration, not code constants. Strong answers mention:
- consistent assignment by customer or session key
- prompt version logging on every response
- outcome metrics such as resolution, satisfaction, conversion, and escalation
Schema Evolution
New event fields should be additive and backward-compatible. The strongest answers mention a schema registry, nullable additions, consumer compatibility, and a version log.
Strong Answer Pattern
- "If I cannot trace it, I cannot tune it."
- "Prompt experiments need consistent assignment and analytics tagging."
- "Dashboards should connect technical latency to user outcomes."
Scenario 1: Add user_satisfaction_score
Primary Prompt
The product team wants a new user_satisfaction_score field in analytics events. How do you add it safely?
Follow-Up 1
How do you keep old consumers from breaking?
Follow-Up 2
What change would be needed in Redshift?
Follow-Up 3
Would you backfill historical data? Under what conditions?
Strong Answer Markers
- Uses versioned additive schema changes.
- Keeps new fields optional at first.
- Mentions warehouse evolution and optional backfill.
Scenario 2: No One Knows Where The Latency Went
Primary Prompt
The team only logs total response time, and now latency is high. What instrumentation is missing?
Follow-Up 1
What span boundaries would you add first?
Follow-Up 2
Which metrics belong on an operations dashboard versus an executive dashboard?
Follow-Up 3
What alarm thresholds would you set initially?
Strong Answer Markers
- Adds span-level tracing across every major stage.
- Distinguishes business dashboards from engineering dashboards.
- Uses p95 and p99, not only averages.
Scenario 3: Prompt Version B Improves Satisfaction But Hurts Escalation Rate
Primary Prompt
Prompt version B raises thumbs-up rates but also increases escalations. How do you reason about that conflict?
Follow-Up 1
What additional metrics do you inspect before deciding?
Follow-Up 2
Could the prompt be overconfident while sounding better?
Follow-Up 3
What is your rollout decision if the metrics remain mixed?
Strong Answer Markers
- Treats multi-metric outcomes as normal.
- Investigates segment-level behavior and root causes.
- Uses business-priority weighting instead of picking the nicest-looking metric.
Red Flags
- Logging raw PII to "fix it later."
- Treating dashboards as only SQL queries.
- Running prompt experiments without version IDs in analytics.
- Relying on average latency as the primary health signal.
Two-Minute Whiteboard Version
Draw four outputs from the same request:
- Trace spans.
- Metrics.
- Structured logs.
- Analytics events tied to prompt and response IDs.