Metrics & Delivery | UAE Visa Processing Platform

Capacity Planning

Platform Capacity Targets

25M

Requests / Year

Annual throughput target

6,500

Requests / Min Peak

~108 req/sec

≤ 15s

Submit p95

End-to-end latency

≤ 2s

Status p95

Cached/ready cases

30-50M

Historical Records

Supported dataset size

~1,625

Concurrent Active

In-flight requests at peak

99.9%

Availability SLA

Core intake & status

LLM Economics

LLM Call Profile

Each request type has a different LLM and tool call footprint, driving infrastructure sizing.

Flow	LLM Calls / Request	Tool Calls / Request	Avg Latency
Submit Request	~6	~18 (many parallelized)	12-15 sec
Request Status	~3	~5	2-3 sec

~105M

LLM Calls / Year

~10

Peak LLM Calls/sec

~1,000

Avg Tokens / Call

Scaling

Scaling Levers

Queue-Depth Autoscaling

Pod/consumer count scales based on queue depth custom metrics in Kubernetes HPA.

API Response Caching

Repetitive validation responses cached in Redis to reduce redundant external API calls.

GPU Model Batching

vLLM continuous batching provides 2-4x effective throughput multiplier under concurrent load.

Scheduled Retries

Non-terminal dependency states deferred to retry queues with exponential backoff.

Delivery Roadmap

Milestones

Seven-phase delivery from foundation setup through production readiness.

Phase 1 — Foundation Setup

Networking, Kubernetes cluster, messaging infrastructure, and data stores provisioned.

Phase 2 — Rule Compiler & YAML Contract

SKILL.md parsing, YAML compilation, schema validation, and version management finalized.

Phase 3 — Submit Flow MVP

Core submit request flow with eligibility, form fill, documents, booking, and payment integrations. Baseline telemetry.

Phase 4 — Status Flow & Loop-back

Poll/push status updates, loop-back handling for incomplete requests, and status observability.

Phase 5 — Observability Hardening

Full trace audit, monitoring dashboards, alerting integration, and compliance reporting.

Phase 6 — Performance Validation

Load/performance testing at 6,500 req/min target throughput. GPU scaling validation.

Phase 7 — Production Readiness

Governance sign-off, security audit completion, runbook handover, and go-live.

Risk Management

Risks & Mitigations

Key risks identified during planning with corresponding mitigation strategies.

External API Instability

Impact: Delays / Failures

Circuit breaker per API + exponential retries + deferred queue for non-terminal states.

Model-Serving Latency Spikes

Impact: SLA Breach

GPU autoscaling, continuous batching via vLLM, model routing between replicas, and KV-cache optimization.

Context Bloat in Long Loops

Impact: Reduced Quality

Summarizer agent with configurable token thresholds + immutable trace store for full context recovery.

Inconsistent Rule Interpretations

Impact: Wrong Outcomes

Automated rule tests + strict YAML schema validation + business team approval workflow before environment promotion.

Data Drift in Profile Sources

Impact: Validation Errors

Freshness SLAs per data source + periodic reconciliation jobs + alerting on stale-data thresholds.

Delivery Phases

Phase Roadmap

Phase 1 Foundation & MVP

Infrastructure and networking setup
Rule compiler and YAML finalization
Submit Request flow with core integrations
Baseline telemetry and tracing
Status flow with poll/push handling

Phase 2 Hardening & Scale

Full observability and audit hardening
Performance validation at target throughput
Security audit and compliance sign-off
Production readiness and governance approval
Operational handover and KT