On-Prem Tech Stack (LLD)

Each layer of the stack is independently scalable and uses open-source components where possible.

  • System pool — Ingress, telemetry agents
  • Orchestrator pool — Orchestrator + Summarizer pods
  • Specialist pool — Eligibility, Form, Document, Booking, Payment, Rule Compiler
  • Tool executor pool — Tool middleware + API client pods
  • GPU pool — vLLM/TGI model serving

Horizontal autoscaling based on CPU/memory/queue depth custom metrics.

  • Fast model (8-13B) — Orchestrator, Summarizer loops
  • Capable model (70B) — Specialist agent reasoning
  • Self-hosted via vLLM/TGI on GPU nodes
  • Local model routing and failover between replicas
  • KV-cache optimization and prompt caching
  • Intake queue/topic
  • Agent-specific work queues/topics
  • Tool execution queue/topic
  • Scheduled retry queue
  • Status update topics/subscriptions

Relational

PostgreSQL HA — primary transactional storage for eChannels request tables.

Document Store

MongoDB/Cassandra/Scylla — UDB domain objects, session snapshots, reasoning traces.

Object Storage

MinIO/Ceph — document attachments, SKILL.md sources, compiled YAML.

Cache

Redis cluster — rule cache, session cache, API response cache, rate-limit counters.

  • Self-hosted gateway (Kong/Tyk/APISIX) for traffic governance and multi-backend routing
  • Vault/KMS for secrets and certificates
  • K8s service accounts + RBAC + mTLS for east-west auth
  • Network segmentation with internal firewall policies

Monitoring & Telemetry Architecture

All platform components emit traces, metrics, and logs through OpenTelemetry collectors into Prometheus (metrics), Loki (logs), and optionally Tempo (traces), unified in Grafana dashboards.

Metrics

Agent throughput, API latency, error rates, queue depth, cache hit ratio, token budgets.

Logs

Structured reasoning traces, tool call logs, auth failures, circuit breaker events.

Dashboards

Unified views across all signals with alerting integration for email, SMS, and ITSM webhooks.

Monitored Stages

Intake
Ingress latency, API rate, auth failures
Orchestration
Loop duration, retries, summarization, tokens
Agents
Per-agent throughput, error rates, queue wait
Tool/API
API latency, status codes, circuit breaker state
Data Layer
DB latency, cache hit ratio, storage growth
Status
Event lag, notification success, loop-back frequency

Security, Privacy & Compliance

Encryption & Access

Encryption in transit and at rest for all data stores
Least-privilege IAM and service identity controls
Secret rotation through centralized vault
K8s RBAC + mTLS for service-to-service auth

Audit & Retention

Full audit trail for every critical request action
Immutable, queryable logs for regulatory review
Data retention policies by artifact type
7-year retention norm (UAE standard)

Failure Handling Patterns

Circuit Breakers

Per external API — isolates failures to prevent cascade across the agent pipeline.

Exponential Retries

Transient failures retried with backoff. Scheduled retry queue for deferred operations.

Dead-Letter Queues

Non-recoverable messages routed to DLQ for manual inspection and replay.

Graceful Fallback

Cached state used for read paths where policy permits. Manual review queue for unresolved cases.

Environments & Rollout

DEV

Development

SIT

System Integration

UAT

User Acceptance

PROD

Production

Rollout Strategy

  • Controlled canary for orchestrator changes
  • Blue/green rollout for tool middleware updates

Configuration Separation

  • Rule packs versioned per environment
  • API endpoint and quota profiles per environment