System Overview (HLD)

User channels connect through an edge load balancer and API gateway to the on-prem Kubernetes cluster running orchestrator, specialist agents, and tool executors. Data flows through GPU model serving, Redis, document/object stores, PostgreSQL, and an event bus for status propagation.

Internal Runtime (LLD)

The Kubernetes cluster is segmented into dedicated node pools: system (ingress, telemetry), orchestrator, specialist agents, tool executors, and GPU. Each pool scales independently based on load.

Orchestrator ReAct Reasoning Loop

Each request follows a REASON → ACT → OBSERVE cycle. The orchestrator decides the next action, delegates to a specialist or calls a tool, captures outputs, and optionally compresses context when token thresholds are exceeded.

Traceability Model

Each request records: step ID, timestamp, phase, action invoked and parameters, tool outputs and result codes, reasoning summary, and token budget with compression metadata. This enables replay, root-cause debugging, and audit inspection.

On-Prem Network Zones

The deployment spans four security zones: DMZ (edge/gateway/bastion), Application (K8s cluster and messaging), Data (databases, cache, object storage, vault), and Observability (monitoring stack).