Flyte Production Infrastructure Deployment

The deployment architecture of Flyte Production Infrastructure is centered around a unified gRPC/HTTP service called the Flyte Binary, which consolidates multiple backend services and background workers into a single deployment. This service interacts with a PostgreSQL database for metadata and an Object Storage backend (S3, GCS, or Azure) for large data artifacts.

Key components include:

Flyte Binary: A multi-functional container that serves as the API gateway (handling Runs, Tasks, Projects, etc.) and runs background processes like the Trigger Scheduler, Abort Reconciler, and the Executor. The Executor acts as a Kubernetes controller, managing the lifecycle of TaskAction custom resources and orchestrating worker pods.
Flyte Console: A separate web-based user interface that communicates with the Flyte Binary API.
Worker Pods: Dynamically created by the Executor to run user-defined tasks. These pods utilize Flyte Co-Pilot as both an init container (for downloading metadata) and a sidecar (for monitoring and uploading results).
Ingress Controller: Manages external access to both the console (HTTP) and the binary service (gRPC and HTTP).
External Services: Integrates with OIDC providers for authentication and optional external Flyte Connectors for specialized task execution.

The architecture follows a Data Movement with Copilot for task execution and a The Reconciler Architecture for resource management within the Kubernetes cluster.

Key Architectural Findings:

The 'flyte-binary' is a unified service that bundles API handlers (Runs, Tasks, Projects, DataProxy, Events, Cache, Actions, App, Secret, Auth) and background workers (Abort Reconciler, Trigger Scheduler, Executor, Garbage Collector).
The Executor is a Kubernetes controller that reconciles 'TaskAction' CRDs and manages worker pod lifecycles.
Flyte Co-Pilot is deployed as both an init container (downloader) and a sidecar (uploader) within worker pods to handle metadata and data transfer.
The system uses PostgreSQL for metadata storage and supports multiple object storage providers (S3, GCS, Azure) via the 'stow' library.
Traffic is routed via an Ingress controller, with separate paths for the web console and gRPC/HTTP API services.

Loading diagram...