Skip to main content

Flyte Internal Component Architecture

The Flyte internal component architecture follows a service-oriented design, often running in a unified process managed by the cache_service.manager. The system is divided into a control plane for workflow management, an execution plane for task reconciliation, and a data plane for handling large datasets and logs.

Key Components

  • cache_service.manager: The central entry point that initializes and orchestrates all internal services, including database migrations, storage connections, and Kubernetes clients.
  • runs.service: The core workflow engine responsible for managing the lifecycle of runs, tasks, and triggers. It persists metadata in PostgresConfig and offloads large input data to the DataStore.
  • ActionsService: Acts as a bridge between the Runs Service and Kubernetes. It translates action requests into TaskAction Custom Resources (CRDs) and watches for status updates to report back to the core engine.
  • executor.pkg.controller: A Kubernetes controller that reconciles TaskAction CRDs. It uses Task Plugins & Machinery to execute various task types (e.g., K8s Pods, Spark jobs) and reports execution events.
  • dataproxy: Provides a unified interface for data upload/download and log streaming, abstracting the underlying storage and Kubernetes log APIs.
  • cache_service: A dedicated service for task result caching, allowing the Executor to skip redundant computations by checking for previously successful executions.
  • Task Plugins & Machinery: A library of execution logic used by the Executor to interact with external compute providers and orchestrate task-specific resources.

Data Flow

  1. A User initiates a run through the Runs Service.
  2. The Runs Service persists the run metadata and calls the Actions Service to enqueue the root action.
  3. The Actions Service creates a TaskAction CRD in Kubernetes.
  4. The Executor detects the new CRD and uses FlytePlugins to launch the actual task (e.g., a Pod).
  5. As the task progresses, the Executor reports events to the Events Service, which proxies them back to the Runs Service to update the workflow state.
  6. The Actions Service also watches the CRD status and provides an independent update path for action-level transitions.
  7. The DataProxy facilitates data movement between the user, the execution environment, and the DataStore.

Key Architectural Findings:

  • The 'manager' component serves as a unified orchestrator that wires together all services, including database and storage initialization.
  • 'runs.service' manages high-level workflow state and uses 'actions.service' as a gRPC-to-K8s bridge.
  • 'executor.pkg.controller' is a standard K8s controller that implements the 'TaskAction' reconciliation loop.
  • 'flyteplugins' provides the actual execution logic for different task types, decoupled from the controller machinery.
  • 'dataproxy' and 'cache_service' provide specialized data and performance optimizations, interacting with both the control and execution planes.
  • The architecture uses a 'Events Service' proxy to decouple the Executor's event reporting from the Runs Service's internal update logic.
Loading diagram...