Skip to main content

Flyte

Reliably orchestrate ML pipelines, models, and agents at scale — in pure Python.

Overview

Flyte is a cloud-native, open-source orchestrator designed to manage complex machine learning and data processing workflows. Built on Kubernetes, it provides a robust platform for executing distributed tasks with strong guarantees on reliability, scalability, and reproducibility. Flyte allows engineers and data scientists to define their workflows in pure Python while the platform handles the heavy lifting of resource management, dependency tracking, and execution state.

At its core, Flyte treats every execution as a first-class citizen, providing deep visibility into the lifecycle of every task. Whether you are training a large-scale model, running a data transformation pipeline, or serving a real-time agent, Flyte ensures that your applications are resilient to failures and easy to observe. Its extensible architecture supports a wide range of backends, from Kubernetes-native jobs to external web APIs and cloud services.

Key Concepts

  • Task Plugins & Machinery Task Plugins & Machinery: The extensibility layer that allows Flyte to execute diverse workloads, including Kubernetes Pods, AWS services, and custom Web APIs.
  • Task Execution Engine Task Execution Engine: The high-performance reconciler that manages the state transitions of tasks, ensuring they move reliably from submission to completion.
  • Data Proxy & Storage Data Proxy & Storage: A unified abstraction for data movement that handles input/output persistence and provides signed URLs for secure, performant data access.
  • Result Caching & Memoization Result Caching & Memoization: A sophisticated caching system that identifies redundant executions based on input signatures and reuses previous results to save time and resources.
  • Security & Secret Management Security & Secret Management: A secure framework for injecting sensitive credentials into execution environments using providers like HashiCorp Vault, AWS Secrets Manager, and GCP Secret Manager.
  • Scheduling & Automation Scheduling & Automation: The system for triggering workflows based on time-based schedules (cron) or external events, enabling fully automated data pipelines.

Common Use Cases

  • ML Training Pipelines: Orchestrate complex multi-step training workflows with automatic retries, resource isolation, and result tracking.
  • Model Serving & Agents: Deploy and serve models using integrations like FastAPI, managing the lifecycle of the serving infrastructure alongside the training code.
  • Distributed Data Processing: Run large-scale data transformations using Kubernetes-native plugins for Spark, Ray, Dask, and MPI.
  • Automated ETL: Schedule recurring data ingestion and processing tasks with built-in monitoring and alerting.
  • Resource-Intensive Simulations: Execute high-performance computing tasks that require specific hardware accelerators (GPUs/TPUs) and complex environment configurations.

Getting Started

To begin your journey with Flyte, we recommend starting with the Getting Started guide to set up your local environment. If you're ready to deploy your first application, follow the Tutorial: Deploying Your First App for a step-by-step walkthrough. For a deeper dive into the platform's internals, explore the Architecture: Control and Data Planes section.