Skip to main content

TaskAction Resource Model

The TaskAction resource is the primary unit of work in the Flyte executor runtime. Implemented as a Kubernetes Custom Resource Definition (CRD), it serves as the bridge between the Flyte control plane (which defines what needs to be done) and the executor controller (which manages the actual execution via plugins).

When a workflow reaches a task node, the Flyte Actions service creates a TaskAction resource. The executor's controller then reconciles this resource, invoking specialized plugins (like Pod, Spark, or Ray) to carry out the work.

Defining the Desired State

The TaskActionSpec class in executor/api/v1/taskaction_types.go defines the configuration required to execute a task. It encapsulates the task's identity, its data dependencies, and the logic to be executed.

Task Identity and Context

Every TaskAction is anchored to a specific project, domain, and run. These fields are mandatory and are used for resource isolation and tracking:

  • Project / Domain / RunName: Identify the execution context.
  • ActionName: The unique identifier for this specific task within the run.
  • TaskType: Determines which plugin handles the task (e.g., "container", "spark", "ray").

Execution Logic

The core logic of the task is stored in the TaskTemplate field as a proto-serialized core.TaskTemplate. This allows the executor to remain agnostic of the specific task details until reconciliation time, where the plugin deserializes it to configure the underlying Kubernetes resources (like a Pod or a CRD for Spark).

Data Flow

Flyte uses URIs to manage data movement between tasks:

  • InputURI: The location where the task's input data is stored.
  • RunOutputBase: The base path where the task should write its results.

Example Resource Creation

In actions/k8s/client.go, the ActionsClient instantiates a TaskAction with labels that facilitate querying and lifecycle management:

taskAction := &executorv1.TaskAction{
ObjectMeta: metav1.ObjectMeta{
Name: buildTaskActionName(actionID),
Namespace: flyteNamespace,
Labels: map[string]string{
"flyte.org/project": actionID.Run.Project,
"flyte.org/domain": actionID.Run.Domain,
"flyte.org/run": actionID.Run.Name,
"flyte.org/action": actionID.Name,
"flyte.org/action-type": "task",
},
},
Spec: executorv1.TaskActionSpec{
RunName: actionID.Run.Name,
Project: actionID.Run.Project,
Domain: actionID.Run.Domain,
ActionName: actionID.Name,
TaskType: "container",
InputURI: inputURI,
RunOutputBase: outputBase,
TaskTemplate: serializedTemplate,
},
}

Tracking Execution Progress

The TaskActionStatus class tracks the observed state of the task. It uses a combination of high-level phases, Kubernetes conditions, and detailed plugin state.

Phases and History

The PluginPhase field provides a human-readable string representing the current stage of execution (e.g., Queued, Initializing, Executing, Succeeded, Failed).

To provide a full audit trail, Flyte maintains a PhaseHistory in the status. This is an append-only log of PhaseTransition objects, each recording the phase name and the timestamp of the transition:

// From executor/api/v1/taskaction_types.go
type PhaseTransition struct {
Phase string `json:"phase"`
OccurredAt metav1.Time `json:"occurredAt"`
Message string `json:"message,omitempty"`
}

Kubernetes Conditions

Following Kubernetes API conventions, TaskActionStatus includes a Conditions array. Standard types include:

  • Available: The resource is fully functional.
  • Progressing: The task is currently being created or updated.
  • Degraded: The task failed to reach its desired state.

Plugin State Persistence

Because reconciliation is an iterative process, plugins often need to persist internal state between rounds (e.g., a job ID or a cursor). Flyte provides two fields for this:

  • PluginState: A Gob-encoded byte array for plugin-specific data.
  • StateJSON: A JSON-serialized NodeStatus sent to the State Service.

Error Handling and Reliability

Flyte distinguishes between user-level task failures and infrastructure-level system failures to ensure robust execution.

Structured Errors

When a task fails, the plugin populates the ErrorState field. This captures the specific error code (like OOMKilled) and the error kind (USER vs SYSTEM), allowing Flyte to report granular failure reasons back to the user.

// From executor/api/v1/taskaction_types.go
type ErrorState struct {
Code string `json:"code,omitempty"`
Kind string `json:"kind,omitempty"`
Message string `json:"message,omitempty"`
}

System Failure Tracking

The SystemFailures counter tracks transient infrastructure issues, such as Kubernetes API timeouts or admission webhook denials. If this count exceeds a configured threshold (typically 3), the TaskAction is transitioned to a permanent failure state. This prevents a single broken node or transient network issue from causing infinite retry loops.

Lifecycle and Cleanup

The lifecycle of a TaskAction is managed through several Kubernetes mechanisms:

  1. Naming Convention: Root actions are named <run-id>-a0, while child actions follow the pattern <run-id>-<action-id>.
  2. Ownership: Child TaskAction resources often have an OwnerReference pointing to their parent. This ensures that when a parent task is deleted, Kubernetes automatically cleans up all associated child actions.
  3. Finalizers: The controller attaches the flyte.org/plugin-finalizer to the resource. This ensures that even if a TaskAction is deleted, the controller has an opportunity to invoke the plugin's Abort or Finalize methods to clean up external resources (like cloud-provider Spark clusters) before the CRD is removed from etcd.
  4. Termination Labels: Upon completion, the controller stamps the resource with the flyte.org/termination-status label, which can be used by garbage collection processes to identify resources ready for removal.