Skip to main content

Configuring Storage Backends

Flyte uses a unified storage interface to interact with different backends like AWS S3, Google Cloud Storage, Azure Blob Storage, Redis, and local filesystems. You configure these backends using the Config struct in flytestdlib/storage.

Configuring S3 or Minio

To set up S3 or a Minio-compatible backend, use the StowConfig within the main Config. While Flyte supports a legacy ConnectionConfig, using StowConfig is the recommended approach as it provides more flexibility.

The following example shows how to configure a Minio backend in YAML:

storage:
type: minio
container: "flyte-data"
enable-multicontainer: false
stow:
kind: "s3"
config:
auth_type: "accesskey"
access_key_id: "minio-access-key"
secret_key: "minio-secret-key"
region: "us-east-1"
endpoint: "http://localhost:9000"
disable_ssl: "true"

In Go, you can initialize the DataStore programmatically:

import (
"github.com/flyteorg/flyte/v2/flytestdlib/storage"
"github.com/flyteorg/flyte/v2/flytestdlib/promutils"
)

cfg := &storage.Config{
Type: storage.TypeMinio,
InitContainer: "flyte-data",
Stow: storage.StowConfig{
Kind: "s3",
Config: map[string]string{
"auth_type": "accesskey",
"access_key_id": "minio-access-key",
"secret_key": "minio-secret-key",
"region": "us-east-1",
"endpoint": "http://localhost:9000",
"disable_ssl": "true",
},
},
}

dataStore, err := storage.NewDataStore(cfg, promutils.NewTestScope())

Configuring Local Filesystem

For local development, you can use the local storage type. This maps a local directory to a Flyte container.

cfg := &storage.Config{
Type: storage.TypeLocal,
InitContainer: "testdata",
Stow: storage.StowConfig{
Kind: "local",
Config: map[string]string{
"path": "./tmp/storage",
},
},
}

dataStore, err := storage.NewDataStore(cfg, promutils.NewTestScope())

When using local storage, a DataReference like file://testdata/config.yaml will resolve to ./tmp/storage/config.yaml.

Configuring Redis and Scheme Routing

Flyte supports Redis for high-performance metadata storage. If you configure RedisConfig alongside a blob store (like S3), Flyte automatically uses a schemeRoutingStore. This allows you to use redis:// URIs for metadata while keeping large artifacts in s3://.

storage:
type: s3
container: "flyte-data"
stow:
kind: "s3"
config:
region: "us-east-1"
redis:
addr: "localhost:6379"
db: 0

When this configuration is detected in RefreshConfig, Flyte wraps the default store and the Redis store:

// flytestdlib/storage/rawstores.go
if cfg.Type != TypeRedis && len(cfg.Redis.Addr) > 0 {
redisStore, err := NewRedisRawStore(ctx, cfg, ds.metrics)
if err != nil {
return err
}

rawStore = newSchemeRoutingStore(rawStore, redisStore, ds.metrics.copyMetrics)
}

The schemeRoutingStore dispatches operations based on the URI scheme:

  • redis://<addr>/<key> goes to the Redis backend.
  • All other schemes (e.g., s3://, file://) go to the default backend.

Global Storage Settings

The Config struct provides several parameters that control how Flyte interacts with storage regardless of the backend.

Multi-Container Support

By default, Flyte only allows access to the bucket defined in InitContainer. To allow access to any bucket specified in a DataReference, set MultiContainerEnabled to true.

cfg := &storage.Config{
Type: storage.TypeS3,
InitContainer: "default-bucket",
MultiContainerEnabled: true, // Allows s3://other-bucket/data.pb
}

Download Limits

To prevent excessive memory usage, you can set limits on download sizes using LimitsConfig.

cfg := &storage.Config{
Limits: storage.LimitsConfig{
MaxDownloadMBs: 10, // Rejects downloads larger than 10MB
},
}

Troubleshooting

Missing InitContainer

The InitContainer field is mandatory even if MultiContainerEnabled is set to true. If it is missing, NewDataStore will return an error: initContainer is required even with 'enable-multicontainer'

Redis Value Limits

Redis storage has a hard limit of 512MiB per value. Additionally, the Redis backend does not support signed URLs. If your application requires signed URLs for data access, ensure those references use a blob store scheme (like s3://) rather than redis://.

Local Storage Paths

When using the local backend, ensure the path in StowConfig exists. The DataReference must include the InitContainer name as the first segment of the path after the scheme (e.g., file://my-container/path/to/file).