Skip to main content

Platform Observability & Utilities

Flyte provides a suite of standard library components in flytestdlib designed to handle configuration, logging, metrics, and system health in a consistent, cloud-native manner. These utilities are built to be context-aware and support dynamic updates, ensuring that observability remains robust across distributed components like the executor, manager, and plugins.

Hierarchical Configuration Management

The configuration system in Flyte is built around the config.Section interface, which allows for a tree-like organization of configuration settings. This design enables different components to register their own typed configuration structs while sharing a common loading and management mechanism.

Registering Configuration Sections

Components register their configuration during initialization using MustRegisterSection. This function requires a unique key and a pointer to a struct that will hold the configuration values. The use of pointers is mandatory as it allows the configuration manager to unmarshal values directly into the struct.

// Example of registering a configuration section
type MyComponentConfig struct {
Enabled bool `json:"enabled"`
Port int `json:"port"`
}

var myConfig = &MyComponentConfig{
Enabled: true,
Port: 8080,
}

func init() {
config.MustRegisterSection("my_component", myConfig)
}

The config.Section interface (defined in flytestdlib/config/section.go) also supports change notifications. By using MustRegisterSectionWithUpdates, you can provide a callback that is invoked whenever the configuration is reloaded and changed.

PFlag Integration

To support command-line overrides, configuration structs can implement the PFlagProvider interface. This allows Flyte to automatically map command-line flags to configuration fields using the spf13/pflag library.

// flytestdlib/config/section.go
type PFlagProvider interface {
GetPFlagSet(prefix string) *pflag.FlagSet
}

Context-Aware Logging

Flyte uses a structured logging wrapper around logrus located in the logger package. A key feature of this implementation is its context-awareness: it automatically extracts fields from the Go context.Context to include in log entries, facilitating distributed tracing and request tracking.

Structured Logging with Context

The logger.GetLogFields(ctx) function retrieves fields previously injected into the context (often via contextutils). When you call logging functions like logger.Infof(ctx, ...), these fields are automatically appended to the structured log output.

// flytestdlib/logger/logger.go
func getLogger(ctx context.Context) logrus.FieldLogger {
cfg := GetConfig()
// ...
entry := logrus.WithFields(logrus.Fields(contextutils.GetLogFields(ctx)))
if cfg.IncludeSourceCode {
entry = entry.WithField(sourceCodeKey, getSourceLocation())
}
return entry
}

Dynamic Configuration

The logger's behavior is controlled by a registered configuration section. Updating the logger config section at runtime (e.g., via a config file reload) automatically triggers onConfigUpdated, which reconfigures the global logrus instance's level and formatter (supporting json, text, and gcp formats).

Namespaced Metrics

Metrics in Flyte are managed through promutils.Scope, which provides a namespaced registry for Prometheus metrics. This prevents naming collisions between different components and simplifies metric organization.

Metric Scopes

A Scope (defined in flytestdlib/promutils/scope.go) acts as a prefix for all metrics created through it. Scopes can be nested using NewSubScope, creating a hierarchy that reflects the system's architecture.

// Creating a namespaced scope in executor/setup.go
storageScope := promutils.NewScope("executor").NewSubScope("storage")
dataStore, err := storage.NewDataStore(storageConfig, storageScope)

Metric names are automatically sanitized by SanitizeMetricName, which replaces characters like - with _ to comply with Prometheus naming conventions.

Timing with StopWatch

For measuring durations, Flyte provides the StopWatch utility. Unlike raw Prometheus timers, StopWatch is designed to scale durations to a specific time.Duration (e.g., milliseconds or seconds) before recording them.

// Using a StopWatch to time an operation
stopWatch := scope.MustNewStopWatch("request_duration", "Duration of requests", time.Millisecond)
timer := stopWatch.Start()
// ... perform operation ...
timer.Stop()

Labeled Metrics

The promutils/labeled package provides metrics that automatically pull label values from the context. For example, a labeled.Counter will look for specific keys in the context and use their values as Prometheus labels when Inc(ctx) is called.

// flytestdlib/promutils/labeled/counter.go
func (c Counter) Inc(ctx context.Context) {
counter, err := c.GetMetricWith(contextutils.Values(ctx, c.labels...))
if err != nil {
panic(err.Error())
}
counter.Inc()
}

Observability Server

The profutils package provides a standardized way to expose system health and observability data via an HTTP server. This server is typically started alongside the main application logic to provide standard endpoints for monitoring tools.

The StartProfilingServerWithDefaultHandlers function (in flytestdlib/profutils/server.go) sets up the following endpoints:

  • /metrics: Exposes the Prometheus metrics registry.
  • /healthcheck: Returns a simple 200 OK for L7 load balancer health checks.
  • /version: Returns a JSON object containing build information (version, build hash, and timestamp).
  • /config: Dumps the current state of all registered configuration sections as JSON.
// Starting the observability server
err := profutils.StartProfilingServerWithDefaultHandlers(ctx, 1024, nil)

This unified approach ensures that every Flyte component provides a consistent interface for operators to inspect its health, configuration, and performance.