GetLogs
Returns the logs for a Kubeflow job by initializing a log plugin and retrieving logs for master, worker, parameter server, chief, and evaluator replicas based on the task type and replica counts.
def GetLogs(
pluginContext: k8s.PluginContext,
taskType: string,
objectMeta: meta_v1.ObjectMeta,
taskTemplate: *core.TaskTemplate,
hasMaster: bool,
workersCount: int32,
psReplicasCount: int32,
chiefReplicasCount: int32,
evaluatorReplicasCount: int32,
primaryContainerName: string
) - > []*core.TaskLog, error
Retrieves the logs for a Kubeflow job, including logs for master, worker, parameter server, chief, and evaluator replicas based on the task type and replica counts. This function aggregates logs from various components of a distributed Kubeflow job.
Parameters
| Name | Type | Description |
|---|---|---|
| pluginContext | k8s.PluginContext | The context for the Kubernetes plugin, providing access to task execution metadata. |
| taskType | string | The type of the task (e.g., PytorchTaskType, MPITaskType), which determines which specific logs to retrieve. |
| objectMeta | meta_v1.ObjectMeta | Kubernetes object metadata, containing the name and namespace of the Kubeflow job to identify the pods. |
| taskTemplate | *core.TaskTemplate | The task template associated with the job, used for initializing the log plugin. |
| hasMaster | bool | A boolean indicating whether the job has a master replica, relevant for Pytorch tasks. |
| workersCount | int32 | The number of worker replicas in the job, used to iterate and retrieve logs for each worker. |
| psReplicasCount | int32 | The number of parameter server replicas in the job, used to retrieve logs for each parameter server. |
| chiefReplicasCount | int32 | The number of chief replicas in the job (typically 0 or 1), used to retrieve logs for the chief replica. |
| evaluatorReplicasCount | int32 | The number of evaluator replicas in the job (typically 0 or 1), used to retrieve logs for the evaluator replica. |
| primaryContainerName | string | The name of the primary container within the pods from which to fetch logs. |
Returns
| Type | Description |
|---|---|
[]*core.TaskLog, error | A slice of TaskLog objects containing the aggregated logs, or an error if log retrieval fails for any component. |