nccl

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 27, 2024 License: Apache-2.0 Imports: 13 Imported by: 0

Documentation

Overview

Package nccl monitors the NCCL status. Optional, enabled if the host has NVIDIA GPUs.

Index

Constants

View Source
const (
	// repeated messages may indicate GPU communication issues, which may happen due to fabric manager issues
	// e.g.,
	// [Thu Oct 10 03:06:53 2024] pt_main_thread[2536443]: segfault at 7f797fe00000 ip 00007f7c7ac69996 sp 00007f7c12fd7c30 error 4 in libnccl.so.2[7f7c7ac00000+d3d3000]
	EventNameNCCLSegfaultInLibncclFromDmesg = "nccl_segfault_in_libnccl_from_dmesg"

	EventKeyNCCLSegfaultInLibncclFromDmesgUnixSeconds = "unix_seconds"
	EventKeyNCCLSegfaultInLibncclFromDmesgLogLine     = "log_line"
)

Variables

This section is empty.

Functions

func New

Types

type Config

type Config struct {
	Query query_config.Config `json:"query"`
}

func ParseConfig

func ParseConfig(b any, db *sql.DB) (*Config, error)

func (Config) Validate

func (cfg Config) Validate() error

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL