nccl

package
v0.4.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2025 License: Apache-2.0 Imports: 12 Imported by: 0

Documentation

Overview

Package nccl monitors the NCCL status. Optional, enabled if the host has NVIDIA GPUs.

Index

Constants

View Source
const (
	// repeated messages may indicate GPU communication issues, which may happen due to fabric manager issues
	// e.g.,
	// [Thu Oct 10 03:06:53 2024] pt_main_thread[2536443]: segfault at 7f797fe00000 ip 00007f7c7ac69996 sp 00007f7c12fd7c30 error 4 in libnccl.so.2[7f7c7ac00000+d3d3000]
	EventNameNCCLSegfaultInLibncclFromDmesg = "nccl_segfault_in_libnccl_from_dmesg"

	EventKeyNCCLSegfaultInLibncclFromDmesgUnixSeconds = "unix_seconds"
	EventKeyNCCLSegfaultInLibncclFromDmesgLogLine     = "log_line"
)

Variables

This section is empty.

Functions

Types

This section is empty.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL