sxid

package
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 2, 2024 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Overview

Package sxid provides the NVIDIA SXID error details.

Index

Constants

View Source
const (
	// e.g.,
	// [111111111.111] nvidia-nvswitch3: SXid (PCI:0000:05:00.0): 12028, Non-fatal, Link 32 egress non-posted PRIV error (First)
	// [131453.740743] nvidia-nvswitch0: SXid (PCI:0000:00:00.0): 20034, Fatal, Link 30 LTSSM Fault Up
	//
	// ref.
	// "D.4 Non-Fatal NVSwitch SXid Errors"
	// https://docs.nvidia.com/datacenter/tesla/pdf/fabric-manager-user-guide.pdf
	RegexNVSwitchSXidDmesg = `SXid.*?: (\d+),`
)

Variables

View Source
var CompiledRegexNVSwitchSXidDmesg = regexp.MustCompile(RegexNVSwitchSXidDmesg)

Functions

func ExtractNVSwitchSXid

func ExtractNVSwitchSXid(line string) int

Extracts the nvidia NVSwitch SXid error code from the dmesg log line. Returns 0 if the error code is not found. https://docs.nvidia.com/datacenter/tesla/pdf/fabric-manager-user-guide.pdf

Types

type Detail

type Detail struct {
	DocumentVersion string `json:"documentation_version"`

	SXid        int    `json:"sxid"`
	Name        string `json:"name"`
	Description string `json:"description"`

	// SuggestedActionsByGPUd is the suggested actions by GPUd.
	SuggestedActionsByGPUd *common.SuggestedActions `json:"suggested_actions_by_gpud,omitempty"`
	// CriticalErrorMarkedByGPUd is true if the GPUd marks this SXid as a critical error.
	// You may use this field to decide whether to alert or not.
	CriticalErrorMarkedByGPUd bool `json:"critical_error_marked_by_gpud"`

	PotentialFatal bool   `json:"potential_fatal"`
	AlwaysFatal    bool   `json:"always_fatal"`
	Impact         string `json:"impact"`
	Recovery       string `json:"recovery"`
	OtherImpact    string `json:"other_impact"`
}

Defines the SXid error information that is static. ref. https://docs.nvidia.com/datacenter/tesla/pdf/fabric-manager-user-guide.pdf

func GetDetail

func GetDetail(id int) (*Detail, bool)

Returns the error if found. Otherwise, returns false.

func (Detail) JSON added in v0.1.8

func (d Detail) JSON() ([]byte, error)

type DmesgError

type DmesgError struct {
	Detail  *Detail        `json:"detail"`
	LogItem query_log.Item `json:"log_item"`
}

func ParseDmesgErrorJSON

func ParseDmesgErrorJSON(data []byte) (*DmesgError, error)

func ParseDmesgErrorYAML

func ParseDmesgErrorYAML(data []byte) (*DmesgError, error)

func ParseDmesgLogLine

func ParseDmesgLogLine(time metav1.Time, line string) (DmesgError, error)

func (*DmesgError) JSON

func (de *DmesgError) JSON() ([]byte, error)

func (*DmesgError) YAML

func (de *DmesgError) YAML() ([]byte, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL