gobpfld

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2021 License: MIT Imports: 21 Imported by: 2

README

GoBPFLD

GoDoc

GoBPFLD is a pure go eBPF loader/userspace library as an alternative to using gobpf which requires CGO to work. The goal of GoBPFLD is to provide a library for eBPF development which is comparable to libbpf(C library) but without CGO which improves the development experience.

WARNING GoBPFLD is currently not (yet) feature complete, and may lack critical features for some eBPF program types since the main focus for now is on XDP programs.

WARNING GoBPFLD has only been tested on X86_64 machines, due to the nature of interacting with the kernel via syscalls it is likely that architecture dependant bugs may arise. For now it is not recommended to trust this library with any architecture other than X86_64.

Requirements

eBPF is a Linux specific feature (ignoring userspace eBPF) which was introduced in kernel 3.18. This means that for this library to work the executable must be run on a Linux machine with 3.18 or above.

This library detects (at runtime) which version of the linux kernel is being used. Higher level features will attempt to fallback to still over as much functionality as possible. This library attempts to catch the usage of unsupported features and return nice verbose human readable errors. If this fails the kernel will still return an error, which is less verbose. For some features running on a newer kernel version may be required. You can find a great overview of features per kernel version here.

Security

Programs which interact with the kernel via the bpf syscall need to have extra capabilities if they are not running as the root user. In particular the CAP_BPF or CAP_SYS_ADMIN capability. CAP_BPF is available since kernel 5.8 and is preferred since it has the least privileges needed. If you run on a kernel version lower than 5.8, CAP_SYS_ADMIN is the only option you have to grant non-root users access to the bpf syscall. In this case it might be a better option to run you program as root and switch to a non-root user when not using the bpf syscall. This can be accomplished by using the seteuid syscall via the syscall.Seteuid function.

There are a number of eBPF related vulnerabilities known so far: CVE-2016-2383, CVE-2016-4557, CVE-2021-20268. The kernel has the ability to JIT eBPF programs which translates the eBPF instruction into actual machine code to be executed. Not only that but it executes in the kernel with all associated privileges. To ensure that eBPF programs don't access memory outside the eBPF vm, the kernel attempts to detect illegal code, if the verifier fails we have security issues. Programs using this library therefor must be sure that the eBPF programs don't contain user input without sanitization. Even normal features of eBPF such as packet manipulation or dropping may be considered security issues in some cases. More info about eBPF JIT and eBPF hardening can be found in the cilium reference guide

Use cases

GoBPFLD is a loader/library to make eBPF tool development in Go smoother. It is not a standalone tool like bpftool but rather more like libbpf or gobpf.

Features

  • Pure Go - no CGO, missing libraries, forced dynamic linking, ect.
  • Load pre-compiled eBPF programs from ELF files (see LoadProgramFromELF)
  • Decode eBPF bytecode (see BPFProgram.DecodeToReader and ebpf.Decode)
  • Encode eBPF instructions into bytecode (see ebpf.Encode)
  • Loading eBPF maps into the kernel (see BPFMap.Load)
  • Loading eBPF programs into the kernel (see BPFProgram.Load)
  • Interacting with eBPF maps (lookup, set, delete, batch-lookup, batch-set, and batch-delete)
  • Map iterators (see GenericMap.Iterator and MapIterator)
  • Attaching eBPF programs to network interfaces as XDP programs(see BPFProgram.XDPLinkAttach)
  • Attaching eBPF programs to sockets(see BPFProgram.SocketAttach and BPFProgram.SocketAttachControlFunc)
  • XSK/AF_XDP socket support (see NewXSKSocket and XSKSocket)
  • Go wrappers around all bpf syscall commands (see bpfsys package)
  • eBPF clang style assembly parser/assembler (see eBPF package)
  • XDP program testing (see BPFProgram.XDPTestProgram)

Examples

The cmd/examples directory contains examples programs which demonstrate how to use this library and its capabilities.

TODO/Roadmap/Scope limits

As mentioned earlier the first milestone/focus area of this project has been on implementing basic eBPF and XDP related features, and thus is missing a lot of stuff. This is a list of features to be added later, just to keep track.

Must have

Features/tasks in this list are commonly used/requested because they are used in common use cases / scenarios.

  • Data relocation from ELF files(static global variables)
  • Attach to sockets
  • Attach to kprobes
  • Attach to tc (traffic control)
  • Attach to tracepoints
  • Attack to perf events
  • Tailcall support
  • Map pinning and unpinning
  • Bulk map ops
  • Program pinning and unpinning
  • BPF2BPF function calls
  • Map iterator construct (looping over maps is very common)
  • (partially implemented) Linux kernel version detection (so programs can programmatically decide which features they can use, then error, warn or be backwards compatible)
Should have

Features/tasks in this list are not critical for most users but still important for a significant portion.

  • Map in map support (useful but not widely used)
  • XSK/AF_XDP support (useful for kernel bypass and packet capture)
  • XSK multiple sockets per netdev,queue pair (currently only one socket per pair is supported)
  • (partially implemented) Program testing (Being able to unit test an XDP program would be great)
  • Ringbuffer map support
  • Map access via memory mapping https://lwn.net/Articles/805043/ (could improve performance)
  • Support for LWT programs (Light weight tunnel)
  • BTF support (So we have more type info to work with)
  • ARM64 support / testing (ARM is on the rise)
  • ARM32 support / testing (ARM is on the rise)
  • Library testing framework(Unit tests but for BPF functions, in a VM for consistant results(muli kernel tests?))
Could have

Features/tasks in this list are cool to have but secondary to the primary goal of this project.

  • Built-in XSK kernel program (like libbpf) (only useful for people intrested in full kernel bypass without additional logic in XDP/eBPF)
  • RISC-V support / testing (RISC-V has promise, would be cool, but not yet widely used)
  • x86_32 support / testing (32 bit is not very popular anymore, but maybe still useful for IOT or raspberry pi like machines)
  • Userspace VM (It would be cool to be able to run eBPF in Go, for testing or as plugin mechanism like LUA and WASM. But not an important feature related to eBPF loading)
  • Userspace map caching (Depending on the map flags and eBPF program, maps can be cached in the userspace without requesting value via syscalls (userspace -> kernel only maps))
Won't have

Features/tasks in this list are out of the scope of the project. We have to draw the line somewhere to avoid feature creep.

  • cBPF support (cBPF is not even supported by Linux anymore, just converted to eBPF, which you can also do with tools for any exiting program)

Documentation

Index

Constants

View Source
const BPFSysPath = "/sys/fs/bpf/"

BPFSysPath is the path to the bpf FS used to pin objects to

Variables

View Source
var (
	// ErrProgramNotLoaded is returned when attempting to attach a non-loaded program
	ErrProgramNotLoaded = errors.New("the program is not yet loaded and thus can't be attached")
	// ErrProgramNotXDPType is returned when attempting to attach a non-XDP program to a netdev
	ErrProgramNotXDPType = errors.New("the program is not loaded as an XDP program and thus can't be " +
		"attached as such")
	// ErrNetlinkAlreadyHasXDPProgram is returned when attempting to attach a program to an
	// netdev that already has an XDP program attached
	ErrNetlinkAlreadyHasXDPProgram = errors.New("the netlink already has an XDP program attached")
)
View Source
var BPFMapDefSize = int(unsafe.Sizeof(BPFMapDef{}))

BPFMapDefSize is the size of BPFMapDef in bytes

View Source
var ErrObjNameToLarge = errors.New("object name to large")

ErrObjNameToLarge is returned when a given string or byte slice is to large. The kernel limits names to 15 usable bytes plus a null-termination char

View Source
var ErrProgramNotSocketFilterType = errors.New("the program is not loaded as an socket filter program and " +
	"thus can't be attached as such")

ErrProgramNotSocketFilterType is returned when attempting to attach a non-socket filter program to a socket.

Functions

func CStrBytesToString

func CStrBytesToString(b []byte) string

CStrBytesToString converts bytes to string assuming it is a C string

func CStrToString

func CStrToString(cstr string) string

CStrToString trims the string at the first null byte which is used in C to indicate the end of the string

func GetNetDevQueueCount

func GetNetDevQueueCount(netdev string) (int, error)

GetNetDevQueueCount uses the /sys/class/net/<dev>/queues/ directory to figure out how many queues a network device has. Knowing the number of queues is critical when binding XSK sockets to a network device.

func MapIterForEach

func MapIterForEach(iter MapIterator, key, value interface{}, callback func(key, value interface{}) error) error

MapIterForEach fully loops over the given iterator, calling the callback for each entry. This offers less control but requires less external setup.

MapIterForEach accepts non-pointer values for key and value in which case they will only be used for type information. If callback returns an error the iterator will stop iterating and return the error from callback. Callback is always invoked with pointer types, even if non-pointer types were supplied to key and value.

func PinFD

func PinFD(relativePath string, fd bpfsys.BPFfd) error

PinFD pins an eBPF object(map, program, link) identified by the given `fd` to the given `relativePath` relative to the `BPFSysPath` on the BPF FS.

This function is exposed so custom program or map implementations can use outside of this library. However, it is recommendd to use the BPFProgram.Pin and AbstractMap.Pin functions if gobpfld types are used.

func StringToCStrBytes

func StringToCStrBytes(str string) []byte

StringToCStrBytes turns the string into a null terminated byte slice

func UnpinFD

func UnpinFD(relativePath string, deletePin bool) (bpfsys.BPFfd, error)

UnpinFD gets the fd of an eBPF object(map, program, link) which is pinned at the given `relativePath` relative to the `BPFSysPath` on the BPF FS. If `deletePin` is true, this function will remove the pin from the BPF FS after successfully getting it.

This function is exposed so custom program or map implementations can use outside of this library. However, it is recommendd to use the BPFProgram.Unpin and AbstractMap.Unpin functions if gobpfld types are used.

Types

type AbstractMap

type AbstractMap struct {
	Name ObjName

	Loaded bool
	Fd     bpfsys.BPFfd

	Definition BPFMapDef
}

AbstractMap is a base struct which implements BPFMap however it lacks any features for interacting with the map, these need to be implemented by a specific map type which can embed this type to reduce code dupplication. This type is exported so users of the library can also embed this struct in application specific implementation.

func (*AbstractMap) GetDefinition

func (m *AbstractMap) GetDefinition() BPFMapDef

func (*AbstractMap) GetFD

func (m *AbstractMap) GetFD() bpfsys.BPFfd

func (*AbstractMap) GetName

func (m *AbstractMap) GetName() ObjName

func (*AbstractMap) IsLoaded

func (m *AbstractMap) IsLoaded() bool

func (*AbstractMap) Load

func (m *AbstractMap) Load() error

Load validates and loads the userspace map definition into the kernel.

func (*AbstractMap) Pin

func (m *AbstractMap) Pin(relativePath string) error

Pin pins the map to a location in the bpf filesystem, since the file system now also holds a reference to the map the original creator of the map can terminate without triggering the map to be closed as well. A map can be unpinned from the bpf FS by another process thus transferring it or persisting it across multiple runs of the same program.

func (*AbstractMap) Unload

func (m *AbstractMap) Unload() error

Unload closes the file descriptor associate with the map, this will cause the map to unload from the kernel if it is not still in use by a eBPF program, bpf FS, or a userspace program still holding a fd to the map.

func (*AbstractMap) Unpin

func (m *AbstractMap) Unpin(relativePath string, deletePin bool) error

Unpin captures the file descriptor of the map at the given 'relativePath' from the kernel. The definition in this map must match the definition of the pinned map, otherwise this function will return an error since mismatched definitions might cause seemingly unrelated bugs in other functions. If 'deletePin' is true the bpf FS pin will be removed after successfully loading the map, thus transferring ownership of the map in a scenario where the map is not shared between multiple programs. Otherwise the pin will keep existing which will cause the map to not be deleted when this program exits.

type BPFELF

type BPFELF struct {
	// Programs contained within the ELF
	Programs map[string]*BPFProgram
	// Maps defined in the ELF
	Maps map[string]BPFMap
	// contains filtered or unexported fields
}

BPFELF is the result of parsing an eBPF ELF file. It can contain multiple programs and maps.

func LoadProgramFromELF

func LoadProgramFromELF(r io.ReaderAt, settings ELFParseSettings) (BPFELF, error)

type BPFGenericMap

type BPFGenericMap struct {
	AbstractMap
}

BPFGenericMap is a runtime reflection implementation for generic BPFTypes. Because it uses reflection for type information it is slower than any application specific map. For high speed access a custom BPFMap implementation is recommended.

func (*BPFGenericMap) Delete

func (m *BPFGenericMap) Delete(key interface{}) error

func (*BPFGenericMap) DeleteBatch

func (m *BPFGenericMap) DeleteBatch(
	keys interface{},
	maxBatchSize uint32,
) (
	count int,
	err error,
)

func (*BPFGenericMap) Get

func (m *BPFGenericMap) Get(key interface{}, value interface{}) error

func (*BPFGenericMap) GetAndDelete

func (m *BPFGenericMap) GetAndDelete(key interface{}, value interface{}) error

func (*BPFGenericMap) GetAndDeleteBatch

func (m *BPFGenericMap) GetAndDeleteBatch(
	keys interface{},
	values interface{},
	maxBatchSize uint32,
) (
	count int,
	err error,
)

func (*BPFGenericMap) GetBatch

func (m *BPFGenericMap) GetBatch(
	keys interface{},
	values interface{},
	maxBatchSize uint32,
) (
	count int,
	full bool,
	err error,
)

GetBatch fills the keys and values array/slice with the keys and values inside the map up to a maximum of maxBatchSize entries. The keys and values array/slice must have at least a length of maxBatchSize. The key and value of an entry is has the same index, so for example the value for keys[2] is in values[2]. Count is the amount of entries returnes, full is true if all entries were returned.

This function is intended for small maps which can be read into userspace all at once since GetBatch can only read from the beginning of the map. If the map is to large to read all at once a iterator should be used instead of the Get or GetBatch function.

func (*BPFGenericMap) Iterator

func (m *BPFGenericMap) Iterator() MapIterator

func (*BPFGenericMap) Set

func (m *BPFGenericMap) Set(key interface{}, value interface{}, flags bpfsys.BPFAttrMapElemFlags) error

func (*BPFGenericMap) SetBatch

func (m *BPFGenericMap) SetBatch(
	keys interface{},
	values interface{},
	flags bpfsys.BPFAttrMapElemFlags,
	maxBatchSize uint32,
) (
	count int,
	err error,
)

type BPFMap

type BPFMap interface {
	GetName() ObjName
	GetFD() bpfsys.BPFfd
	IsLoaded() bool
	GetDefinition() BPFMapDef

	Load() error
}

func MapFromID

func MapFromID(id uint32) (BPFMap, error)

MapFromID creates a BPFMap object from a map that is already loaded into the kernel.

type BPFMapDef

type BPFMapDef struct {
	Type       bpftypes.BPFMapType
	KeySize    uint32
	ValueSize  uint32
	MaxEntries uint32
	Flags      bpftypes.BPFMapFlags
}

func (BPFMapDef) Equal

func (def BPFMapDef) Equal(other BPFMapDef) bool

Equal checks if two map definitions are functionally identical

func (BPFMapDef) Validate

func (def BPFMapDef) Validate() error

Validate checks if the map definition is valid, the kernel also does these checks but if the kernel finds an error it doesn't return a nice error message. This give a better user experience.

type BPFProgInfo

type BPFProgInfo struct {
	Type            bpftypes.BPFProgType
	ID              uint32
	Tag             [bpftypes.BPF_TAG_SIZE]byte
	JitedProgInsns  []ebpf.RawInstruction
	XlatedProgInsns []ebpf.RawInstruction
	LoadTime        time.Time
	CreatedByUID    uint32
	MapIDs          []uint32
	Name            ObjName
	IfIndex         uint32
	Flags           bpftypes.BPFProgInfoFlags
	NetNSDev        uint64
	NetNSIno        uint64
	JitedKsyms      []uint64
	JitedFuncLens   []uint32
	BTFID           uint32
	FuncInfo        []bpftypes.BPFFuncInfo
	LineInfo        []bpftypes.BPFLineInfo
	JitedLineInfo   []bpftypes.BPFLineInfo
	ProgTags        [][bpftypes.BPF_TAG_SIZE]byte
	RunTimeNs       uint64
	RunCnt          uint64
	RecursionMisses uint64
}

BPFProgInfo is a more easy to use version of the bpftypes.BPFProgInfo the main difference being that this struct contains the actual from the kernel not just pointers to them

func GetLoadedPrograms

func GetLoadedPrograms() ([]BPFProgInfo, error)

GetLoadedPrograms returns a slice of info object about all loaded bpf programs

func GetProgramInfo

func GetProgramInfo(fd bpfsys.BPFfd) (*BPFProgInfo, error)

type BPFProgram

type BPFProgram struct {
	// Name of the program
	Name    ObjName
	License string
	// The actual instructions of the program
	Instructions []ebpf.RawInstruction
	// Locations where map fds need to be inserted into the
	// program before loading
	MapFDLocations map[string][]uint64
	Maps           map[string]BPFMap

	// A list of network interface ids the program is linked to
	AttachedNetlinkIDs []int
	AttachedSocketFDs  []int
	// contains filtered or unexported fields
}

func NewBPFProgram

func NewBPFProgram() *BPFProgram

func (*BPFProgram) DecodeToReader

func (p *BPFProgram) DecodeToReader(w io.Writer) error

DecodeToReader decodes the eBPF program and writes the human readable format to the provided w. The output that is generated is inspired by the llvm-objdump -S output format of eBPF programs

func (*BPFProgram) Fd

func (p *BPFProgram) Fd() (bpfsys.BPFfd, error)

func (*BPFProgram) Load

func (p *BPFProgram) Load(settings BPFProgramLoadSettings) (log string, err error)

func (*BPFProgram) Pin

func (p *BPFProgram) Pin(relativePath string) error

Pin pins the program to a location in the bpf filesystem, since the file system now also holds a reference to the program, the original creator of the program can terminate without triggering the program to be closed as well. A program can be unpinned from the bpf FS by another process thus transferring it or persisting it across multiple runs of the same program.

func (*BPFProgram) SocketAttach

func (p *BPFProgram) SocketAttach(fd uintptr) error

SocketAttach attempts to attach a filter program to the network socket indicated by the given file descriptor. This function can be used if network file descriptors are managed outside of the net package or when using the net.TCPListener.File function to get a duplicate file descriptor.

func (*BPFProgram) SocketAttachControlFunc

func (p *BPFProgram) SocketAttachControlFunc(network, address string, c syscall.RawConn) error

SocketAttachControlFunc attaches a "socket filter" program to a network socket. This function is meant to be used as function pointer in net.Dialer.Control or net.ListenConfig.Control.

func (*BPFProgram) SocketDettach

func (p *BPFProgram) SocketDettach(settings BPFProgramSocketFilterDetachSettings) error

SocketDettach detaches the program from one or all sockets.

func (*BPFProgram) Unpin

func (p *BPFProgram) Unpin(relativePath string, deletePin bool) error

Unpin captures the file descriptor of the program at the given 'relativePath' from the kernel. If 'deletePin' is true the bpf FS pin will be removed after successfully loading the program, thus transferring ownership of the program in a scenario where the program is not shared between multiple userspace programs. Otherwise the pin will keep existing which will cause the map to not be deleted when this program exits.

func (*BPFProgram) XDPLinkAttach

func (p *BPFProgram) XDPLinkAttach(settings BPFProgramXDPLinkAttachSettings) error

XDPLinkAttach attaches a already loaded eBPF XDP program to a network device. If attaching fails due to the XDP mode we will automatically attempt to fallback to slower but better supported XDP mode

func (*BPFProgram) XDPLinkDetach

func (p *BPFProgram) XDPLinkDetach(settings BPFProgramXDPLinkDetachSettings) error

XDPLinkDetach detaches a XDP program from one or all network interfaces it is attached to.

func (*BPFProgram) XDPTestProgram added in v0.3.0

func (p *BPFProgram) XDPTestProgram(settings TestXDPProgSettings) (*TestXDPProgResult, error)

XDPTestProgram executes a loaded XDP program on supplied data. This feature can be used to test the functionality of an XDP program without having to generate actual traffic on an interface. It is also useful for benchmarking a XDP programs which is otherwise impractical.

type BPFProgramLoadSettings

type BPFProgramLoadSettings struct {
	// The type of eBPF program, this determines how the program will be verified and to which
	// attach point it can attach.
	ProgramType bpftypes.BPFProgType
	// A hint to the verifier about where you are going to attach the program.
	// This value can be left default for most program types, but must be set for some programs types.
	// This value may restrict where the program may be attached
	ExpectedAttachType bpftypes.BPFAttachType
	// The index of the network interface to which the program will be attached.
	// This is only required for XDP offloading in hardware mode.
	// In hardware mode the kernel needs to know how to convert eBPF into code that can run on the
	// hardware, so at load time it needs to know which devices will be used.
	IfIndex          uint32
	VerifierLogLevel bpftypes.BPFLogLevel
	VerifierLogSize  int
}

type BPFProgramSocketFilterDetachSettings

type BPFProgramSocketFilterDetachSettings struct {
	// the file descriptor of the network socket from which the program should be detached
	Fd int
	// If true, the program will be detached from all network interfaces
	All bool
}

type BPFProgramXDPLinkAttachSettings

type BPFProgramXDPLinkAttachSettings struct {
	// Name of the network interface to which to attach the XDP program
	InterfaceName string
	// If true, this program will replace any existing program.
	// If false, attempting to attach a program while one is still loaded will cause an error
	Replace bool
	XDPMode XDPMode
	// If true, we will return a error when we can't attach the program in the specified mode
	// If false, we will automatically fallback to a less specific XPDMode if the current mode fails.
	DisableFallback bool
}

type BPFProgramXDPLinkDetachSettings

type BPFProgramXDPLinkDetachSettings struct {
	// Name of the network interface from which the program should detach
	InterfaceName string
	// If true, the program will be detached from all network interfaces
	All bool
}

type BatchLookupIterator

type BatchLookupIterator struct {
	// The map over which to iterate
	BPFMap BPFMap
	// Size of the buffer, bigger buffers are more cpu efficient but takeup more memory
	BufSize int
	// contains filtered or unexported fields
}

func (*BatchLookupIterator) Init

func (bli *BatchLookupIterator) Init(key, value interface{}) error

func (*BatchLookupIterator) Next

func (bli *BatchLookupIterator) Next() (updated bool, err error)

Next gets the key and value at the current location and writes them to the pointers given to the iterator during initialization. It then advances the internal pointer to the next key and value. If the iterator can't get the key and value at the current location since we are done iterating or an error was encountered 'updated' is false.

type ELFParseSettings

type ELFParseSettings struct {
	// If true, names which are to large will be truncated, this can cause unexpected behavior
	// Otherwise an error will be generated.
	TruncateNames bool
}

type ELFRelocEntry

type ELFRelocEntry struct {
	elf.Rel64

	Symbol *elf.Symbol
	Type   ELF_R_BPF
}

func (*ELFRelocEntry) AbsoluteOffset

func (e *ELFRelocEntry) AbsoluteOffset() (uint64, error)

type ELFRelocTable

type ELFRelocTable []ELFRelocEntry

type ELF_R_BPF

type ELF_R_BPF int

ELF_R_BPF The BPF ELF reloc types for BPF. https://github.com/llvm/llvm-project/blob/74d9a76ad3f55c16982ceaa8b6b4a6b7744109b1/llvm/include/llvm/BinaryFormat/ELFRelocs/BPF.def

const (
	// R_BPF_NONE is an invalid relocation type
	R_BPF_NONE ELF_R_BPF = 0
	// R_BPF_64_64 indicates that 32 bits should be relocated
	R_BPF_64_64 ELF_R_BPF = 1
	// R_BPF_64_32 insicates that 64 bits should be relocated
	R_BPF_64_32 ELF_R_BPF = 10
)

type FrameLeaser

type FrameLeaser interface {
	ReadLease() (*XSKLease, error)
	WriteLease() (*XSKLease, error)
}

type FrameReader

type FrameReader interface {
	ReadFrame(p []byte) (n int, err error)
}

A FrameReader can read whole or partial ethernet frames. Every time ReadFrame is called, p will be filled with up to len(p) bytes from a single frame. These bytes include both the header and body of the ethernet frame. If p to small to fit the whole frame, the remaining bytes of the frame are discarded. The next call to ReadFrame will start at the next frame.

n will be set to the number of bytes read from the the frame. err is non nil if any error has occurred during the process. If both n is 0 and err is nil nothing was read for an expected reason like a timout or external interrupt.

type FrameWriter

type FrameWriter interface {
	WriteFrame(p []byte) (n int, err error)
}

type MapIterator

type MapIterator interface {
	// Init should be called with a key and value pointer to variables which will be used on subsequent calls to
	// Next to set values. The key and value pointers must be compatible with the map.
	// The value of key should not be modified between the first call to Next and discarding of the iterator since
	// it is reused. Doing so may cause skipped entries, duplicate entries, or error opon calling Next.
	Init(key, value interface{}) error
	// Next assignes the next value to the key and value last passed via the Init func.
	// True is returned if key and value was updated.
	// If updated is false and err is nil, all values from the iterator were read.
	// On error a iterator should also be considered empty and can be discarded.
	Next() (updated bool, err error)
}

A MapIterator describes an iterator which can iterate over all keys and values of a map without keeping all contents in userspace memory at the same time. Since maps can be constantly updated by a eBPF program the results are not guaranteed, expect to read duplicate values or not get all keys. This depends greatly on the frequency of change of the map, the type of map (arrays are not effected, hashes are) and speed of iteration. It is recommended to quickly iterate over maps and not to change them during iteration to reduce these effects.

type ObjName

type ObjName struct {
	// contains filtered or unexported fields
}

func MustNewObjName

func MustNewObjName(initialName string) ObjName

func NewObjName

func NewObjName(initialName string) (*ObjName, error)

func (*ObjName) GetCstr

func (on *ObjName) GetCstr() [bpftypes.BPF_OBJ_NAME_LEN]byte

func (*ObjName) SetBytes

func (on *ObjName) SetBytes(strBytes []byte) error

func (*ObjName) SetString

func (on *ObjName) SetString(str string) error

func (*ObjName) String

func (on *ObjName) String() string

type ProgArrayMap

type ProgArrayMap struct {
	AbstractMap
}

ProgArrayMap is a specialized map type used for tail calls https://docs.cilium.io/en/stable/bpf/#tail-calls

func (*ProgArrayMap) Get

func (m *ProgArrayMap) Get(key int) (int, error)

Get performs a lookup in the xskmap based on the key and returns the file descriptor of the socket

func (*ProgArrayMap) Set

func (m *ProgArrayMap) Set(key int32, value *BPFProgram) error

type SingleLookupIterator

type SingleLookupIterator struct {
	// The map over which to iterate
	BPFMap BPFMap
	// contains filtered or unexported fields
}

SingleLookupIterator uses the MapGetNextKey and MapLookupElem commands to iterate over a map. This is very widely supported but not the fastest option.

func (*SingleLookupIterator) Init

func (sli *SingleLookupIterator) Init(key, value interface{}) error

func (*SingleLookupIterator) Next

func (sli *SingleLookupIterator) Next() (updated bool, err error)

Next gets the key and value at the current location and writes them to the pointers given to the iterator during initialization. It then advances the internal pointer to the next key and value. If the iterator can't get the key and value at the current location since we are done iterating or an error was encountered 'updated' is false.

type TestXDPProgResult added in v0.3.0

type TestXDPProgResult struct {
	// The return value of the program
	ReturnValue int32
	// The avarage duration of a single run in nanoseconds
	Duration uint32
	// The modified data (as it would be received by the network stack)
	Data []byte
}

TestXDPProgResult is the result of XDPTestProgram

type TestXDPProgSettings added in v0.3.0

type TestXDPProgSettings struct {
	// How often should the test be repeated? For benchmarking purposes
	Repeat uint32
	// The input data, in this case the ethernet frame to check
	Data []byte
}

TestXDPProgSettings are the settings passed to XDPTestProgram

type XDPMode

type XDPMode int
const (
	// XDPModeHW indicates that the XDP program should be loaded in hardware mode.
	// This requires support from the NIC and driver but is the fastest mode available.
	XDPModeHW XDPMode = iota
	// XDPModeDRV indicates that the XDP program should be loaded in driver mode.
	// This requires driver support but is faster than SKB mode because it runs at the driver level.
	XDPModeDRV
	// XDPModeSKB indicates that the XDP program should be loaded driver independent mode.
	// This works for every network driver but is the slowest option, if other loading methods fail this is the fallback
	XDPModeSKB
)

type XSKIterator

type XSKIterator struct {
	// contains filtered or unexported fields
}

func (*XSKIterator) Init

func (xi *XSKIterator) Init(key, value interface{}) error

func (*XSKIterator) Next

func (xi *XSKIterator) Next() (updated bool, err error)

Next gets the key and value at the current location and writes them to the pointers given to the iterator during initialization. It then advances the internal pointer to the next key and value. If the iterator can't get the key and value at the current location since we are done iterating or an error was encountered 'updated' is false.

type XSKLease

type XSKLease struct {
	Data []byte
	// The amount of bytes which are prefixed at the start which don't contain frame data.
	// This headroom can be used to add an extra header(encapsulation) without having to
	// copy or move the existing packet data.
	Headroom int
	// contains filtered or unexported fields
}

XSKLease is used to "lease" a piece of buffer memory from the socket and return it after the user is done using it. This allows us to implement true zero copy packet access. After a XSKLease is released or written the underlaying array of Data will be repurposed, to avoid strage bugs users must use Data or sub-slices of Data after the lease has been released.

func (*XSKLease) Release

func (xl *XSKLease) Release() error

Release releases the leased memory so the kernel can fill it with new data.

func (*XSKLease) Write

func (xl *XSKLease) Write() error

Write writes a lease to the network interface. The len property of the 'Data' slice - 'Headroom' is the length of the packet. Make sure to resize the Data to the size of the data to be transmitted. The headroom should always be included(never resize the start of the slice). The 'Headroom' should be used to indicate from which byte the headroom starts. After Write has been called the lease will be released and the Data slice or its subslices should not be used anymore.

type XSKMap

type XSKMap struct {
	AbstractMap
	// contains filtered or unexported fields
}

XSKMap is a specialized map type designed to work in conjunction with XSKSocket's.

func (*XSKMap) Delete

func (m *XSKMap) Delete(key uint32) error

func (*XSKMap) Get

func (m *XSKMap) Get(key uint32) (*XSKSocket, error)

Get performs a lookup in the xskmap based on the key and returns the file descriptor of the socket

func (*XSKMap) Iterator

func (m *XSKMap) Iterator() MapIterator

func (*XSKMap) Load

func (m *XSKMap) Load() error

func (*XSKMap) Set

func (m *XSKMap) Set(key uint32, value *XSKSocket) error

type XSKMultiSocket

type XSKMultiSocket struct {
	// contains filtered or unexported fields
}

XSKMultiSocket is a collection of XSKSockets. The multi socket balances reads and writes between all XSKSockets. This is useful for multi queue netdevices since a XSKSocket can only read or write from one rx/tx queue pair at a time. A multi queue allows you to bundle all of these sockets so you get a socket for the whole netdevice.

An alternative use for the multi socket is to add sockets from multiple netdevices.

TODO look into using epoll for multi sockets. Using poll for single sockets still makes sense since there is always

1 fd, but for multi sockets we can have much more. For high-end NICs with ~40 rx/tx queues(mallanox for example)
it makes sense to start using epoll since it is supposed to scale better. Should make it configurable when adding
support in case freeBSD or other unix-like os adds XSK support since epoll is non-POSIX

TODO dynamic socket adding/removing. Should not be to hard, the main edge case to solve is dealing with

pending/blocking syscalls for read/write. But presumably epoll can allow us to dynamically add/remove
fds without interrupting the reads/writes. Otherwise adding/removing sockets will have to request both the
rmu and wmu.

func NewXSKMultiSocket

func NewXSKMultiSocket(xskSockets ...*XSKSocket) (*XSKMultiSocket, error)

func (*XSKMultiSocket) Close

func (xms *XSKMultiSocket) Close() error

func (*XSKMultiSocket) ReadFrame

func (xms *XSKMultiSocket) ReadFrame(p []byte) (n int, err error)

func (*XSKMultiSocket) ReadLease

func (xms *XSKMultiSocket) ReadLease() (lease *XSKLease, err error)

ReadLease reads a frame from the socket and returns its memory in a XSKLease. After reading the contents of the frame it can be released or written, both will allow the memory to be reused. Calling Write on the lease will cause the contents of Data to be written back to the network interface. The contents of Data can be modified before calling Write thus allowing a program to implement zero-copy/zero-allocation encaptulation or request/response protocols.

func (*XSKMultiSocket) SetReadTimeout

func (xms *XSKMultiSocket) SetReadTimeout(ms int) error

SetReadTimeout sets the timeout for Read and ReadLease calls. If ms == 0 (default), we will never block/wait and return no data if there isn't any ready. If ms == -1, we will block forever until we can read. If ms > 0, we will wait for x miliseconds for an oppurunity to read or return no data.

func (*XSKMultiSocket) SetWriteTimeout

func (xms *XSKMultiSocket) SetWriteTimeout(ms int) error

SetWriteTimeout sets the timeout for Write and XSKLease.WriteBack calls. If ms == 0 (default), we will never block/wait and error if we can't write at once. If ms == -1, we will block forever until we can write. If ms > 0, we will wait for x miliseconds for an oppurunity to write or error afterwards.

func (*XSKMultiSocket) WriteFrame

func (xms *XSKMultiSocket) WriteFrame(p []byte) (n int, err error)

func (*XSKMultiSocket) WriteLease

func (xms *XSKMultiSocket) WriteLease() (lease *XSKLease, err error)

WriteLease creates a XSKLease which points to a piece of preallocated memory. This memory can be used to build packets for writing. Unlike XSKLeases gotten from ReadLease, write leases have no Headroom. The Data slice of the lease is the full length of the usable frame, this length should not be exceeded. Any memory held by the lease can't be reused until released or written.

This function blocks until a frame for transmission is available and is not subject to the write timeout.

type XSKSettings

type XSKSettings struct {
	// Size of the umem frames/packet buffers (2048 or 4096)
	FrameSize int
	// Amount of frames/packets which can be used, must be a power of 2
	FrameCount int
	// The index of the network device on which XSK will be used
	NetDevIfIndex int
	// The id of the Queue on which this XSK will be used
	QueueID int
	// How much unused space should be left at the start of each buffer.
	// This can be used to for example encapsulate a packet whichout having to move or copy memory
	Headroom int
	// Is Tx disabled for this socket?
	DisableTx bool
	// Is Rx disabled for this socket?
	DisableRx bool
	// If true, XDP_USE_NEED_WAKEUP is not used. Should be on by default
	// unless there is a reason it doesn't work (like on older kernels)
	DisableNeedWakeup bool
	// If true, zero copy mode is forced. By default zero copy mode is attempted and if not available
	// in the driver will automatically fallback to copy mode.
	ForceZeroCopy bool
	// If true, copy mode is always used and zero copy mode never attempted.
	ForceCopy bool
	// The minimum time between two checks of the completion queue. A lower value allows for more transmitted
	// packets per seconds at the cost of higher CPU usage, even when not transmitting.
	// By default this value is 10ms which seems a sane value, it means that there is a theorethical max TX rate of
	// (1000/10) * (tx ring size) which is 100 * 2048 = 204,800 packets per second when DisableRx = false
	// or 100 * 4096 = 409,600 when DisableRx = true at the default FrameCount of 4096.
	// Setting this setting to 0 will cause one goroutine to busy poll(use 100% CPU) per socket.
	CQConsumeInterval *time.Duration
}

type XSKSocket

type XSKSocket struct {
	// contains filtered or unexported fields
}

A XSKSocket can bind to one queue on one netdev

func NewXSKSocket

func NewXSKSocket(settings XSKSettings) (_ *XSKSocket, err error)

func (*XSKSocket) Close

func (xs *XSKSocket) Close() error

func (*XSKSocket) Fd

func (xs *XSKSocket) Fd() int

Fd returns the file descriptor of the socket.

func (*XSKSocket) ReadFrame

func (xs *XSKSocket) ReadFrame(p []byte) (n int, err error)

ReadFrame implements FrameReader, however we have to implement this with a memory copy which is not ideal for efficiency. For zero copy packet access ReadLease should be used.

func (*XSKSocket) ReadLease

func (xs *XSKSocket) ReadLease() (lease *XSKLease, err error)

ReadLease reads a frame from the socket and returns its memory in a XSKLease. After reading the contents of the frame it can be released or written, both will allow the memory to be reused. Calling Write on the lease will cause the contents of Data to be written back to the network interface. The contents of Data can be modified before calling Write thus allowing a program to implement zero-copy/zero-allocation encaptulation or request/response protocols.

func (*XSKSocket) SetReadTimeout

func (xs *XSKSocket) SetReadTimeout(ms int) error

SetReadTimeout sets the timeout for Read and ReadLease calls. If ms == 0 (default), we will never block/wait and return no data if there isn't any ready. If ms == -1, we will block forever until we can read. If ms > 0, we will wait for x miliseconds for an oppurunity to read or return no data.

func (*XSKSocket) SetWriteTimeout

func (xs *XSKSocket) SetWriteTimeout(ms int) error

SetWriteTimeout sets the timeout for Write and XSKLease.WriteBack calls. If ms == 0 (default), we will never block/wait and error if we can't write at once. If ms == -1, we will block forever until we can write. If ms > 0, we will wait for x miliseconds for an oppurunity to write or error afterwards.

func (*XSKSocket) WriteFrame

func (xs *XSKSocket) WriteFrame(p []byte) (n int, err error)

WriteFrame implements FrameWriter. The interface requires us to copy p into umem which is not optimal for speed. For maximum performance use WriteLease instead.

func (*XSKSocket) WriteLease

func (xs *XSKSocket) WriteLease() (lease *XSKLease, err error)

WriteLease creates a XSKLease which points to a piece of preallocated memory. This memory can be used to build packets for writing. Unlike XSKLeases gotten from ReadLease, write leases have no Headroom. The Data slice of the lease is the full length of the usable frame, this length should not be exceeded. Any memory held by the lease can't be reused until released or written.

This function blocks until a frame for transmission is available and is not subject to the write timeout.

Directories

Path Synopsis
Package bpfsys contains low level functions related to syscalls and kernel interactions.
Package bpfsys contains low level functions related to syscalls and kernel interactions.
cmd
Package ebpf contains all types and constants to decode, encode, and generate eBPF bytecode in go.
Package ebpf contains all types and constants to decode, encode, and generate eBPF bytecode in go.
package kernelsupport is used to query what eBPF features are supported for different version of the linux kernel
package kernelsupport is used to query what eBPF features are supported for different version of the linux kernel

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL