ring0

package
v0.0.0-...-d22dedf Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 17, 2025 License: Apache-2.0, MIT Imports: 9 Imported by: 4

Documentation

Overview

Package ring0 provides basic operating system-level stubs.

Index

Constants

View Source
const (
	// KernelFlagsSet should always be set in the kernel.
	KernelFlagsSet = _RFLAGS_RESERVED

	// UserFlagsSet are always set in userspace.
	//
	// _RFLAGS_IOPL is a set of two bits and it shows the I/O privilege
	// level. The Current Privilege Level (CPL) of the task must be less
	// than or equal to the IOPL in order for the task or program to access
	// I/O ports.
	//
	// Here, _RFLAGS_IOPL0 is used only to determine whether the task is
	// running in the kernel or userspace mode. In the user mode, the CPL is
	// always 3 and it doesn't matter what IOPL is set if it is below CPL.
	//
	// We need to have one bit which will be always different in user and
	// kernel modes. And we have to remember that even though we have
	// KernelFlagsClear, we still can see some of these flags in the kernel
	// mode. This can happen when the goruntime switches on a goroutine
	// which has been saved in the host mode. On restore, the popf
	// instruction is used to restore flags and this means that all flags
	// what the goroutine has in the host mode will be restored in the
	// kernel mode.
	//
	// _RFLAGS_IOPL0 is never set in host and kernel modes and we always set
	// it in the user mode. So if this flag is set, the task is running in
	// the user mode and if it isn't set, the task is running in the kernel
	// mode.
	UserFlagsSet = _RFLAGS_RESERVED | _RFLAGS_IF | _RFLAGS_IOPL0

	// KernelFlagsClear should always be clear in the kernel.
	KernelFlagsClear = _RFLAGS_STEP | _RFLAGS_IF | _RFLAGS_IOPL | _RFLAGS_AC | _RFLAGS_NT

	// UserFlagsClear are always cleared in userspace.
	UserFlagsClear = _RFLAGS_NT | _RFLAGS_IOPL1
)
View Source
const (
	SegmentDescriptorAccess     SegmentDescriptorFlags = 1 << 8  // Access bit (always set).
	SegmentDescriptorWrite                             = 1 << 9  // Write permission.
	SegmentDescriptorExpandDown                        = 1 << 10 // Grows down, not used.
	SegmentDescriptorExecute                           = 1 << 11 // Execute permission.
	SegmentDescriptorSystem                            = 1 << 12 // Zero => system, 1 => user code/data.
	SegmentDescriptorPresent                           = 1 << 15 // Present.
	SegmentDescriptorAVL                               = 1 << 20 // Available.
	SegmentDescriptorLong                              = 1 << 21 // Long mode.
	SegmentDescriptorDB                                = 1 << 22 // 16 or 32-bit.
	SegmentDescriptorG                                 = 1 << 23 // Granularity: page or byte.
)

SegmentDescriptorFlag declarations.

Variables

View Source
var (
	// VirtualAddressBits is the number of bits available in the virtual
	// address space.
	//
	// Initialized by ring0.Init.
	VirtualAddressBits uintptr

	// PhysicalAddressBits is the number of bits available in the physical
	// address space.
	//
	// Initialized by ring0.Init.
	PhysicalAddressBits uintptr

	// UserspaceSize is the total size of userspace.
	//
	// Initialized by ring0.Init.
	UserspaceSize uintptr

	// MaximumUserAddress is the largest possible user address.
	//
	// Initialized by ring0.Init.
	MaximumUserAddress uintptr

	// KernelStartAddress is the starting kernel address.
	//
	// Initialized by ring0.Init.
	KernelStartAddress uintptr
)

Functions

func AddrOfStart

func AddrOfStart() uintptr

AddrOfStart return the address of the CPU entrypoint.

The following start conditions must be satisfied:

  • AX should contain the CPU pointer.
  • c.GDT() should be loaded as the GDT.
  • c.IDT() should be loaded as the IDT.
  • c.CR0() should be the current CR0 value.
  • c.CR3() should be set to the kernel PageTables.
  • c.CR4() should be the current CR4 value.
  • c.EFER() should be the current EFER value.

The CPU state will be set to c.Registers().

In Go 1.17+, Go references to assembly functions resolve to an ABIInternal wrapper function rather than the function itself. We must reference from assembly to get the ABI0 (i.e., primary) address.

func Halt

func Halt()

Halt halts execution.

func HaltAndWriteFSBase

func HaltAndWriteFSBase(regs *arch.Registers)

HaltAndWriteFSBase halts execution. On resume, it sets FS_BASE from the value in regs.

func Init

func Init(fs cpuid.FeatureSet)

Init sets function pointers based on architectural features.

This must be called prior to using ring0. It may be called with the auto-detected feature set using InitDefault. It may also be called at another time with a different FeatureSet.

func InitDefault

func InitDefault()

InitDefault initializes ring0 with the auto-detected host feature set.

func IsCanonical

func IsCanonical(addr uint64) bool

IsCanonical indicates whether addr is canonical per the amd64 spec.

func IsKernelFlags

func IsKernelFlags(rflags uint64) bool

IsKernelFlags returns true if rflags corresponds to the kernel mode.

func ReadCR2

func ReadCR2() uintptr

ReadCR2 reads the current CR2 value.

func SetCPUIDFaulting

func SetCPUIDFaulting(on bool) bool

SetCPUIDFaulting sets CPUID faulting per the boolean value.

True is returned if faulting could be set.

Types

type CPU

type CPU struct {

	// CPUArchState is architecture-specific state.
	CPUArchState
	// contains filtered or unexported fields
}

CPU is the per-CPU struct.

func (*CPU) CR0

func (c *CPU) CR0() uint64

CR0 returns the CPU's CR0 value.

func (*CPU) CR4

func (c *CPU) CR4() uint64

CR4 returns the CPU's CR4 value.

func (*CPU) ClearErrorCode

func (c *CPU) ClearErrorCode()

ClearErrorCode resets the error code.

func (*CPU) EFER

func (c *CPU) EFER() uint64

EFER returns the CPU's EFER value.

func (*CPU) ErrorCode

func (c *CPU) ErrorCode() (value uintptr, user bool)

ErrorCode returns the last error code.

The returned boolean indicates whether the error code corresponds to the last user error or not. If it does not, then fault information must be ignored. This is generally the result of a kernel fault while servicing a user fault.

func (*CPU) FaultAddr

func (c *CPU) FaultAddr() uintptr

FaultAddr returns the last fault address.

func (*CPU) FloatingPointState

func (c *CPU) FloatingPointState() *fpu.State

FloatingPointState returns the kernel floating point state.

This is explicitly safe to call during KernelException and KernelSyscall.

func (*CPU) GDT

func (c *CPU) GDT() (uint64, uint16)

GDT returns the CPU's GDT base and limit.

func (*CPU) IDT

func (c *CPU) IDT() (uint64, uint16)

IDT returns the CPU's IDT base and limit.

func (*CPU) Init

func (c *CPU) Init(k *Kernel, cpuID int, hooks Hooks)

Init initializes a new CPU.

Init allows embedding in other objects.

func (*CPU) Registers

func (c *CPU) Registers() *arch.Registers

Registers returns a modifiable-copy of the kernel registers.

This is explicitly safe to call during KernelException and KernelSyscall.

func (*CPU) StackTop

func (c *CPU) StackTop() uint64

StackTop returns the kernel's stack address.

func (*CPU) SwitchToUser

func (c *CPU) SwitchToUser(switchOpts SwitchOpts) (vector Vector)

SwitchToUser performs either a sysret or an iret.

The return value is the vector that interrupted execution.

This function will not split the stack. Callers will probably want to call runtime.entersyscall (and pair with a call to runtime.exitsyscall) prior to calling this function.

When this is done, this region is quite sensitive to things like system calls. After calling entersyscall, any memory used must have been allocated and no function calls without go:nosplit are permitted. Any calls made here are protected appropriately (e.g. IsCanonical and CR3).

Also note that this function transitively depends on the compiler generating code that uses IP-relative addressing inside of absolute addresses. That's the case for amd64, but may not be the case for other architectures.

Precondition: the Rip, Rsp, Fs and Gs registers must be canonical.

+checkescape:all

func (*CPU) TSS

func (c *CPU) TSS() (uint64, uint16, *SegmentDescriptor)

TSS returns the CPU's TSS base, limit and value.

func (*CPU) Vector

func (c *CPU) Vector() uintptr

Vector returns the vector of the last exception.

type CPUArchState

type CPUArchState struct {
	// contains filtered or unexported fields
}

CPUArchState contains CPU-specific arch state.

type Gate64

type Gate64 struct {
	// contains filtered or unexported fields
}

Gate64 is a 64-bit task, trap, or interrupt gate.

type Hooks

type Hooks interface {
	// KernelSyscall is called for kernel system calls.
	//
	// Return from this call will restore registers and return to the kernel: the
	// registers must be modified directly.
	//
	// If this function is not provided, a kernel exception results in halt.
	//
	// This must be go:nosplit, as this will be on the interrupt stack.
	// Closures are permitted, as the pointer to the closure frame is not
	// passed on the stack.
	KernelSyscall()

	// KernelException handles an exception during kernel execution.
	//
	// Return from this call will restore registers and return to the kernel: the
	// registers must be modified directly.
	//
	// If this function is not provided, a kernel exception results in halt.
	//
	// This must be go:nosplit, as this will be on the interrupt stack.
	// Closures are permitted, as the pointer to the closure frame is not
	// passed on the stack.
	KernelException(Vector)
}

Hooks are hooks for kernel functions.

type Kernel

type Kernel struct {
	// PageTables are the kernel pagetables; this must be provided.
	PageTables *pagetables.PageTables

	KernelArchState
}

Kernel is a global kernel object.

This contains global state, shared by multiple CPUs.

func (*Kernel) EntryRegions

func (k *Kernel) EntryRegions() map[uintptr]uintptr

EntryRegions returns the set of kernel entry regions (must be mapped).

func (*Kernel) Init

func (k *Kernel) Init(maxCPUs int)

Init initializes a new kernel.

type KernelArchState

type KernelArchState struct {
	// contains filtered or unexported fields
}

KernelArchState contains architecture-specific state.

type SegmentDescriptor

type SegmentDescriptor struct {
	// contains filtered or unexported fields
}

SegmentDescriptor is a segment descriptor.

var (
	UserCodeSegment32 SegmentDescriptor
	UserDataSegment   SegmentDescriptor
	UserCodeSegment64 SegmentDescriptor
	KernelCodeSegment SegmentDescriptor
	KernelDataSegment SegmentDescriptor
)

Standard segments.

func (*SegmentDescriptor) Base

func (d *SegmentDescriptor) Base() uint32

Base returns the descriptor's base linear address.

func (*SegmentDescriptor) DPL

func (d *SegmentDescriptor) DPL() int

DPL returns the descriptor privilege level.

func (*SegmentDescriptor) Flags

Flags returns descriptor flags.

func (*SegmentDescriptor) Limit

func (d *SegmentDescriptor) Limit() uint32

Limit returns the descriptor size.

type SegmentDescriptorFlags

type SegmentDescriptorFlags uint32

SegmentDescriptorFlags are typed flags within a descriptor.

type Selector

type Selector uint16

Selector is a segment Selector.

const (
	Kcode   Selector = segKcode << 3
	Kdata   Selector = segKdata << 3
	Ucode32 Selector = (segUcode32 << 3) | 3
	Udata   Selector = (segUdata << 3) | 3
	Ucode64 Selector = (segUcode64 << 3) | 3
	Tss     Selector = segTss << 3
)

Selectors.

type SwitchArchOpts

type SwitchArchOpts struct {
	// UserPCID indicates that the application PCID to be used on switch,
	// assuming that PCIDs are supported.
	//
	// Per pagetables_x86.go, a zero PCID implies a flush.
	UserPCID uint16

	// KernelPCID indicates that the kernel PCID to be used on return,
	// assuming that PCIDs are supported.
	//
	// Per pagetables_x86.go, a zero PCID implies a flush.
	KernelPCID uint16
}

SwitchArchOpts are embedded in SwitchOpts.

type SwitchOpts

type SwitchOpts struct {
	// Registers are the user register state.
	Registers *arch.Registers

	// FloatingPointState is a byte pointer where floating point state is
	// saved and restored.
	FloatingPointState *fpu.State

	// PageTables are the application page tables.
	PageTables *pagetables.PageTables

	// Flush indicates that a TLB flush should be forced on switch.
	Flush bool

	// FullRestore indicates that an iret-based restore should be used.
	FullRestore bool

	// SwitchArchOpts are architecture-specific options.
	SwitchArchOpts
}

SwitchOpts are passed to the Switch function.

type TaskState64

type TaskState64 struct {
	// contains filtered or unexported fields
}

TaskState64 is a 64-bit task state structure.

type Vector

type Vector uintptr

Vector is an exception vector.

const (
	DivideByZero Vector = iota
	Debug
	NMI
	Breakpoint
	Overflow
	BoundRangeExceeded
	InvalidOpcode
	DeviceNotAvailable
	DoubleFault
	CoprocessorSegmentOverrun
	InvalidTSS
	SegmentNotPresent
	StackSegmentFault
	GeneralProtectionFault
	PageFault

	X87FloatingPointException
	AlignmentCheck
	MachineCheck
	SIMDFloatingPointException
	VirtualizationException
	SecurityException = 0x1e
	SyscallInt80      = 0x80
)

Exception vectors.

const (
	Syscall Vector = _NR_INTERRUPTS
)

System call vectors.

Directories

Path Synopsis
Package pagetables provides a generic implementation of pagetables.
Package pagetables provides a generic implementation of pagetables.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL