pgalloc

package
v0.0.0-...-e27d27a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 27, 2024 License: Apache-2.0, MIT Imports: 31 Imported by: 39

Documentation

Overview

Package pgalloc contains the page allocator subsystem, which provides allocatable memory that may be mapped into application address spaces.

Index

Constants

View Source
const (
	// CtxMemoryFile is a Context.Value key for a MemoryFile.
	CtxMemoryFile contextID = iota

	// CtxMemoryCgroupID is the memory cgroup id which the task belongs to.
	CtxMemoryCgroupID

	// CtxMemoryFileMap is a Context.Value key for mapping
	// MemoryFileOpts.RestoreID to *MemoryFile. This is used for save/restore.
	CtxMemoryFileMap
)

Variables

This section is empty.

Functions

func IMAWorkAroundForMemFile

func IMAWorkAroundForMemFile(fd uintptr)

IMAWorkAroundForMemFile works around IMA by immediately creating a temporary PROT_EXEC mapping, while the backing file is still small. IMA will ignore any future mappings.

The Linux kernel contains an optional feature called "Integrity Measurement Architecture" (IMA). If IMA is enabled, it will checksum binaries the first time they are mapped PROT_EXEC. This is bad news for executable pages mapped from our backing file, which can grow to terabytes in (sparse) size. If IMA attempts to checksum a file that large, it will allocate all of the sparse pages and quickly exhaust all memory.

func MemoryCgroupIDFromContext

func MemoryCgroupIDFromContext(ctx context.Context) uint32

MemoryCgroupIDFromContext returns the memory cgroup id of the ctx, or zero if the ctx does not belong to any memory cgroup.

func MemoryFileMapFromContext

func MemoryFileMapFromContext(ctx context.Context) map[string]*MemoryFile

MemoryFileMapFromContext returns the memory file map used by ctx, or nil if no such map exists.

Types

type AllocOpts

type AllocOpts struct {
	// Kind is the allocation's memory accounting type.
	Kind usage.MemoryKind

	// MemCgID is the memory cgroup ID and the zero value indicates that
	// the memory will not be accounted to any cgroup.
	MemCgID uint32

	// Mode controls the commitment status of returned pages.
	Mode AllocationMode

	// If Huge is true, the allocation should be hugepage-backed if possible.
	Huge bool

	// Dir indicates the direction in which offsets are allocated.
	Dir Direction

	// If ReaderFunc is provided, the allocated memory is filled by calling it
	// repeatedly until either length bytes are read or a non-nil error is
	// returned. It returns the allocated memory, truncated down to the nearest
	// page. If this is shorter than length bytes due to an error returned by
	// ReaderFunc, it returns the partially filled fr and error.
	ReaderFunc safemem.ReaderFunc
}

AllocOpts are options used in MemoryFile.Allocate.

type AllocationMode

type AllocationMode int

AllocationMode is the type of AllocOpts.Mode.

const (
	// AllocateUncommitted indicates that MemoryFile.Allocate() must return
	// uncommitted pages.
	AllocateUncommitted AllocationMode = iota

	// AllocateCallerIndirectCommit indicates that the caller of
	// MemoryFile.Allocate() intends to commit all allocated pages, without
	// using our page tables. Thus, Allocate() may return committed or
	// uncommitted pages.
	AllocateCallerIndirectCommit

	// AllocateAndCommit indicates that MemoryFile.Allocate() must return
	// committed pages.
	AllocateAndCommit

	// AllocateAndWritePopulate indicates that the caller of
	// MemoryFile.Allocate() intends to commit all allocated pages, using our
	// page tables. Thus, Allocate() may return committed or uncommitted pages,
	// and should pre-populate page table entries permitting writing for
	// mappings of those pages returned by MapInternal().
	AllocateAndWritePopulate
)

type DelayedEvictionType

type DelayedEvictionType uint8

DelayedEvictionType is the type of MemoryFileOpts.DelayedEviction.

const (
	// DelayedEvictionDefault has unspecified behavior.
	DelayedEvictionDefault DelayedEvictionType = iota

	// DelayedEvictionDisabled requires that evictable allocations are evicted
	// as soon as possible.
	DelayedEvictionDisabled

	// DelayedEvictionEnabled requests that the MemoryFile delay eviction of
	// evictable allocations until doing so is considered necessary to avoid
	// performance degradation due to host memory pressure, or OOM kills.
	//
	// As of this writing, the behavior of DelayedEvictionEnabled depends on
	// whether or not MemoryFileOpts.UseHostMemcgPressure is enabled:
	//
	//	- If UseHostMemcgPressure is true, evictions are delayed until memory
	//	pressure is indicated.
	//
	//	- Otherwise, evictions are only delayed until the releaser goroutine is
	//	out of work (pages to release).
	DelayedEvictionEnabled

	// DelayedEvictionManual requires that evictable allocations are only
	// evicted when MemoryFile.StartEvictions() is called. This is extremely
	// dangerous outside of tests.
	DelayedEvictionManual
)

type Direction

type Direction uint8

Direction is the type of AllocOpts.Dir.

const (
	// BottomUp allocates offsets in increasing offsets.
	BottomUp Direction = iota
	// TopDown allocates offsets in decreasing offsets.
	TopDown
)

func (Direction) String

func (d Direction) String() string

String implements fmt.Stringer.

type EvictableMemoryUser

type EvictableMemoryUser interface {
	// Evict requests that the EvictableMemoryUser deallocate memory used by
	// er, which was registered as evictable by a previous call to
	// MemoryFile.MarkEvictable.
	//
	// Evict is not required to deallocate memory. In particular, since pgalloc
	// must call Evict without holding locks to avoid circular lock ordering,
	// it is possible that the passed range has already been marked as
	// unevictable by a racing call to MemoryFile.MarkUnevictable.
	// Implementations of EvictableMemoryUser must detect such races and handle
	// them by making Evict have no effect on unevictable ranges.
	//
	// After a call to Evict, the MemoryFile will consider the evicted range
	// unevictable (i.e. it will not call Evict on the same range again) until
	// informed otherwise by a subsequent call to MarkEvictable.
	Evict(ctx context.Context, er EvictableRange)
}

An EvictableMemoryUser represents a user of MemoryFile-allocated memory that may be asked to deallocate that memory in the presence of memory pressure.

type LoadOpts

type LoadOpts struct {
	// If PagesFile is not nil, then page contents will be read from PagesFile,
	// starting at PagesFileOffset, rather than from r. If LoadFrom returns a
	// nil error, it increments PagesFileOffset by the number of bytes that
	// will be read out of PagesFile. PagesFile may be read even after LoadFrom
	// returns; OnAsyncPageLoadStart will be called before reading from
	// PagesFile begins, and OnAsyncPageLoadDone will be called after all reads
	// are complete. Callers must ensure that PagesFile remains valid until
	// OnAsyncPageLoadDone is called.
	PagesFile            *fd.FD
	PagesFileOffset      uint64
	OnAsyncPageLoadStart func()
	OnAsyncPageLoadDone  func(error)
}

LoadOpts provides options to MemoryFile.LoadFrom().

type MemoryFile

type MemoryFile struct {
	memmap.NoBufferedIOFallback
	// contains filtered or unexported fields
}

MemoryFile is a memmap.File whose pages may be allocated to arbitrary users.

func MemoryFileFromContext

func MemoryFileFromContext(ctx context.Context) *MemoryFile

MemoryFileFromContext returns the MemoryFile used by ctx, or nil if no such MemoryFile exists.

func NewMemoryFile

func NewMemoryFile(file *os.File, opts MemoryFileOpts) (*MemoryFile, error)

NewMemoryFile creates a MemoryFile backed by the given file. If NewMemoryFile succeeds, ownership of file is transferred to the returned MemoryFile.

func (*MemoryFile) Allocate

func (f *MemoryFile) Allocate(length uint64, opts AllocOpts) (memmap.FileRange, error)

Allocate returns a range of initially-zeroed pages of the given length, with a single reference on each page held by the caller. When the last reference on an allocated page is released, ownership of the page is returned to the MemoryFile, allowing it to be returned by a future call to Allocate.

Preconditions:

  • length > 0.
  • length must be page-aligned.
  • If opts.Hugepage == true, length must be hugepage-aligned.

func (*MemoryFile) AwaitLoadAll

func (f *MemoryFile) AwaitLoadAll() error

AwaitLoadAll blocks until async page loading has completed. If async page loading is not in progress, AwaitLoadAll returns immediately.

func (*MemoryFile) DataFD

func (f *MemoryFile) DataFD(fr memmap.FileRange) (int, error)

DataFD implements memmap.File.DataFD.

func (*MemoryFile) DecRef

func (f *MemoryFile) DecRef(fr memmap.FileRange)

DecRef implements memmap.File.DecRef.

func (*MemoryFile) Decommit

func (f *MemoryFile) Decommit(fr memmap.FileRange)

Decommit uncommits the given pages, causing them to become zeroed.

Preconditions:

  • fr.Start and fr.End must be page-aligned.
  • fr.Length() > 0.
  • At least one reference must be held on all pages in fr.

func (*MemoryFile) Destroy

func (f *MemoryFile) Destroy()

Destroy releases all resources used by f.

Preconditions: All pages allocated by f have been freed.

Postconditions: None of f's methods may be called after Destroy.

func (*MemoryFile) FD

func (f *MemoryFile) FD() int

FD implements memmap.File.FD.

func (*MemoryFile) File

func (f *MemoryFile) File() *os.File

File returns the backing file.

func (*MemoryFile) HasUniqueRef

func (f *MemoryFile) HasUniqueRef(fr memmap.FileRange) bool

HasUniqueRef returns true if all pages in the given range have exactly one reference. A return value of false is inherently racy, but if the caller holds a reference on the given range and is preventing other goroutines from copying it, then a return value of true is not racy.

Preconditions: At least one reference must be held on all pages in fr.

func (*MemoryFile) HugepagesEnabled

func (f *MemoryFile) HugepagesEnabled() bool

HugepagesEnabled returns true if the MemoryFile expects to back allocations for which AllocOpts.Huge == true with huge pages.

func (*MemoryFile) IncRef

func (f *MemoryFile) IncRef(fr memmap.FileRange, memCgID uint32)

IncRef implements memmap.File.IncRef.

func (*MemoryFile) IsAsyncLoading

func (f *MemoryFile) IsAsyncLoading() bool

IsAsyncLoading returns true if async page loading is in progress or has failed permanently.

func (*MemoryFile) IsDiskBacked

func (f *MemoryFile) IsDiskBacked() bool

IsDiskBacked returns true if f is backed by a file on disk.

func (*MemoryFile) IsSavable

func (f *MemoryFile) IsSavable() bool

IsSavable returns true if f is savable.

func (*MemoryFile) LoadFrom

func (f *MemoryFile) LoadFrom(ctx context.Context, r io.Reader, opts *LoadOpts) error

LoadFrom loads MemoryFile state from the given stream.

func (*MemoryFile) MapInternal

func (f *MemoryFile) MapInternal(fr memmap.FileRange, at hostarch.AccessType) (safemem.BlockSeq, error)

MapInternal implements memmap.File.MapInternal.

func (*MemoryFile) MarkAllUnevictable

func (f *MemoryFile) MarkAllUnevictable(user EvictableMemoryUser)

MarkAllUnevictable informs f that user no longer considers any offsets to be evictable. It otherwise has the same semantics as MarkUnevictable.

func (*MemoryFile) MarkEvictable

func (f *MemoryFile) MarkEvictable(user EvictableMemoryUser, er EvictableRange)

MarkEvictable allows f to request memory deallocation by calling user.Evict(er) in the future.

Redundantly marking an already-evictable range as evictable has no effect.

func (*MemoryFile) MarkSavable

func (f *MemoryFile) MarkSavable()

MarkSavable marks f as savable.

func (*MemoryFile) MarkUnevictable

func (f *MemoryFile) MarkUnevictable(user EvictableMemoryUser, er EvictableRange)

MarkUnevictable informs f that user no longer considers er to be evictable, so the MemoryFile should no longer call user.Evict(er). Note that, per EvictableMemoryUser.Evict's documentation, user.Evict(er) may still be called even after MarkUnevictable returns due to race conditions, and implementations of EvictableMemoryUser must handle this possibility.

Redundantly marking an already-unevictable range as unevictable has no effect.

func (*MemoryFile) RestoreID

func (f *MemoryFile) RestoreID() string

RestoreID returns the restore ID for f.

func (*MemoryFile) SaveTo

func (f *MemoryFile) SaveTo(ctx context.Context, w io.Writer, pw io.Writer, opts SaveOpts) error

SaveTo writes f's state to the given stream.

func (*MemoryFile) ShouldCacheEvictable

func (f *MemoryFile) ShouldCacheEvictable() bool

ShouldCacheEvictable returns true if f is meaningfully delaying evictions of evictable memory, such that it may be advantageous to cache data in evictable memory. The value returned by ShouldCacheEvictable may change between calls.

func (*MemoryFile) StartEvictions

func (f *MemoryFile) StartEvictions()

StartEvictions requests that f evict all evictable allocations. It does not wait for eviction to complete; for this, see MemoryFile.WaitForEvictions.

func (*MemoryFile) String

func (f *MemoryFile) String() string

String implements fmt.Stringer.String.

func (*MemoryFile) TotalSize

func (f *MemoryFile) TotalSize() uint64

TotalSize returns the current size of the backing file in bytes, which is an upper bound on the amount of memory that can currently be allocated from the MemoryFile. The value returned by TotalSize is permitted to change.

func (*MemoryFile) TotalUsage

func (f *MemoryFile) TotalUsage() (uint64, error)

TotalUsage returns an aggregate usage for all memory statistics except Mapped (which is external to MemoryFile). This is generally much cheaper than UpdateUsage, but will not provide a fine-grained breakdown.

func (*MemoryFile) UpdateUsage

func (f *MemoryFile) UpdateUsage(memCgIDs map[uint32]struct{}) error

UpdateUsage ensures that the memory usage statistics in usage.MemoryAccounting are up to date. If memCgIDs is nil, all the pages will be scanned. Else only the pages which belong to the memory cgroup ids in memCgIDs will be scanned and the memory usage will be updated.

func (*MemoryFile) WaitForEvictions

func (f *MemoryFile) WaitForEvictions()

WaitForEvictions blocks until f is no longer evicting any evictable allocations.

type MemoryFileOpts

type MemoryFileOpts struct {
	// DelayedEviction controls the extent to which the MemoryFile may delay
	// eviction of evictable allocations.
	DelayedEviction DelayedEvictionType

	// If UseHostMemcgPressure is true, use host memory cgroup pressure level
	// notifications to determine when eviction is necessary. This option has
	// no effect unless DelayedEviction is DelayedEvictionEnabled.
	UseHostMemcgPressure bool

	// DecommitOnDestroy indicates whether the entire host file should be
	// decommitted on destruction. This is appropriate for host filesystem based
	// files that need to be explicitly cleaned up to release disk space.
	DecommitOnDestroy bool

	// If DisableIMAWorkAround is true, NewMemoryFile will not call
	// IMAWorkAroundForMemFile().
	DisableIMAWorkAround bool

	// DiskBackedFile indicates that the MemoryFile is backed by a file on disk.
	DiskBackedFile bool

	// RestoreID is an opaque string used to reassociate the MemoryFile with its
	// replacement during restore.
	RestoreID string

	// If ExpectHugepages is true, MemoryFile will expect that the host will
	// attempt to back AllocOpts.Huge == true allocations with huge pages. If
	// ExpectHugepages is false, MemoryFile will expect that the host will back
	// all allocations with small pages.
	ExpectHugepages bool

	// If AdviseHugepage is true, MemoryFile will request that the host back
	// AllocOpts.Huge == true allocations with huge pages using MADV_HUGEPAGE.
	AdviseHugepage bool

	// If AdviseNoHugepage is true, MemoryFile will request that the host back
	// AllocOpts.Huge == false allocations with small pages using
	// MADV_NOHUGEPAGE.
	AdviseNoHugepage bool

	// If DisableMemoryAccounting is true, memory usage observed by the
	// MemoryFile will not be reported in usage.MemoryAccounting.
	DisableMemoryAccounting bool
}

MemoryFileOpts provides options to NewMemoryFile.

type SaveOpts

type SaveOpts struct {
	// If ExcludeCommittedZeroPages is true, SaveTo() will scan both committed
	// and possibly-committed pages to find zero pages, whose contents are
	// saved implicitly rather than explicitly to reduce checkpoint size. If
	// ExcludeCommittedZeroPages is false, SaveTo() will scan only
	// possibly-committed pages to find zero pages.
	//
	// Enabling ExcludeCommittedZeroPages will usually increase the time taken
	// by SaveTo() (due to the larger number of pages that must be scanned),
	// but may instead improve SaveTo() and LoadFrom() time, and checkpoint
	// size, if the application has many committed zero pages.
	ExcludeCommittedZeroPages bool
}

SaveOpts provides options to MemoryFile.SaveTo().

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL