mm

package
v0.0.0-...-ba09d25 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 29, 2021 License: Apache-2.0, MIT Imports: 23 Imported by: 0

README

This package provides an emulation of Linux semantics for application virtual memory mappings.

For completeness, this document also describes aspects of the memory management subsystem defined outside this package.

Background

We begin by describing semantics for virtual memory in Linux.

A virtual address space is defined as a collection of mappings from virtual addresses to physical memory. However, userspace applications do not configure mappings to physical memory directly. Instead, applications configure memory mappings from virtual addresses to offsets into a file using the mmap system call.[^mmap-anon] For example, a call to:

mmap(
    /* addr = */ 0x400000,
    /* length = */ 0x1000,
    PROT_READ | PROT_WRITE,
    MAP_SHARED,
    /* fd = */ 3,
    /* offset = */ 0);

creates a mapping of length 0x1000 bytes, starting at virtual address (VA) 0x400000, to offset 0 in the file represented by file descriptor (FD) 3. Within the Linux kernel, virtual memory mappings are represented by virtual memory areas (VMAs). Supposing that FD 3 represents file /tmp/foo, the state of the virtual memory subsystem after the mmap call may be depicted as:

VMA:     VA:0x400000 -> /tmp/foo:0x0

Establishing a virtual memory area does not necessarily establish a mapping to a physical address, because Linux has not necessarily provisioned physical memory to store the file's contents. Thus, if the application attempts to read the contents of VA 0x400000, it may incur a page fault, a CPU exception that forces the kernel to create such a mapping to service the read.

For a file, doing so consists of several logical phases:

  1. The kernel allocates physical memory to store the contents of the required part of the file, and copies file contents to the allocated memory. Supposing that the kernel chooses the physical memory at physical address (PA) 0x2fb000, the resulting state of the system is:

    VMA:     VA:0x400000 -> /tmp/foo:0x0
    Filemap:                /tmp/foo:0x0 -> PA:0x2fb000
    

    (In Linux the state of the mapping from file offset to physical memory is stored in struct address_space, but to avoid confusion with other notions of address space we will refer to this system as filemap, named after Linux kernel source file mm/filemap.c.)

  2. The kernel stores the effective mapping from virtual to physical address in a page table entry (PTE) in the application's page tables, which are used by the CPU's virtual memory hardware to perform address translation. The resulting state of the system is:

    VMA:     VA:0x400000 -> /tmp/foo:0x0
    Filemap:                /tmp/foo:0x0 -> PA:0x2fb000
    PTE:     VA:0x400000 -----------------> PA:0x2fb000
    

    The PTE is required for the application to actually use the contents of the mapped file as virtual memory. However, the PTE is derived from the VMA and filemap state, both of which are independently mutable, such that mutations to either will affect the PTE. For example:

    • The application may remove the VMA using the munmap system call. This breaks the mapping from VA:0x400000 to /tmp/foo:0x0, and consequently the mapping from VA:0x400000 to PA:0x2fb000. However, it does not necessarily break the mapping from /tmp/foo:0x0 to PA:0x2fb000, so a future mapping of the same file offset may reuse this physical memory.

    • The application may invalidate the file's contents by passing a length of 0 to the ftruncate system call. This breaks the mapping from /tmp/foo:0x0 to PA:0x2fb000, and consequently the mapping from VA:0x400000 to PA:0x2fb000. However, it does not break the mapping from VA:0x400000 to /tmp/foo:0x0, so future changes to the file's contents may again be made visible at VA:0x400000 after another page fault results in the allocation of a new physical address.

    Note that, in order to correctly break the mapping from VA:0x400000 to PA:0x2fb000 in the latter case, filemap must also store a reverse mapping from /tmp/foo:0x0 to VA:0x400000 so that it can locate and remove the PTE.

[^mmap-anon]: Memory mappings to non-files are discussed in later sections.

Private Mappings

The preceding example considered VMAs created using the MAP_SHARED flag, which means that PTEs derived from the mapping should always use physical memory that represents the current state of the mapped file.[^mmap-dev-zero] Applications can alternatively pass the MAP_PRIVATE flag to create a private mapping. Private mappings are copy-on-write.

Suppose that the application instead created a private mapping in the previous example. In Linux, the state of the system after a read page fault would be:

VMA:     VA:0x400000 -> /tmp/foo:0x0 (private)
Filemap:                /tmp/foo:0x0 -> PA:0x2fb000
PTE:     VA:0x400000 -----------------> PA:0x2fb000 (read-only)

Now suppose the application attempts to write to VA:0x400000. For a shared mapping, the write would be propagated to PA:0x2fb000, and the kernel would be responsible for ensuring that the write is later propagated to the mapped file. For a private mapping, the write incurs another page fault since the PTE is marked read-only. In response, the kernel allocates physical memory to store the mapping's private copy of the file's contents, copies file contents to the allocated memory, and changes the PTE to map to the private copy. Supposing that the kernel chooses the physical memory at physical address (PA) 0x5ea000, the resulting state of the system is:

VMA:     VA:0x400000 -> /tmp/foo:0x0 (private)
Filemap:                /tmp/foo:0x0 -> PA:0x2fb000
PTE:     VA:0x400000 -----------------> PA:0x5ea000

Note that the filemap mapping from /tmp/foo:0x0 to PA:0x2fb000 may still exist, but is now irrelevant to this mapping.

[^mmap-dev-zero]: Modulo files with special mmap semantics such as /dev/zero.

Anonymous Mappings

Instead of passing a file to the mmap system call, applications can instead request an anonymous mapping by passing the MAP_ANONYMOUS flag. Semantically, an anonymous mapping is essentially a mapping to an ephemeral file initially filled with zero bytes. Practically speaking, this is how shared anonymous mappings are implemented, but private anonymous mappings do not result in the creation of an ephemeral file; since there would be no way to modify the contents of the underlying file through a private mapping, all private anonymous mappings use a single shared page filled with zero bytes until copy-on-write occurs.

Virtual Memory in the Sentry

The sentry implements application virtual memory atop a host kernel, introducing an additional level of indirection to the above.

Consider the same scenario as in the previous section. Since the sentry handles application system calls, the effect of an application mmap system call is to create a VMA in the sentry (as opposed to the host kernel):

Sentry VMA:     VA:0x400000 -> /tmp/foo:0x0

When the application first incurs a page fault on this address, the host kernel delivers information about the page fault to the sentry in a platform-dependent manner, and the sentry handles the fault:

  1. The sentry allocates memory to store the contents of the required part of the file, and copies file contents to the allocated memory. However, since the sentry is implemented atop a host kernel, it does not configure mappings to physical memory directly. Instead, mappable "memory" in the sentry is represented by a host file descriptor and offset, since (as noted in "Background") this is the memory mapping primitive provided by the host kernel. In general, memory is allocated from a temporary host file using the pgalloc package. Supposing that the sentry allocates offset 0x3000 from host file "memory-file", the resulting state is:

    Sentry VMA:     VA:0x400000 -> /tmp/foo:0x0
    Sentry filemap:                /tmp/foo:0x0 -> host:memory-file:0x3000
    
  2. The sentry stores the effective mapping from virtual address to host file in a host VMA by invoking the mmap system call:

    Sentry VMA:     VA:0x400000 -> /tmp/foo:0x0
    Sentry filemap:                /tmp/foo:0x0 -> host:memory-file:0x3000
      Host VMA:     VA:0x400000 -----------------> host:memory-file:0x3000
    
  3. The sentry returns control to the application, which immediately incurs the page fault again.[^mmap-populate] However, since a host VMA now exists for the faulting virtual address, the host kernel now handles the page fault as described in "Background":

    Sentry VMA:     VA:0x400000 -> /tmp/foo:0x0
    Sentry filemap:                /tmp/foo:0x0 -> host:memory-file:0x3000
      Host VMA:     VA:0x400000 -----------------> host:memory-file:0x3000
      Host filemap:                                host:memory-file:0x3000 -> PA:0x2fb000
      Host PTE:     VA:0x400000 --------------------------------------------> PA:0x2fb000
    

Thus, from an implementation standpoint, host VMAs serve the same purpose in the sentry that PTEs do in Linux. As in Linux, sentry VMA and filemap state is independently mutable, and the desired state of host VMAs is derived from that state.

[^mmap-populate]: The sentry could force the host kernel to establish PTEs when it creates the host VMA by passing the MAP_POPULATE flag to the mmap system call, but usually does not. This is because, to reduce the number of page faults that require handling by the sentry and (correspondingly) the number of host mmap system calls, the sentry usually creates host VMAs that are much larger than the single faulting page.

Private Mappings

The sentry implements private mappings consistently with Linux. Before copy-on-write, the private mapping example given in the Background results in:

Sentry VMA:     VA:0x400000 -> /tmp/foo:0x0 (private)
Sentry filemap:                /tmp/foo:0x0 -> host:memory-file:0x3000
  Host VMA:     VA:0x400000 -----------------> host:memory-file:0x3000 (read-only)
  Host filemap:                                host:memory-file:0x3000 -> PA:0x2fb000
  Host PTE:     VA:0x400000 --------------------------------------------> PA:0x2fb000 (read-only)

When the application attempts to write to this address, the host kernel delivers information about the resulting page fault to the sentry. Analogous to Linux, the sentry allocates memory to store the mapping's private copy of the file's contents, copies file contents to the allocated memory, and changes the host VMA to map to the private copy. Supposing that the sentry chooses the offset 0x4000 in host file memory-file to store the private copy, the state of the system after copy-on-write is:

Sentry VMA:     VA:0x400000 -> /tmp/foo:0x0 (private)
Sentry filemap:                /tmp/foo:0x0 -> host:memory-file:0x3000
  Host VMA:     VA:0x400000 -----------------> host:memory-file:0x4000
  Host filemap:                                host:memory-file:0x4000 -> PA:0x5ea000
  Host PTE:     VA:0x400000 --------------------------------------------> PA:0x5ea000

However, this highlights an important difference between Linux and the sentry. In Linux, page tables are concrete (architecture-dependent) data structures owned by the kernel. Conversely, the sentry has the ability to create and destroy host VMAs using host system calls, but it does not have direct access to their state. Thus, as written, if the application invokes the munmap system call to remove the sentry VMA, it is non-trivial for the sentry to determine that it should deallocate host:memory-file:0x4000. This implies that the sentry must retain information about the host VMAs that it has created.

Anonymous Mappings

The sentry implements anonymous mappings consistently with Linux, except that there is no shared zero page.

Implementation Constructs

In Linux:

  • A virtual address space is represented by struct mm_struct.

  • VMAs are represented by struct vm_area_struct, stored in struct mm_struct::mmap.

  • Mappings from file offsets to physical memory are stored in struct address_space.

  • Reverse mappings from file offsets to virtual mappings are stored in struct address_space::i_mmap.

  • Physical memory pages are represented by a pointer to struct page or an index called a page frame number (PFN), represented by pfn_t.

  • PTEs are represented by architecture-dependent type pte_t, stored in a table hierarchy rooted at struct mm_struct::pgd.

In the sentry:

Documentation

Overview

Package mm provides a memory management subsystem. See README.md for a detailed overview.

Lock order:

fs locks, except for memmap.Mappable locks

mm.MemoryManager.metadataMu
  mm.MemoryManager.mappingMu
    Locks taken by memmap.Mappable methods other than Translate
      mm.MemoryManager.activeMu
        Locks taken by memmap.Mappable.Translate
          mm.privateRefs.mu
            platform.AddressSpace locks
              memmap.File locks
      mm.aioManager.mu
        mm.AIOContext.mu
      kernel.TaskSet.mu

Only mm.MemoryManager.Fork is permitted to lock mm.MemoryManager.activeMu in multiple mm.MemoryManagers, as it does so in a well-defined order (forked child first).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Unpin

func Unpin(prs []PinnedRange)

Unpin releases the reference held by prs.

Types

type AIOContext

type AIOContext struct {
	// contains filtered or unexported fields
}

AIOContext is a single asynchronous I/O context.

+stateify savable

func (*AIOContext) CancelPendingRequest

func (ctx *AIOContext) CancelPendingRequest()

CancelPendingRequest forgets about a request that hasn't yet completed.

func (*AIOContext) Dead

func (ctx *AIOContext) Dead() bool

Dead returns true if the context has been destroyed.

func (*AIOContext) Drain

func (ctx *AIOContext) Drain()

Drain drops all completed requests. Pending requests remain untouched.

func (*AIOContext) FinishRequest

func (ctx *AIOContext) FinishRequest(data interface{})

FinishRequest finishes a pending request. It queues up the data and notifies listeners.

func (*AIOContext) PopRequest

func (ctx *AIOContext) PopRequest() (interface{}, bool)

PopRequest pops a completed request if available, this function does not do any blocking. Returns false if no request is available.

func (*AIOContext) Prepare

func (ctx *AIOContext) Prepare() error

Prepare reserves space for a new request, returning nil if available. Returns EAGAIN if the context is busy and EINVAL if the context is dead.

func (*AIOContext) WaitChannel

func (ctx *AIOContext) WaitChannel() chan struct{}

WaitChannel returns a channel that is notified when an AIO request is completed. Returns nil if the context is destroyed and there are no more outstanding requests.

type Dumpability

type Dumpability int

Dumpability describes if and how core dumps should be created.

const (
	// NotDumpable indicates that core dumps should never be created.
	NotDumpable Dumpability = iota

	// UserDumpable indicates that core dumps should be created, owned by
	// the current user.
	UserDumpable

	// RootDumpable indicates that core dumps should be created, owned by
	// root.
	RootDumpable
)

type MLockAllOpts

type MLockAllOpts struct {
	// If Current is true, change the memory-locking behavior of all mappings
	// to Mode. If Future is true, upgrade the memory-locking behavior of all
	// future mappings to Mode. At least one of Current or Future must be true.
	Current bool
	Future  bool
	Mode    memmap.MLockMode
}

MLockAllOpts holds options to MLockAll.

type MRemapMoveMode

type MRemapMoveMode int

MRemapMoveMode controls MRemap's moving behavior.

const (
	// MRemapNoMove prevents MRemap from moving the remapped mapping.
	MRemapNoMove MRemapMoveMode = iota

	// MRemapMayMove allows MRemap to move the remapped mapping.
	MRemapMayMove

	// MRemapMustMove requires MRemap to move the remapped mapping to
	// MRemapOpts.NewAddr, replacing any existing mappings in the remapped
	// range.
	MRemapMustMove
)

type MRemapOpts

type MRemapOpts struct {
	// Move controls whether MRemap moves the remapped mapping to a new address.
	Move MRemapMoveMode

	// NewAddr is the new address for the remapping. NewAddr is ignored unless
	// Move is MMRemapMustMove.
	NewAddr hostarch.Addr
}

MRemapOpts specifies options to MRemap.

type MSyncOpts

type MSyncOpts struct {
	// Sync has the semantics of MS_SYNC.
	Sync bool

	// Invalidate has the semantics of MS_INVALIDATE.
	Invalidate bool
}

MSyncOpts holds options to MSync.

type MemoryManager

type MemoryManager struct {
	// contains filtered or unexported fields
}

MemoryManager implements a virtual address space.

+stateify savable

func NewMemoryManager

func NewMemoryManager(p platform.Platform, mfp pgalloc.MemoryFileProvider, sleepForActivation bool) *MemoryManager

NewMemoryManager returns a new MemoryManager with no mappings and 1 user.

func (*MemoryManager) Activate

func (mm *MemoryManager) Activate(ctx context.Context) error

Activate ensures this MemoryManager has a platform.AddressSpace.

The caller must not hold any locks when calling Activate.

When this MemoryManager is no longer needed by a task, it should call Deactivate to release the reference.

func (*MemoryManager) AddressSpace

func (mm *MemoryManager) AddressSpace() platform.AddressSpace

AddressSpace returns the platform.AddressSpace bound to mm.

Preconditions: The caller must have called mm.Activate().

func (*MemoryManager) ArgvEnd

func (mm *MemoryManager) ArgvEnd() hostarch.Addr

ArgvEnd returns the end of the application argument vector.

There is no guarantee that this value is sensible w.r.t. ArgvStart.

func (*MemoryManager) ArgvStart

func (mm *MemoryManager) ArgvStart() hostarch.Addr

ArgvStart returns the start of the application argument vector.

There is no guarantee that this value is sensible w.r.t. ArgvEnd.

func (*MemoryManager) Auxv

func (mm *MemoryManager) Auxv() arch.Auxv

Auxv returns the current map of auxiliary vectors.

func (*MemoryManager) Brk

func (mm *MemoryManager) Brk(ctx context.Context, addr hostarch.Addr) (hostarch.Addr, error)

Brk implements the semantics of Linux's brk(2), except that it returns an error on failure.

func (*MemoryManager) BrkSetup

func (mm *MemoryManager) BrkSetup(ctx context.Context, addr hostarch.Addr)

BrkSetup sets mm's brk address to addr and its brk size to 0.

func (*MemoryManager) CheckIORange

func (mm *MemoryManager) CheckIORange(addr hostarch.Addr, length int64) (hostarch.AddrRange, bool)

CheckIORange is similar to hostarch.Addr.ToRange, but applies bounds checks consistent with Linux's arch/x86/include/asm/uaccess.h:access_ok().

Preconditions: length >= 0.

func (*MemoryManager) CompareAndSwapUint32

func (mm *MemoryManager) CompareAndSwapUint32(ctx context.Context, addr hostarch.Addr, old, new uint32, opts usermem.IOOpts) (uint32, error)

CompareAndSwapUint32 implements usermem.IO.CompareAndSwapUint32.

func (*MemoryManager) CopyIn

func (mm *MemoryManager) CopyIn(ctx context.Context, addr hostarch.Addr, dst []byte, opts usermem.IOOpts) (int, error)

CopyIn implements usermem.IO.CopyIn.

func (*MemoryManager) CopyInTo

func (mm *MemoryManager) CopyInTo(ctx context.Context, ars hostarch.AddrRangeSeq, dst safemem.Writer, opts usermem.IOOpts) (int64, error)

CopyInTo implements usermem.IO.CopyInTo.

func (*MemoryManager) CopyOut

func (mm *MemoryManager) CopyOut(ctx context.Context, addr hostarch.Addr, src []byte, opts usermem.IOOpts) (int, error)

CopyOut implements usermem.IO.CopyOut.

func (*MemoryManager) CopyOutFrom

func (mm *MemoryManager) CopyOutFrom(ctx context.Context, ars hostarch.AddrRangeSeq, src safemem.Reader, opts usermem.IOOpts) (int64, error)

CopyOutFrom implements usermem.IO.CopyOutFrom.

func (*MemoryManager) Deactivate

func (mm *MemoryManager) Deactivate()

Deactivate releases a reference to the MemoryManager.

func (*MemoryManager) DebugString

func (mm *MemoryManager) DebugString(ctx context.Context) string

DebugString returns a string containing information about mm for debugging.

func (*MemoryManager) DecUsers

func (mm *MemoryManager) DecUsers(ctx context.Context)

DecUsers decrements mm's user count. If the user count reaches 0, all mappings in mm are unmapped.

func (*MemoryManager) Decommit

func (mm *MemoryManager) Decommit(addr hostarch.Addr, length uint64) error

Decommit implements the semantics of Linux's madvise(MADV_DONTNEED).

func (*MemoryManager) DestroyAIOContext

func (mm *MemoryManager) DestroyAIOContext(ctx context.Context, id uint64) *AIOContext

DestroyAIOContext destroys an asynchronous I/O context. It returns the destroyed context. nil if the context does not exist.

func (*MemoryManager) DetachShm

func (mm *MemoryManager) DetachShm(ctx context.Context, addr hostarch.Addr) error

DetachShm unmaps a sysv shared memory segment.

func (*MemoryManager) Dumpability

func (mm *MemoryManager) Dumpability() Dumpability

Dumpability returns the dumpability.

func (*MemoryManager) EnableMembarrierPrivate

func (mm *MemoryManager) EnableMembarrierPrivate()

EnableMembarrierPrivate causes future calls to IsMembarrierPrivateEnabled to return true.

func (*MemoryManager) EnableMembarrierRSeq

func (mm *MemoryManager) EnableMembarrierRSeq()

EnableMembarrierRSeq causes future calls to IsMembarrierRSeqEnabled to return true.

func (*MemoryManager) EnvvEnd

func (mm *MemoryManager) EnvvEnd() hostarch.Addr

EnvvEnd returns the end of the application environment vector.

There is no guarantee that this value is sensible w.r.t. EnvvStart.

func (*MemoryManager) EnvvStart

func (mm *MemoryManager) EnvvStart() hostarch.Addr

EnvvStart returns the start of the application environment vector.

There is no guarantee that this value is sensible w.r.t. EnvvEnd.

func (*MemoryManager) Executable

func (mm *MemoryManager) Executable() fsbridge.File

Executable returns the executable, if available.

An additional reference will be taken in the case of a non-nil executable, which must be released by the caller.

func (*MemoryManager) Fork

func (mm *MemoryManager) Fork(ctx context.Context) (*MemoryManager, error)

Fork creates a copy of mm with 1 user, as for Linux syscalls fork() or clone() (without CLONE_VM).

func (*MemoryManager) GetSharedFutexKey

func (mm *MemoryManager) GetSharedFutexKey(ctx context.Context, addr hostarch.Addr) (futex.Key, error)

GetSharedFutexKey is used by kernel.Task.GetSharedKey.

func (*MemoryManager) HandleUserFault

func (mm *MemoryManager) HandleUserFault(ctx context.Context, addr hostarch.Addr, at hostarch.AccessType, sp hostarch.Addr) error

HandleUserFault handles an application page fault. sp is the faulting application thread's stack pointer.

Preconditions: mm.as != nil.

func (*MemoryManager) IncUsers

func (mm *MemoryManager) IncUsers() bool

IncUsers increments mm's user count and returns true. If the user count is already 0, IncUsers does nothing and returns false.

func (*MemoryManager) Invalidate

func (mm *MemoryManager) Invalidate(ar hostarch.AddrRange, opts memmap.InvalidateOpts)

Invalidate implements memmap.MappingSpace.Invalidate.

func (*MemoryManager) InvalidateUnsavable

func (mm *MemoryManager) InvalidateUnsavable(ctx context.Context) error

InvalidateUnsavable invokes memmap.Mappable.InvalidateUnsavable on all Mappables mapped by mm.

func (*MemoryManager) IsMembarrierPrivateEnabled

func (mm *MemoryManager) IsMembarrierPrivateEnabled() bool

IsMembarrierPrivateEnabled returns true if mm.EnableMembarrierPrivate() has previously been called.

func (*MemoryManager) IsMembarrierRSeqEnabled

func (mm *MemoryManager) IsMembarrierRSeqEnabled() bool

IsMembarrierRSeqEnabled returns true if mm.EnableMembarrierRSeq() has previously been called.

func (*MemoryManager) LoadUint32

func (mm *MemoryManager) LoadUint32(ctx context.Context, addr hostarch.Addr, opts usermem.IOOpts) (uint32, error)

LoadUint32 implements usermem.IO.LoadUint32.

func (*MemoryManager) LookupAIOContext

func (mm *MemoryManager) LookupAIOContext(ctx context.Context, id uint64) (*AIOContext, bool)

LookupAIOContext looks up the given context. It returns false if the context does not exist.

func (*MemoryManager) MLock

func (mm *MemoryManager) MLock(ctx context.Context, addr hostarch.Addr, length uint64, mode memmap.MLockMode) error

MLock implements the semantics of Linux's mlock()/mlock2()/munlock(), depending on mode.

func (*MemoryManager) MLockAll

func (mm *MemoryManager) MLockAll(ctx context.Context, opts MLockAllOpts) error

MLockAll implements the semantics of Linux's mlockall()/munlockall(), depending on opts.

func (*MemoryManager) MMap

func (mm *MemoryManager) MMap(ctx context.Context, opts memmap.MMapOpts) (hostarch.Addr, error)

MMap establishes a memory mapping.

func (*MemoryManager) MProtect

func (mm *MemoryManager) MProtect(addr hostarch.Addr, length uint64, realPerms hostarch.AccessType, growsDown bool) error

MProtect implements the semantics of Linux's mprotect(2).

func (*MemoryManager) MRemap

func (mm *MemoryManager) MRemap(ctx context.Context, oldAddr hostarch.Addr, oldSize uint64, newSize uint64, opts MRemapOpts) (hostarch.Addr, error)

MRemap implements the semantics of Linux's mremap(2).

func (*MemoryManager) MSync

func (mm *MemoryManager) MSync(ctx context.Context, addr hostarch.Addr, length uint64, opts MSyncOpts) error

MSync implements the semantics of Linux's msync().

func (*MemoryManager) MUnmap

func (mm *MemoryManager) MUnmap(ctx context.Context, addr hostarch.Addr, length uint64) error

MUnmap implements the semantics of Linux's munmap(2).

func (*MemoryManager) MapStack

func (mm *MemoryManager) MapStack(ctx context.Context) (hostarch.AddrRange, error)

MapStack allocates the initial process stack.

func (*MemoryManager) MaxResidentSetSize

func (mm *MemoryManager) MaxResidentSetSize() uint64

MaxResidentSetSize returns the value advertised as mm's max RSS in bytes.

func (*MemoryManager) NeedsUpdate

func (mm *MemoryManager) NeedsUpdate(generation int64) bool

NeedsUpdate implements seqfile.SeqSource.NeedsUpdate.

func (*MemoryManager) NewAIOContext

func (mm *MemoryManager) NewAIOContext(ctx context.Context, events uint32) (uint64, error)

NewAIOContext creates a new context for asynchronous I/O.

NewAIOContext is analogous to Linux's fs/aio.c:ioctx_alloc().

func (*MemoryManager) NumaPolicy

func (mm *MemoryManager) NumaPolicy(addr hostarch.Addr) (linux.NumaPolicy, uint64, error)

NumaPolicy implements the semantics of Linux's get_mempolicy(MPOL_F_ADDR).

func (*MemoryManager) Pin

func (mm *MemoryManager) Pin(ctx context.Context, ar hostarch.AddrRange, at hostarch.AccessType, ignorePermissions bool) ([]PinnedRange, error)

Pin returns the memmap.File ranges currently mapped by addresses in ar in mm, acquiring a reference on the returned ranges which the caller must release by calling Unpin. If not all addresses are mapped, Pin returns a non-nil error. Note that Pin may return both a non-empty slice of PinnedRanges and a non-nil error.

Pin does not prevent mapped ranges from changing, making it unsuitable for most I/O. It should only be used in contexts that would use get_user_pages() in the Linux kernel.

Preconditions: * ar.Length() != 0. * ar must be page-aligned.

func (*MemoryManager) ReadMapsDataInto

func (mm *MemoryManager) ReadMapsDataInto(ctx context.Context, buf *bytes.Buffer)

ReadMapsDataInto is called by fsimpl/proc.mapsData.Generate to implement /proc/[pid]/maps.

func (*MemoryManager) ReadMapsSeqFileData

func (mm *MemoryManager) ReadMapsSeqFileData(ctx context.Context, handle seqfile.SeqHandle) ([]seqfile.SeqData, int64)

ReadMapsSeqFileData is called by fs/proc.mapsData.ReadSeqFileData to implement /proc/[pid]/maps.

func (*MemoryManager) ReadSmapsDataInto

func (mm *MemoryManager) ReadSmapsDataInto(ctx context.Context, buf *bytes.Buffer)

ReadSmapsDataInto is called by fsimpl/proc.smapsData.Generate to implement /proc/[pid]/maps.

func (*MemoryManager) ReadSmapsSeqFileData

func (mm *MemoryManager) ReadSmapsSeqFileData(ctx context.Context, handle seqfile.SeqHandle) ([]seqfile.SeqData, int64)

ReadSmapsSeqFileData is called by fs/proc.smapsData.ReadSeqFileData to implement /proc/[pid]/smaps.

func (*MemoryManager) ResidentSetSize

func (mm *MemoryManager) ResidentSetSize() uint64

ResidentSetSize returns the value advertised as mm's RSS in bytes.

func (*MemoryManager) SetArgvEnd

func (mm *MemoryManager) SetArgvEnd(a hostarch.Addr)

SetArgvEnd sets the end of the application argument vector.

func (*MemoryManager) SetArgvStart

func (mm *MemoryManager) SetArgvStart(a hostarch.Addr)

SetArgvStart sets the start of the application argument vector.

func (*MemoryManager) SetAuxv

func (mm *MemoryManager) SetAuxv(auxv arch.Auxv)

SetAuxv sets the entire map of auxiliary vectors.

func (*MemoryManager) SetDontFork

func (mm *MemoryManager) SetDontFork(addr hostarch.Addr, length uint64, dontfork bool) error

SetDontFork implements the semantics of madvise MADV_DONTFORK.

func (*MemoryManager) SetDumpability

func (mm *MemoryManager) SetDumpability(d Dumpability)

SetDumpability sets the dumpability.

func (*MemoryManager) SetEnvvEnd

func (mm *MemoryManager) SetEnvvEnd(a hostarch.Addr)

SetEnvvEnd sets the end of the application environment vector.

func (*MemoryManager) SetEnvvStart

func (mm *MemoryManager) SetEnvvStart(a hostarch.Addr)

SetEnvvStart sets the start of the application environment vector.

func (*MemoryManager) SetExecutable

func (mm *MemoryManager) SetExecutable(ctx context.Context, file fsbridge.File)

SetExecutable sets the executable.

This takes a reference on d.

func (*MemoryManager) SetMmapLayout

func (mm *MemoryManager) SetMmapLayout(ac arch.Context, r *limits.LimitSet) (arch.MmapLayout, error)

SetMmapLayout initializes mm's layout from the given arch.Context.

Preconditions: mm contains no mappings and is not used concurrently.

func (*MemoryManager) SetNumaPolicy

func (mm *MemoryManager) SetNumaPolicy(addr hostarch.Addr, length uint64, policy linux.NumaPolicy, nodemask uint64) error

SetNumaPolicy implements the semantics of Linux's mbind().

func (*MemoryManager) SetVDSOSigReturn

func (mm *MemoryManager) SetVDSOSigReturn(addr uint64)

SetVDSOSigReturn sets the address of vdso_sigreturn.

func (*MemoryManager) String

func (mm *MemoryManager) String() string

String implements fmt.Stringer.String.

func (*MemoryManager) SwapUint32

func (mm *MemoryManager) SwapUint32(ctx context.Context, addr hostarch.Addr, new uint32, opts usermem.IOOpts) (uint32, error)

SwapUint32 implements usermem.IO.SwapUint32.

func (*MemoryManager) VDSOSigReturn

func (mm *MemoryManager) VDSOSigReturn() uint64

VDSOSigReturn returns the address of vdso_sigreturn.

func (*MemoryManager) VirtualDataSize

func (mm *MemoryManager) VirtualDataSize() uint64

VirtualDataSize returns the size of private data segments in mm.

func (*MemoryManager) VirtualMemorySize

func (mm *MemoryManager) VirtualMemorySize() uint64

VirtualMemorySize returns the combined length in bytes of all mappings in mm.

func (*MemoryManager) VirtualMemorySizeRange

func (mm *MemoryManager) VirtualMemorySizeRange(ar hostarch.AddrRange) uint64

VirtualMemorySizeRange returns the combined length in bytes of all mappings in ar in mm.

func (*MemoryManager) ZeroOut

func (mm *MemoryManager) ZeroOut(ctx context.Context, addr hostarch.Addr, toZero int64, opts usermem.IOOpts) (int64, error)

ZeroOut implements usermem.IO.ZeroOut.

type PinnedRange

type PinnedRange struct {
	// Source is the corresponding range of addresses.
	Source hostarch.AddrRange

	// File is the mapped file.
	File memmap.File

	// Offset is the offset into File at which this PinnedRange begins.
	Offset uint64
}

PinnedRanges are returned by MemoryManager.Pin.

func (PinnedRange) FileRange

func (pr PinnedRange) FileRange() memmap.FileRange

FileRange returns the memmap.File offsets mapped by pr.

type SpecialMappable

type SpecialMappable struct {
	SpecialMappableRefs
	// contains filtered or unexported fields
}

SpecialMappable implements memmap.MappingIdentity and memmap.Mappable with semantics similar to Linux's mm/mmap.c:_install_special_mapping(), except that SpecialMappable takes ownership of the memory that it represents (_install_special_mapping() does not.)

+stateify savable

func NewSharedAnonMappable

func NewSharedAnonMappable(length uint64, mfp pgalloc.MemoryFileProvider) (*SpecialMappable, error)

NewSharedAnonMappable returns a SpecialMappable that implements the semantics of mmap(MAP_SHARED|MAP_ANONYMOUS) and mappings of /dev/zero.

TODO(gvisor.dev/issue/1624): Linux uses an ephemeral file created by mm/shmem.c:shmem_zero_setup(), and VFS2 does something analogous. VFS1 uses a SpecialMappable instead, incorrectly getting device and inode IDs of zero and causing memory for shared anonymous mappings to be allocated up-front instead of on first touch; this is to avoid exacerbating the fs.MountSource leak (b/143656263). Delete this function along with VFS1.

func NewSpecialMappable

func NewSpecialMappable(name string, mfp pgalloc.MemoryFileProvider, fr memmap.FileRange) *SpecialMappable

NewSpecialMappable returns a SpecialMappable that owns fr, which represents offsets in mfp.MemoryFile() that contain the SpecialMappable's data. The SpecialMappable will use the given name in /proc/[pid]/maps.

Preconditions: fr.Length() != 0.

func (*SpecialMappable) AddMapping

AddMapping implements memmap.Mappable.AddMapping.

func (*SpecialMappable) CopyMapping

CopyMapping implements memmap.Mappable.CopyMapping.

func (*SpecialMappable) DecRef

func (m *SpecialMappable) DecRef(ctx context.Context)

DecRef implements refs.RefCounter.DecRef.

func (*SpecialMappable) DeviceID

func (m *SpecialMappable) DeviceID() uint64

DeviceID implements memmap.MappingIdentity.DeviceID.

func (*SpecialMappable) FileRange

func (m *SpecialMappable) FileRange() memmap.FileRange

FileRange returns the offsets into MemoryFileProvider().MemoryFile() that store the SpecialMappable's contents.

func (*SpecialMappable) InodeID

func (m *SpecialMappable) InodeID() uint64

InodeID implements memmap.MappingIdentity.InodeID.

func (*SpecialMappable) InvalidateUnsavable

func (m *SpecialMappable) InvalidateUnsavable(ctx context.Context) error

InvalidateUnsavable implements memmap.Mappable.InvalidateUnsavable.

func (*SpecialMappable) Length

func (m *SpecialMappable) Length() uint64

Length returns the length of the SpecialMappable.

func (*SpecialMappable) MappedName

func (m *SpecialMappable) MappedName(ctx context.Context) string

MappedName implements memmap.MappingIdentity.MappedName.

func (*SpecialMappable) MemoryFileProvider

func (m *SpecialMappable) MemoryFileProvider() pgalloc.MemoryFileProvider

MemoryFileProvider returns the MemoryFileProvider whose MemoryFile stores the SpecialMappable's contents.

func (*SpecialMappable) Msync

Msync implements memmap.MappingIdentity.Msync.

func (*SpecialMappable) RemoveMapping

RemoveMapping implements memmap.Mappable.RemoveMapping.

func (*SpecialMappable) Translate

func (m *SpecialMappable) Translate(ctx context.Context, required, optional memmap.MappableRange, at hostarch.AccessType) ([]memmap.Translation, error)

Translate implements memmap.Mappable.Translate.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL