Documentation ¶
Index ¶
- Constants
- Variables
- func Clearenv()
- func Close(fd int) (err error)
- func CloseOnExec(fd int)
- func Environ() []string
- func Exec(argv0 string, argv []string, envv []string) (err error)
- func Exit(code int)
- func Faccessat(dirfd int, path string, mode uint32, flags int) (err error)
- func ForkExec(argv0 string, argv []string, attr *ProcAttr) (pid int, err error)
- func Getcwd(buf []byte) (n int, err error)
- func Getenv(key string) (value string, found bool)
- func Getpagesize() int
- func Getpid() (pid int)
- func Getrlimit(which int, lim *Rlimit) (err error)
- func Getwd() (string, error)
- func Kill(pid int, signum Signal) (err error)
- func Lstat(path string, stat *Stat_t) (err error)
- func Open(path string, mode int, perm uint32) (fd int, err error)
- func Pipe(p []int) (err error)
- func Read(fd int, p []byte) (n int, err error)
- func Seek(fd int, offset int64, whence int) (newoffset int64, err error)
- func SetNonblock(fd int, nonblocking bool) (err error)
- func Setenv(key, value string) error
- func Setrlimit(resource int, rlim *Rlimit) error
- func StartProcess(argv0 string, argv []string, attr *ProcAttr) (pid int, handle uintptr, err error)
- func Stat(path string, stat *Stat_t) (err error)
- func Unsetenv(key string) error
- type Credential
- type Errno
- type ProcAttr
- type Rlimit
- type Signal
- type Stat_t
- type SysProcAttr
- type SysProcIDMap
- type Timespec
- type Timeval
Constants ¶
const ( CLONE_VM = 0x00000100 // set if VM shared between processes CLONE_FS = 0x00000200 // set if fs info shared between processes CLONE_FILES = 0x00000400 // set if open files shared between processes CLONE_SIGHAND = 0x00000800 // set if signal handlers and blocked signals shared CLONE_PIDFD = 0x00001000 // set if a pidfd should be placed in parent CLONE_PTRACE = 0x00002000 // set if we want to let tracing continue on the child too CLONE_VFORK = 0x00004000 // set if the parent wants the child to wake it up on mm_release CLONE_PARENT = 0x00008000 // set if we want to have the same parent as the cloner CLONE_THREAD = 0x00010000 // Same thread group? CLONE_NEWNS = 0x00020000 // New mount namespace group CLONE_SYSVSEM = 0x00040000 // share system V SEM_UNDO semantics CLONE_SETTLS = 0x00080000 // create a new TLS for the child CLONE_PARENT_SETTID = 0x00100000 // set the TID in the parent CLONE_CHILD_CLEARTID = 0x00200000 // clear the TID in the child CLONE_DETACHED = 0x00400000 // Unused, ignored CLONE_UNTRACED = 0x00800000 // set if the tracing process can't force CLONE_PTRACE on this clone CLONE_CHILD_SETTID = 0x01000000 // set the TID in the child CLONE_NEWCGROUP = 0x02000000 // New cgroup namespace CLONE_NEWUTS = 0x04000000 // New utsname namespace CLONE_NEWIPC = 0x08000000 // New ipc namespace CLONE_NEWUSER = 0x10000000 // New user namespace CLONE_NEWPID = 0x20000000 // New pid namespace CLONE_NEWNET = 0x40000000 // New network namespace CLONE_IO = 0x80000000 // Clone io context CLONE_CLEAR_SIGHAND = 0x100000000 // Clear any signal handler and reset to SIG_DFL. CLONE_INTO_CGROUP = 0x200000000 // Clone into a specific cgroup given the right permissions. CLONE_NEWTIME = 0x00000080 // New time namespace )
Linux unshare/clone/clone2/clone3 flags, architecture-independent, copied from linux/sched.h.
Variables ¶
var ( Stdin = 0 Stdout = 1 Stderr = 2 )
var ForkLock sync.RWMutex
ForkLock is used to synchronize creation of new file descriptors with fork.
We want the child in a fork/exec sequence to inherit only the file descriptors we intend. To do that, we mark all file descriptors close-on-exec and then, in the child, explicitly unmark the ones we want the exec'ed program to keep. Unix doesn't make this easy: there is, in general, no way to allocate a new file descriptor close-on-exec. Instead you have to allocate the descriptor and then mark it close-on-exec. If a fork happens between those two events, the child's exec will inherit an unwanted file descriptor.
This lock solves that race: the create new fd/mark close-on-exec operation is done holding ForkLock for reading, and the fork itself is done holding ForkLock for writing. At least, that's the idea. There are some complications.
Some system calls that create new file descriptors can block for arbitrarily long times: open on a hung NFS server or named pipe, accept on a socket, and so on. We can't reasonably grab the lock across those operations.
It is worse to inherit some file descriptors than others. If a non-malicious child accidentally inherits an open ordinary file, that's not a big deal. On the other hand, if a long-lived child accidentally inherits the write end of a pipe, then the reader of that pipe will not see EOF until that child exits, potentially causing the parent program to hang. This is a common problem in threaded C programs that use popen.
Luckily, the file descriptors that are most important not to inherit are not the ones that can take an arbitrarily long time to create: pipe returns instantly, and the net package uses non-blocking I/O to accept on a listening socket. The rules for which file descriptor-creating operations use the ForkLock are as follows:
- Pipe. Use pipe2 if available. Otherwise, does not block, so use ForkLock.
- Socket. Use SOCK_CLOEXEC if available. Otherwise, does not block, so use ForkLock.
- Open. Use O_CLOEXEC if available. Otherwise, may block, so live with the race.
- Dup. Use F_DUPFD_CLOEXEC or dup3 if available. Otherwise, does not block, so use ForkLock.
Functions ¶
func CloseOnExec ¶ added in v0.9.1
func CloseOnExec(fd int)
func Getpagesize ¶ added in v0.9.1
func Getpagesize() int
func SetNonblock ¶ added in v0.9.1
func StartProcess ¶ added in v0.9.1
StartProcess wraps ForkExec for package os.
Types ¶
type Credential ¶ added in v0.9.1
type Credential struct { Uid uint32 // User ID. Gid uint32 // Group ID. Groups []uint32 // Supplementary group IDs. NoSetGroups bool // If true, don't set supplementary groups }
Credential holds user and group identities to be assumed by a child process started by StartProcess.
type ProcAttr ¶ added in v0.9.1
type ProcAttr struct { Dir string // Current working directory. Env []string // Environment. Files []uintptr // File descriptors. Sys *SysProcAttr }
ProcAttr holds attributes that will be applied to a new process started by StartProcess.
type Signal ¶ added in v0.8.10
type Signal int
A Signal is a number describing a process signal. It implements the os.Signal interface.
type SysProcAttr ¶ added in v0.9.1
type SysProcAttr struct { Chroot string // Chroot. Credential *Credential // Credential. // Ptrace tells the child to call ptrace(PTRACE_TRACEME). // Call runtime.LockOSThread before starting a process with this set, // and don't call UnlockOSThread until done with PtraceSyscall calls. Ptrace bool Setsid bool // Create session. // Setpgid sets the process group ID of the child to Pgid, // or, if Pgid == 0, to the new child's process ID. Setpgid bool // Setctty sets the controlling terminal of the child to // file descriptor Ctty. Ctty must be a descriptor number // in the child process: an index into ProcAttr.Files. // This is only meaningful if Setsid is true. Setctty bool Noctty bool // Detach fd 0 from controlling terminal Ctty int // Controlling TTY fd // Foreground places the child process group in the foreground. // This implies Setpgid. The Ctty field must be set to // the descriptor of the controlling TTY. // Unlike Setctty, in this case Ctty must be a descriptor // number in the parent process. Foreground bool Pgid int // Child's process group ID if Setpgid. // Pdeathsig, if non-zero, is a signal that the kernel will send to // the child process when the creating thread dies. Note that the signal // is sent on thread termination, which may happen before process termination. // There are more details at https://go.dev/issue/27505. Pdeathsig Signal Cloneflags uintptr // Flags for clone calls (Linux only) UidMappings []SysProcIDMap // User ID mappings for user namespaces. GidMappings []SysProcIDMap // Group ID mappings for user namespaces. // GidMappingsEnableSetgroups enabling setgroups syscall. // If false, then setgroups syscall will be disabled for the child process. // This parameter is no-op if GidMappings == nil. Otherwise for unprivileged // users this should be set to false for mappings work. GidMappingsEnableSetgroups bool AmbientCaps []uintptr // Ambient capabilities (Linux only) UseCgroupFD bool // Whether to make use of the CgroupFD field. CgroupFD int // File descriptor of a cgroup to put the new process into. }
type SysProcIDMap ¶ added in v0.9.1
type SysProcIDMap struct { ContainerID int // Container ID. HostID int // Host ID. Size int // Size. }
SysProcIDMap holds Container ID to Host ID mappings used for User Namespaces in Linux. See user_namespaces(7).