Documentation ¶
Overview ¶
Package framework contains all the codes related to DXF.
The goal of the DXF is to implement unified scheduling and distributed execution of tasks, and to provide unified resource management capabilities for both overall and individual tasks, which better meets users' expectations for resource usage.
DXF runs on all the nodes of the TiDB cluster, there is an owner node responsible for scheduling resources and tasks, and all other nodes are followers which are responsible for executing tasks.
The components that are unique to owner nodes: ¶
- scheduler manager: manages schedulers of tasks, and it also responsible for post-cleanup of tasks, such as moving the task to history table, cleaning up intermediate data files generated by global sort.
- task schedulers: responsible for managing lifecycle and scheduling of the task.
- node manager: manages nodes that can be used to run tasks.
- balancer: responsible for balancing the load of subtasks on all nodes.
The components that are running on all nodes: ¶
- task executor manager: manages all task executors. - slot manager: manages the slots or resources that can be used to run tasks. - task executor: responsible for executing the task.
Resource abstraction ¶
To fully utilize the resources and avoid resource overuse, we abstract the resources in the TiDB cluster as slots. A slot is the minimum granularity of node resources. For each node, slot count = number of cores, and each slot represents:
- one core on the node
- 1/number-of-cores * total-memory-of-memory
- 1/number-of-cores * total-disk-space-of-disk. Note: this factor is not considered during scheduling right now.
For example, if a node has 16 cores, 128GB memory, and 1TB disk space, then one slot represents: 1 core, 8GB memory, and 64GB disk space.
The maximum number of slots that can be used by a task is determined its concurrency and the target scope.
To better describe the resources that a task can use, we define a stripe as a slot group which consists one slot on each node of the same target scope. As we don't allow subtasks of the same task run on some node concurrently, so the maximum resource that a task can use is task-concurrency stripes.
Service scope ¶
To isolate resources, and avoid DXF tasks from interfering with online transactions, each node in the cluster have a service scope and the default scope is empty.
A DXF task can only run on the nodes with the same scope as the target scope of the task.
Due to history reasons, there is a special service scope 'background'. When scheduling a task with empty target scope, the task will run on the nodes of the 'background' scope if such nodes exist, otherwise, the task will run on the nodes of same scope, i.e. empty.
Task abstraction ¶
A task is abstracted as multiple steps that runs in sequence, each step contains multiple sub-tasks that runs in parallel, such as:
task ├── step1 │ ├── subtask1 │ ├── subtask2 │ └── subtask3 └── step2 ├── subtask1 ├── subtask2 └── subtask3
For the steps of specific task type, see step.go for more detail.
Task order ¶
As the resources are limited, we need to schedule tasks in a certain order.
In DXF, we manage to run tasks of higher ranking first, and then run tasks of lower ranking. A task of higher ranking might preempt the resources of a task of lower ranking.
Note, we use the word rank instead of priority as it's only part of fields that determine the order of tasks. Task rank is defined by:
priority asc, create_time asc, id asc.
Task state machine ¶
Note: if a task fails during running, it will end with `reverted` state. The `failed` state is used to mean the framework cannot run the task, such as invalid task type, scheduler init error(fatal), etc.
normal execution state transition:
┌──────┐ │failed│ └──────┘ ▲ ┌──┴────┐ ┌───────┐ ┌────────┐ │pending├────►│running├────►│succeed │ └──┬────┘ └──┬┬───┘ └────────┘ │ ││ ┌─────────┐ ┌────────┐ │ │└────────►│reverting├────►│reverted│ │ ▼ └─────────┘ └────────┘ │ ┌──────────┐ ▲ └─────────►│cancelling├────┘ └──────────┘
pause/resume state transition: as we don't know the state of the task before `paused`, so the state after `resuming` is always `running`.
┌───────┐ │pending├──┐ └───────┘ │ ┌───────┐ ┌──────┐ ├────►│pausing├──────►│paused│ ┌───────┐ │ └───────┘ └───┬──┘ │running├──┘ │ └───▲───┘ ┌────────┐ │ └────────────┤resuming│◄─────────┘ └────────┘
modifying state transition:
┌───────┐ │pending├──┐ └───────┘ │ ┌───────┐ │ ┌─────────┐ │running├──┼────►│modifying├────► original state └───────┘ │ └─────────┘ ┌───────┐ │ │paused ├──┘ └───────┘
Subtask state machine ¶
NOTE: `running` -> `pending` only happens when some node is taken as dead, so its running subtask is balanced to other node, and the subtask is idempotent, we do this to make the subtask can be scheduled to other node again, it's NOT a normal state transition.
┌──────────────┐ │ ┌───┴──┐ │ ┌───────►│paused│ ▼ │ └──────┘ ┌───────┐ ┌───┴───┐ ┌───────┐ │pending├───►│running├───►│succeed│ └───────┘ └┬──┬───┘ └───────┘ ▲ │ │ ┌──────┐ └────────┘ ├───────►│failed│ │ └──────┘ │ ┌────────┐ └───────►│canceled│ └────────┘
Directories ¶
Path | Synopsis |
---|---|
Package mock is a generated GoMock package.
|
Package mock is a generated GoMock package. |
execute
Package mockexecute is a generated GoMock package.
|
Package mockexecute is a generated GoMock package. |
mock
Package mock is a generated GoMock package.
|
Package mock is a generated GoMock package. |