GO-2024-3284: PingCAP TiDB nil pointer dereference in github.com/pingcap/tidb

timer/

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/pingcap/tidb

README ¶

Timer framework usage

Timer framework usage

Introduction

Timer framework is an internal framework for TiDB to run tasks periodically. You can use it if you have below requirements:

Run a task periodically and the scheduling policy should fit some specified semantics. For example, run a task every 5 minutes, or run a task every day at 00:00:00.
Need to run the task in a distributed environment. For example, you have 3 TiDB servers, and you want to run the task on only one of them. And you also want to do some recovery work when the TiDB process restarts.
Need to read the persisted schedule runtime information of the timer to do some work somewhere like monitoring.
Your trigger action needs to meet some message delivery semantics such as at least once, at most once, or exactly once.

What timer framework is not suitable for you:

Some work that the frequency is too high. For example, you want to run a task every 1 second. Though you can set the scheduling policy with an interval of "1s", it is not efficient because every time it triggers, the framework will make some calls to the store. In this case, you should use a timer of Golang in your own code.
The timer framework only helps you to schedule a "trigger" action but not to schedule some heavy jobs across the nodes. That means the trigger action should be light-weighted and can be finished in a short time. If you want to schedule some heavy jobs, you should use some other distributed job scheduler like disttask in another package.

Usage

Timer client operations

Create a timer client

A client must be created from a store. The store is used to persist the runtime information of the timer. Currently, we use TiDB system table to store the information in the production environment. But if you want to do some simple tests in your own test case, you can use a memory store, for example:

package main

import (
	"fmt"
	"github.com/pingcap/tidb/timer/api"
)

func main() {
	store := api.NewMemoryTimerStore()
	client := api.NewDefaultTimerClient(store)
	fmt.Println(client.GetDefaultNamespace())
}

The method client.GetDefaultNamespace() returns the current namespace of the client. It is used to distinguish different tenants in the future, but now it is always default.

You can check the method tablestore.NewTableTimerStore to see how to create a store from a TiDB system table.

Create a timer with the client

If you want to do something every 1 hour, you can create a timer like this:

timer, err := client.CreateTimer(ctx, api.TimerSpec{
    Key:             "/your/timer/key",
    SchedPolicyType: api.SchedEventInterval,
    SchedPolicyExpr: "1h",
    HookClass:       "your.hook.class.name",
    Watermark:       time.Now(),
    Data:            []byte("yourdata"),
    Enable:          true,
})

if err != nil {
    // handle err
}

fmt.Printf("created timer id: %s\n", timer.ID)

The specified Key field is unique under a namespace. If you want to create a timer with the same key, you must delete the old one with the same key first if it exists.

To custom the timer schedule policy, you can specify the values of fields SchedPolicyType and SchedPolicyExpr. Currently, we support 2 types of schedule policy:

INTERVAL, triggers with a fixed interval, for example, 1h means every 1 hour.
CRON, triggers with a cron expression, for example, 0 0 0 * * * means every day at 00:00:00.

The field HookClass is optional. It tells the framework which hook class to use to trigger the action. If you do not specify it, the framework will set the timer's event status to TRIGGER without calling any custom hook class code.

The field Watermark is optional. If it is set, the timer will not trigger until there is an event not triggered between the watermark and now. If it is not set, the timer will try triggering the timer immediately after it is created.

The field Enable is optional. But if you want to keep the timer to schedule the new events, you must set it to true. If you set this value to false, the timer will stop scheduling any new event until you set it to true again.

The Data field is also optional, you can set any binary data you want. When the timer triggers, the framework will pass the data to the hook class.

When a timer is created successfully, the field ID will be assigned automatically. You can use this ID to do some operations on the timer.

Query timers with the client

You can use the client to query timer(s), for example:

// query timers with the specified key prefix
timers, err := client.GetTimers(context.TODO(), api.WithKeyPrefix("/your/timer/"))
if err != nil {
	// handle err ...
}

// query timer with the specified key
timer, err := client.GetTimerByKey(ctx, "/your/timer/key")
if err != nil {
    // handle err ...
}

// query timer with the specified id
timer, err := client.GetTimerByID(ctx, "123")
if err != nil {
    // handle err ...
}

Update a timer

You can update a timer's metadata with the method UpdateTimer, for example:

// update the schedule policy
if err := client.UpdateTimer(ctx, timer.ID, api.WithSetSchedExpr(api.SchedEventCron, "0 0 * * * *")); err != nil {
    // handle error
}

// update the watermark
if err := client.UpdateTimer(ctx, timer.ID, api.WithSetWatermark(time.Now())); err != nil {
    // handle error
}

Delete a timer

You can delete a timer with the method DeleteTimer, for example:

exist, err := client.DeleteTimer(ctx, timer.ID)
if err != nil {
    // handle error
}

An extra value exist is returned. If the timer exists, it will be deleted and exist will be true. Otherwise, exist will be false.

Set up a runtime to schedule timers

If you only create a timer through a client, the timer will not run by default. You must set up a timer runtime to schedule the timers. You can create a new runtime.TimerGroupRuntime and then start it, for example:

package main

import "github.com/pingcap/tidb/timer/api"

func main() {
    ctx := ... // some go context
    store := ... // Create a timer store.
    rt := runtime.NewTimerRuntimeBuilder("myGroup", store).
        RegisterHookFactory("your.hook.class.name", MyTimerHookFactory).
        SetCond(&api.TimerCond{
            Key: api.NewOptionalVal("/"), KeyPrefix: true,
        }).
        Build()
	
    rt.Start()
    defer rt.Stop()

    <-ctx.Done()
}

We use a builder to build a new timer. You can specify a group name when creating it. The group name is only used by monitoring now. RegisterHookFactory registers your custom hook class which will be called when the runtime trys to trigger a event. SetCond sets a condition and the runtime will only schedule the timers which match the condition. If you do not set the condition, the runtime will schedule all the timers.

If you start more than one timer runtimes with overlapped conditions, the hook functions have a probability to be called more than once. So it's better to use some distributed lock to ensure that only one runtime is running.

In the future, we are going to provide some preserved runtimes in TiDB system, so that you can use them directly without creating a new one.

Custom your hook class

You can implement your own hook class by implementing the interface api.HookClass, for example:

func MyTimerHookFactory(hookClass string, cli api.TimerClient) api.Hook {
    return &MyTimerHook{
        cli: cli,
    }
}

type MyTimerHook struct {
	cli api.TimerClient
}

func (h *MyTimerHook) Start() {
	// You should do some init works here.
}

func (h *MyTimerHook) Stop() {
	// You should do some clear works here.
}

func (h *MyTimerHook) OnPreSchedEvent(ctx context.Context, event api.TimerShedEvent) (api.PreSchedEventResult, error) {
	fmt.Printf("OnPreSchedEvent: %s\n", event.EventID())
	return api.PreSchedEventResult{}, nil
}

func (h *MyTimerHook) OnSchedEvent(ctx context.Context, event api.TimerShedEvent) error {
    fmt.Printf("OnSchedEvent: %s, event start: %s\n", event.EventID(), event.Timer().EventStart)
	return h.cli.CloseTimerEvent(ctx, event.Timer().ID, event.EventID())
}

The function MyTimerHookFactory is used as a parameter of RegisterHookFactory in the previous section. It is used to create a new hook instance when the runtime needs it. A parameter cli with type api.TimerClient will be provided to help to construct the hook, and the hook can use the client to do some further timer operations.

All the timers with the same hook class will share the same hook instance. When the runtime finds a timer that is ready to trigger, it will first find whether the corresponding hook is created. If not, it will call the hook factory to create a new hook instance. Then it will call the hook's Start method to do some preparation work. The Start method will only be called once. After that, for each trigger action, the runtime will call the hook's OnPreSchedEvent method to do some pre-trigger works.

OnPreSchedEvent returns a PreSchedEventResult object to tell the runtime what to do next. In most cases, we just need to return an empty PreSchedEventResult to continue the trigger action. But sometimes you may want to delay the schedule for some reason such as some schedule window limits required by business. You can delay the schedule like this:

func shouldDelaySchedule() bool {
	// your function to check whether to delay a timer's schedule
}

func (h *MyTimerHook) OnPreSchedEvent(ctx context.Context, event api.TimerShedEvent) (api.PreSchedEventResult, error) {
    if shouldDelaySchedule() {
        return api.PreSchedEventResult{
            Delay: time.Minute,
        }, nil
    }   
    return api.PreSchedEventResult{}, nil
}

In the above example, the delay time is 1 minute. After 1 minute, OnPreSchedEvent will be called again, and the trigger action will not continue until the method OnPreSchedEvent returns no delay.

If you want to carry some extra data to the trigger action, you can set the PreSchedEventResult's EventData field. The data will be passed to the trigger action. For example:

func (h *MyTimerHook) OnPreSchedEvent(ctx context.Context, event api.TimerShedEvent) (api.PreSchedEventResult, error) {
    if shouldDelaySchedule() {
        return api.PreSchedEventResult{
            EventData: []byte("mycustomeventdata"),
        }, nil
    }   
    return api.PreSchedEventResult{}, nil
}

The EventData will be persisted before the action is actually triggered. So you can use it to do some recovery work if the TiDB process restarts.

After PreSchedEventResult indicates to continue the trigger action, the event meta including the EventData will then be persisted to the store. After that, if you query this timer with the client, you can see the EventStatus changes to TRIGGER from IDLE, and the EventID is also not empty.

Then the hook method OnSchedEvent will be called. You can do some real trigger action here. For example, you can send a message to a message queue or submit some job to an outer job system. If your job is not very heavy, you can even do it directly in this method. Please notice that the method OnSchedEvent will only be called once if it returns no error. Only some cases may cause the method OnSchedEvent to be called again:

The method OnSchedEvent returns an error.
The runtime restarts and finds a timer's EventStatus is TRIGGER.

You should close the timer event by calling the client's method CloseTimerEvent to reset the timer's state to IDLE, or the timer will not be able to schedule new events. By default, closing a timer will also reset the Watermark field to the value of EventStart, but you can specify the next watermark manually. For example:

func (h *MyTimerHook) OnSchedEvent(ctx context.Context, event api.TimerShedEvent) error {
    // do some things ...
	
	// close the timer event so that the next event can be scheduled
    return h.cli.CloseTimerEvent(
        ctx, 
        event.Timer().ID, 
        event.EventID(),  
        api.WithSetWatermark(time.Now()), // use the current time as the next watermark instead of the event start time
    )
}

FAQs

Q: What is the difference between the attributes ID and Key of the timer, they are all unique. A: The key difference is that the Key is specified by the user but the ID is generated by the framework. The ID is the real identifier of a timer because it is "absolutely" unique because any two timers cannot have the same ID even if one timer is deleted. But it is allowed to create a timer with a Key that is the same as another timer which deleted before, and it is also allowed to create timers with the same Key in different namespaces. So it is better to use ID to log events from a certain timer. But Key is also useful because it can be meaningful. Key can be used to do some semantic works like querying or ensuring the idempotent creation of timers for certain uses.
Q: What is the behavior of the timer framework if the TiDB restarts, especially if it shuts down for a long time? A: When the TiDB restarts (which means the timer runtime restarts), the runtime will reload all timers from the store and check their state. If a timer's EventStatus is TRIGGER, the runtime will try to trigger the event again by calling the hook's method OnSchedEvent (OnPreSchedEvent will not be called). If the EventStatus is IDLE, the runtime will try to trigger the event if the timer is time to trigger. Notice that when a runtime shuts down for a long time, there may be many time points to trigger between Watermark and the current time, but the hook will still be triggered only once. For example, one timer has a cron expression 0 * * * * * indicating that it should trigger every one hour. If the TiDB shuts down, leave this timer with a watermark 2022-01-01 09:58:00 and then restart at 2022-01-01 12:01:00, and it tries to trigger the timer. However, there are 3 time points which are not triggered. The hook will still be invoked once with the EventStart value 2022-01-01 12:01:00 and Watmerkark value 2022-01-01 09:58:00. It is up to the hook class to determine how to handle this case. For example, it can do the trigger action 3 times in one OnSchedEvent call or just do it once and then close the timer with the new watermark 2022-01-01 10:00:00. For the second case, the timer will be triggered again after a very short time because the next expected trigger time 2022-01-01 11:00:00 is still before the current time.
Q: Is it thread-safe for the hook call such as Start, Stop, OnPreSchedEvent, OnSchedEvent? A: Yes, it is thread-safe. They are called in one goroutine now, there is no data race without a lock for variables that are only read or written in these methods. But if you create a goroutine manually and access these variables in the background, you should also use a lock to protect them. In the future, the trigger method such as OnPreSchedEvent and OnSchedEvent may be called in concurrency of different timers, but it will also be guaranteed that these methods for the same timer will be called in sequence and guarantees the happens-before semantics.
Q: Will the hook calls from different timers be blocked by each other if some hook calls take a long time? A: Yes, it may happen if two timers share the same hook and one timer's call is slow. The reason is timers that share one hook will be triggered in one goroutine (or maybe a pool with some fixed goroutines in the future). To resolve this issue, you can just do the trigger action background and return a non-error value immediately in the foreground call. Notice that if you do this, you should retry yourself when the trigger action fails.

Directories ¶

Path	Synopsis
api
metrics
runtime
tablestore

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL