Documentation ¶
Overview ¶
Package discovery provides a way to discover all tablets e.g. within a specific shard and monitor their current health.
Use the HealthCheck object to query for tablets and their health.
For an example how to use the HealthCheck object, see worker/topo_utils.go.
Tablets have to be manually added to the HealthCheck using AddTablet(). Alternatively, use a Watcher implementation which will constantly watch a source (e.g. the topology) and add and remove tablets as they are added or removed from the source. For a Watcher example have a look at NewShardReplicationWatcher().
Each HealthCheck has a HealthCheckStatsListener that will receive notification of when tablets go up and down. TabletStatsCache is one implementation, that caches the known tablets and the healthy ones per keyspace/shard/tabletType.
Internally, the HealthCheck module is connected to each tablet and has a streaming RPC (StreamHealth) open to receive periodic health infos.
Index ¶
- Constants
- func IsReplicationLagHigh(tabletStats *TabletStats) bool
- func TabletToMapKey(tablet *topodatapb.Tablet) string
- func TrivialStatsUpdate(o, n *TabletStats) bool
- type FakeHealthCheck
- func (fhc *FakeHealthCheck) AddTablet(tablet *topodatapb.Tablet, name string)
- func (fhc *FakeHealthCheck) AddTestTablet(cell, host string, port int32, keyspace, shard string, ...) *sandboxconn.SandboxConn
- func (fhc *FakeHealthCheck) CacheStatus() TabletsCacheStatusList
- func (fhc *FakeHealthCheck) Close() error
- func (fhc *FakeHealthCheck) GetAllTablets() map[string]*topodatapb.Tablet
- func (fhc *FakeHealthCheck) GetConnection(key string) queryservice.QueryService
- func (fhc *FakeHealthCheck) RegisterStats()
- func (fhc *FakeHealthCheck) RemoveTablet(tablet *topodatapb.Tablet)
- func (fhc *FakeHealthCheck) Reset()
- func (fhc *FakeHealthCheck) SetListener(listener HealthCheckStatsListener, sendDownEvents bool)
- func (fhc *FakeHealthCheck) WaitForInitialStatsUpdates()
- type FilterByShard
- type HealthCheck
- type HealthCheckImpl
- func (hc *HealthCheckImpl) AddTablet(tablet *topodatapb.Tablet, name string)
- func (hc *HealthCheckImpl) CacheStatus() TabletsCacheStatusList
- func (hc *HealthCheckImpl) Close() error
- func (hc *HealthCheckImpl) GetConnection(key string) queryservice.QueryService
- func (hc *HealthCheckImpl) RegisterStats()
- func (hc *HealthCheckImpl) RemoveTablet(tablet *topodatapb.Tablet)
- func (hc *HealthCheckImpl) SetListener(listener HealthCheckStatsListener, sendDownEvents bool)
- func (hc *HealthCheckImpl) WaitForInitialStatsUpdates()
- type HealthCheckStatsListener
- type TabletRecorder
- type TabletStats
- type TabletStatsCache
- func (tc *TabletStatsCache) GetHealthyTabletStats(keyspace, shard string, tabletType topodatapb.TabletType) []TabletStats
- func (tc *TabletStatsCache) GetTabletStats(keyspace, shard string, tabletType topodatapb.TabletType) []TabletStats
- func (tc *TabletStatsCache) ResetForTesting()
- func (tc *TabletStatsCache) StatsUpdate(ts *TabletStats)
- func (tc *TabletStatsCache) WaitForAllServingTablets(ctx context.Context, ts topo.SrvTopoServer, cell string, ...) error
- func (tc *TabletStatsCache) WaitForTablets(ctx context.Context, cell, keyspace, shard string, ...) error
- type TabletStatsList
- type TabletsCacheStatus
- type TabletsCacheStatusList
- type TopologyWatcher
- func NewCellTabletsWatcher(topoServer topo.Server, tr TabletRecorder, cell string, ...) *TopologyWatcher
- func NewShardReplicationWatcher(topoServer topo.Server, tr TabletRecorder, cell, keyspace, shard string, ...) *TopologyWatcher
- func NewTopologyWatcher(topoServer topo.Server, tr TabletRecorder, cell string, ...) *TopologyWatcher
Constants ¶
const ( DefaultHealthCheckConnTimeout = 1 * time.Minute DefaultHealthCheckRetryDelay = 5 * time.Second DefaultHealthCheckTimeout = 1 * time.Minute )
See the documentation for NewHealthCheck below for an explanation of these parameters.
const ( // DefaultTopoReadConcurrency can be used as default value for the topoReadConcurrency parameter of a TopologyWatcher. DefaultTopoReadConcurrency int = 5 // DefaultTopologyWatcherRefreshInterval can be used as the default value for // the refresh interval of a topology watcher. DefaultTopologyWatcherRefreshInterval = 1 * time.Minute // HealthCheckTemplate is the HTML code to display a TabletsCacheStatusList HealthCheckTemplate = `` /* 649-byte string literal not displayed */ )
Variables ¶
This section is empty.
Functions ¶
func IsReplicationLagHigh ¶
func IsReplicationLagHigh(tabletStats *TabletStats) bool
IsReplicationLagHigh verifies that the given TabletStats refers to a tablet with high replication lag, i.e. higher than the configured discovery_low_replication_lag flag.
func TabletToMapKey ¶
func TabletToMapKey(tablet *topodatapb.Tablet) string
TabletToMapKey creates a key to the map from tablet's host and ports. It should only be used in discovery and related module.
func TrivialStatsUpdate ¶
func TrivialStatsUpdate(o, n *TabletStats) bool
TrivialStatsUpdate returns true iff the old and new TabletStats haven't changed enough to warrant re-calling FilterByReplicationLag.
Types ¶
type FakeHealthCheck ¶
type FakeHealthCheck struct {
// contains filtered or unexported fields
}
FakeHealthCheck implements discovery.HealthCheck.
func NewFakeHealthCheck ¶
func NewFakeHealthCheck() *FakeHealthCheck
NewFakeHealthCheck returns the fake healthcheck object.
func (*FakeHealthCheck) AddTablet ¶
func (fhc *FakeHealthCheck) AddTablet(tablet *topodatapb.Tablet, name string)
AddTablet adds the tablet and calls the listener.
func (*FakeHealthCheck) AddTestTablet ¶
func (fhc *FakeHealthCheck) AddTestTablet(cell, host string, port int32, keyspace, shard string, tabletType topodatapb.TabletType, serving bool, reparentTS int64, err error) *sandboxconn.SandboxConn
AddTestTablet inserts a fake entry into FakeHealthCheck. The Tablet can be talked to using the provided connection. The Listener is called, as if AddTablet had been called.
func (*FakeHealthCheck) CacheStatus ¶
func (fhc *FakeHealthCheck) CacheStatus() TabletsCacheStatusList
CacheStatus is not implemented.
func (*FakeHealthCheck) GetAllTablets ¶
func (fhc *FakeHealthCheck) GetAllTablets() map[string]*topodatapb.Tablet
GetAllTablets returns all the tablets we have.
func (*FakeHealthCheck) GetConnection ¶
func (fhc *FakeHealthCheck) GetConnection(key string) queryservice.QueryService
GetConnection returns the TabletConn of the given tablet.
func (*FakeHealthCheck) RegisterStats ¶
func (fhc *FakeHealthCheck) RegisterStats()
RegisterStats is not implemented.
func (*FakeHealthCheck) RemoveTablet ¶
func (fhc *FakeHealthCheck) RemoveTablet(tablet *topodatapb.Tablet)
RemoveTablet removes the tablet.
func (*FakeHealthCheck) Reset ¶
func (fhc *FakeHealthCheck) Reset()
Reset cleans up the internal state.
func (*FakeHealthCheck) SetListener ¶
func (fhc *FakeHealthCheck) SetListener(listener HealthCheckStatsListener, sendDownEvents bool)
SetListener is not implemented.
func (*FakeHealthCheck) WaitForInitialStatsUpdates ¶
func (fhc *FakeHealthCheck) WaitForInitialStatsUpdates()
WaitForInitialStatsUpdates is not implemented.
type FilterByShard ¶
type FilterByShard struct {
// contains filtered or unexported fields
}
FilterByShard is a TabletRecorder filter that filters tablets by keyspace/shard.
func NewFilterByShard ¶
func NewFilterByShard(tr TabletRecorder, filters []string) (*FilterByShard, error)
NewFilterByShard creates a new FilterByShard on top of an existing TabletRecorder. Each filter is a keyspace|shard entry, where shard can either be a shard name, or a keyrange. All tablets that match at least one keyspace|shard tuple will be forwarded to the underlying TabletRecorder.
func (*FilterByShard) AddTablet ¶
func (fbs *FilterByShard) AddTablet(tablet *topodatapb.Tablet, name string)
AddTablet is part of the TabletRecorder interface.
func (*FilterByShard) RemoveTablet ¶
func (fbs *FilterByShard) RemoveTablet(tablet *topodatapb.Tablet)
RemoveTablet is part of the TabletRecorder interface.
type HealthCheck ¶
type HealthCheck interface { // TabletRecorder interface adds AddTablet and RemoveTablet methods. // AddTablet adds the tablet, and starts health check on it. // RemoveTablet removes the tablet, and stops its StreamHealth RPC. TabletRecorder // RegisterStats registers the connection counts stats. // It can only be called on one Healthcheck object per process. RegisterStats() // SetListener sets the listener for healthcheck // updates. sendDownEvents is used when a tablet changes type // (from replica to master for instance). If the listener // wants two events (Up=false on old type, Up=True on new // type), sendDownEvents should be set. Otherwise, the // healthcheck will only send one event (Up=true on new type). // // Note that the default implementation requires to set the // listener before any tablets are added to the healthcheck. SetListener(listener HealthCheckStatsListener, sendDownEvents bool) // WaitForInitialStatsUpdates waits until all tablets added via // AddTablet() call were propagated to the listener via corresponding // StatsUpdate() calls. Note that code path from AddTablet() to // corresponding StatsUpdate() is asynchronous but not cancelable, thus // this function is also non-cancelable and can't return error. Also // note that all AddTablet() calls should happen before calling this // method. WaitForInitialStatsUpdates won't wait for StatsUpdate() calls // corresponding to AddTablet() calls made during its execution. WaitForInitialStatsUpdates() // GetConnection returns the TabletConn of the given tablet. GetConnection(key string) queryservice.QueryService // CacheStatus returns a displayable version of the cache. CacheStatus() TabletsCacheStatusList // Close stops the healthcheck. Close() error }
HealthCheck defines the interface of health checking module. The goal of this object is to maintain a StreamHealth RPC to a lot of tablets. Tablets are added / removed by calling the AddTablet / RemoveTablet methods (other discovery module objects can for instance watch the topology and call these).
Updates to the health of all registered tablet can be watched by registering a listener. To get the underlying "TabletConn" object which is used for each tablet, use the "GetConnection()" method below and pass in the Key string which is also sent to the listener in each update (as it is part of TabletStats).
func NewDefaultHealthCheck ¶
func NewDefaultHealthCheck() HealthCheck
NewDefaultHealthCheck creates a new HealthCheck object with a default configuration.
func NewHealthCheck ¶
func NewHealthCheck(connTimeout, retryDelay, healthCheckTimeout time.Duration) HealthCheck
NewHealthCheck creates a new HealthCheck object. Parameters: connTimeout.
The duration to wait until a health-check streaming connection is up. 0 means it should establish the connection in the background and return immediately.
retryDelay.
The duration to wait before retrying to connect (e.g. after a failed connection attempt).
healthCheckTimeout.
The duration for which we consider a health check response to be 'fresh'. If we don't get a health check response from a tablet for more than this duration, we consider the tablet not healthy.
type HealthCheckImpl ¶
type HealthCheckImpl struct {
// contains filtered or unexported fields
}
HealthCheckImpl performs health checking and notifies downstream components about any changes.
func (*HealthCheckImpl) AddTablet ¶
func (hc *HealthCheckImpl) AddTablet(tablet *topodatapb.Tablet, name string)
AddTablet adds the tablet, and starts health check. It does not block on making connection. name is an optional tag for the tablet, e.g. an alternative address.
func (*HealthCheckImpl) CacheStatus ¶
func (hc *HealthCheckImpl) CacheStatus() TabletsCacheStatusList
CacheStatus returns a displayable version of the cache.
func (*HealthCheckImpl) Close ¶
func (hc *HealthCheckImpl) Close() error
Close stops the healthcheck. After Close() returned, it's guaranteed that the listener isn't currently executing and won't be called again.
func (*HealthCheckImpl) GetConnection ¶
func (hc *HealthCheckImpl) GetConnection(key string) queryservice.QueryService
GetConnection returns the TabletConn of the given tablet.
func (*HealthCheckImpl) RegisterStats ¶
func (hc *HealthCheckImpl) RegisterStats()
RegisterStats registers the connection counts stats
func (*HealthCheckImpl) RemoveTablet ¶
func (hc *HealthCheckImpl) RemoveTablet(tablet *topodatapb.Tablet)
RemoveTablet removes the tablet, and stops the health check. It does not block.
func (*HealthCheckImpl) SetListener ¶
func (hc *HealthCheckImpl) SetListener(listener HealthCheckStatsListener, sendDownEvents bool)
SetListener sets the listener for healthcheck updates. It must be called after NewHealthCheck and before any tablets are added (either through AddTablet or through a Watcher).
func (*HealthCheckImpl) WaitForInitialStatsUpdates ¶
func (hc *HealthCheckImpl) WaitForInitialStatsUpdates()
WaitForInitialStatsUpdates waits until all tablets added via AddTablet() call were propagated to downstream via corresponding StatsUpdate() calls.
type HealthCheckStatsListener ¶
type HealthCheckStatsListener interface { // StatsUpdate is called when: // - a new tablet is known to the HealthCheck, and its first // streaming healthcheck is returned. (then ts.Up is true). // - a tablet is removed from the list of tablets we watch // (then ts.Up is false). // - a tablet dynamically changes its type. When registering the // listener, if sendDownEvents is true, two events are generated // (ts.Up false on the old type, ts.Up true on the new type). // If it is false, only one event is sent (ts.Up true on the new // type). StatsUpdate(*TabletStats) }
HealthCheckStatsListener is the listener to receive health check stats update.
type TabletRecorder ¶
type TabletRecorder interface { // AddTablet adds the tablet. // Name is an alternate name, like an address. AddTablet(tablet *topodatapb.Tablet, name string) // RemoveTablet removes the tablet. RemoveTablet(tablet *topodatapb.Tablet) }
TabletRecorder is the part of the HealthCheck interface that can add or remove tablets. We define it as a sub-interface here so we can add filters on tablets if needed.
type TabletStats ¶
type TabletStats struct { // Key uniquely identifies that serving tablet. It is computed // from the Tablet's record Hostname and PortMap. If a tablet // is restarted on different ports, its Key will be different. // Key is computed using the TabletToMapKey method below. // key can be used in GetConnection(). Key string // Tablet is the tablet object that was sent to HealthCheck.AddTablet. Tablet *topodatapb.Tablet // Name is an optional tag (e.g. alternative address) for the // tablet. It is supposed to represent the tablet as a task, // not as a process. For instance, it can be a // cell+keyspace+shard+tabletType+taskIndex value. Name string // Target is the current target as returned by the streaming // StreamHealth RPC. Target *querypb.Target // Up describes whether the tablet is added or removed. Up bool // Serving describes if the tablet can be serving traffic. Serving bool // TabletExternallyReparentedTimestamp is the last timestamp // that this tablet was either elected the master, or received // a TabletExternallyReparented event. It is set to 0 if the // tablet doesn't think it's a master. TabletExternallyReparentedTimestamp int64 // Stats is the current health status, as received by the // StreamHealth RPC (replication lag, ...). Stats *querypb.RealtimeStats // LastError is the error we last saw when trying to get the // tablet's healthcheck. LastError error }
TabletStats is returned when getting the set of tablets.
func FilterByReplicationLag ¶
func FilterByReplicationLag(tabletStatsList []*TabletStats) []*TabletStats
FilterByReplicationLag filters the list of TabletStats by TabletStats.Stats.SecondsBehindMaster. The algorithm (TabletStats that is non-serving or has error is ignored): - Return the list if there is 0 or 1 tablet. - Return the list if all tablets have <=30s lag. - Filter by replication lag: for each tablet, if the mean value without it is more than 0.7 of the mean value across all tablets, it is valid. - Make sure we return at least two tablets (if there is one with <2h replication lag). - If one tablet is removed, run above steps again in case there are two tablets with high replication lag. (It should cover most cases.) For example, lags of (5s, 10s, 15s, 120s) return the first three; lags of (30m, 35m, 40m, 45m) return all.
func RemoveUnhealthyTablets ¶
func RemoveUnhealthyTablets(tabletStatsList []TabletStats) []TabletStats
RemoveUnhealthyTablets filters all unhealthy tablets out. NOTE: Non-serving tablets are considered healthy.
func (*TabletStats) DeepEqual ¶
func (e *TabletStats) DeepEqual(f *TabletStats) bool
DeepEqual compares two TabletStats. Since we include protos, we need to use proto.Equal on these.
func (*TabletStats) String ¶
func (e *TabletStats) String() string
String is defined because we want to print a []*TabletStats array nicely.
type TabletStatsCache ¶
type TabletStatsCache struct {
// contains filtered or unexported fields
}
TabletStatsCache is a HealthCheckStatsListener that keeps both the current list of available TabletStats, and a serving list: - for master tablets, only the current master is kept. - for non-master tablets, we filter the list using FilterByReplicationLag. It keeps entries for all tablets in the cell it's configured to serve for, and for the master independently of which cell it's in. Note the healthy tablet computation is done when we receive a tablet update only, not at serving time. Also note the cache may not have the last entry received by the tablet. For instance, if a tablet was healthy, and is still healthy, we do not keep its new update.
func NewTabletStatsCache ¶
func NewTabletStatsCache(hc HealthCheck, cell string) *TabletStatsCache
NewTabletStatsCache creates a TabletStatsCache, and registers it as HealthCheckStatsListener of the provided healthcheck. Note we do the registration in this code to guarantee we call SetListener with sendDownEvents=true, as we need these events to maintain the integrity of our cache.
func NewTabletStatsCacheDoNotSetListener ¶
func NewTabletStatsCacheDoNotSetListener(cell string) *TabletStatsCache
NewTabletStatsCacheDoNotSetListener is identical to NewTabletStatsCache but does not automatically set the returned object as listener for "hc". Instead, it's up to the caller to ensure that TabletStatsCache.StatsUpdate() gets called properly. This is useful for chaining multiple listeners. When the caller sets its own listener on "hc", they must make sure that they set the parameter "sendDownEvents" to "true" or this cache won't properly remove tablets whose tablet type changes.
func (*TabletStatsCache) GetHealthyTabletStats ¶
func (tc *TabletStatsCache) GetHealthyTabletStats(keyspace, shard string, tabletType topodatapb.TabletType) []TabletStats
GetHealthyTabletStats returns only the healthy targets. The returned array is owned by the caller. For TabletType_MASTER, this will only return at most one entry, the most recent tablet of type master.
func (*TabletStatsCache) GetTabletStats ¶
func (tc *TabletStatsCache) GetTabletStats(keyspace, shard string, tabletType topodatapb.TabletType) []TabletStats
GetTabletStats returns the full list of available targets. The returned array is owned by the caller.
func (*TabletStatsCache) ResetForTesting ¶
func (tc *TabletStatsCache) ResetForTesting()
ResetForTesting is for use in tests only.
func (*TabletStatsCache) StatsUpdate ¶
func (tc *TabletStatsCache) StatsUpdate(ts *TabletStats)
StatsUpdate is part of the HealthCheckStatsListener interface.
func (*TabletStatsCache) WaitForAllServingTablets ¶
func (tc *TabletStatsCache) WaitForAllServingTablets(ctx context.Context, ts topo.SrvTopoServer, cell string, types []topodatapb.TabletType) error
WaitForAllServingTablets waits for at least one healthy serving tablet in the given cell for all keyspaces / shards before returning. It will return ctx.Err() if the context is canceled.
func (*TabletStatsCache) WaitForTablets ¶
func (tc *TabletStatsCache) WaitForTablets(ctx context.Context, cell, keyspace, shard string, types []topodatapb.TabletType) error
WaitForTablets waits for at least one tablet in the given cell / keyspace / shard before returning. The tablets do not have to be healthy. It will return ctx.Err() if the context is canceled.
type TabletStatsList ¶
type TabletStatsList []*TabletStats
TabletStatsList is used for sorting.
func (TabletStatsList) Less ¶
func (tsl TabletStatsList) Less(i, j int) bool
Less is part of sort.Interface
func (TabletStatsList) Swap ¶
func (tsl TabletStatsList) Swap(i, j int)
Swap is part of sort.Interface
type TabletsCacheStatus ¶
type TabletsCacheStatus struct { Cell string Target *querypb.Target TabletsStats TabletStatsList }
TabletsCacheStatus is the current tablets for a cell/target.
func (*TabletsCacheStatus) StatusAsHTML ¶
func (tcs *TabletsCacheStatus) StatusAsHTML() template.HTML
StatusAsHTML returns an HTML version of the status.
type TabletsCacheStatusList ¶
type TabletsCacheStatusList []*TabletsCacheStatus
TabletsCacheStatusList is used for sorting.
func (TabletsCacheStatusList) Len ¶
func (tcsl TabletsCacheStatusList) Len() int
Len is part of sort.Interface.
func (TabletsCacheStatusList) Less ¶
func (tcsl TabletsCacheStatusList) Less(i, j int) bool
Less is part of sort.Interface
func (TabletsCacheStatusList) Swap ¶
func (tcsl TabletsCacheStatusList) Swap(i, j int)
Swap is part of sort.Interface
type TopologyWatcher ¶
type TopologyWatcher struct {
// contains filtered or unexported fields
}
TopologyWatcher polls tablet from a configurable set of tablets periodically. When tablets are added / removed, it calls the TabletRecorder AddTablet / RemoveTablet interface appropriately.
func NewCellTabletsWatcher ¶
func NewCellTabletsWatcher(topoServer topo.Server, tr TabletRecorder, cell string, refreshInterval time.Duration, topoReadConcurrency int) *TopologyWatcher
NewCellTabletsWatcher returns a TopologyWatcher that monitors all the tablets in a cell, and starts refreshing.
func NewShardReplicationWatcher ¶
func NewShardReplicationWatcher(topoServer topo.Server, tr TabletRecorder, cell, keyspace, shard string, refreshInterval time.Duration, topoReadConcurrency int) *TopologyWatcher
NewShardReplicationWatcher returns a TopologyWatcher that monitors the tablets in a cell/keyspace/shard, and starts refreshing.
func NewTopologyWatcher ¶
func NewTopologyWatcher(topoServer topo.Server, tr TabletRecorder, cell string, refreshInterval time.Duration, topoReadConcurrency int, getTablets func(tw *TopologyWatcher) ([]*topodatapb.TabletAlias, error)) *TopologyWatcher
NewTopologyWatcher returns a TopologyWatcher that monitors all the tablets in a cell, and starts refreshing.
func (*TopologyWatcher) Stop ¶
func (tw *TopologyWatcher) Stop()
Stop stops the watcher. It does not clean up the tablets added to TabletRecorder.
func (*TopologyWatcher) WaitForInitialTopology ¶
func (tw *TopologyWatcher) WaitForInitialTopology() error
WaitForInitialTopology waits until the watcher reads all of the topology data for the first time and transfers the information to TabletRecorder via its AddTablet() method.