filter

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 23, 2023 License: Apache-2.0 Imports: 8 Imported by: 0

README

Table Filter

A table filter is an interface which determines if a table or schema should be accepted for some process or not given its name.

This package defines the format allowing users to specify the filter criteria via command line or config files. This package is used by all tools in the TiDB ecosystem.

Examples

package main

import (
    "fmt"

    "github.com/wuhuizuo/tidb6/util/table-filter"
    "github.com/spf13/pflag"
)

func main() {
    args := pflag.StringArrayP("filter", "f", []string{"*.*"}, "table filter")
    pflag.Parse()

    f, err := filter.Parse(*args)
    if err != nil {
            panic(err)
    }
    f = filter.CaseInsensitive(f)

    tables := []filter.Table{
        {Schema: "employees", Name: "employees"},
        {Schema: "employees", Name: "departments"},
        {Schema: "employees", Name: "dept_manager"},
        {Schema: "employees", Name: "dept_emp"},
        {Schema: "employees", Name: "titles"},
        {Schema: "employees", Name: "salaries"},
        {Schema: "AdventureWorks.Person", Name: "Person"},
        {Schema: "AdventureWorks.Person", Name: "Password"},
        {Schema: "AdventureWorks.Sales", Name: "SalesOrderDetail"},
        {Schema: "AdventureWorks.Sales", Name: "SalesOrderHeader"},
        {Schema: "AdventureWorks.Production", Name: "WorkOrder"},
        {Schema: "AdventureWorks.Production", Name: "WorkOrderRouting"},
        {Schema: "AdventureWorks.Production", Name: "ProductPhoto"},
        {Schema: "AdventureWorks.Production", Name: "TransactionHistory"},
        {Schema: "AdventureWorks.Production", Name: "TransactionHistoryArchive"},
    }

    for _, table := range tables {
        fmt.Printf("%5v: %v\n", f.MatchTable(table.Schema, table.Name), table)
    }
}

Try to run with ./main -f 'employee.*' -f '*.WorkOrder' and see the result.

Syntax

Allowlist

The input to the filter.Parse() function is a list of table filter rules. Each rule specifies what the fully-qualified name of the table to be accepted.

db1.tbl1
db2.tbl2
db3.tbl3

A plain name must only consist of valid identifier characters [0-9a-zA-Z$_\U00000080-\U0010ffff]+. All other ASCII characters are reserved. Some punctuations have special meanings, described below.

Wildcards

Each part of the name can be a wildcard symbol as in fnmatch(3):

  • * — matches zero or more characters
  • ? — matches one character
  • [a-z] — matches one character between “a” and “z” inclusive
  • [!a-z] — matches one character except “a” to “z”.
db[0-9].tbl[0-9][0-9]
data.*
*.backup_*

“Character” here means a Unicode code point, so e.g.

  • U+00E9 (é) is 1 character.
  • U+0065 U+0301 (é) are 2 characters.
  • U+1F926 U+1F3FF U+200D U+2640 U+FE0F (🤦🏿‍♀️) are 5 characters.
File import

Include an @ at the beginning of the string to specify a file name, which filter.Parse() reads every line as filter rules.

For example, if a file config/filter.txt has content:

employees.*
*.WorkOrder

the following two invocations would be equivalent:

./main -f '@config/filter.txt'
./main -f 'employees.*' -f '*.WorkOrder'

A filter file cannot further import another file.

Comments and blank lines

Leading and trailing white-spaces of every line are trimmed.

Blank lines (empty strings) are ignored.

A leading # marks a comment and is ignored. # not at start of line may be considered syntax error.

Blocklist

An ! at the beginning of the line means the pattern after it is used to exclude tables from being processed. This effectively turns the filter into a blocklist.

*.*
#^ note: must add the *.* to include all tables first
!*.Password
!employees.salaries
Escape character

Precede any special character by a \ to turn it into an identifier character.

AdventureWorks\.*.*

For simplicity and future compatibility, the following sequences are prohibited:

  • \ at the end of the line after trimming whitespaces (use “[ ]” to match a literal whitespace at the end).
  • \ followed by any ASCII alphanumeric character ([0-9a-zA-Z]). In particular, C-like escape sequences like \0, \r, \n and \t currently are meaningless.
Quoted identifier

Besides \, special characters can also be escaped by quoting using " or `.

"AdventureWorks.Person".Person
`AdventureWorks.Person`.Password

Quoted identifier cannot span multiple lines.

It is invalid to partially quote an identifier.

"this is "invalid*.*
Regular expression

Use / to delimit regular expressions:

/^db\d{2,}$/./^tbl\d{2,}$/

These regular expressions use the Go dialect. The pattern is matched if the identifier contains a substring matching the regular expression. For instance, /b/ matches db01.

(Note: every / in the regex must be escaped as \/, including inside []. You cannot place an unescaped / between \Q\E.)

Algorithm

Default behavior

When a table name matches none of the rules in the filter list, the default behavior is to ignore such unmatched tables.

To build a blocklist, an explicit *.* must be used as the first rule, otherwise all tables will be excluded.

# every table will be filtered out
./main -f '!*.Password'

# only the "Password" table is filtered out, the rest are included.
./main -f '*.*' -f '!*.Password'
Precedence

In a filter list, if a table name matches multiple patterns, the last match decides the outcome. For instance, given

# rule 1
employees.*
# rule 2
!*.dep*
# rule 3
*.departments

We get:

Table name Rule 1 Rule 2 Rule 3 Outcome
irrelevant.table Default (reject)
employees.employees Rule 1 (accept)
employees.dept_emp Rule 2 (reject)
employees.departments Rule 3 (accept)
else.departments Rule 3 (accept)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ColumnFilter

type ColumnFilter interface {
	// MatchColumn checks if a column can be processed after applying the columnFilter.
	MatchColumn(column string) bool
}

ColumnFilter is a structure to check if a column should be included for processing.

func ParseColumnFilter

func ParseColumnFilter(args []string) (ColumnFilter, error)

ParseColumnFilter a columnFilter from a list of serialized columnFilter rules. Column is not case-sensitive on any platform, nor are column aliases. So the parsed columnFilter is case-insensitive.

type Filter

type Filter interface {
	// MatchTable checks if a table can be processed after applying the tableFilter.
	MatchTable(schema string, table string) bool
	// MatchSchema checks if a schema can be processed after applying the tableFilter.
	MatchSchema(schema string) bool
	// contains filtered or unexported methods
}

Filter is a structure to check if a table should be included for processing.

func All

func All() Filter

All creates a tableFilter which matches everything.

func CaseInsensitive

func CaseInsensitive(f Filter) Filter

CaseInsensitive returns a new tableFilter which is the case-insensitive version of the input tableFilter.

func NewSchemasFilter

func NewSchemasFilter(schemas ...string) Filter

NewSchemasFilter creates a tableFilter which only accepts a list of schemas.

func NewTablesFilter

func NewTablesFilter(tables ...Table) Filter

NewTablesFilter creates a tableFilter which only accepts a list of tables.

func Parse

func Parse(args []string) (Filter, error)

Parse a tableFilter from a list of serialized tableFilter rules. The parsed tableFilter is case-sensitive by default.

func ParseMySQLReplicationRules

func ParseMySQLReplicationRules(rules *MySQLReplicationRules) (Filter, error)

ParseMySQLReplicationRules constructs up to 2 filters from the MySQLReplicationRules. Tables have to pass *both* filters to be processed.

type MySQLReplicationRules

type MySQLReplicationRules struct {
	// DoTables is an allowlist of tables.
	DoTables []*Table `json:"do-tables" toml:"do-tables" yaml:"do-tables"`
	// DoDBs is an allowlist of schemas.
	DoDBs []string `json:"do-dbs" toml:"do-dbs" yaml:"do-dbs"`

	// IgnoreTables is a blocklist of tables.
	IgnoreTables []*Table `json:"ignore-tables" toml:"ignore-tables" yaml:"ignore-tables"`
	// IgnoreDBs is a blocklist of schemas.
	IgnoreDBs []string `json:"ignore-dbs" toml:"ignore-dbs" yaml:"ignore-dbs"`
}

MySQLReplicationRules is a set of rules based on MySQL's replication tableFilter.

func (*MySQLReplicationRules) ToLower

func (r *MySQLReplicationRules) ToLower()

ToLower convert all entries to lowercase Deprecated: use `filter.CaseInsensitive` instead.

type Table

type Table struct {
	// Schema is the name of the schema (database) containing this table.
	Schema string `toml:"db-name" json:"db-name" yaml:"db-name"`
	// Name is the unqualified table name.
	Name string `toml:"tbl-name" json:"tbl-name" yaml:"tbl-name"`
}

Table represents a qualified table name.

func (*Table) Clone

func (t *Table) Clone() *Table

Clone clones a new filter.Table

func (*Table) String

func (t *Table) String() string

String implements the fmt.Stringer interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL