Documentation ¶
Index ¶
- type ArgOpts
- type FDSet
- func (s *FDSet) AddConstants(cons intset.FastIntSet)
- func (s *FDSet) AddEquivalence(from, to intset.FastIntSet)
- func (s *FDSet) AddFrom(fds *FDSet)
- func (s *FDSet) AddLaxFunctionalDependency(from, to intset.FastIntSet)
- func (s *FDSet) AddNCFunctionalDependency(from, to, nc intset.FastIntSet, strict, equiv bool)
- func (s *FDSet) AddStrictFunctionalDependency(from, to intset.FastIntSet)
- func (s FDSet) AllCols() intset.FastIntSet
- func (s *FDSet) ClosureOfLax(colSet intset.FastIntSet) intset.FastIntSet
- func (s *FDSet) ClosureOfStrict(colSet intset.FastIntSet) intset.FastIntSet
- func (s *FDSet) ConstantCols() intset.FastIntSet
- func (s *FDSet) EquivalenceCols() (eqs []*intset.FastIntSet)
- func (s FDSet) FindPrimaryKey() (*intset.FastIntSet, bool)
- func (s *FDSet) InClosure(setA, setB intset.FastIntSet) bool
- func (s *FDSet) IsHashCodeRegistered(hashCode string) (int, bool)
- func (s *FDSet) MakeCartesianProduct(rhs *FDSet)
- func (s *FDSet) MakeNotNull(notNullCols intset.FastIntSet)
- func (s *FDSet) MakeNullable(nullableCols intset.FastIntSet)
- func (s *FDSet) MakeOuterJoin(innerFDs, filterFDs *FDSet, outerCols, innerCols intset.FastIntSet, ...)
- func (s *FDSet) MaxOneRow(cols intset.FastIntSet)
- func (s *FDSet) ProjectCols(cols intset.FastIntSet)
- func (s *FDSet) ReduceCols(colSet intset.FastIntSet) intset.FastIntSet
- func (s *FDSet) RegisterUniqueID(hashCode string, uniqueID int)
- func (s *FDSet) String() string
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type FDSet ¶
type FDSet struct { // NotNullCols is used to record the columns with not-null attributes applied. // eg: {1} ~~> {2,3}, when {2,3} not null is applied, it actually does nothing. // but we should record {2,3} as not-null down for the convenience of transferring // Lax FD: {1} ~~> {2,3} to strict FD: {1} --> {2,3} with {1} as not-null next time. NotNullCols intset.FastIntSet // HashCodeToUniqueID map the expression's hashcode to a statement allocated unique // ID quite like the unique ID bounded with column. It's mainly used to add the expr // to the fdSet as an extended column. <NOT CONCURRENT SAFE FOR NOW> HashCodeToUniqueID map[string]int // GroupByCols is used to record columns / expressions that under the group by phrase. GroupByCols intset.FastIntSet HasAggBuilt bool // contains filtered or unexported fields }
FDSet is the main portal of functional dependency, it stores the relationship between (extended table / physical table)'s columns. For more theory about this design, ref the head comments in the funcdep/doc.go.
func (*FDSet) AddConstants ¶
func (s *FDSet) AddConstants(cons intset.FastIntSet)
AddConstants adds a strict FD to the source which indicates that each of the given column have the same constant value for all rows, or same null value for all rows if it's nullable.
{} --> {a}
there are several usage for this kind of FD, for example, constant FD column can exist in the select fields not matter whatever the group by column is; distinct predicate can be eliminated for these columns as well.
when adding columns, be cautious that 1: constant can be propagated in the equivalence/strict closure, turning strict FD as a constant one. 2: constant can simplify the strict FD both in determinants side and dependencies side. 3: constant can simplify the lax FD in the dependencies side.
func (*FDSet) AddEquivalence ¶
func (s *FDSet) AddEquivalence(from, to intset.FastIntSet)
AddEquivalence take two column id as parameters, establish a strict equivalence between this two column which may be enclosed a superset of equivalence closure in the fdSet.
For the equivalence storage, for simplicity, we only keep the superset relationship of equivalence since equivalence has the reflexivity.
eg: a==b, b==c, a==e
in our fdSet, the equivalence will be stored like: {a, b, c, e} == {a, b, c,e} According and characteristic and reflexivity, each element in the equivalence enclosure can derive whatever in the same enclosure.
func (*FDSet) AddFrom ¶
AddFrom merges two FD sets by adding each FD from the given set to this set. Since two different tables may have some column ID overlap, we better use column unique ID to build the FDSet instead.
func (*FDSet) AddLaxFunctionalDependency ¶
func (s *FDSet) AddLaxFunctionalDependency(from, to intset.FastIntSet)
AddLaxFunctionalDependency is to add `LAX` functional dependency to the fdGraph.
func (*FDSet) AddNCFunctionalDependency ¶
func (s *FDSet) AddNCFunctionalDependency(from, to, nc intset.FastIntSet, strict, equiv bool)
AddNCFunctionalDependency is to add conditional functional dependency to the fdGraph.
func (*FDSet) AddStrictFunctionalDependency ¶
func (s *FDSet) AddStrictFunctionalDependency(from, to intset.FastIntSet)
AddStrictFunctionalDependency is to add `STRICT` functional dependency to the fdGraph.
func (FDSet) AllCols ¶
func (s FDSet) AllCols() intset.FastIntSet
AllCols returns all columns in the current set.
func (*FDSet) ClosureOfLax ¶
func (s *FDSet) ClosureOfLax(colSet intset.FastIntSet) intset.FastIntSet
ClosureOfLax is exported for outer usage.
func (*FDSet) ClosureOfStrict ¶
func (s *FDSet) ClosureOfStrict(colSet intset.FastIntSet) intset.FastIntSet
ClosureOfStrict is exported for outer usage.
func (*FDSet) ConstantCols ¶
func (s *FDSet) ConstantCols() intset.FastIntSet
ConstantCols returns the set of columns that will always have the same value for all rows in table.
func (*FDSet) EquivalenceCols ¶
func (s *FDSet) EquivalenceCols() (eqs []*intset.FastIntSet)
EquivalenceCols returns the set of columns that are constrained to equal to each other.
func (FDSet) FindPrimaryKey ¶
func (s FDSet) FindPrimaryKey() (*intset.FastIntSet, bool)
FindPrimaryKey checks whether there's a key in the current set which implies key -> all cols.
func (*FDSet) InClosure ¶
func (s *FDSet) InClosure(setA, setB intset.FastIntSet) bool
InClosure is used to judge whether fd: setA -> setB can be inferred from closure s. It's a short-circuit version of the `closureOf`.
func (*FDSet) IsHashCodeRegistered ¶
IsHashCodeRegistered checks whether the given hashcode has been registered in the current set.
func (*FDSet) MakeCartesianProduct ¶
MakeCartesianProduct records fdSet after the impact of Cartesian Product of (T1 x T2) is made. 1: left FD is reserved. 2: right FD is reserved. Actually, for two independent table, FDs won't affect (implies or something) each other, appending them together is adequate. But for constant FDs, according to our definition, we should merge them as a larger superset pointing themselves.
func (*FDSet) MakeNotNull ¶
func (s *FDSet) MakeNotNull(notNullCols intset.FastIntSet)
MakeNotNull modify the FD set based the listed column with NOT NULL flags. Most of the case is used in the derived process after predicate evaluation, which can upgrade lax FDs to strict ones.
func (*FDSet) MakeNullable ¶
func (s *FDSet) MakeNullable(nullableCols intset.FastIntSet)
MakeNullable make the fd's NotNullCols to be cleaned, after the both side fds are handled it can be nullable.
func (*FDSet) MakeOuterJoin ¶
func (s *FDSet) MakeOuterJoin(innerFDs, filterFDs *FDSet, outerCols, innerCols intset.FastIntSet, opt *ArgOpts)
MakeOuterJoin generates the records the fdSet of the outer join.
We always take the left side as the row-supplying side and the right side as the null-supplying side. (swap it if not) As we know, the outer join would generate null extended rows compared with the inner join. So we cannot do the same thing directly with the inner join. This function deals with the special cases of the outer join.
Knowledge: 1: the filter condition related to the lhs column won't filter predicate-allowed rows and refuse null rows (left rows always remain) 2: the filter condition related to the rhs column won't filter NULL rows, although the filter has the not-null attribute. (null-appending happened after that)
Notification: 1: the origin FD from the left side (rows-supplying) over the result of outer join filtered are preserved because
it may be duplicated by multi-matching, but actually, they are the same left rows (don't violate FD definition).
2: the origin FD from the right side (nulls-supplying) over the result of outer join filtered may not be valid anymore.
<1> strict FD may be wakened as a lax one. But if at least one non-NULL column is part of the determinant, the strict FD can be preserved. a b | c d e ------+---------------- 1 1 | 1 NULL 1 1 2 | NULL NULL NULL 2 1 | NULL NULL NULL left join with (a,b) * (c,d,e) on (a=c and b=1), if there is a strict FD {d} -> {e} on the rhs. After supplied with null values, {d} -> {e} are degraded to a lax one {d} ~~> {e} as you see. the origin and supplied null value for d column determine different dependency. NULL -> 1 and NULL -> NULL which breaks strict FD definition. If the determinant contains at least a not null column for example c here, FD like {c,d} -> {e} can survive after the left join. Because you can not find two same key, one from the origin rows and the other one from the supplied rows. for lax FD, the supplied rows of null values don't affect lax FD itself. So we can keep it. <2> The FDSet should remove constant FD since null values may be substituted for some unmatched left rows. NULL is not a constant anymore. <3> equivalence FD should be removed since substituted null values are not equal to the other substituted null value.
3: the newly added FD from filters should take some consideration as below:
<1> strict/lax FD: join key filter conditions can not produce new strict/lax FD yet (knowledge: 1&2). <2> constant FD from the join conditions is only used for checking other FD. We cannot keep itself. a b | c d -------+--------- 1 1 | 1 1 1 2 | NULL NULL left join with (a,b) * (c,d) on (a=c and d=1), some rhs rows will be substituted with null values, and FD on rhs {d=1} are lost.
// a b | c d // -------+--------- // 1 1 | 1 1 // 1 2 | NULL NULL
left join with (a,b) * (c,d) on (a=c and b=1), it only gives the pass to the first matching, lhs other rows are still kept and appended with null values. So the FD on rhs {b=1} are not applicable to lhs rows. above all: constant FD are lost <3.1> equivalence FD: when the left join conditions only contain equivalence FD (EFD for short below) across left and right cols and no other `LEFT` condition on the (left-side cols except the cols in EFD's from) to filter the left join results. We can maintain the strict FD from EFD's `from` side to EFD's `to` side over the left join result.
// a b | c d e // ------+---------------- // 1 1 | 1 NULL 1 // 1 2 | NULL NULL NULL // 2 1 | NULL NULL NULL
Neg eg: left join with (a,b) * (c,d,e) on (a=c and b=1), other b=1 will filter the result, causing the left row (1, 2) miss matched with right row (1, null 1) by a=c, consequently leading the left row appended as (1,2,null,null,null), which will break the FD: {a} -> {c} for key a=1 with different c=1/null.
// a b | c d e // ------+---------------- // 1 1 | NULL NULL NULL // 2 1 | NULL NULL NULL
Pos eg: if the filter is on EFD's `from` cols, it's ok. Let's say: (a,b) * (c,d,e) on (a=c and a=2), a=2 only won't leading same key a with matched c and mismatched NULL, neg case result is changed as above, so strict FD {a} -> {c} can exist.
// a b | c d e // ------+---------------- // 1 1 | 1 NULL 1 // 1 2 | NULL NULL NULL // 2 1 | NULL NULL NULL
Neg eg: left join with (a,b) * (c,d,e) on (a=c and b=c), two EFD here, where b=c can also refuse some rows joined by a=c, consequently applying it with NULL as (1 2 | NULL NULL NULL), leading the same key a has different value 1/NULL. But macroscopically, we can combine {a,b} together as the strict FD's from side, so new FD {a,b} -> {c} is secured. For case of (a=c and b=ce), the FD is {a, b} -> {c, e} conclusion: without this kind of limited left conditions to judge the join match, we can say: FD {a} -> {c} exists. <3.2> equivalence FD: when the determinant and dependencies from an equivalence FD of join condition are each covering a strict FD of the left / right side. After joining, we can extend the left side strict FD's dependencies to all cols.
// a b | c d e // ------+---------------- // 1 1 | 1 NULL 1 // 2 2 | NULL NULL NULL // 3 1 | NULL NULL NULL
left join with (a,b) * (c,d,e) on (a=c and b=1). Supposing that left `a` are strict Key and right `c` are strict Key too. Key means the strict FD can determine all cols from that table. case 1: left join matched one left row match one / multi rows from right side, since `c` is strict determine all cols from right table, so {a} == {b} --> {all cols in right}, according to the transitive rule of strict FD, we get {a} --> {all cols in right} case 2: left join miss match miss matched rows from left side are unique originally, even appended with NULL value from right side, they are still strictly determine themselves and even the all rows after left join. conclusion combined: If there is an equivalence covering both strict Key from the right and left, we can create a new strict FD: {columns of the left side of the join in the equivalence} -> {all columns after join}. <3.3> equivalence FD: let's see equivalence FD as double-directed strict FD from join equal conditions, and we only keep the rhs ~~> lhs.
// a b | c d e // ------+---------------- // 1 1 | 1 NULL 1 // 1 2 | NULL NULL NULL // 2 1 | NULL NULL NULL
left join with (a,b) * (c,d,e) on (a=c and b=1). From the join equivalence condition can derive a new FD {ac} == {ac}. while since there are some supplied null value in the c column, we don't guarantee {ac} == {ac} yet, so do {a} -> {c} because two same determinant key {1} can point to different dependency {1} & {NULL}. But in return, FD like {c} -> {a} are degraded to the corresponding lax one.
4: the new formed FD {left primary key, right primary key} -> {all columns} are preserved in spite of the null-supplied rows. 5: There's no join key and no filters from the outer side. The join case is a cartesian product. In this case,
the strict equivalence classes still exist. - If the right side has no row, we will supply null-extended rows, then the value of any column is NULL, and the equivalence class exists. - If the right side has rows, no row is filtered out after the filters since no row of the outer side is filtered out. Hence, the equivalence class remains.
func (*FDSet) MaxOneRow ¶
func (s *FDSet) MaxOneRow(cols intset.FastIntSet)
MaxOneRow will regard every column in the fdSet as a constant. Since constant is stronger that strict FD, it will take over all existed strict/lax FD, only keeping the equivalence. Because equivalence is stronger than constant.
f: {a}--> {b,c}, {abc} == {abc} cols: {a,c} result: {} --> {a,c}, {a,c} == {a,c}
func (*FDSet) ProjectCols ¶
func (s *FDSet) ProjectCols(cols intset.FastIntSet)
ProjectCols projects FDSet to the target columns Formula: Strict decomposition FD4A: If X −→ Y Z then X −→ Y and X −→ Z. Lax decomposition FD4B: If X ~→ Y Z and I(R) is Y-definite then X ~→ Z.
func (*FDSet) ReduceCols ¶
func (s *FDSet) ReduceCols(colSet intset.FastIntSet) intset.FastIntSet
ReduceCols is used to minimize the determinants in one fd input. function dependency = determinants -> dependencies given: AB -> XY, once B can be inferred from current closure when inserting, take A -> XY instead.
func (*FDSet) RegisterUniqueID ¶
RegisterUniqueID is used to record the map relationship between expr and allocated uniqueID.