funcdep

package
v1.1.0-beta.0...-a59252d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 3, 2025 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ArgOpts

type ArgOpts struct {
	SkipFDRule331   bool
	OnlyInnerFilter bool
	InnerIsFalse    bool
}

ArgOpts contains some arg used for FD maintenance.

type FDSet

type FDSet struct {

	// NotNullCols is used to record the columns with not-null attributes applied.
	// eg: {1} ~~> {2,3}, when {2,3} not null is applied, it actually does nothing.
	// but we should record {2,3} as not-null down for the convenience of transferring
	// Lax FD: {1} ~~> {2,3} to strict FD: {1} --> {2,3} with {1} as not-null next time.
	NotNullCols intset.FastIntSet
	// HashCodeToUniqueID map the expression's hashcode to a statement allocated unique
	// ID quite like the unique ID bounded with column. It's mainly used to add the expr
	// to the fdSet as an extended column. <NOT CONCURRENT SAFE FOR NOW>
	HashCodeToUniqueID map[string]int
	// GroupByCols is used to record columns / expressions that under the group by phrase.
	GroupByCols intset.FastIntSet
	HasAggBuilt bool
	// contains filtered or unexported fields
}

FDSet is the main portal of functional dependency, it stores the relationship between (extended table / physical table)'s columns. For more theory about this design, ref the head comments in the funcdep/doc.go.

func (*FDSet) AddConstants

func (s *FDSet) AddConstants(cons intset.FastIntSet)

AddConstants adds a strict FD to the source which indicates that each of the given column have the same constant value for all rows, or same null value for all rows if it's nullable.

{} --> {a}

there are several usage for this kind of FD, for example, constant FD column can exist in the select fields not matter whatever the group by column is; distinct predicate can be eliminated for these columns as well.

when adding columns, be cautious that 1: constant can be propagated in the equivalence/strict closure, turning strict FD as a constant one. 2: constant can simplify the strict FD both in determinants side and dependencies side. 3: constant can simplify the lax FD in the dependencies side.

func (*FDSet) AddEquivalence

func (s *FDSet) AddEquivalence(from, to intset.FastIntSet)

AddEquivalence take two column id as parameters, establish a strict equivalence between this two column which may be enclosed a superset of equivalence closure in the fdSet.

For the equivalence storage, for simplicity, we only keep the superset relationship of equivalence since equivalence has the reflexivity.

eg: a==b, b==c, a==e

in our fdSet, the equivalence will be stored like: {a, b, c, e} == {a, b, c,e} According and characteristic and reflexivity, each element in the equivalence enclosure can derive whatever in the same enclosure.

func (*FDSet) AddFrom

func (s *FDSet) AddFrom(fds *FDSet)

AddFrom merges two FD sets by adding each FD from the given set to this set. Since two different tables may have some column ID overlap, we better use column unique ID to build the FDSet instead.

func (*FDSet) AddLaxFunctionalDependency

func (s *FDSet) AddLaxFunctionalDependency(from, to intset.FastIntSet)

AddLaxFunctionalDependency is to add `LAX` functional dependency to the fdGraph.

func (*FDSet) AddNCFunctionalDependency

func (s *FDSet) AddNCFunctionalDependency(from, to, nc intset.FastIntSet, strict, equiv bool)

AddNCFunctionalDependency is to add conditional functional dependency to the fdGraph.

func (*FDSet) AddStrictFunctionalDependency

func (s *FDSet) AddStrictFunctionalDependency(from, to intset.FastIntSet)

AddStrictFunctionalDependency is to add `STRICT` functional dependency to the fdGraph.

func (FDSet) AllCols

func (s FDSet) AllCols() intset.FastIntSet

AllCols returns all columns in the current set.

func (*FDSet) ClosureOfLax

func (s *FDSet) ClosureOfLax(colSet intset.FastIntSet) intset.FastIntSet

ClosureOfLax is exported for outer usage.

func (*FDSet) ClosureOfStrict

func (s *FDSet) ClosureOfStrict(colSet intset.FastIntSet) intset.FastIntSet

ClosureOfStrict is exported for outer usage.

func (*FDSet) ConstantCols

func (s *FDSet) ConstantCols() intset.FastIntSet

ConstantCols returns the set of columns that will always have the same value for all rows in table.

func (*FDSet) EquivalenceCols

func (s *FDSet) EquivalenceCols() (eqs []*intset.FastIntSet)

EquivalenceCols returns the set of columns that are constrained to equal to each other.

func (FDSet) FindPrimaryKey

func (s FDSet) FindPrimaryKey() (*intset.FastIntSet, bool)

FindPrimaryKey checks whether there's a key in the current set which implies key -> all cols.

func (*FDSet) InClosure

func (s *FDSet) InClosure(setA, setB intset.FastIntSet) bool

InClosure is used to judge whether fd: setA -> setB can be inferred from closure s. It's a short-circuit version of the `closureOf`.

func (*FDSet) IsHashCodeRegistered

func (s *FDSet) IsHashCodeRegistered(hashCode string) (int, bool)

IsHashCodeRegistered checks whether the given hashcode has been registered in the current set.

func (*FDSet) MakeCartesianProduct

func (s *FDSet) MakeCartesianProduct(rhs *FDSet)

MakeCartesianProduct records fdSet after the impact of Cartesian Product of (T1 x T2) is made. 1: left FD is reserved. 2: right FD is reserved. Actually, for two independent table, FDs won't affect (implies or something) each other, appending them together is adequate. But for constant FDs, according to our definition, we should merge them as a larger superset pointing themselves.

func (*FDSet) MakeNotNull

func (s *FDSet) MakeNotNull(notNullCols intset.FastIntSet)

MakeNotNull modify the FD set based the listed column with NOT NULL flags. Most of the case is used in the derived process after predicate evaluation, which can upgrade lax FDs to strict ones.

func (*FDSet) MakeNullable

func (s *FDSet) MakeNullable(nullableCols intset.FastIntSet)

MakeNullable make the fd's NotNullCols to be cleaned, after the both side fds are handled it can be nullable.

func (*FDSet) MakeOuterJoin

func (s *FDSet) MakeOuterJoin(innerFDs, filterFDs *FDSet, outerCols, innerCols intset.FastIntSet, opt *ArgOpts)

MakeOuterJoin generates the records the fdSet of the outer join.

We always take the left side as the row-supplying side and the right side as the null-supplying side. (swap it if not) As we know, the outer join would generate null extended rows compared with the inner join. So we cannot do the same thing directly with the inner join. This function deals with the special cases of the outer join.

Knowledge: 1: the filter condition related to the lhs column won't filter predicate-allowed rows and refuse null rows (left rows always remain) 2: the filter condition related to the rhs column won't filter NULL rows, although the filter has the not-null attribute. (null-appending happened after that)

Notification: 1: the origin FD from the left side (rows-supplying) over the result of outer join filtered are preserved because

it may be duplicated by multi-matching, but actually, they are the same left rows (don't violate FD definition).

2: the origin FD from the right side (nulls-supplying) over the result of outer join filtered may not be valid anymore.

		<1> strict FD may be wakened as a lax one. But if at least one non-NULL column is part of the determinant, the
			strict FD can be preserved.
	        a  b  |  c     d     e
			------+----------------
		 	1  1  |  1    NULL   1
		    1  2  | NULL  NULL  NULL
		    2  1  | NULL  NULL  NULL
			left join with (a,b) * (c,d,e) on (a=c and b=1), if there is a strict FD {d} -> {e} on the rhs. After supplied
			with null values, {d} -> {e} are degraded to a lax one {d} ~~> {e} as you see. the origin and supplied null value
			for d column determine different dependency.  NULL -> 1 and NULL -> NULL which breaks strict FD definition.

			If the determinant contains at least a not null column for example c here, FD like {c,d} -> {e} can survive
			after the left join. Because you can not find two same key, one from the origin rows and the other one from the
			supplied rows.

			for lax FD, the supplied rows of null values don't affect lax FD itself. So we can keep it.

		<2> The FDSet should remove constant FD since null values may be substituted for some unmatched left rows. NULL is not a
			constant anymore.

     <3> equivalence FD should be removed since substituted null values are not equal to the other substituted null value.

3: the newly added FD from filters should take some consideration as below:

	 	<1> strict/lax FD: join key filter conditions can not produce new strict/lax FD yet (knowledge: 1&2).

		<2> constant FD from the join conditions is only used for checking other FD. We cannot keep itself.

   a   b  |  c     d
   -------+---------
   1   1  |  1     1
   1   2  | NULL NULL

         left join with (a,b) * (c,d) on (a=c and d=1), some rhs rows will be substituted with null values, and FD on rhs
         {d=1} are lost.

// a b | c d // -------+--------- // 1 1 | 1 1 // 1 2 | NULL NULL

         left join with (a,b) * (c,d) on (a=c and b=1), it only gives the pass to the first matching, lhs other rows are still
         kept and appended with null values. So the FD on rhs {b=1} are not applicable to lhs rows.

         above all: constant FD are lost

		<3.1> equivalence FD: when the left join conditions only contain equivalence FD (EFD for short below) across left and right
			cols and no other `LEFT` condition on the (left-side cols except the cols in EFD's from) to filter the left join results. We can maintain the strict
			FD from EFD's `from` side to EFD's `to` side over the left join result.

// a b | c d e // ------+---------------- // 1 1 | 1 NULL 1 // 1 2 | NULL NULL NULL // 2 1 | NULL NULL NULL

Neg eg: left join with (a,b) * (c,d,e) on (a=c and b=1), other b=1 will filter the result, causing the left row (1, 2)
miss matched with right row (1, null 1) by a=c, consequently leading the left row appended as (1,2,null,null,null), which
will break the FD: {a} -> {c} for key a=1 with different c=1/null.

// a b | c d e // ------+---------------- // 1 1 | NULL NULL NULL // 2 1 | NULL NULL NULL

Pos eg: if the filter is on EFD's `from` cols, it's ok. Let's say: (a,b) * (c,d,e) on (a=c and a=2), a=2 only won't leading
same key a with matched c and mismatched NULL, neg case result is changed as above, so strict FD {a} -> {c} can exist.

// a b | c d e // ------+---------------- // 1 1 | 1 NULL 1 // 1 2 | NULL NULL NULL // 2 1 | NULL NULL NULL

	Neg eg: left join with (a,b) * (c,d,e) on (a=c and b=c), two EFD here, where b=c can also refuse some rows joined by a=c,
	consequently applying it with NULL as (1  2  | NULL  NULL  NULL), leading the same key a has different value 1/NULL. But
	macroscopically, we can combine {a,b} together as the strict FD's from side, so new FD {a,b} -> {c} is secured. For case
	of (a=c and b=ce), the FD is {a, b} -> {c, e}

	conclusion: without this kind of limited left conditions to judge the join match, we can say: FD {a} -> {c} exists.

<3.2> equivalence FD: when the determinant and dependencies from an equivalence FD of join condition are each covering a strict
	FD of the left / right side. After joining, we can extend the left side strict FD's dependencies to all cols.

// a b | c d e // ------+---------------- // 1 1 | 1 NULL 1 // 2 2 | NULL NULL NULL // 3 1 | NULL NULL NULL

			left join with (a,b) * (c,d,e) on (a=c and b=1). Supposing that left `a` are strict Key and right `c` are strict Key too.
			Key means the strict FD can determine all cols from that table.
			case 1: left join matched
				one left row match one / multi rows from right side, since `c` is strict determine all cols from right table, so
             {a} == {b} --> {all cols in right}, according to the transitive rule of strict FD, we get {a} --> {all cols in right}
			case 2: left join miss match
				miss matched rows from left side are unique originally, even appended with NULL value from right side, they are still
				strictly determine themselves and even the all rows after left join.
			conclusion combined:
				If there is an equivalence covering both strict Key from the right and left, we can create a new strict FD: {columns of the left side of the join in the equivalence} -> {all columns after join}.

		<3.3> equivalence FD: let's see equivalence FD as double-directed strict FD from join equal conditions, and we  only keep the
			rhs ~~> lhs.

// a b | c d e // ------+---------------- // 1 1 | 1 NULL 1 // 1 2 | NULL NULL NULL // 2 1 | NULL NULL NULL

left join with (a,b) * (c,d,e) on (a=c and b=1). From the join equivalence condition can derive a new FD {ac} == {ac}.
while since there are some supplied null value in the c column, we don't guarantee {ac} == {ac} yet, so do {a} -> {c}
because two same determinant key {1} can point to different dependency {1} & {NULL}. But in return, FD like {c} -> {a}
are degraded to the corresponding lax one.

4: the new formed FD {left primary key, right primary key} -> {all columns} are preserved in spite of the null-supplied rows. 5: There's no join key and no filters from the outer side. The join case is a cartesian product. In this case,

the strict equivalence classes still exist.
  - If the right side has no row, we will supply null-extended rows, then the value of any column is NULL, and the equivalence class exists.
  - If the right side has rows, no row is filtered out after the filters since no row of the outer side is filtered out. Hence, the equivalence class remains.

func (*FDSet) MaxOneRow

func (s *FDSet) MaxOneRow(cols intset.FastIntSet)

MaxOneRow will regard every column in the fdSet as a constant. Since constant is stronger that strict FD, it will take over all existed strict/lax FD, only keeping the equivalence. Because equivalence is stronger than constant.

f:      {a}--> {b,c}, {abc} == {abc}
cols:   {a,c}
result: {} --> {a,c}, {a,c} == {a,c}

func (*FDSet) ProjectCols

func (s *FDSet) ProjectCols(cols intset.FastIntSet)

ProjectCols projects FDSet to the target columns Formula: Strict decomposition FD4A: If X −→ Y Z then X −→ Y and X −→ Z. Lax decomposition FD4B: If X ~→ Y Z and I(R) is Y-definite then X ~→ Z.

func (*FDSet) ReduceCols

func (s *FDSet) ReduceCols(colSet intset.FastIntSet) intset.FastIntSet

ReduceCols is used to minimize the determinants in one fd input. function dependency = determinants -> dependencies given: AB -> XY, once B can be inferred from current closure when inserting, take A -> XY instead.

func (*FDSet) RegisterUniqueID

func (s *FDSet) RegisterUniqueID(hashCode string, uniqueID int)

RegisterUniqueID is used to record the map relationship between expr and allocated uniqueID.

func (*FDSet) String

func (s *FDSet) String() string

String returns format string of this FDSet.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL