knnc

package
v0.0.0-...-8d852e1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 20, 2022 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

knnc is a package for doing KNN (k nearest neighbour) searching with high concurrency.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FilterStage

func FilterStage(args FilterStageArgs) (<-chan ScoreItem, bool)

FilterStage is a concurrent stage where elements from a chan are either dropped or passed along to the output chan, depending on args.FilterFunc. See docs of FilterStageArgs for more details. Also note that this is a front-end for syncx.Stage, so docs for that are relevant as well. Returns false if args.Ok() returns false.

func MapStage

func MapStage(args MapStageArgs) (<-chan ScoreItem, bool)

MapStage is a concurrent stage where elements from a chan are read, transformed with a mapping function, then potentially outputted to the returned chan. See docs of MapStageArgs for more details. Also note that this is a front-end for syncx.Stage, so docs for that are relevant as well. Returns false if args.Ok() returns false.

func MergeStage

func MergeStage(args MergeStageArgs) (<-chan ScoreItems, bool)

MergeStage ScoreItem elements from the input chan, merges them into an ordered slice (using ScoreItems.BubbleInsert), then sends the items through the output chan at a given interval. See MergeStageArgs for more detail. Also note that this is a front-end for syncx.Stage (NWorkers=1), so docs for that are relevant as well. Returns false if args.Ok() returns false.

Types

type ActiveGoroutinesTicker

type ActiveGoroutinesTicker struct {
	// contains filtered or unexported fields
}

ActiveGoroutinesTicker simply wraps an int and a RWMutes and helps tracking the number of currently running goroutines. Usage is simple, increment the ticker by calling the AddAwait() method, then invoke the returned callback to decrement the ticker. Check the current ticker with the CurrentN() method.

func (*ActiveGoroutinesTicker) AddAwait

func (a *ActiveGoroutinesTicker) AddAwait() func()

AddAwait increments the internal ticker and returns a func that decrements it in a concurrency-safe way.

func (*ActiveGoroutinesTicker) BlockUntilBelowN

func (a *ActiveGoroutinesTicker) BlockUntilBelowN(n int)

BlockUntilBelowN is a convenience method; it blocks until the internal ticker is below the specified 'n'.

func (*ActiveGoroutinesTicker) CurrentN

func (a *ActiveGoroutinesTicker) CurrentN() int

CurrentN returns the value of the internal ticker.

type Distancer

type Distancer = mathx.Distancer

Distancer is an alias for mathx.Distancer.

type DistancerContainer

type DistancerContainer interface {
	// See docs for mathx.Distancer and/or the surrounding interface (
	// DistancerContainer). The concrete returned type here should be
	// thread-safe, or nil when it is no longer needed -- this will mark
	// it as deletable.
	Distancer() mathx.Distancer
}

DistancerContainer is any type that can produce a mathx.Distancer, which can do distance calculations (related to KNN). The intent, in the context of this pkg, is to do KNN queries on these distancers, and give them back as results.

type FilterStageArgs

type FilterStageArgs struct {
	// NWorkers specifies the number of workers to use in this stage.
	NWorkers int
	// In is the input chan for this concurrent stage.
	In <-chan ScoreItem
	// FilterFunc specifies what each worker should do in this concurrent stage.
	// Specifically, elements from "In" (field of this struct) are read, given
	// to this FilterFunc, then passed along to the output chan if the filter func
	// returns true.
	FilterFunc func(ScoreItem) bool

	syncx.StageArgsPartial
}

FilterStageArgs is intended as args for the FilterStage func.

func (*FilterStageArgs) Ok

func (args *FilterStageArgs) Ok() bool

Ok is used for validation and returns true if the following is true:

  • args.NWorkers > 0
  • args.In != nil
  • args.FilterFunc != nil
  • args.StageArgsPartial.Ok() returns true.

type MapStageArgs

type MapStageArgs struct {
	// NWorkers specifies the number of workers to use in this stage.
	NWorkers int
	// In is the input chan for this concurrent stage.
	In <-chan ScanItem
	// MapFunc specifies what each worker should do in this concurrent stage.
	// Specifically, elements from "In" (field of this struct) are read, given
	// to this MapFunc, then the returned results are written to the output chan
	// _if_ the bool is true.
	MapFunc func(Distancer) (ScoreItem, bool)

	syncx.StageArgsPartial
}

MapStageArgs are intended as args for the MapStage func.

func (*MapStageArgs) Ok

func (args *MapStageArgs) Ok() bool

Ok is used for validation and returns true if the following is true:

  • args.NWorkers > 0
  • args.In != nil
  • args.MapFunc != nil
  • args.StageArgsPartial.Ok() returns true.

type MergeStageArgs

type MergeStageArgs struct {
	// In is the input chan for this 'stage'.
	In <-chan ScoreItem
	// K is the number of elements of the ordered slice that is returned.
	K int
	// Ascending specifies whether the resulting scores (from the 'In' chan)
	// should be ordered in ascending (or descending) order.
	Ascending bool
	// SendInterval specifies how often the result slice should be sent to the
	// output chan, 1=on each insert, 2=on every second insert, etc. Do note that
	// one send will send the slice, then reset the internal slice such that no
	// sent elements are duplicated.
	SendInterval int

	syncx.StageArgsPartial
}

MergeStageArgs is intended as args for the MergeStage func.

func (*MergeStageArgs) Ok

func (args *MergeStageArgs) Ok() bool

Ok is used for validation and returns true if the following is true:

  • args.In != nil
  • args.K > 0
  • args.SendInterval >= 1
  • args.StageArgsPartial.Ok() returns true.

type NewSearchSpacesArgs

type NewSearchSpacesArgs struct {
	// SearchSpacesMaxCap represents the max data capacity in each SearchSpace
	// instance. This is 'inherited' each time those instances are instantiated.
	SearchSpacesMaxCap int
	// SearchSpacesMaxN represents the maximum amount of allowed SearchSpace
	// instances kept in SearchSpaces (plural).
	SearchSpacesMaxN int
	// MaintenanceTaskInterval is a _suggestion_ of how often the internal task
	// loop is ran. See SearchSpaces.StartMaintenance method for more info.
	MaintenanceTaskInterval time.Duration
}

NewSearchSpacesArgs is intended as an argument to the NewSearchSpaces func.

func (*NewSearchSpacesArgs) Ok

func (args *NewSearchSpacesArgs) Ok() bool

Ok validates NewSearchSpaceArgs. Returns true iff:

(1) args.SearchSpacesMaxCap > 0
(2) args.SearchSpacesMaxN > 0
(3)	args.MaintenanceTaskInterval > 0

type ScanChan

type ScanChan <-chan ScanItem

ScanChan is the return of SearchSpace.Scan. It is a chan of ScanItem.

type ScanItem

type ScanItem struct {
	Distancer mathx.Distancer
}

ScanItem is a single/atomic item output from a SearchSpace.Scan.

type ScoreItem

type ScoreItem struct {
	Distancer Distancer
	// Score is the 'distance' between a query vec and a neighbor candidate.
	Score float64
	// Set is false if this instance is in a default unset state.
	Set bool
}

type ScoreItems

type ScoreItems []ScoreItem

ScoreItems is <[]ScoreItem>, used for method attachment.

func (ScoreItems) BubbleInsert

func (items ScoreItems) BubbleInsert(insertee ScoreItem, ascending bool)

BubbleInsert either bubbles up- or bubbles down the insertee into the slice, based on the 'ascending' arg and the 'score' within _all_ 'ScoreItems', including the ones in the slice this method is attached to. It assumes that all elements in the slice are already sorted in the way that is specified by the ascending arg, otherwise it won't work as expected, so be sure to insert any ScoreItem into the slice with this method.

func (ScoreItems) Trim

func (items ScoreItems) Trim() ScoreItems

Trim removes zero-value elements from the slice.

type SearchSpace

type SearchSpace struct {
	// contains filtered or unexported fields
}

SearchSpace is a keeper and scanner of DistancerContainer(s). It is intended to be one of many and will as such have sizing limitations (max capacity) so it's easier to keep in main memory.

func NewSearchSpace

func NewSearchSpace(maxCap int) (*SearchSpace, bool)

NewSearchSpace is a factory func for the SearchSpace T. Only requirement is a maximum capacity -- return will be false if that is < 1.

func (*SearchSpace) AddSearchable

func (ss *SearchSpace) AddSearchable(dc DistancerContainer) bool

AddSearchable is the only way of adding data to this search space (do look at the clean() and clear() methods, those are the only way to delete data). There are a few rules for adding data here:

  • All of vectors must be of equal length. To be specific, all integers from dc.Distancer().Dim() must be the same.
  • The rule above does not apply if the SearchSpace.Len() == 0.
  • SearchSpace.Len() will never be greater than SearchSpace.Cap(). So if SearchSpace.Len() >= SearchSpace.Cap(), then theis will abort.

func (*SearchSpace) Cap

func (ss *SearchSpace) Cap() int

Cap gives the current capacity of the search space.

func (*SearchSpace) Clean

func (ss *SearchSpace) Clean()

Clean is a controlled way of deleting data in this search space. DistancerContainer kept in this type can either give a valid mathx.Distancer or a nil -- the latter is interpreted as a mark for deletion and will be removed when calling this Clean() method.

func (*SearchSpace) Clear

func (ss *SearchSpace) Clear() []DistancerContainer

Clear will reset the inner data slice and return the old slice.

func (*SearchSpace) Dim

func (ss *SearchSpace) Dim() int

Dim returns the dimension of all internal data (if any). Note that the dim can/will be overridden when SearchSpace.Len() = 0. This is handled automatically when adding new data with SearchSpace.AddSearchable(...).

func (*SearchSpace) Len

func (ss *SearchSpace) Len() int

Len gives the current len of the search space.

func (*SearchSpace) Scan

func (ss *SearchSpace) Scan(args SearchSpaceScanArgs) (ScanChan, bool)

Scan starts a scanner worker which scans the SearchSpace (i.e not blocking). Returns is (ScanChan, true) if args.Ok() == true, else return is (nil, false). See SearchSpaceScanArgs and syncx.StageArgsPartial for more details. Note, scanner uses 'read mutex', so will not block multiple concurrent scans.

type SearchSpaceScanArgs

type SearchSpaceScanArgs struct {
	// Extend refers to the search extent. 1=scan whole searchspace, 0.5=half.
	// Must be >= 0.0 and <= 1.0.
	Extent float64
	syncx.StageArgsPartial
}

SearchSpaceScanArgs is intended for SearchSpace.Scan().

func (*SearchSpaceScanArgs) Ok

func (args *SearchSpaceScanArgs) Ok() bool

Ok validates SearchSpaceScanArgs. Returns true iff:

  • args.Extent > 0.0 and <= 1.0.
  • args.StageArgsPartial.Ok() is true.

type SearchSpaces

type SearchSpaces struct {
	// contains filtered or unexported fields
}

SearchSpaces is intended for managing a collection of SearchSpace (singular) instances and their data, specifically in the context of cleaning and scanning.

func NewSearchSpaces

func NewSearchSpaces(args NewSearchSpacesArgs) (*SearchSpaces, bool)

NewSearchSpaces is a factory func for SearchSpaces T. Returns (nil, false) if args.Ok() == false.

func (*SearchSpaces) AddSearchable

func (ss *SearchSpaces) AddSearchable(dc DistancerContainer) bool

AddSearchable is the only way of adding data to this instance; specifically to internal SearchSpace instances. There is a set of conditions wehre data can't be added:

  • dc must not be nil.
  • dc.Distancer() must not be nil.
  • All distancer dimensions must be equal (dc.Distancer().Dim()); this rule does not apply if SearchSpaces.Len() == 0 and a new one is created.
  • None internal SearchSpace (singular) could add, due to their capacities,
  • Same as above _and_ if a new SearchSpace instance can't be created due to the capacity limit of this SearchSpaces instance.

func (*SearchSpaces) Cap

func (ss *SearchSpaces) Cap() int

Cap returns the capacity of the internal slice of SearchSpace instances.

func (*SearchSpaces) CheckMaintenance

func (ss *SearchSpaces) CheckMaintenance() bool

CheckMaintenance returns true if the maintenance task loop is active.

func (*SearchSpaces) Clean

func (ss *SearchSpaces) Clean()

Clean is a controlled way of deleting data in this instance. It calls the method with the same name on all internal SearchSpace (singular) instances and deletes the ones which get completely emptied (len of 0).

func (*SearchSpaces) Clear

func (ss *SearchSpaces) Clear() []*SearchSpace

Clear will reset the internal SearchSpace slice and return the old one.

func (*SearchSpaces) Dim

func (ss *SearchSpaces) Dim() int

Dim returns the dimension of all internal data of all internal SearchSpace instances. Not that the dim can/will be overridden if SearchSpaces.Len() returns 0 on the first int (i.e no SearchSpace instances). This is handled automatically in SearchSpaces.AddSearchable(...).

func (*SearchSpaces) Len

func (ss *SearchSpaces) Len() (int, int)

Len returns a tuple where [0] = number of internal SearchSpace instances, and [1] = sum of all their Len method returns (i.e num of all data).

func (*SearchSpaces) Scan

func (ss *SearchSpaces) Scan(args SearchSpacesScanArgs) (ScanChan, bool)

func (*SearchSpaces) StartMaintenance

func (ss *SearchSpaces) StartMaintenance()

StartMaintenance starts a task loop where internal data is cleaned and stale data is removed. Specifically, each step will run at approximately the interval specified when creating this instance (NewSearchSpacesArgs.MaintenanceTaskInterval). Each step will call the Clean() method on a _single_ SearchSpace instance, after which the instance will be removed if it does not have any data in it. Note, one maintenance task loop can be ran at a time, so calling this method twice in a row (without calling ss.StopMaintenance) will only spawn one worker.

func (*SearchSpaces) StopMaintenance

func (ss *SearchSpaces) StopMaintenance()

StopMaintenance stops the internal maintenance task loop (if running).

type SearchSpacesScanArgs

type SearchSpacesScanArgs struct {
	NWorkers int
	SearchSpaceScanArgs
}

SearchSpacesScanArgs is intended for SearchSpaces.Scan(). Note that some of these fields will get passed to each internal SearchSpace (singular) when their 'Scan()' method is called. Those shared and 'inherited' fields are args.Extent and args.BaseStageArgs.BaseWorkerArgs, as those are required for SearchSpaceScanArgs (again, singular).

func (*SearchSpacesScanArgs) Ok

func (args *SearchSpacesScanArgs) Ok() bool

Ok validates SearchSpacesScanArgs. Returns true iff:

  • args.Extent >= 0.0 and <= 1.0.
  • args.SearchSpaceArgs.Ok() is true.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL