Documentation ¶
Index ¶
- Constants
- Variables
- func CalcSignatureSize(numElements uint64, numHashes int, falsePositiveRate float64) uint64
- func Combinations(set []uint64, n int) (subsets [][]uint64)
- func Combinations2(set []uint64) [][2]uint64
- func Execute()
- func IntSlice2StringSlice(vals []int) []string
- func MeanStdev(values []float64) (float64, float64)
- func NewSearchResultParser(file string, poolStrings *sync.Pool, poolMatches *sync.Pool, scoreField int, ...) (chan *SearchResult, error)
- type IndexQuery
- type Match
- type MatchResult
- type MatchResult2
- type MatchResult3
- type Matches
- type Meta
- type Name2Idx
- type Options
- type ProfileNode
- type Query
- type QueryResult
- type SearchOptions
- type SearchResult
- type SortByJacc
- type SortByQCov
- type SortByTCov
- type Target
- type Targets
- type Uint64Slice
- type UnikFileInfo
- type UnikFileInfoGroup
- type UnikFileInfoGroups
- type UnikFileInfos
- type UnikFileInfosByName
- type UnikIndex
- type UnikIndexDB
- type UnikIndexDBInfo
- type UnikIndexDBSearchEngine
Constants ¶
const PosPopCountBufSize = 64 // 64 is the cache line size for most 64-bit machines.
PosPopCountBufSize defines the buffer size of byte slice feeding to pospopcount (github.com/clausecker/pospop).
Theoretically, size >240 is better, but in this scenario, we need firstly transposing the signature matrix, which is the performance bottleneck. Column size of the matrix is fixed, therefore we must control the row size to balance time of matrix transposing and popopcount.
64 is the best value for my machine (AMD ryzen 2700X).
const UnikIndexDBVersion uint8 = 4
UnikIndexDBVersion is the version of database.
Variables ¶
var BufferSize = 65536 // os.Getpagesize()
BufferSize is size of buffer
var ErrVersionMismatch = errors.New("kmcp/index: version mismatch")
ErrVersionMismatch indicates mismatched version
var RootCmd = &cobra.Command{ Use: "kmcp", Short: "K-mer-based Metagenomic Classification and Profilling", Long: fmt.Sprintf(` Program: kmcp (K-mer-based Metagenomic Classification and Profiling) Version: v%s Documents: https://bioinf.shenwei.me/kmcp Source code: https://github.com/shenwei356/kmcp KMCP is a tool for metagenomic classification and profiling. KMCP can also be used for: 1. Fast sequence search against large scales of genomic datasets as BIGSI and COBS do. 2. Fast assembly/genome similarity estimation as Mash and sourmash do, by utilizing Minimizer, FracMinHash (Scaled MinHash), or Closed Syncmers. `, VERSION), }
RootCmd represents the base command when called without any subcommands
var VERSION = "0.8.0"
VERSION is the version
Functions ¶
func CalcSignatureSize ¶
CalcSignatureSize is from https://github.com/bingmann/cobs/blob/master/cobs/util/calc_signature_size.cpp . but we can optionally roundup to 2^n.
def roundup(x):
x -= 1 x |= x >> 1 x |= x >> 2 x |= x >> 4 x |= x >> 8 x |= x >> 16 x |= x >> 32 return (x | x>>64) + 1
f=lambda ne,nh,fpr: math.ceil(-nh/(math.log(1-math.pow(fpr,1/nh)))*ne)
roundup(f(300000, 1, 0.25))
func Combinations ¶ added in v0.4.0
modify from https://github.com/mxschmitt/golang-combinations/blob/master/combinations.go too slow for big n.
func Combinations2 ¶ added in v0.4.0
Note: set should not have duplicates
func Execute ¶
func Execute()
Execute adds all child commands to the root command sets flags appropriately. This is called by main.main(). It only needs to happen once to the rootCmd.
func IntSlice2StringSlice ¶ added in v0.5.0
Types ¶
type IndexQuery ¶
type IndexQuery struct { // Kmers []uint64 Hashes *[][]uint64 // related to database Hashes1 *[]uint64 Ch chan *[]*Match // result chanel }
IndexQuery is a query sent to multiple indices of a database.
type Match ¶
type Match struct { Target []string // target name TargetIdx []uint32 GenomeSize []uint64 NumKmers int // matched k-mers FPR float64 QCov float64 // |A∩B|/|A|, coverage of query. i.e., Containment Index TCov float64 // |A∩B|/|B|, coverage of target JaccardIndex float64 // |A∩B|/|A∪B|, i.e., JaccardIndex }
Match is the struct of matching detail.
type MatchResult ¶ added in v0.4.0
type MatchResult2 ¶ added in v0.7.0
type MatchResult3 ¶ added in v0.7.0
type Matches ¶
type Matches []*Match
Matches is list of Matches, for sorting.
type Meta ¶
type Meta struct { SeqID string `json:"id"` // sequence ID FragIdx uint32 `json:"idx"` // sequence location index GenomeSize uint64 `json:"gn-s"` // genome length Ks []int `json:"ks"` // ks Syncmer bool `json:"sm"` // syncmer SyncmerS int `json:"sm-s"` Minimizer bool `json:"mm"` // minimizer MinimizerW int `json:"mm-w"` SplitSeq bool `json:"sp"` // split sequence SplitSize int `json:"sp-s"` SplitNum int `json:"sp-n"` SplitOverlap int `json:"sp-o"` }
Meta contains some meta information
type Options ¶
type Options struct { NumCPUs int Verbose bool LogFile string Log2File bool Compress bool CompressionLevel int }
Options contains the global flags
type ProfileNode ¶ added in v0.4.0
type Query ¶
type Query struct { Idx uint64 // id for keep output in order ID []byte Seq *seq.Seq Seq2 *seq.Seq Ch chan *QueryResult // result chanel }
Query strands for a query sequence.
type QueryResult ¶
type QueryResult struct { QueryIdx uint64 // id for keep output in order QueryID []byte QueryLen int DBId int // id of database, for getting database name with few space FPR float64 // fpr, p is related to database K int NumKmers int // number of k-mers Matches *[]*Match // all matches }
QueryResult is the search result of a query sequence.
type SearchOptions ¶
type SearchOptions struct { LoadWholeFile bool UseMMap bool Threads int Verbose bool DeduplicateThreshold int // deduplicate k-mers only number of kmers > this threshold KeepUnmatched bool TopN int TopNScores int SortBy string DoNotSort bool MinQLen int MinMatched int MinQueryCov float64 MinTargetCov float64 LoadDefaultNameMap bool NameMap map[string]string TrySingleEnd bool // when no target found for paired end reads, retry searching with Single Ends. }
SearchOptions defines options for searching
type SearchResult ¶ added in v0.6.0
type SortByJacc ¶ added in v0.3.0
type SortByJacc struct{ Matches }
SortByJacc is used to sort matches by jaccard index.
type Target ¶ added in v0.4.0
type Target struct { Name string GenomeSize uint64 // Counting matches in all chunks // some reads match multiple sites in the same genome, // the count should be divided by number of sites. Match []float64 // sum of read (query) length QLen []float64 // unique match UniqMatch []float64 // unique match with high confidence UniqMatchHic []float64 SumMatch float64 // depth SumUniqMatch float64 SumUniqMatchHic float64 FragsProp float64 // coverage Coverage float64 Qlens float64 RelDepth []float64 RelDepthStd float64 // RefName string // Taxonomy information Taxid uint32 Rank string TaxonName string LineageNames []string LineageTaxids []string CompleteLineageNames []string CompleteLineageTaxids []uint32 Percentage float64 // relative abundance Stats *stats.Quantiler // for computing percentil of qcov of unique matches StatsA *stats.Quantiler // for computing percentil of qcov of all matches Score float64 }
func (*Target) AddTaxonomy ¶ added in v0.4.0
type Uint64Slice ¶ added in v0.4.0
type Uint64Slice []uint64
func (Uint64Slice) Len ¶ added in v0.4.0
func (s Uint64Slice) Len() int
func (Uint64Slice) Less ¶ added in v0.4.0
func (s Uint64Slice) Less(i, j int) bool
func (*Uint64Slice) Pop ¶ added in v0.6.0
func (s *Uint64Slice) Pop() interface{}
func (*Uint64Slice) Push ¶ added in v0.6.0
func (s *Uint64Slice) Push(x interface{})
func (Uint64Slice) Swap ¶ added in v0.4.0
func (s Uint64Slice) Swap(i, j int)
type UnikFileInfo ¶
type UnikFileInfo struct { Path string Name string GenomeSize uint64 Index uint32 Indexes uint32 Kmers uint64 }
UnikFileInfo store basic info of .unik file.
func (UnikFileInfo) String ¶
func (i UnikFileInfo) String() string
type UnikFileInfoGroup ¶
type UnikFileInfoGroup struct { Infos []UnikFileInfo Kmers uint64 }
UnikFileInfoGroup represents a slice of UnikFileInfos
func (UnikFileInfoGroup) String ¶
func (i UnikFileInfoGroup) String() string
type UnikFileInfoGroups ¶
type UnikFileInfoGroups []UnikFileInfoGroup
UnikFileInfoGroups is just a slice of UnikFileInfoGroup
func (UnikFileInfoGroups) Len ¶
func (l UnikFileInfoGroups) Len() int
func (UnikFileInfoGroups) Swap ¶
func (l UnikFileInfoGroups) Swap(i int, j int)
type UnikFileInfos ¶
type UnikFileInfos []UnikFileInfo
UnikFileInfos is list of UnikFileInfo.
func (UnikFileInfos) Len ¶
func (l UnikFileInfos) Len() int
func (UnikFileInfos) Swap ¶
func (l UnikFileInfos) Swap(i int, j int)
type UnikFileInfosByName ¶
type UnikFileInfosByName []UnikFileInfo
UnikFileInfosByName is used to sort infos by name and indices
func (UnikFileInfosByName) Len ¶
func (l UnikFileInfosByName) Len() int
func (UnikFileInfosByName) Swap ¶
func (l UnikFileInfosByName) Swap(i int, j int)
type UnikIndex ¶
type UnikIndex struct { Options SearchOptions InCh chan *IndexQuery Path string Header index.Header ExtraWorkers int // when #threads > 1.5 * #index files // contains filtered or unexported fields }
UnikIndex defines a unik index struct.
func NewUnixIndex ¶
func NewUnixIndex(file string, opt SearchOptions, fpr float64, nextraWorkers int) (*UnikIndex, error)
NewUnixIndex create a index from file.
type UnikIndexDB ¶
type UnikIndexDB struct { Options SearchOptions DBId int // id for current database InCh chan *Query Info UnikIndexDBInfo Header index.Header Indices []*UnikIndex ExtraWorkers int // contains filtered or unexported fields }
UnikIndexDB is database for multiple .unik indices.
func NewUnikIndexDB ¶
func NewUnikIndexDB(path string, opt SearchOptions, dbID int) (*UnikIndexDB, error)
NewUnikIndexDB opens and read from database directory.
func (*UnikIndexDB) CompatibleWith ¶
func (db *UnikIndexDB) CompatibleWith(db2 *UnikIndexDB) bool
CompatibleWith has loose restric tions for enabling searching from database of different perameters.
func (*UnikIndexDB) String ¶
func (db *UnikIndexDB) String() string
type UnikIndexDBInfo ¶
type UnikIndexDBInfo struct { Version uint8 `yaml:"version"` IndexVersion uint8 `yaml:"unikiVersion"` Alias string `yaml:"alias"` K int `yaml:"k"` Ks []int `yaml:"ks"` Hashed bool `yaml:"hashed"` Canonical bool `yaml:"canonical"` Scaled bool `yaml:"scaled"` Scale uint32 `yaml:"scale"` Minimizer bool `yaml:"minimizer"` MinimizerW uint32 `yaml:"minimizer-w"` Syncmer bool `yaml:"syncmer"` SyncmerS uint32 `yaml:"syncmer-s"` SplitSeq bool `yaml:"split-seq"` SplitSize int `yaml:"split-size"` SplitNum int `yaml:"split-num"` SplitOverlap int `yaml:"split-overlap"` CompactSize bool `yaml:"compact-size"` NumHashes int `yaml:"hashes"` FPR float64 `yaml:"fpr"` NumNames int `yaml:"numNameGroups"` BlockSize int `yaml:"blocksize"` Kmers uint64 `yaml:"totalKmers"` Files []string `yaml:"files"` NameMapping map[string]string `yaml:"name-mapping,omitempty"` MappingNames bool `yaml:"mapping-names,omitempty"` // contains filtered or unexported fields }
UnikIndexDBInfo is the meta data of a database.
func NewUnikIndexDBInfo ¶
func NewUnikIndexDBInfo(files []string) UnikIndexDBInfo
NewUnikIndexDBInfo creates UnikIndexDBInfo from index files, but you have to manually assign other values.
func UnikIndexDBInfoFromFile ¶
func UnikIndexDBInfoFromFile(file string) (UnikIndexDBInfo, error)
UnikIndexDBInfoFromFile creates UnikIndexDBInfo from files.
func (UnikIndexDBInfo) Check ¶
func (i UnikIndexDBInfo) Check() error
Check check if all index files exist.
func (UnikIndexDBInfo) CompatibleWith ¶
func (i UnikIndexDBInfo) CompatibleWith(j UnikIndexDBInfo) bool
CompatibleWith checks whether two databases have the same parameters.
func (UnikIndexDBInfo) String ¶
func (i UnikIndexDBInfo) String() string
type UnikIndexDBSearchEngine ¶
type UnikIndexDBSearchEngine struct { Options SearchOptions DBs []*UnikIndexDB DBNames []string InCh chan *Query // queries OutCh chan *QueryResult // contains filtered or unexported fields }
UnikIndexDBSearchEngine search sequence on multiple database
func NewUnikIndexDBSearchEngine ¶
func NewUnikIndexDBSearchEngine(opt SearchOptions, dbPaths ...string) (*UnikIndexDBSearchEngine, error)
NewUnikIndexDBSearchEngine returns a search engine based on multiple engines
func (*UnikIndexDBSearchEngine) Close ¶
func (sg *UnikIndexDBSearchEngine) Close() error
Close closes the search engine.
Source Files ¶
- autocomplete.go
- compute.go
- cov2simi.go
- filter.go
- index-info.go
- index.go
- merge-regions.go
- merge.go
- profile.go
- query-fpr.go
- root.go
- search.go
- taxonomy.go
- unik-info.go
- util-binary-file.go
- util-cli.go
- util-db-info.go
- util-db-search.go
- util-filter.go
- util-hash.go
- util-index.go
- util-io.go
- util-logging.go
- util-profile.go
- util-regions.go
- util.go
- utils.go
- version.go