Documentation ¶
Index ¶
- Constants
- Variables
- func CheckText(content []byte, maxTrigramCount int) error
- func Explode(dstDir string, f IndexFile) (map[string]string, error)
- func IndexFilePaths(p string) ([]string, error)
- func JsonMarshalRepoMetaTemp(shardPath string, repositoryMetadata interface{}) (tempPath, finalPath string, err error)
- func Merge(dstDir string, files ...IndexFile) (tmpName, dstName string, _ error)
- func PrintNgramStats(r IndexFile) error
- func ReadMetadata(inf IndexFile) ([]*Repository, *IndexMetadata, error)
- func ReadMetadataPath(p string) ([]*Repository, *IndexMetadata, error)
- func ReadMetadataPathAlive(p string) ([]*Repository, *IndexMetadata, error)
- func SetTombstone(shardPath string, repoID uint32) error
- func ShardMergingEnabled() bool
- func SortFiles(ms []FileMatch, opts *SearchOptions)
- func UnsetTombstone(shardPath string, repoID uint32) error
- type ChunkMatch
- type Document
- type DocumentSection
- type FileMatch
- type FlushReason
- type IndexBuilder
- type IndexFile
- type IndexMetadata
- type LineFragmentMatch
- type LineMatch
- type ListOptions
- type Location
- type MinimalRepoListEntry
- type Progress
- type Range
- type RepoList
- type RepoListEntry
- type RepoListField
- type RepoStats
- type ReposMap
- type Repository
- type RepositoryBranch
- type SearchOptions
- type SearchResult
- type Searcher
- type Sender
- type Stats
- type Streamer
- type Symbol
Constants ¶
const ( RepoListFieldRepos RepoListField = 0 RepoListFieldMinimal = 1 RepoListFieldReposMap = 2 )
const FeatureVersion = 12
FeatureVersion is increased if a feature is added that requires reindexing data without changing the format version 2: Rank field for shards. 3: Rank documents within shards 4: Dedup file bugfix 5: Remove max line size limit 6: Include '#' into the LineFragment template 7: Record skip reasons in the index. 8: Record source path in the index. 9: Store ctags metadata & bump default max file size 10: Compound shards; more flexible TOC format. 11: Bloom filters for file names & contents 12: go-enry for identifying file languages
const IndexFormatVersion = 16
IndexFormatVersion is a version number. It is increased every time the on-disk index format is changed. 5: subrepositories. 6: remove size prefix for posting varint list. 7: move subrepos into Repository struct. 8: move repoMetaData out of indexMetadata 9: use bigendian uint64 for trigrams. 10: sections for rune offsets. 11: file ends in rune offsets. 12: 64-bit branchmasks. 13: content checksums 14: languages 15: rune based symbol sections 16: ctags metadata
const NextIndexFormatVersion = 17
17: compound shard (multi repo)
const ReadMinFeatureVersion = 8
ReadMinFeatureVersion constrains backwards compatibility by refusing to load a file with a FeatureVersion below it.
const WriteMinFeatureVersion = 10
WriteMinFeatureVersion constrains forwards compatibility by emitting files that won't load in zoekt with a FeatureVersion below it.
Variables ¶
var FlushReasonStrings = map[FlushReason]string{ FlushReasonTimerExpired: "timer_expired", FlushReasonFinalFlush: "final_flush", FlushReasonMaxSize: "max_size_reached", }
var Version string
Filled by the linker
Functions ¶
func Explode ¶
Explode takes an IndexFile f and creates 1 simple shard per repository contained in f. Explode returns a map of tmpName -> dstName. It is the responsibility of the caller to rename the temporary shard(s) and delete the input shard.
func IndexFilePaths ¶
IndexFilePaths returns all paths for the IndexFile at filepath p that exist. Note: if no files exist this will return an empty slice and nil error.
This is p and the ".meta" file for p.
func JsonMarshalRepoMetaTemp ¶
func JsonMarshalRepoMetaTemp(shardPath string, repositoryMetadata interface{}) (tempPath, finalPath string, err error)
JsonMarshalRepoMetaTemp writes the json encoding of the given repository metadata to a temporary file in the same directory as the given shard path. It returns both the path of the temporary file and the path of the final file that the caller should use.
The caller is responsible for renaming the temporary file to the final file path, or removing the temporary file if it is no longer needed. TODO: Should we stick this in a util package?
func Merge ¶
Merge files into a compound shard in dstDir. Merge returns tmpName and a dstName. It is the responsibility of the caller to delete the input shards and rename the temporary compound shard from tmpName to dstName.
func PrintNgramStats ¶
PrintNgramStats outputs a list of the form
n_1 trigram_1 n_2 trigram_2 ...
where n_i is the length of the postings list of trigram_i stored in r.
func ReadMetadata ¶
func ReadMetadata(inf IndexFile) ([]*Repository, *IndexMetadata, error)
ReadMetadata returns the metadata of index shard without reading the index data. The IndexFile is not closed.
func ReadMetadataPath ¶
func ReadMetadataPath(p string) ([]*Repository, *IndexMetadata, error)
ReadMetadataPath returns the metadata of index shard at p without reading the index data. ReadMetadataPath is a helper for ReadMetadata which opens the IndexFile at p.
func ReadMetadataPathAlive ¶
func ReadMetadataPathAlive(p string) ([]*Repository, *IndexMetadata, error)
ReadMetadataPathAlive is like ReadMetadataPath except that it only returns alive repositories.
func SetTombstone ¶
SetTombstone idempotently sets a tombstone for repoName in .meta.
func ShardMergingEnabled ¶
func ShardMergingEnabled() bool
ShardMergingEnabled returns true if SRC_ENABLE_SHARD_MERGING is set to true.
func SortFiles ¶
func SortFiles(ms []FileMatch, opts *SearchOptions)
SortFiles sorts files matches. The order depends on the match score and, if available, on the pre-computed document ranks.
Rankings derived from match scores and rank vectors are combined based on "Reciprocal Rank Fusion" (RRF).
func UnsetTombstone ¶
UnsetTombstone idempotently removes a tombstones for reopName in .meta.
Types ¶
type ChunkMatch ¶
type ChunkMatch struct { // Content is a contiguous range of complete lines that fully contains Ranges. Content []byte // ContentStart is the location (inclusive) of the beginning of content // relative to the beginning of the file. It will always be at the // beginning of a line (Column will always be 1). ContentStart Location // FileName indicates whether this match is a match on the file name, in // which case Content will contain the file name. FileName bool // Ranges is a set of matching ranges within this chunk. Each range is relative // to the beginning of the file (not the beginning of Content). Ranges []Range // SymbolInfo is the symbol information associated with Ranges. If it is non-nil, // its length will equal that of Ranges. Any of its elements may be nil. SymbolInfo []*Symbol Score float64 DebugScore string }
ChunkMatch is a set of non-overlapping matches within a contiguous range of lines in the file.
type Document ¶
type Document struct { Name string Content []byte Branches []string SubRepositoryPath string Language string // If set, something is wrong with the file contents, and this // is the reason it wasn't indexed. SkipReason string // Document sections for symbols. Offsets should use bytes. Symbols []DocumentSection SymbolsMetaData []*Symbol // Ranks is a vector of ranks for a document as provided by a DocumentRanksFile // file in the git repo. // // Two documents can be ordered by comparing the components of their rank // vectors. Bigger entries are better, as are longer vectors. // // This field is experimental and may change at any time without warning. Ranks []float64 }
Document holds a document (file) to index.
type DocumentSection ¶
type DocumentSection struct {
Start, End uint32
}
type FileMatch ¶
type FileMatch struct { // Ranking; the higher, the better. Score float64 // TODO - hide this field? // Experimental. Ranks is a vector containing floats in the interval [0, 1]. The // length of the vector depends on the output from the ranking function at index // time. // // This field is only set if the shard contains ranking information and // SearchOptions.UseDocumentRanks is true. Ranks []float64 // For debugging. Needs DebugScore set, but public so tests in // other packages can print some diagnostics. Debug string FileName string // Repository is the globally unique name of the repo of the // match Repository string Branches []string // One of LineMatches or ChunkMatches will be returned depending on whether // the SearchOptions.ChunkMatches is set. LineMatches []LineMatch ChunkMatches []ChunkMatch // RepositoryID is a Sourcegraph extension. This is the ID of Repository in // Sourcegraph. RepositoryID uint32 // RepositoryPriority is a Sourcegraph extension. It is used by Sourcegraph to // order results from different repositories relative to each other. RepositoryPriority float64 // Only set if requested Content []byte // Checksum of the content. Checksum []byte // Detected language of the result. Language string // SubRepositoryName is the globally unique name of the repo, // if it came from a subrepository SubRepositoryName string // SubRepositoryPath holds the prefix where the subrepository // was mounted. SubRepositoryPath string // Commit SHA1 (hex) of the (sub)repo holding the file. Version string }
FileMatch contains all the matches within a file.
type FlushReason ¶
type FlushReason uint8
const ( FlushReasonTimerExpired FlushReason = 1 << iota FlushReasonFinalFlush FlushReasonMaxSize )
func (FlushReason) String ¶
func (fr FlushReason) String() string
type IndexBuilder ¶
type IndexBuilder struct { // IndexTime will be used as the time if non-zero. Otherwise // time.Now(). This is useful for doing reproducible builds in tests. IndexTime time.Time // a sortable 20 chars long id. ID string // contains filtered or unexported fields }
IndexBuilder builds a single index shard.
func NewIndexBuilder ¶
func NewIndexBuilder(r *Repository) (*IndexBuilder, error)
NewIndexBuilder creates a fresh IndexBuilder. The passed in Repository contains repo metadata, and may be set to nil.
func (*IndexBuilder) Add ¶
func (b *IndexBuilder) Add(doc Document) error
Add a file which only occurs in certain branches.
func (*IndexBuilder) AddFile ¶
func (b *IndexBuilder) AddFile(name string, content []byte) error
AddFile is a convenience wrapper for Add
func (*IndexBuilder) ContentSize ¶
func (b *IndexBuilder) ContentSize() uint32
ContentSize returns the number of content bytes so far ingested.
type IndexFile ¶
type IndexFile interface { Read(off uint32, sz uint32) ([]byte, error) Size() (uint32, error) Close() Name() string }
IndexFile is a file suitable for concurrent read access. For performance reasons, it allows a mmap'd implementation.
type IndexMetadata ¶
type IndexMetadata struct { IndexFormatVersion int IndexFeatureVersion int IndexMinReaderVersion int IndexTime time.Time PlainASCII bool LanguageMap map[string]uint16 ZoektVersion string ID string }
IndexMetadata holds metadata stored in the index file. It contains data generated by the core indexing library.
type LineFragmentMatch ¶
type LineFragmentMatch struct { // Offset within the line, in bytes. LineOffset int // Offset from file start, in bytes. Offset uint32 // Number bytes that match. MatchLength int SymbolInfo *Symbol }
LineFragmentMatch a segment of matching text within a line.
type LineMatch ¶
type LineMatch struct { // The line in which a match was found. Line []byte LineStart int LineEnd int LineNumber int // Before and After are only set when SearchOptions.NumContextLines is > 0 Before []byte After []byte // If set, this was a match on the filename. FileName bool // The higher the better. Only ranks the quality of the match // within the file, does not take rank of file into account Score float64 DebugScore string LineFragments []LineFragmentMatch }
LineMatch holds the matches within a single line in a file.
type ListOptions ¶
type ListOptions struct { // Return only Minimal data per repo that Sourcegraph frontend needs. // // Deprecated: use Field Minimal bool // Field decides which field to populate in RepoList response. Field RepoListField }
func (*ListOptions) GetField ¶
func (o *ListOptions) GetField() (RepoListField, error)
func (*ListOptions) String ¶
func (o *ListOptions) String() string
type MinimalRepoListEntry ¶
type MinimalRepoListEntry struct { HasSymbols bool Branches []RepositoryBranch }
type Progress ¶
type Progress struct { // Priority of the shard that was searched. Priority float64 // MaxPendingPriority is the maximum priority of pending result that is being searched in parallel. // This is used to reorder results when the result set is known to be stable-- that is, when a result's // Priority is greater than the max(MaxPendingPriority) from the latest results of each backend, it can be returned to the user. // // MaxPendingPriority decreases monotonically in each SearchResult. MaxPendingPriority float64 }
Progress contains information about the global progress of the running search query. This is used by the frontend to reorder results and emit them when stable. Sourcegraph specific: this is used when querying multiple zoekt-webserver instances.
type RepoList ¶
type RepoList struct { // Returned when ListOptions.Field is RepoListFieldRepos. Repos []*RepoListEntry // Returned when ListOptions.Field is RepoListFieldMinimal. // // Deprecated: use ReposMap. Minimal map[uint32]*MinimalRepoListEntry // ReposMap is set when ListOptions.Field is RepoListFieldReposMap. ReposMap ReposMap Crashes int // Stats response to a List request. // This is the aggregate RepoStats of all repos matching the input query. Stats RepoStats }
RepoList holds a set of Repository metadata.
type RepoListEntry ¶
type RepoListEntry struct { Repository Repository IndexMetadata IndexMetadata Stats RepoStats }
type RepoListField ¶
type RepoListField int
type RepoStats ¶
type RepoStats struct { // Repos is used for aggregrating the number of repositories. Repos int // Shards is the total number of search shards. Shards int // Documents holds the number of documents or files. Documents int // IndexBytes is the amount of RAM used for index overhead. IndexBytes int64 // ContentBytes is the amount of RAM used for raw content. ContentBytes int64 // NewLinesCount is the number of newlines "\n" that appear in the zoekt // indexed documents. This is not exactly the same as line count, since it // will not include lines not terminated by "\n" (eg a file with no "\n", or // a final line without "\n"). Note: Zoekt deduplicates documents across // branches, so if a path has the same contents on multiple branches, there // is only one document for it. As such that document's newlines is only // counted once. See DefaultBranchNewLinesCount and AllBranchesNewLinesCount // for counts which do not deduplicate. NewLinesCount uint64 // DefaultBranchNewLinesCount is the number of newlines "\n" in the default // branch. DefaultBranchNewLinesCount uint64 // OtherBranchesNewLinesCount is the number of newlines "\n" in all branches // except the default branch. OtherBranchesNewLinesCount uint64 }
Statistics of a (collection of) repositories.
type ReposMap ¶
type ReposMap map[uint32]MinimalRepoListEntry
func (*ReposMap) MarshalBinary ¶
MarshalBinary implements a specialized encoder for ReposMap.
func (*ReposMap) UnmarshalBinary ¶
UnmarshalBinary implements a specialized decoder for ReposMap.
type Repository ¶
type Repository struct { // Sourcergaph's repository ID ID uint32 // The repository name Name string // The repository URL. URL string // The physical source where this repo came from, eg. full // path to the zip filename or git repository directory. This // will not be exposed in the UI, but can be used to detect // orphaned index shards. Source string // The branches indexed in this repo. Branches []RepositoryBranch // Nil if this is not the super project. SubRepoMap map[string]*Repository // URL template to link to the commit of a branch CommitURLTemplate string // The repository URL for getting to a file. Has access to // {{Branch}}, {{Path}} FileURLTemplate string // The URL fragment to add to a file URL for line numbers. has // access to {{LineNumber}}. The fragment should include the // separator, generally '#' or ';'. LineFragmentTemplate string // All zoekt.* configuration settings. RawConfig map[string]string // Importance of the repository, bigger is more important Rank uint16 // IndexOptions is a hash of the options used to create the index for the // repo. IndexOptions string // HasSymbols is true if this repository has indexed ctags // output. Sourcegraph specific: This field is more appropriate for // IndexMetadata. However, we store it here since the Sourcegraph frontend // can read this structure but not IndexMetadata. HasSymbols bool // Tombstone is true if we are not allowed to search this repo. Tombstone bool // LatestCommitDate is the date of the latest commit among all indexed Branches. // The date might be time.Time's 0-value if the repository was last indexed // before this field was added. LatestCommitDate time.Time // FileTombstones is a set of file paths that should be ignored across all branches // in this shard. FileTombstones map[string]struct{} `json:",omitempty"` // contains filtered or unexported fields }
Repository holds repository metadata.
func (*Repository) MergeMutable ¶
func (r *Repository) MergeMutable(x *Repository) (mutated bool, err error)
MergeMutable will merge x into r. mutated will be true if it made any changes. err is non-nil if we needed to mutate an immutable field.
Note: SubRepoMap, IndexOptions and HasSymbol fields are ignored. They are computed while indexing so can't be synthesized from x.
Note: We ignore RawConfig fields which are duplicated into Repository: name and id.
Note: URL, *Template fields are ignored. They are not used by Sourcegraph.
func (*Repository) UnmarshalJSON ¶
func (r *Repository) UnmarshalJSON(data []byte) error
type RepositoryBranch ¶
RepositoryBranch describes an indexed branch, which is a name combined with a version.
func (RepositoryBranch) String ¶
func (r RepositoryBranch) String() string
type SearchOptions ¶
type SearchOptions struct { // Return an upper-bound estimate of eligible documents in // stats.ShardFilesConsidered. EstimateDocCount bool // Return the whole file. Whole bool // Maximum number of matches: skip all processing an index // shard after we found this many non-overlapping matches. ShardMaxMatchCount int // Maximum number of matches: stop looking for more matches // once we have this many matches across shards. TotalMaxMatchCount int // Maximum number of matches: skip processing documents for a repository in // a shard once we have found ShardRepoMaxMatchCount. // // A compound shard may contain multiple repositories. This will most often // be set to 1 to find all repositories containing a result. ShardRepoMaxMatchCount int // Deprecated: this field is not read anymore. ShardMaxImportantMatch int // Deprecated: this field is not read anymore. TotalMaxImportantMatch int // Abort the search after this much time has passed. MaxWallTime time.Duration // FlushWallTime if non-zero will stop streaming behaviour at first and // instead will collate and sort results. At FlushWallTime the results will // be sent and then the behaviour will revert to the normal streaming. FlushWallTime time.Duration // Trim the number of results after collating and sorting the // results MaxDocDisplayCount int // If set to a number greater than zero then up to this many number // of context lines will be added before and after each matched line. // Note that the included context lines might contain matches and // it's up to the consumer of the result to remove those lines. NumContextLines int // If true, ChunkMatches will be returned in each FileMatch rather than LineMatches // EXPERIMENTAL: the behavior of this flag may be changed in future versions. ChunkMatches bool // EXPERIMENTAL. If true, document ranks are used as additional input for // sorting matches. UseDocumentRanks bool // RanksDampingFactor determines the contribution of documents ranks to the // final ranking based on RRF. A value in (0,1] reduces the contribution, // while a value in (-inf,0) increases it. RanksDampingFactor float64 // Trace turns on opentracing for this request if true and if the Jaeger address was provided as // a command-line flag Trace bool // If set, the search results will contain debug information for scoring. DebugScore bool // SpanContext is the opentracing span context, if it exists, from the zoekt client SpanContext map[string]string }
func (*SearchOptions) SetDefaults ¶
func (o *SearchOptions) SetDefaults()
func (*SearchOptions) String ¶
func (s *SearchOptions) String() string
type SearchResult ¶
type SearchResult struct { Stats Progress Files []FileMatch // RepoURLs holds a repo => template string map. RepoURLs map[string]string // FragmentNames holds a repo => template string map, for // the line number fragment. LineFragments map[string]string }
SearchResult contains search matches and extra data
func (*SearchResult) SizeBytes ¶
func (sr *SearchResult) SizeBytes() (sz uint64)
SizeBytes is a best-effort estimate of the size of SearchResult in memory. The estimate does not take alignment into account. The result is a lower bound on the actual size in memory.
type Searcher ¶
type Searcher interface { Search(ctx context.Context, q query.Q, opts *SearchOptions) (*SearchResult, error) // List lists repositories. The query `q` can only contain // query.Repo atoms. List(ctx context.Context, q query.Q, opts *ListOptions) (*RepoList, error) Close() // Describe the searcher for debug messages. String() string }
func NewSearcher ¶
NewSearcher creates a Searcher for a single index file. Search results coming from this searcher are valid only for the lifetime of the Searcher itself, ie. []byte members should be copied into fresh buffers if the result is to survive closing the shard.
type Sender ¶
type Sender interface {
Send(*SearchResult)
}
Sender is the interface that wraps the basic Send method.
type Stats ¶
type Stats struct { // Amount of I/O for reading contents. ContentBytesLoaded int64 // Amount of I/O for reading from index. IndexBytesLoaded int64 // Number of search shards that had a crash. Crashes int // Wall clock time for this search Duration time.Duration // Number of files containing a match. FileCount int // Number of files in shards that we considered. ShardFilesConsidered int // Files that we evaluated. Equivalent to files for which all // atom matches (including negations) evaluated to true. FilesConsidered int // Files for which we loaded file content to verify substring matches FilesLoaded int // Candidate files whose contents weren't examined because we // gathered enough matches. FilesSkipped int // Shards that we scanned to find matches. ShardsScanned int // Shards that we did not process because a query was canceled. ShardsSkipped int // Shards that we did not process because the query was rejected by the // ngram filter indicating it had no matches. ShardsSkippedFilter int // Number of non-overlapping matches MatchCount int // Number of candidate matches as a result of searching ngrams. NgramMatches int // Wall clock time for queued search. Wait time.Duration // Number of times regexp was called on files that we evaluated. RegexpsConsidered int // FlushReason explains why results were flushed. FlushReason FlushReason }
Stats contains interesting numbers on the search
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
package build implements a more convenient interface for building zoekt indices.
|
package build implements a more convenient interface for building zoekt indices. |
zoekt-archive-index
Command zoekt-archive-index indexes an archive.
|
Command zoekt-archive-index indexes an archive. |
zoekt-git-clone
This binary fetches all repos of a user or organization and clones them.
|
This binary fetches all repos of a user or organization and clones them. |
zoekt-mirror-bitbucket-server
This binary fetches all repos of a project, and of a specific type, in case these are specified, and clones them.
|
This binary fetches all repos of a project, and of a specific type, in case these are specified, and clones them. |
zoekt-mirror-github
This binary fetches all repos of a user or organization and clones them.
|
This binary fetches all repos of a user or organization and clones them. |
zoekt-mirror-gitiles
This binary fetches all repos of a Gitiles host.
|
This binary fetches all repos of a Gitiles host. |
zoekt-mirror-gitlab
This binary fetches all repos for a user from gitlab.
|
This binary fetches all repos for a user from gitlab. |
zoekt-repo-index
zoekt-repo-index indexes a repo-based repository.
|
zoekt-repo-index indexes a repo-based repository. |
zoekt-sourcegraph-indexserver
Command zoekt-sourcegraph-indexserver periodically reindexes enabled repositories on sourcegraph
|
Command zoekt-sourcegraph-indexserver periodically reindexes enabled repositories on sourcegraph |
zoekt-test
zoekt-test compares the search engine results with raw substring search
|
zoekt-test compares the search engine results with raw substring search |
Package gitindex provides functions for indexing Git repositories.
|
Package gitindex provides functions for indexing Git repositories. |
package ignore provides helpers to support ignore-files similar to .gitignore
|
package ignore provides helpers to support ignore-files similar to .gitignore |
internal
|
|
otlpenv
Package otlpenv exports getters to read OpenTelemetry protocol configuration options based on the official spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#configuration-options
|
Package otlpenv exports getters to read OpenTelemetry protocol configuration options based on the official spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#configuration-options |
Package rpc provides a zoekt.Searcher over RPC.
|
Package rpc provides a zoekt.Searcher over RPC. |
Package stream provides a client and a server to consume search results as stream.
|
Package stream provides a client and a server to consume search results as stream. |
Package trace provides a tracing API that in turn invokes both the `golang.org/x/net/trace` API and creates an opentracing span if appropriate.
|
Package trace provides a tracing API that in turn invokes both the `golang.org/x/net/trace` API and creates an opentracing span if appropriate. |