Documentation ¶
Overview ¶
Package tilelibrary is a package for implementing tile libraries in Go. It is assumed that the tile information provided beforehand is imputed, or that having nocalls is okay in SGLF files. The library does check for completeness of tiles, but doesn't modify them to be complete or imputed before writing them to files. Various functions to merge, liftover, import, export, and modify libraries are provided. Should be used in conjunction with the structures package. Note: Do not add tiles to libraries made from SGLFv2 files (since it's not clear how the tile information would be included) Note: adding tiles to a library at any point will require sorting that library before writing it to a file.
Index ¶
- Variables
- func ReadMapping(filepath string) (mapping [][][]int, sourceID, destinationID [md5.Size]byte, err error)
- func WriteMapping(filename string, mapping LiftoverMapping) error
- type KnownVariants
- type Library
- func (l *Library) AddDirectories(directories []string, gzipped bool) error
- func (l *Library) AddLibraryFastJ(directory string) error
- func (l *Library) AddLibrarySGLFv2() error
- func (l *Library) AddPathFromDirectories(directories []string, genomePath int, gzipped bool) error
- func (l *Library) AddTile(genomePath, step int, new *structures.TileVariant, bases string) error
- func (l *Library) Annotate(path, step int, hash structures.VariantHash, annotation string) bool
- func (l *Library) AssignID()
- func (l *Library) Equals(l2 *Library) bool
- func (l *Library) FindFrequency(path, step int, toFind *structures.TileVariant) int
- func (l *Library) HashEquals(l2 *Library) bool
- func (l *Library) MergeLibraries(libraryToMerge *Library, textFile string) (*Library, error)
- func (l *Library) MergeLibrariesWithoutCreation(libraryToMerge *Library) (*Library, error)
- func (l *Library) SortLibrary()
- func (l *Library) TileExists(path, step int, toCheck *structures.TileVariant) (int, bool)
- func (l *Library) WriteLibraryToSGLF(directoryToWriteTo string) error
- func (l *Library) WriteLibraryToSGLFv2(directoryToWriteTo string) error
- type LiftoverMapping
Constants ¶
This section is empty.
Variables ¶
var ErrBadLiftover = errors.New("not a valid mapping file")
ErrBadLiftover is an error for when a file is not a liftover mapping.
var ErrBadSource = errors.New("source library is not part of the destination library")
ErrBadSource is an error for when the origin of a liftover mapping is not a subset of the destination library.
var ErrCannotAddTile = errors.New("library was built from sglfv2 files--cannot add new tile")
ErrCannotAddTile is an error when trying to add tiles to libraries built from SGLFv2 files, since it's not clear how to add tiles properly to them.
var ErrInconsistentHash = errors.New("bases and hash do not match")
ErrInconsistentHash is an error for when the hash of a set of bases does not match the TileVariant hash.
var ErrIncorrectSourceText = errors.New("library doesn't have the right intermediate file(s) as its Text field")
ErrIncorrectSourceText is an error for when the source file/directory doesn't match the function it's used in.
var ErrInvalidReferenceLibrary = errors.New("reference library field is not a library pointer")
ErrInvalidReferenceLibrary is an error when the ReferenceLibrary field of a TileVariant is not a pointer to a Library.
var ErrTileContradiction = errors.New("a tile that was added is not found in the library")
ErrTileContradiction is an error that occurs when a tile that is known to be in the library was not found (usually this is used after adding that tile to the library)
Functions ¶
func ReadMapping ¶
func ReadMapping(filepath string) (mapping [][][]int, sourceID, destinationID [md5.Size]byte, err error)
ReadMapping gets the information from a mapping given its filepath. It also returns the hashes for the source and destination libraries, in that order.
func WriteMapping ¶
func WriteMapping(filename string, mapping LiftoverMapping) error
WriteMapping writes a LiftoverMapping to a specified file. The format is path/step/source1,destination1;source2,destination2;... Current suffix for mappings: .sglfmapping (make sure all filenames end with this suffix) Returns any error encountered, or nil if there's no error.
Types ¶
type KnownVariants ¶
type KnownVariants struct { List [](*structures.TileVariant) // List to keep track of relative tile ordering (implicitly assigns tile variant numbers by index after sorting) Counts []int // Counts of each variant so far }
KnownVariants is a struct to hold the known variants in a specific step. KnownVariants will also keep track of the count of each tile--the variant at List[i] has been seen Counts[i] times.
type Library ¶
type Library struct { Paths []concurrentPath // The paths of the library. ID [md5.Size]byte // The ID of a library. Components [][md5.Size]byte // the IDs of the libraries that this library is composed of (empty if this library was not made from others) // contains filtered or unexported fields }
Library is a struct to represent a library of tile variants from a set of genomes.
A Library is separated into paths, in the Paths field, represented as a slice of concurrentPaths. This makes Libraries safe for concurrent use in terms of modification of tiles.
Libraries have IDs for easy discussion and reference. Currently, the IDs are calculated by the MD5 hash algorithm. It hashes all of the location and tile information (except annotations) in order, by path, then step, then in the order of variants by increasing variant number. Upon reaching a new path or step, the uint32 form, separated into 4 bytes, of the path/step is added to the hash. Then, for each variant, infomation is added to the list of bytes to be hashed in the following order:
-Variant number, in uint32 form, separated into bytes -Total count of this variant -Tile length, in terms of steps. -The hash of the tile variant, as bytes.
Hashing everything including location infomation ensures that two libraries with the same tiles and counts but with some tiles in different locations would not be considered the same libraries.
The Components of a library determine specifically which libraries are allowed to liftover to that library (since without being part of the components, it's impossible to know easily if you can make a liftover mapping)
In terms of usage, create a new library using New, which will set up the Paths of the Library for you, and will set the reference text file and any component libraries.
Notes: files and directories of tile libraries will not be automatically deleted. If files or directories must be deleted, you must add this functionality (e.g. by using os.Remove). In addition, the ID is not automatically updated when using AddTile, since AssignID is not quick. The caller must use AssignID on the library to update its ID after adding all tiles.
func CompileDirectoriesToLibrary ¶
func CompileDirectoriesToLibrary(directories []string, libraryTextFile string, gzipped bool) (*Library, error)
CompileDirectoriesToLibrary creates a new Library based on the directories given, sorts it, and gives it its ID (so this library is ready for use). Returns the library pointer and an error, if any (nil if no error was encounted)
func New ¶
New sets up the basic structure for a library and returns a pointer to a new library. For consistency, it's best to use an absolute path for the text file. Relative paths will still work, but they are not recommended.
func SequentialCompileDirectoriesToLibrary ¶
func SequentialCompileDirectoriesToLibrary(directories []string, libraryTextFile string) (*Library, error)
SequentialCompileDirectoriesToLibrary creates a new Library based on the directories given, sorts it, and gives it its ID (so this library is ready for use). This adds each directory sequentially (so each genome is done one at a time, rather than doing one path of all genomes all at once) Returns the library pointer and an error, if any (nil if no error was encounted)
func (*Library) AddDirectories ¶
AddDirectories adds information from a list of directories for genomes into a library, but parses by path. Will return any error encountered.
func (*Library) AddLibraryFastJ ¶
AddLibraryFastJ adds a directory of gzipped FastJ files to a specific library. Will return any error encountered.
func (*Library) AddLibrarySGLFv2 ¶
AddLibrarySGLFv2 adds a directory of SGLFv2 files to a library. Library should be initialized with this directory as the Text field, so that text files of bases and directories aren't mixed together. Returns any error encountered, or nil if there's no error.
func (*Library) AddPathFromDirectories ¶
AddPathFromDirectories parses the same path for all genomes, represented by a list of directories, and puts the information in a Library. Will return any error encountered.
func (*Library) AddTile ¶
func (l *Library) AddTile(genomePath, step int, new *structures.TileVariant, bases string) error
AddTile is a function to add a tile (without sorting). Safe to use without checking existence of the tile beforehand (since the function will do that for you). Will return any error encountered. Note: AddTile will write any new tiles to disk in an intermediate file.
func (*Library) Annotate ¶
func (l *Library) Annotate(path, step int, hash structures.VariantHash, annotation string) bool
Annotate is a method to annotate (or re-annotate) a Tile at a specific path and step. If no match is found, the user is notified through the returned boolean.
func (*Library) AssignID ¶
func (l *Library) AssignID()
AssignID assigns a library its ID. Current method takes the path, step, variant number, variant count, variant hash, and variant length into account when making the ID.
func (*Library) Equals ¶
Equals checks for equality between two libraries. It does not check similarity in text or components, and tiles are checked by hash. HashEquals will generally be a faster way of checking equality--this is best used when you need to be completely sure about library equality (or inequality)
func (*Library) FindFrequency ¶
func (l *Library) FindFrequency(path, step int, toFind *structures.TileVariant) int
FindFrequency is a function to find the frequency of a specific tile at a specific path and step. A tile that is not found at a specific location has a frequency of 0.
func (*Library) HashEquals ¶
HashEquals is a simpler way of checking library equality, since two libraries with the same ID are almost certainly equal. It's faster to use than Equals, given that the IDs have been calculated already.
func (*Library) MergeLibraries ¶
MergeLibraries is a function to merge the library given with the base library. This version creates a new library. Returns the library pointer and an error, if any (nil if no error was encounted)
func (*Library) MergeLibrariesWithoutCreation ¶
MergeLibrariesWithoutCreation merges libraries without creating a new one, using the "mainLibrary" instead. Returns the library pointer and an error, if any (nil if no error was encounted)
func (*Library) SortLibrary ¶
func (l *Library) SortLibrary()
SortLibrary is a function to sort the library once all initial genomes are done being added. This function should only be used once after initial setup of the library, after all tiles have been added, since it sorts everything. The sort function compares tile counts and hashes, so the order in which tiles are added doesn't matter.
func (*Library) TileExists ¶
func (l *Library) TileExists(path, step int, toCheck *structures.TileVariant) (int, bool)
TileExists is a function to check if a specific tile exists at a specific path and step in a library. Returns the index of the variant and the boolean true, if found--otherwise, returns 0 and false, meaning not found. It creates more room for new steps and variants, if necessary.
func (*Library) WriteLibraryToSGLF ¶
WriteLibraryToSGLF writes the contents of a library to SGLF files to a specified directory. Will return any error encountered.
func (*Library) WriteLibraryToSGLFv2 ¶
WriteLibraryToSGLFv2 writes the contents of a library to SGLFv2 files to a specified directory. Will return any error encountered.
type LiftoverMapping ¶
type LiftoverMapping struct { Mapping [][][]int // The actual mapping between the two libraries SourceLibrary *Library // The source library to map from. DestinationLibrary *Library // The destination library to map to. }
LiftoverMapping is a representation of a liftover from one library to another, essentially becoming a translation of variants from the source to the destination. If a = LiftoverMapping.Mapping[b][c][d], then in path b, step c, variant d in the first library maps to variant a in path b and step c in the second.
func CreateMapping ¶
func CreateMapping(source, destination *Library) (LiftoverMapping, error)
CreateMapping creates a liftover mapping from the source library to the destination library. Returns the mapping and an error, if any (nil if no error was encounted)