fastago

command module

v0.1.0 Latest Latest Go to latest Published: Oct 15, 2021 License: Apache-2.0 Imports: 1 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/lucblassel/fastago

Links

Open Source Insights

README ¶

Fastago

Presentation

This is a very simple tool to do basic operations on fasta formatted files.

It is inspired by goalign, with the main difference that this tool streams the files and executes operations per sequence. This eliminates the need to load the whole file into memory.
This tool is not meant to work on alignments, it is simply a collection of useful "ease of life" functions when working with fasta files.

Examples

# Transform all characters to uppercase and save to "upper.fasta"
$> fastago transform upper --input input.fasta --output upper.fasta

# Count sequences in file
$> fastago stats count --input input.fasta 

# Read from stdin
$> cat input.fasta | fastago stats length  

# chain operations
$> cat input.fasta | fastago transform lower | fastago stats count

# read compressed file
$> fastago stats count --input input.fasta.gz

# pipe in compressed data
$> cat input.fasta.xz | fastago stats count --compression xz

# Extract "Seq_1" and "Seq_2" sequences from input file
$> fastago subset --input input.fasta Seq_1 Seq_2

Installation

Binaries

Grab a binary from the releases section.

Build from source

Make sure you have go 1.16 installed. Then clone this repository and build with:
go build or go build -o <binaryName> if you want to specify another name than fastago.

Commands

addid 🏳 : add a prefix or a suffix to sequence names
rename 🏳 : rename sequences with either a regex or a map file
stats : get statistics and information on the sequences
- count : count sequences in file
- length 🏳 : get length of sequences in file (can also output the average/min/max)
- freqs 🏳 : get average frequencies of bases in file (can also output frequencies in each sequence)
subset 🏳 : subset the files, keeping only specified sequences. Works with regex, a file of names or positional arguments
transform : apply transformtaion functions to sequences
- upper : transform sequence bases to uppercase
- lower : transform sequence bases to lowercase
- replace base1 base2 : transform sequence replace base1 with base2
help : show usage message
version : get current version of fastago
completion : generate autocompletion script for bash, zsh, fish or powershell (thank you cobra 🙏)

General flags

-i or --input: specify the input file. The default is stdin.
-o or --output: specify the output file. The default is stdout.
-c or --compression: specify the compression method of the input file. If the input on stdin is compressed, this flag must be specified for this tool to work. If this flag is not used and -i is, fastago will try to guess the compression from the file extension. Supported compression schemes: gzip (.gz), bzip2 (.bz2), xzip (.xz)
-w or --linewidth: specify the width at which a sequence will wrap in the output. The default is 80 characters.
-h or --help : display a help message.

Per command flags

addid

You must specify at least one of the following flagas to run this command:

-p or --prefix to add your identifier to the beginning of each sequence name
s or --suffix to add your identifier to the end of each sequence name

rename

There are 2 ways to rename sequences:

The -m or --map flag allows you to specify a mapping of names to be renamed. On each line of this file you must write the name of the sequence you want to change and the new name, separated by a tab character.
The -r or --regex flag, allows you to specify a regular expression that will match a substring in each sequence name. This match will be replace by the value specified with the -por --replace flag. If you provide a regular expression you must also provide a replacement string. More info on Go regular expression syntax here.

stats

length

With the -m or --mode flag you can choose which information you want to display:

-m each : will display the length of each sequence after it's name on a single line
-m average or -m mean will display the average length of all sequences in the file
-m min or -m minimum will display the length of the shortest sequence in the file
-m max or -m maximum will display the length of the longest sequence in the file

The default value for this flag is each.

subset

There are 2 ways to subset your fasta file:

You can use the -n or --names flag to specify a file of names to keep (1 by line)
You can use the r or --regex flag to specify a regular expression that matches the sequences you want to keep.

If you specify the -x or --exclude flag you specify the sequences to exclude instead of the sequences to keep.

freqs

With the -m or --mode flag you can choose which information you want to display:

-m each : will display the frequencies in each sequence after it's name on a single line

Not specifying the m flag will output the frequencies averaged over all sequences.

Contributing

If you wish to contribute to this project check out our contribution guidelines

Documentation ¶

Overview ¶

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
cmd
pkg
seqs Package seqs allows to read a fasta formatted input stream as a succession of whole sequences.	Package seqs allows to read a fasta formatted input stream as a succession of whole sequences.
stream Package stream allows to read a fasta formatted input stream as lines.	Package stream allows to read a fasta formatted input stream as lines.
version

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL