Documentation ¶
Overview ¶
Package uniprot provides an XML parser for Uniprot data dumps.
Uniprot is comprehensive, high-quality and freely accessible resource of protein sequence and functional information. It is the best(1) protein database out there.
Uniprot database dumps are available as gzipped FASTA files or gzipped XML files. The XML files have significantly more information than the FASTA files, and this parser specifically works on the gzipped XML files from Uniprot.
Uniprot provides an XML schema of their data dumps(3), which is useful for autogeneration of Golang structs. xsdgen was used to automatically generate xml.go from uniprot.xsd.
Each protein in Uniprot is known as an "Entry" (as defined in xml.go).
The function Parse stream-reads Uniprot into an Entry channel, from which you can use the entries however you want. Read simplifies reading gzipped files from a disk into an Entry channel.
Example (Basic) ¶
This example shows how to open a uniprot data dump file and read the results into a list. Directly using the channel without converting to an array should be used for the Trembl data dump
package main import ( "fmt" "github.com/bebop/poly/io/uniprot" ) func main() { entries, _, _ := uniprot.Read("data/uniprot_sprot_mini.xml.gz") var entry uniprot.Entry for singleEntry := range entries { entry = singleEntry } fmt.Println(entry.Accession[0]) }
Output: O55723
Index ¶
- func Parse(decoder Decoder, entries chan<- Entry, errors chan<- error)
- func Read(path string) (chan Entry, chan error, error)
- type Absorption
- type AlternativeName
- type Anon6
- type CitationType
- type CofactorType
- type CommentType
- type Component
- type Conflict
- type ConsortiumType
- type Dataset
- type Date
- type DbReferenceType
- type Decoder
- type Direction
- type Disease
- type Domain
- type Entry
- type EventType
- type EvidenceType
- type EvidencedStringType
- type FeatureType
- type Fragment
- type GeneLocationType
- type GeneNameType
- type GeneType
- type ImportedFromType
- type IntListType
- type InteractantType
- type IsoformType
- type KeywordType
- type Kinetics
- type Lineage
- type Link
- type LocationType
- type Name
- type NameListType
- type OrganismNameType
- type OrganismType
- type PersonType
- type PhDependence
- type PhysiologicalReactionType
- type Plasmid
- type PositionType
- type PropertyType
- type ProteinExistenceType
- type ProteinType
- type ReactionType
- type RecommendedName
- type RedoxPotential
- type ReferenceType
- type Resource
- type Sequence
- type SequenceType
- type SourceDataType
- type SourceType
- type Status
- type StatusType
- type Strain
- type SubcellularLocationType
- type SubmittedName
- type TemperatureDependence
- type Tissue
- type Transposon
- type Type
- type Uniprot
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Parse ¶
Parse parses Uniprot entries into a channel.
Example ¶
xmlFile, _ := os.Open("data/uniprot_sprot_mini.xml.gz") unzippedBytes, _ := gzip.NewReader(xmlFile) entries := make(chan Entry, 100) // if you don't have a buffered channel, nothing will be read in loops on the channel. decoderErrors := make(chan error, 100) decoder := xml.NewDecoder(unzippedBytes) go Parse(decoder, entries, decoderErrors) var entry Entry for singleEntry := range entries { entry = singleEntry } fmt.Println(entry.Accession[0])
Output: O55723
func Read ¶
Read reads a gzipped Uniprot XML dump. Failing to open the XML dump gives a single error, while errors encountered while decoding the XML dump are added to the errors channel.
Example ¶
entries, _, _ := Read("data/uniprot_sprot_mini.xml.gz") var entry Entry for singleEntry := range entries { entry = singleEntry } fmt.Println(entry.Accession[0])
Output: O55723
Types ¶
type Absorption ¶
type Absorption struct { Max EvidencedStringType `xml:"http://uniprot.org/uniprot max,omitempty"` Text []EvidencedStringType `xml:"http://uniprot.org/uniprot text,omitempty"` }
type AlternativeName ¶
type AlternativeName struct { FullName EvidencedStringType `xml:"http://uniprot.org/uniprot fullName,omitempty"` ShortName []EvidencedStringType `xml:"http://uniprot.org/uniprot shortName,omitempty"` EcNumber []EvidencedStringType `xml:"http://uniprot.org/uniprot ecNumber,omitempty"` }
type CitationType ¶
type CitationType struct { Title string `xml:"http://uniprot.org/uniprot title,omitempty"` EditorList NameListType `xml:"http://uniprot.org/uniprot editorList,omitempty"` AuthorList NameListType `xml:"http://uniprot.org/uniprot authorList,omitempty"` Locator string `xml:"http://uniprot.org/uniprot locator,omitempty"` DbReference []DbReferenceType `xml:"http://uniprot.org/uniprot dbReference,omitempty"` Type Type `xml:"type,attr"` Date Date `xml:"date,attr,omitempty"` Name string `xml:"name,attr,omitempty"` Volume string `xml:"volume,attr,omitempty"` First string `xml:"first,attr,omitempty"` Last string `xml:"last,attr,omitempty"` Publisher string `xml:"publisher,attr,omitempty"` City string `xml:"city,attr,omitempty"` Db string `xml:"db,attr,omitempty"` Number string `xml:"number,attr,omitempty"` Institute string `xml:"institute,attr,omitempty"` Country string `xml:"country,attr,omitempty"` }
Describes different types of citations. Equivalent to the flat file RX-, RG-, RA-, RT- and RL-lines.
type CofactorType ¶
type CofactorType struct { Name string `xml:"http://uniprot.org/uniprot name"` DbReference DbReferenceType `xml:"http://uniprot.org/uniprot dbReference"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes a cofactor.
type CommentType ¶
type CommentType struct { Molecule string `xml:"http://uniprot.org/uniprot molecule,omitempty"` Absorption Absorption `xml:"http://uniprot.org/uniprot absorption,omitempty"` Kinetics Kinetics `xml:"http://uniprot.org/uniprot kinetics,omitempty"` PhDependence PhDependence `xml:"http://uniprot.org/uniprot phDependence,omitempty"` RedoxPotential RedoxPotential `xml:"http://uniprot.org/uniprot redoxPotential,omitempty"` TemperatureDependence TemperatureDependence `xml:"http://uniprot.org/uniprot temperatureDependence,omitempty"` Reaction ReactionType `xml:"http://uniprot.org/uniprot reaction,omitempty"` PhysiologicalReaction []PhysiologicalReactionType `xml:"http://uniprot.org/uniprot physiologicalReaction,omitempty"` Cofactor []CofactorType `xml:"http://uniprot.org/uniprot cofactor,omitempty"` SubcellularLocation []SubcellularLocationType `xml:"http://uniprot.org/uniprot subcellularLocation,omitempty"` Conflict Conflict `xml:"http://uniprot.org/uniprot conflict,omitempty"` Link []Link `xml:"http://uniprot.org/uniprot link,omitempty"` Event []EventType `xml:"http://uniprot.org/uniprot event,omitempty"` Isoform []IsoformType `xml:"http://uniprot.org/uniprot isoform,omitempty"` Interactant []InteractantType `xml:"http://uniprot.org/uniprot interactant,omitempty"` OrganismsDiffer bool `xml:"http://uniprot.org/uniprot organismsDiffer,omitempty"` Experiments int `xml:"http://uniprot.org/uniprot experiments,omitempty"` Disease Disease `xml:"http://uniprot.org/uniprot disease,omitempty"` Location []LocationType `xml:"http://uniprot.org/uniprot location,omitempty"` Text []EvidencedStringType `xml:"http://uniprot.org/uniprot text,omitempty"` Type Type `xml:"type,attr"` LocationType string `xml:"locationType,attr,omitempty"` Name string `xml:"name,attr,omitempty"` Mass float32 `xml:"mass,attr,omitempty"` Error string `xml:"error,attr,omitempty"` Method string `xml:"method,attr,omitempty"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes different types of general annotations. Equivalent to the flat file CC-line.
func (*CommentType) UnmarshalXML ¶
func (t *CommentType) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error
type Component ¶
type Component struct { RecommendedName RecommendedName `xml:"http://uniprot.org/uniprot recommendedName,omitempty"` AlternativeName []AlternativeName `xml:"http://uniprot.org/uniprot alternativeName,omitempty"` SubmittedName []SubmittedName `xml:"http://uniprot.org/uniprot submittedName,omitempty"` AllergenName EvidencedStringType `xml:"http://uniprot.org/uniprot allergenName,omitempty"` BiotechName EvidencedStringType `xml:"http://uniprot.org/uniprot biotechName,omitempty"` CdAntigenName []EvidencedStringType `xml:"http://uniprot.org/uniprot cdAntigenName,omitempty"` InnName []EvidencedStringType `xml:"http://uniprot.org/uniprot innName,omitempty"` }
type ConsortiumType ¶
type ConsortiumType struct {
Name string `xml:"name,attr"`
}
Describes the authors of a citation when these are represented by a consortium. Equivalent to the flat file RG-line.
type DbReferenceType ¶
type DbReferenceType struct { Molecule string `xml:"http://uniprot.org/uniprot molecule,omitempty"` Property []PropertyType `xml:"http://uniprot.org/uniprot property,omitempty"` Type string `xml:"type,attr"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes a database cross-reference. Equivalent to the flat file DR-line.
type Decoder ¶
type Decoder interface { DecodeElement(v interface{}, start *xml.StartElement) error Token() (xml.Token, error) }
Decoder decodes XML elements2
type Disease ¶
type Disease struct { Name string `xml:"http://uniprot.org/uniprot name"` Acronym string `xml:"http://uniprot.org/uniprot acronym"` Description string `xml:"http://uniprot.org/uniprot description"` DbReference DbReferenceType `xml:"http://uniprot.org/uniprot dbReference"` }
type Domain ¶
type Domain struct { RecommendedName RecommendedName `xml:"http://uniprot.org/uniprot recommendedName,omitempty"` AlternativeName []AlternativeName `xml:"http://uniprot.org/uniprot alternativeName,omitempty"` SubmittedName []SubmittedName `xml:"http://uniprot.org/uniprot submittedName,omitempty"` AllergenName EvidencedStringType `xml:"http://uniprot.org/uniprot allergenName,omitempty"` BiotechName EvidencedStringType `xml:"http://uniprot.org/uniprot biotechName,omitempty"` CdAntigenName []EvidencedStringType `xml:"http://uniprot.org/uniprot cdAntigenName,omitempty"` InnName []EvidencedStringType `xml:"http://uniprot.org/uniprot innName,omitempty"` }
type Entry ¶
type Entry struct { Accession []string `xml:"http://uniprot.org/uniprot accession"` Name []string `xml:"http://uniprot.org/uniprot name"` Protein ProteinType `xml:"http://uniprot.org/uniprot protein"` Gene []GeneType `xml:"http://uniprot.org/uniprot gene,omitempty"` Organism OrganismType `xml:"http://uniprot.org/uniprot organism"` OrganismHost []OrganismType `xml:"http://uniprot.org/uniprot organismHost,omitempty"` GeneLocation []GeneLocationType `xml:"http://uniprot.org/uniprot geneLocation,omitempty"` Reference []ReferenceType `xml:"http://uniprot.org/uniprot reference"` Comment []CommentType `xml:"http://uniprot.org/uniprot comment,omitempty"` DbReference []DbReferenceType `xml:"http://uniprot.org/uniprot dbReference,omitempty"` ProteinExistence ProteinExistenceType `xml:"http://uniprot.org/uniprot proteinExistence"` Keyword []KeywordType `xml:"http://uniprot.org/uniprot keyword,omitempty"` Feature []FeatureType `xml:"http://uniprot.org/uniprot feature,omitempty"` Evidence []EvidenceType `xml:"http://uniprot.org/uniprot evidence,omitempty"` Sequence SequenceType `xml:"http://uniprot.org/uniprot sequence"` Dataset Dataset `xml:"dataset,attr"` Created time.Time `xml:"created,attr"` Modified time.Time `xml:"modified,attr"` Version int `xml:"version,attr"` }
func (*Entry) UnmarshalXML ¶
type EventType ¶
type EventType struct {
Type Type `xml:"type,attr"`
}
Describes the type of events that cause alternative products.
type EvidenceType ¶
type EvidenceType struct { Source SourceType `xml:"http://uniprot.org/uniprot source,omitempty"` ImportedFrom ImportedFromType `xml:"http://uniprot.org/uniprot importedFrom,omitempty"` Type string `xml:"type,attr"` Key int `xml:"key,attr"` }
Describes the evidence for an annotation. No flat file equivalent.
type EvidencedStringType ¶
type EvidencedStringType struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type FeatureType ¶
type FeatureType struct { Original string `xml:"http://uniprot.org/uniprot original,omitempty"` Variation []string `xml:"http://uniprot.org/uniprot variation,omitempty"` Location LocationType `xml:"http://uniprot.org/uniprot location"` Type Type `xml:"type,attr"` Description string `xml:"description,attr,omitempty"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes different types of sequence annotations. Equivalent to the flat file FT-line.
type GeneLocationType ¶
type GeneLocationType struct { Name []StatusType `xml:"http://uniprot.org/uniprot name,omitempty"` Type Type `xml:"type,attr"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes non-nuclear gene locations (organelles and plasmids). Equivalent to the flat file OG-line.
type GeneNameType ¶
type GeneNameType struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` Type Type `xml:"type,attr"` }
Describes different types of gene designations. Equivalent to the flat file GN-line.
type GeneType ¶
type GeneType struct {
Name []GeneNameType `xml:"http://uniprot.org/uniprot name"`
}
Describes a gene. Equivalent to the flat file GN-line.
type ImportedFromType ¶
type ImportedFromType struct {
DbReference DbReferenceType `xml:"http://uniprot.org/uniprot dbReference"`
}
Describes the source of the evidence, when it is not assigned by UniProt, but imported from an external database.
type IntListType ¶
type IntListType []int
func (*IntListType) UnmarshalText ¶
func (x *IntListType) UnmarshalText(text []byte) error
type InteractantType ¶
type InteractantType struct { Id string `xml:"http://uniprot.org/uniprot id"` Label string `xml:"http://uniprot.org/uniprot label,omitempty"` DbReference DbReferenceType `xml:"http://uniprot.org/uniprot dbReference,omitempty"` IntactId string `xml:"intactId,attr"` }
type IsoformType ¶
type IsoformType struct { Id []string `xml:"http://uniprot.org/uniprot id"` Name []Name `xml:"http://uniprot.org/uniprot name"` Sequence Anon6 `xml:"http://uniprot.org/uniprot sequence"` Text []EvidencedStringType `xml:"http://uniprot.org/uniprot text,omitempty"` }
Describes isoforms in 'alternative products' annotations.
type KeywordType ¶
type KeywordType struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type Kinetics ¶
type Kinetics struct { KM []EvidencedStringType `xml:"http://uniprot.org/uniprot KM,omitempty"` Vmax []EvidencedStringType `xml:"http://uniprot.org/uniprot Vmax,omitempty"` Text []EvidencedStringType `xml:"http://uniprot.org/uniprot text,omitempty"` }
type LocationType ¶
type LocationType struct { Begin PositionType `xml:"http://uniprot.org/uniprot begin,omitempty"` End PositionType `xml:"http://uniprot.org/uniprot end,omitempty"` Position PositionType `xml:"http://uniprot.org/uniprot position,omitempty"` Sequence string `xml:"sequence,attr,omitempty"` }
Describes a sequence location as either a range with a begin and end or as a position. The 'sequence' attribute is only used when the location is not on the canonical sequence displayed in the current entry.
type Name ¶
type Name struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type NameListType ¶
type NameListType struct { Consortium ConsortiumType `xml:"http://uniprot.org/uniprot consortium,omitempty"` Person PersonType `xml:"http://uniprot.org/uniprot person,omitempty"` }
type OrganismNameType ¶
Describes different types of source organism names.
type OrganismType ¶
type OrganismType struct { Name []OrganismNameType `xml:"http://uniprot.org/uniprot name"` DbReference []DbReferenceType `xml:"http://uniprot.org/uniprot dbReference"` Lineage Lineage `xml:"http://uniprot.org/uniprot lineage,omitempty"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes the source organism.
type PersonType ¶
type PersonType struct {
Name string `xml:"name,attr"`
}
type PhDependence ¶
type PhDependence struct {
Text []EvidencedStringType `xml:"http://uniprot.org/uniprot text"`
}
type PhysiologicalReactionType ¶
type PhysiologicalReactionType struct { DbReference DbReferenceType `xml:"http://uniprot.org/uniprot dbReference"` Direction Direction `xml:"direction,attr"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes a physiological reaction.
type Plasmid ¶
type Plasmid struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type PositionType ¶
type PositionType struct { Position uint64 `xml:"position,attr,omitempty"` Status Status `xml:"status,attr,omitempty"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
func (*PositionType) UnmarshalXML ¶
func (t *PositionType) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error
type PropertyType ¶
type ProteinExistenceType ¶
type ProteinExistenceType struct {
Type Type `xml:"type,attr"`
}
Describes the evidence for the protein's existence. Equivalent to the flat file PE-line.
type ProteinType ¶
type ProteinType struct { RecommendedName RecommendedName `xml:"http://uniprot.org/uniprot recommendedName,omitempty"` AlternativeName []AlternativeName `xml:"http://uniprot.org/uniprot alternativeName,omitempty"` SubmittedName []SubmittedName `xml:"http://uniprot.org/uniprot submittedName,omitempty"` AllergenName EvidencedStringType `xml:"http://uniprot.org/uniprot allergenName,omitempty"` BiotechName EvidencedStringType `xml:"http://uniprot.org/uniprot biotechName,omitempty"` CdAntigenName []EvidencedStringType `xml:"http://uniprot.org/uniprot cdAntigenName,omitempty"` InnName []EvidencedStringType `xml:"http://uniprot.org/uniprot innName,omitempty"` Domain []Domain `xml:"http://uniprot.org/uniprot domain,omitempty"` Component []Component `xml:"http://uniprot.org/uniprot component,omitempty"` }
Describes the names for the protein and parts thereof. Equivalent to the flat file DE-line.
type ReactionType ¶
type ReactionType struct { Text string `xml:"http://uniprot.org/uniprot text"` DbReference []DbReferenceType `xml:"http://uniprot.org/uniprot dbReference"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes a chemical reaction.
type RecommendedName ¶
type RecommendedName struct { FullName EvidencedStringType `xml:"http://uniprot.org/uniprot fullName"` ShortName []EvidencedStringType `xml:"http://uniprot.org/uniprot shortName,omitempty"` EcNumber []EvidencedStringType `xml:"http://uniprot.org/uniprot ecNumber,omitempty"` }
type RedoxPotential ¶
type RedoxPotential struct {
Text []EvidencedStringType `xml:"http://uniprot.org/uniprot text"`
}
type ReferenceType ¶
type ReferenceType struct { Citation CitationType `xml:"http://uniprot.org/uniprot citation"` Scope []string `xml:"http://uniprot.org/uniprot scope"` Source SourceDataType `xml:"http://uniprot.org/uniprot source,omitempty"` Evidence IntListType `xml:"evidence,attr,omitempty"` Key string `xml:"key,attr"` }
Describes a citation and a summary of its content. Equivalent to the flat file RN-, RP-, RC-, RX-, RG-, RA-, RT- and RL-lines.
type SequenceType ¶
type SequenceType struct { Value string `xml:",chardata"` Length int `xml:"length,attr"` Mass int `xml:"mass,attr"` Checksum string `xml:"checksum,attr"` Modified time.Time `xml:"modified,attr"` Version int `xml:"version,attr"` Precursor bool `xml:"precursor,attr,omitempty"` Fragment Fragment `xml:"fragment,attr,omitempty"` }
func (*SequenceType) UnmarshalXML ¶
func (t *SequenceType) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error
type SourceDataType ¶
type SourceDataType struct { Strain Strain `xml:"http://uniprot.org/uniprot strain,omitempty"` Plasmid Plasmid `xml:"http://uniprot.org/uniprot plasmid,omitempty"` Transposon Transposon `xml:"http://uniprot.org/uniprot transposon,omitempty"` Tissue Tissue `xml:"http://uniprot.org/uniprot tissue,omitempty"` }
Describes the source of the sequence according to the citation. Equivalent to the flat file RC-line.
type SourceType ¶
type SourceType struct {
DbReference DbReferenceType `xml:"http://uniprot.org/uniprot dbReference,omitempty"`
}
Describes the source of the data using a database cross-reference (or a 'ref' attribute when the source cannot be found in a public data source, such as PubMed, and is cited only within the UniProtKB entry).
type StatusType ¶
type StatusType struct { Value string `xml:",chardata"` Status Status `xml:"status,attr,omitempty"` }
Indicates whether the name of a plasmid is known or unknown.
type Strain ¶
type Strain struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type SubcellularLocationType ¶
type SubcellularLocationType struct { Location []EvidencedStringType `xml:"http://uniprot.org/uniprot location"` Topology []EvidencedStringType `xml:"http://uniprot.org/uniprot topology,omitempty"` Orientation []EvidencedStringType `xml:"http://uniprot.org/uniprot orientation,omitempty"` }
Describes the subcellular location and optionally the topology and orientation of a molecule.
type SubmittedName ¶
type SubmittedName struct { FullName EvidencedStringType `xml:"http://uniprot.org/uniprot fullName"` EcNumber []EvidencedStringType `xml:"http://uniprot.org/uniprot ecNumber,omitempty"` }
type TemperatureDependence ¶
type TemperatureDependence struct {
Text []EvidencedStringType `xml:"http://uniprot.org/uniprot text"`
}
type Tissue ¶
type Tissue struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type Transposon ¶
type Transposon struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }