Overview ¶
Package uniprot provides an XML parser for Uniprot data dumps.
Uniprot is comprehensive, high-quality and freely accessible resource of protein sequence and functional information. It is the best(1) protein database out there.
Uniprot database dumps are available as gzipped FASTA files or gzipped XML files. The XML files have significantly more information than the FASTA files, and this parser specifically works on the gzipped XML files from Uniprot.
Uniprot provides an XML schema of their data dumps(3), which is useful for autogeneration of Golang structs. xsdgen was used to automatically generate xml.go from uniprot.xsd.
Each protein in Uniprot is known as an "Entry" (as defined in xml.go).
The function Parse stream-reads Uniprot into an Entry channel, from which you can use the entries however you want. Read simplifies reading gzipped files from a disk into an Entry channel.
Example (Basic) ¶
This example shows how to open a uniprot data dump file and read the results into a list. Directly using the channel without converting to an array should be used for the Trembl data dump
package main import ( "fmt" "" ) func main() { entries, _, _ := uniprot.Read("data/uniprot_sprot_mini.xml.gz") var entry uniprot.Entry for singleEntry := range entries { entry = singleEntry } fmt.Println(entry.Accession[0]) }
Output: O55723
Index ¶
- func Parse(decoder Decoder, entries chan<- Entry, errors chan<- error)
- func Read(path string) (chan Entry, chan error, error)
- type Absorption
- type AlternativeName
- type Anon6
- type CitationType
- type CofactorType
- type CommentType
- type Component
- type Conflict
- type ConsortiumType
- type Dataset
- type Date
- type DbReferenceType
- type Decoder
- type Direction
- type Disease
- type Domain
- type Entry
- type EventType
- type EvidenceType
- type EvidencedStringType
- type FeatureType
- type Fragment
- type GeneLocationType
- type GeneNameType
- type GeneType
- type ImportedFromType
- type IntListType
- type InteractantType
- type IsoformType
- type KeywordType
- type Kinetics
- type Lineage
- type Link
- type LocationType
- type Name
- type NameListType
- type OrganismNameType
- type OrganismType
- type PersonType
- type PhDependence
- type PhysiologicalReactionType
- type Plasmid
- type PositionType
- type PropertyType
- type ProteinExistenceType
- type ProteinType
- type ReactionType
- type RecommendedName
- type RedoxPotential
- type ReferenceType
- type Resource
- type Sequence
- type SequenceType
- type SourceDataType
- type SourceType
- type Status
- type StatusType
- type Strain
- type SubcellularLocationType
- type SubmittedName
- type TemperatureDependence
- type Tissue
- type Transposon
- type Type
- type Uniprot
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Parse ¶
Parse parses Uniprot entries into a channel.
Example ¶
xmlFile, _ := os.Open("data/uniprot_sprot_mini.xml.gz") unzippedBytes, _ := gzip.NewReader(xmlFile) entries := make(chan Entry, 100) // if you don't have a buffered channel, nothing will be read in loops on the channel. decoderErrors := make(chan error, 100) decoder := xml.NewDecoder(unzippedBytes) go Parse(decoder, entries, decoderErrors) var entry Entry for singleEntry := range entries { entry = singleEntry } fmt.Println(entry.Accession[0])
Output: O55723
func Read ¶
Read reads a gzipped Uniprot XML dump. Failing to open the XML dump gives a single error, while errors encountered while decoding the XML dump are added to the errors channel.
Example ¶
entries, _, _ := Read("data/uniprot_sprot_mini.xml.gz") var entry Entry for singleEntry := range entries { entry = singleEntry } fmt.Println(entry.Accession[0])
Output: O55723
Types ¶
type Absorption ¶
type Absorption struct { Max EvidencedStringType `xml:" max,omitempty"` Text []EvidencedStringType `xml:" text,omitempty"` }
type AlternativeName ¶
type AlternativeName struct { FullName EvidencedStringType `xml:" fullName,omitempty"` ShortName []EvidencedStringType `xml:" shortName,omitempty"` EcNumber []EvidencedStringType `xml:" ecNumber,omitempty"` }
type CitationType ¶
type CitationType struct { Title string `xml:" title,omitempty"` EditorList NameListType `xml:" editorList,omitempty"` AuthorList NameListType `xml:" authorList,omitempty"` Locator string `xml:" locator,omitempty"` DbReference []DbReferenceType `xml:" dbReference,omitempty"` Type Type `xml:"type,attr"` Date Date `xml:"date,attr,omitempty"` Name string `xml:"name,attr,omitempty"` Volume string `xml:"volume,attr,omitempty"` First string `xml:"first,attr,omitempty"` Last string `xml:"last,attr,omitempty"` Publisher string `xml:"publisher,attr,omitempty"` City string `xml:"city,attr,omitempty"` Db string `xml:"db,attr,omitempty"` Number string `xml:"number,attr,omitempty"` Institute string `xml:"institute,attr,omitempty"` Country string `xml:"country,attr,omitempty"` }
Describes different types of citations. Equivalent to the flat file RX-, RG-, RA-, RT- and RL-lines.
type CofactorType ¶
type CofactorType struct { Name string `xml:" name"` DbReference DbReferenceType `xml:" dbReference"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes a cofactor.
type CommentType ¶
type CommentType struct { Molecule string `xml:" molecule,omitempty"` Absorption Absorption `xml:" absorption,omitempty"` Kinetics Kinetics `xml:" kinetics,omitempty"` PhDependence PhDependence `xml:" phDependence,omitempty"` RedoxPotential RedoxPotential `xml:" redoxPotential,omitempty"` TemperatureDependence TemperatureDependence `xml:" temperatureDependence,omitempty"` Reaction ReactionType `xml:" reaction,omitempty"` PhysiologicalReaction []PhysiologicalReactionType `xml:" physiologicalReaction,omitempty"` Cofactor []CofactorType `xml:" cofactor,omitempty"` SubcellularLocation []SubcellularLocationType `xml:" subcellularLocation,omitempty"` Conflict Conflict `xml:" conflict,omitempty"` Link []Link `xml:" link,omitempty"` Event []EventType `xml:" event,omitempty"` Isoform []IsoformType `xml:" isoform,omitempty"` Interactant []InteractantType `xml:" interactant,omitempty"` OrganismsDiffer bool `xml:" organismsDiffer,omitempty"` Experiments int `xml:" experiments,omitempty"` Disease Disease `xml:" disease,omitempty"` Location []LocationType `xml:" location,omitempty"` Text []EvidencedStringType `xml:" text,omitempty"` Type Type `xml:"type,attr"` LocationType string `xml:"locationType,attr,omitempty"` Name string `xml:"name,attr,omitempty"` Mass float32 `xml:"mass,attr,omitempty"` Error string `xml:"error,attr,omitempty"` Method string `xml:"method,attr,omitempty"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes different types of general annotations. Equivalent to the flat file CC-line.
func (*CommentType) UnmarshalXML ¶
func (t *CommentType) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error
type Component ¶
type Component struct { RecommendedName RecommendedName `xml:" recommendedName,omitempty"` AlternativeName []AlternativeName `xml:" alternativeName,omitempty"` SubmittedName []SubmittedName `xml:" submittedName,omitempty"` AllergenName EvidencedStringType `xml:" allergenName,omitempty"` BiotechName EvidencedStringType `xml:" biotechName,omitempty"` CdAntigenName []EvidencedStringType `xml:" cdAntigenName,omitempty"` InnName []EvidencedStringType `xml:" innName,omitempty"` }
type ConsortiumType ¶
type ConsortiumType struct {
Name string `xml:"name,attr"`
Describes the authors of a citation when these are represented by a consortium. Equivalent to the flat file RG-line.
type DbReferenceType ¶
type DbReferenceType struct { Molecule string `xml:" molecule,omitempty"` Property []PropertyType `xml:" property,omitempty"` Type string `xml:"type,attr"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes a database cross-reference. Equivalent to the flat file DR-line.
type Decoder ¶
type Decoder interface { DecodeElement(v interface{}, start *xml.StartElement) error Token() (xml.Token, error) }
Decoder decodes XML elements2
type Disease ¶
type Disease struct { Name string `xml:" name"` Acronym string `xml:" acronym"` Description string `xml:" description"` DbReference DbReferenceType `xml:" dbReference"` }
type Domain ¶
type Domain struct { RecommendedName RecommendedName `xml:" recommendedName,omitempty"` AlternativeName []AlternativeName `xml:" alternativeName,omitempty"` SubmittedName []SubmittedName `xml:" submittedName,omitempty"` AllergenName EvidencedStringType `xml:" allergenName,omitempty"` BiotechName EvidencedStringType `xml:" biotechName,omitempty"` CdAntigenName []EvidencedStringType `xml:" cdAntigenName,omitempty"` InnName []EvidencedStringType `xml:" innName,omitempty"` }
type Entry ¶
type Entry struct { Accession []string `xml:" accession"` Name []string `xml:" name"` Protein ProteinType `xml:" protein"` Gene []GeneType `xml:" gene,omitempty"` Organism OrganismType `xml:" organism"` OrganismHost []OrganismType `xml:" organismHost,omitempty"` GeneLocation []GeneLocationType `xml:" geneLocation,omitempty"` Reference []ReferenceType `xml:" reference"` Comment []CommentType `xml:" comment,omitempty"` DbReference []DbReferenceType `xml:" dbReference,omitempty"` ProteinExistence ProteinExistenceType `xml:" proteinExistence"` Keyword []KeywordType `xml:" keyword,omitempty"` Feature []FeatureType `xml:" feature,omitempty"` Evidence []EvidenceType `xml:" evidence,omitempty"` Sequence SequenceType `xml:" sequence"` Dataset Dataset `xml:"dataset,attr"` Created time.Time `xml:"created,attr"` Modified time.Time `xml:"modified,attr"` Version int `xml:"version,attr"` }
func (*Entry) UnmarshalXML ¶
type EventType ¶
type EventType struct {
Type Type `xml:"type,attr"`
Describes the type of events that cause alternative products.
type EvidenceType ¶
type EvidenceType struct { Source SourceType `xml:" source,omitempty"` ImportedFrom ImportedFromType `xml:" importedFrom,omitempty"` Type string `xml:"type,attr"` Key int `xml:"key,attr"` }
Describes the evidence for an annotation. No flat file equivalent.
type EvidencedStringType ¶
type EvidencedStringType struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type FeatureType ¶
type FeatureType struct { Original string `xml:" original,omitempty"` Variation []string `xml:" variation,omitempty"` Location LocationType `xml:" location"` Type Type `xml:"type,attr"` Description string `xml:"description,attr,omitempty"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes different types of sequence annotations. Equivalent to the flat file FT-line.
type GeneLocationType ¶
type GeneLocationType struct { Name []StatusType `xml:" name,omitempty"` Type Type `xml:"type,attr"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes non-nuclear gene locations (organelles and plasmids). Equivalent to the flat file OG-line.
type GeneNameType ¶
type GeneNameType struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` Type Type `xml:"type,attr"` }
Describes different types of gene designations. Equivalent to the flat file GN-line.
type GeneType ¶
type GeneType struct {
Name []GeneNameType `xml:" name"`
Describes a gene. Equivalent to the flat file GN-line.
type ImportedFromType ¶
type ImportedFromType struct {
DbReference DbReferenceType `xml:" dbReference"`
Describes the source of the evidence, when it is not assigned by UniProt, but imported from an external database.
type IntListType ¶
type IntListType []int
func (*IntListType) UnmarshalText ¶
func (x *IntListType) UnmarshalText(text []byte) error
type InteractantType ¶
type InteractantType struct { Id string `xml:" id"` Label string `xml:" label,omitempty"` DbReference DbReferenceType `xml:" dbReference,omitempty"` IntactId string `xml:"intactId,attr"` }
type IsoformType ¶
type IsoformType struct { Id []string `xml:" id"` Name []Name `xml:" name"` Sequence Anon6 `xml:" sequence"` Text []EvidencedStringType `xml:" text,omitempty"` }
Describes isoforms in 'alternative products' annotations.
type KeywordType ¶
type KeywordType struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type Kinetics ¶
type Kinetics struct { KM []EvidencedStringType `xml:" KM,omitempty"` Vmax []EvidencedStringType `xml:" Vmax,omitempty"` Text []EvidencedStringType `xml:" text,omitempty"` }
type LocationType ¶
type LocationType struct { Begin PositionType `xml:" begin,omitempty"` End PositionType `xml:" end,omitempty"` Position PositionType `xml:" position,omitempty"` Sequence string `xml:"sequence,attr,omitempty"` }
Describes a sequence location as either a range with a begin and end or as a position. The 'sequence' attribute is only used when the location is not on the canonical sequence displayed in the current entry.
type Name ¶
type Name struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type NameListType ¶
type NameListType struct { Consortium ConsortiumType `xml:" consortium,omitempty"` Person PersonType `xml:" person,omitempty"` }
type OrganismNameType ¶
Describes different types of source organism names.
type OrganismType ¶
type OrganismType struct { Name []OrganismNameType `xml:" name"` DbReference []DbReferenceType `xml:" dbReference"` Lineage Lineage `xml:" lineage,omitempty"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes the source organism.
type PersonType ¶
type PersonType struct {
Name string `xml:"name,attr"`
type PhDependence ¶
type PhDependence struct {
Text []EvidencedStringType `xml:" text"`
type PhysiologicalReactionType ¶
type PhysiologicalReactionType struct { DbReference DbReferenceType `xml:" dbReference"` Direction Direction `xml:"direction,attr"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes a physiological reaction.
type Plasmid ¶
type Plasmid struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type PositionType ¶
type PositionType struct { Position uint64 `xml:"position,attr,omitempty"` Status Status `xml:"status,attr,omitempty"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
func (*PositionType) UnmarshalXML ¶
func (t *PositionType) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error
type PropertyType ¶
type ProteinExistenceType ¶
type ProteinExistenceType struct {
Type Type `xml:"type,attr"`
Describes the evidence for the protein's existence. Equivalent to the flat file PE-line.
type ProteinType ¶
type ProteinType struct { RecommendedName RecommendedName `xml:" recommendedName,omitempty"` AlternativeName []AlternativeName `xml:" alternativeName,omitempty"` SubmittedName []SubmittedName `xml:" submittedName,omitempty"` AllergenName EvidencedStringType `xml:" allergenName,omitempty"` BiotechName EvidencedStringType `xml:" biotechName,omitempty"` CdAntigenName []EvidencedStringType `xml:" cdAntigenName,omitempty"` InnName []EvidencedStringType `xml:" innName,omitempty"` Domain []Domain `xml:" domain,omitempty"` Component []Component `xml:" component,omitempty"` }
Describes the names for the protein and parts thereof. Equivalent to the flat file DE-line.
type ReactionType ¶
type ReactionType struct { Text string `xml:" text"` DbReference []DbReferenceType `xml:" dbReference"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
Describes a chemical reaction.
type RecommendedName ¶
type RecommendedName struct { FullName EvidencedStringType `xml:" fullName"` ShortName []EvidencedStringType `xml:" shortName,omitempty"` EcNumber []EvidencedStringType `xml:" ecNumber,omitempty"` }
type RedoxPotential ¶
type RedoxPotential struct {
Text []EvidencedStringType `xml:" text"`
type ReferenceType ¶
type ReferenceType struct { Citation CitationType `xml:" citation"` Scope []string `xml:" scope"` Source SourceDataType `xml:" source,omitempty"` Evidence IntListType `xml:"evidence,attr,omitempty"` Key string `xml:"key,attr"` }
Describes a citation and a summary of its content. Equivalent to the flat file RN-, RP-, RC-, RX-, RG-, RA-, RT- and RL-lines.
type SequenceType ¶
type SequenceType struct { Value string `xml:",chardata"` Length int `xml:"length,attr"` Mass int `xml:"mass,attr"` Checksum string `xml:"checksum,attr"` Modified time.Time `xml:"modified,attr"` Version int `xml:"version,attr"` Precursor bool `xml:"precursor,attr,omitempty"` Fragment Fragment `xml:"fragment,attr,omitempty"` }
func (*SequenceType) UnmarshalXML ¶
func (t *SequenceType) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error
type SourceDataType ¶
type SourceDataType struct { Strain Strain `xml:" strain,omitempty"` Plasmid Plasmid `xml:" plasmid,omitempty"` Transposon Transposon `xml:" transposon,omitempty"` Tissue Tissue `xml:" tissue,omitempty"` }
Describes the source of the sequence according to the citation. Equivalent to the flat file RC-line.
type SourceType ¶
type SourceType struct {
DbReference DbReferenceType `xml:" dbReference,omitempty"`
Describes the source of the data using a database cross-reference (or a 'ref' attribute when the source cannot be found in a public data source, such as PubMed, and is cited only within the UniProtKB entry).
type StatusType ¶
type StatusType struct { Value string `xml:",chardata"` Status Status `xml:"status,attr,omitempty"` }
Indicates whether the name of a plasmid is known or unknown.
type Strain ¶
type Strain struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type SubcellularLocationType ¶
type SubcellularLocationType struct { Location []EvidencedStringType `xml:" location"` Topology []EvidencedStringType `xml:" topology,omitempty"` Orientation []EvidencedStringType `xml:" orientation,omitempty"` }
Describes the subcellular location and optionally the topology and orientation of a molecule.
type SubmittedName ¶
type SubmittedName struct { FullName EvidencedStringType `xml:" fullName"` EcNumber []EvidencedStringType `xml:" ecNumber,omitempty"` }
type TemperatureDependence ¶
type TemperatureDependence struct {
Text []EvidencedStringType `xml:" text"`
type Tissue ¶
type Tissue struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }
type Transposon ¶
type Transposon struct { Value string `xml:",chardata"` Evidence IntListType `xml:"evidence,attr,omitempty"` }