Documentation ¶
Overview ¶
Package elowl builds a bridge between OWL ontology file and the internal representation of EL++. It contains parser(s) for OWL files and builder(s) for generating EL++ formulae given a set of RDF / OWL triples.
Parsing Input ¶
Currently there is a parser for Turtle files (see https://www.w3.org/TR/turtle/). However we currently don't cover the whole grammar. For example we don't support language tags. For example the following is not supported:
<#spiderman> rel:enemyOf <#green-goblin> ; a foaf:Person ; foaf:name "Spiderman", "Человек-паук"@ru .
Also there is no support for escape sequences such as \n in strings. Only string data is supported at the moment, but the other types could be easily added.
There are several approaches to parse these files so there are interfaces for parsing instances (the parser interface requires a io.Reader). The default implementation uses a hand-written tokenizer, translates the sequence of tokens to an abstract syntax tree (AST) and from this point creates an abstract syntax tree. Tokenization, the transforming to an AST and the transformation from an AST to a set of triples are each defined in there own interfaces so it is easy to plug new approaches inside the current model.
However there are some things I'm not very happy with at the moment (though the perfomance seems ok):
* The tokenizer reads the whole file into memory before parsing the tokens. Turtle files are very clear because each line ends a command. So it should be ok to read the input line by line. However there are multiline texts (”' and """) which may span more than one line. It should not be too hard to add this speciality though. The tokenizer simply stores a list of regex elements and the first one that matches will be the next token. I think it a good idea to create an matcher interface that reads from the given io.Reader. Must of them will be simple regex expressions but for multiline strings we could use a combination of regexes and other methods. However there should be a method to get the next line from the input in the parser itself. This method should take care to either return the rest of an not completely parsed line or read the next line from the input.
* No concurrency. The tokenizer and AST builder don't make use of go routines right now which is not very nice. The converter AST --> triples however processes several statements in a concurrent way.
Index ¶
- Constants
- Variables
- func EqualsRDFObject(object, other RDFObjectType) bool
- func GetObjectString(o RDFObjectType) (string, error)
- func IsBlankNode(s string) bool
- type ASTArc
- type ASTBuilder
- type ASTConverter
- type ASTNode
- type ASTParser
- type ASTTypeInfo
- type DefaultASTBuilder
- type DefaultASTConverter
- type DefaultOWLBuilder
- type OWLBuilder
- type OneRegexTokenizer
- type OntologyLib
- type RDFObjectType
- type RDFTriple
- type RegexTokenizer
- type SynchTokenizer
- type TBoxConverter
- type TokenMatch
- type TripleMap
- type TripleQueryHandler
- type TurtleAST
- type TurtleNonterminal
- type TurtleParser
- type TurtleToken
- type TurtleTokenizer
Constants ¶
const OntologyDir = ".ontologies"
const RDFClass = "http://www.w3.org/2002/07/owl#Class"
const RDFComment = "http://www.w3.org/2000/01/rdf-schema#comment"
const RDFDatatypeProperty = "http://www.w3.org/2002/07/owl/#DatatypeProperty"
const RDFDomain = "http://www.w3.org/2000/01/rdf-schema#domain"
const RDFFirst = "http://www.w3.org/1999/02/22-rdf-syntax-ns#first"
const RDFInverseOf = "http://www.w3.org/2002/07/owl#inverseOf"
const RDFLabel = "http://www.w3.org/2000/01/rdf-schema#label"
const RDFList = "http://www.w3.org/1999/02/22-rdf-syntax-ns#List"
const RDFNil = "http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"
const RDFObjectProperty = "http://www.w3.org/2002/07/owl#ObjectProperty"
const RDFOnProperty = "http://www.w3.org/2002/07/owl#onProperty"
const RDFRange = "http://www.w3.org/2000/01/rdf-schema#range"
const RDFRest = "http://www.w3.org/1999/02/22-rdf-syntax-ns#rest"
const RDFRestriction = "http://www.w3.org/2002/07/owl#Restriction"
const RDFSomeValuesFrom = "http://www.w3.org/2002/07/owl#someValuesFrom"
const RDFSubPropertyOf = "http://www.w3.org/2000/01/rdf-schema#subPropertyOf"
const RDFSubclass = "http://www.w3.org/2000/01/rdf-schema#subClassOf"
const RDFTransitiveProperty = "http://www.w3.org/2002/07/owl/#TransitiveProperty"
const RDFType = "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
const RDFVersionInfo = "http://www.w3.org/2002/07/owl#versionInfo"
Variables ¶
var ErrNoToken = errors.New("Expected token, but non was found.")
A special error that is used to specifiy that we expected to read a token but no further token was found. This way you can check if there was an error because a rule didn't match or simply the stream was empty.
Functions ¶
func EqualsRDFObject ¶
func EqualsRDFObject(object, other RDFObjectType) bool
func GetObjectString ¶
func GetObjectString(o RDFObjectType) (string, error)
func IsBlankNode ¶
Types ¶
type ASTArc ¶
type ASTArc struct { ArcType *ASTTypeInfo Dest int }
func NewASTArc ¶
func NewASTArc(typeInfo *ASTTypeInfo, dest int) *ASTArc
type ASTBuilder ¶
type ASTBuilder interface {
BuildAST(t TurtleTokenizer, r io.Reader) (*TurtleAST, error)
}
type ASTConverter ¶
type ASTConverter interface {
Convert(ast *TurtleAST, defaultBase string, builder OWLBuilder) error
}
type ASTNode ¶
func NewASTNode ¶
type ASTParser ¶
type ASTParser struct {
// contains filtered or unexported fields
}
func DefaultTurtleParser ¶
func NewASTParser ¶
func NewASTParser(t TurtleTokenizer, astBuilder ASTBuilder, converter ASTConverter) *ASTParser
type ASTTypeInfo ¶
type ASTTypeInfo struct {
// contains filtered or unexported fields
}
func NewTypeInfoFromNameFromToken ¶
func NewTypeInfoFromNameFromToken(t TurtleToken) *ASTTypeInfo
func NewTypeInfoFromNonterm ¶
func NewTypeInfoFromNonterm(n TurtleNonterminal) *ASTTypeInfo
func (*ASTTypeInfo) GetNonterm ¶
func (info *ASTTypeInfo) GetNonterm() (TurtleNonterminal, bool)
func (*ASTTypeInfo) GetToken ¶
func (info *ASTTypeInfo) GetToken() (TurtleToken, bool)
func (*ASTTypeInfo) IsNonterm ¶
func (info *ASTTypeInfo) IsNonterm(n TurtleNonterminal) bool
func (*ASTTypeInfo) IsToken ¶
func (info *ASTTypeInfo) IsToken(t TurtleToken) bool
func (*ASTTypeInfo) String ¶
func (info *ASTTypeInfo) String() string
type DefaultASTBuilder ¶
type DefaultASTBuilder struct {
// contains filtered or unexported fields
}
func NewDefaultASTBuilder ¶
func NewDefaultASTBuilder() *DefaultASTBuilder
func (*DefaultASTBuilder) Bla ¶
func (builder *DefaultASTBuilder) Bla(r io.Reader)
func (*DefaultASTBuilder) BuildAST ¶
func (builder *DefaultASTBuilder) BuildAST(t TurtleTokenizer, r io.Reader) (*TurtleAST, error)
type DefaultASTConverter ¶
type DefaultASTConverter struct {
// contains filtered or unexported fields
}
func NewDefaultASTConverter ¶
func NewDefaultASTConverter(numStatementWorkers int) *DefaultASTConverter
func (*DefaultASTConverter) Convert ¶
func (converter *DefaultASTConverter) Convert(ast *TurtleAST, defaultBase string, builder OWLBuilder) error
type DefaultOWLBuilder ¶
func NewDefaultOWLBuilder ¶
func NewDefaultOWLBuilder() *DefaultOWLBuilder
func (*DefaultOWLBuilder) AnswerQuery ¶
func (handler *DefaultOWLBuilder) AnswerQuery(subject, predicate *string, object RDFObjectType, f func(t *RDFTriple) error) error
func (*DefaultOWLBuilder) GetBlankNode ¶
func (handler *DefaultOWLBuilder) GetBlankNode() string
func (*DefaultOWLBuilder) HandleTriple ¶
func (handler *DefaultOWLBuilder) HandleTriple(t *RDFTriple) error
type OWLBuilder ¶
type OneRegexTokenizer ¶
type OneRegexTokenizer struct {
// contains filtered or unexported fields
}
func NewOneRegexTokenizer ¶
func NewOneRegexTokenizer() *OneRegexTokenizer
func (*OneRegexTokenizer) Match ¶
func (t *OneRegexTokenizer) Match(str string)
func (*OneRegexTokenizer) NextToken ¶
func (t *OneRegexTokenizer) NextToken() (*TokenMatch, error)
type OntologyLib ¶
func NewOntologyLib ¶
func NewOntologyLib(db *sql.DB, baseDir string) *OntologyLib
func (*OntologyLib) InitDatabase ¶
func (lib *OntologyLib) InitDatabase(driver string) error
func (*OntologyLib) InitLocal ¶
func (lib *OntologyLib) InitLocal() error
func (*OntologyLib) RetrieveFromUrl ¶
func (lib *OntologyLib) RetrieveFromUrl(url *url.URL) ([]byte, error)
type RDFObjectType ¶
type RDFObjectType interface{}
type RDFTriple ¶
type RDFTriple struct {
Subject, Predicate string
Object RDFObjectType
}
func NewRDFTriple ¶
func NewRDFTriple(subject, predicate string, object RDFObjectType) *RDFTriple
type RegexTokenizer ¶
type RegexTokenizer struct {
// contains filtered or unexported fields
}
Tokenizes a reader by trying a list of regexes, the first that matches is the next token. Implements the tokenizer interface.
func NewRegexTokenizer ¶
func NewRegexTokenizer() *RegexTokenizer
func (*RegexTokenizer) NextToken ¶
func (t *RegexTokenizer) NextToken() (*TokenMatch, error)
type SynchTokenizer ¶
type SynchTokenizer struct {
*RegexTokenizer
}
func (*SynchTokenizer) NextToken ¶
func (t *SynchTokenizer) NextToken() (*TokenMatch, error)
type TBoxConverter ¶
type TBoxConverter struct {
Classes, BlankClasses []string
ClassID map[string]int
Relations []*elconc.BinaryObjectRelation
RelationNames []string
RelationID map[string]int
SubProperties []*elconc.SubProp
}
func NewTBoxConverter ¶
func NewTBoxConverter() *TBoxConverter
func (*TBoxConverter) ConvertToTBox ¶
func (converter *TBoxConverter) ConvertToTBox(handler TripleQueryHandler) error
type TokenMatch ¶
type TokenMatch struct { Token TurtleToken Seq string }
A match defines the toke type of the match and the string sequence with which it was matched.
func (*TokenMatch) CleanUp ¶
func (match *TokenMatch) CleanUp()
type TripleMap ¶
type TripleMap map[string]map[string][]RDFObjectType
func NewTripleMap ¶
func NewTripleMap() TripleMap
func (TripleMap) AddElement ¶
func (tm TripleMap) AddElement(key1, key2 string, value RDFObjectType)
type TripleQueryHandler ¶
type TripleQueryHandler interface { AnswerQuery(subject, predicate *string, object RDFObjectType, f func(t *RDFTriple) error) error }
type TurtleAST ¶
type TurtleAST struct { Nodes []*ASTNode Tokens []*TokenMatch }
func NewTurtleAST ¶
func NewTurtleAST(tokens []*TokenMatch) *TurtleAST
type TurtleNonterminal ¶
type TurtleNonterminal int
A type for the nonternimals as defined in the Turtle grammar. Nearly the same as in the formal specification, though we don't support everything just yet.
const ( TurtleDoc TurtleNonterminal = iota Statement NonterminalPrefixID NonterminalBase Directive Subject IRI BlankNode Collection PrefixedName Triples Object BlankNodePropertyList Literal String ObjectList PredicateObjectList Verb Predicate RDFLiteral )
func (TurtleNonterminal) String ¶
func (n TurtleNonterminal) String() string
Human readable version.
type TurtleParser ¶
type TurtleParser interface {
Parse(r io.Reader, defaultBase string, builder OWLBuilder) error
}
An interface for everything that parses triples from a given reader and adds those triples to the builder for further processing.
type TurtleToken ¶
type TurtleToken int
Defines a type for turtle grammar objects. The list is nearly as defined in the turtle documentation. Some small changes have been made, for example @base and @prefix are defined as tokens.
const ( ErrorToken TurtleToken = iota EOF WS Comment IRIRef PNameNS BlankNodeLabel StringLiteralQuote Annon PNameLN Point OpenBrace CloseBrace OpenBracket CloseBracket OpenCurlyBrace CloseCurlyBrace Comma Semicolon Averb PrefixDirective BaseDirective )
func TokenFromRegexCapture ¶
func TokenFromRegexCapture(str string) TurtleToken
func (TurtleToken) RegexCapture ¶
func (t TurtleToken) RegexCapture() string
func (TurtleToken) String ¶
func (t TurtleToken) String() string
A human readable version of the token name.
type TurtleTokenizer ¶
type TurtleTokenizer interface { Init(r io.Reader) error NextToken() (*TokenMatch, error) }
An interface for functions that return a sequence of tokens. Each time before you use a tokenizer you *must* call its Init method. The tokenizer should also be able to handle subsequent calls with different readers, i.e. you can reuse it for tokenizing another file. However before you use NextToken you must always call Init. After that the tokenizer returns the next Token by calling NextToken. Note one important thing about the NextToken method: An error != nil should only be returned if there was an error while reading the input file. If a syntax error occurred you return error = nil but the token ErrorToken.