Documentation ¶
Overview ¶
Package filter implements flexible ISIL attachments with expression trees[1], serialized as JSON. The top-level key is the label, that is to be given to a record. Here, this label is an ISIL. Each ISIL can specify a tree of filters. Intermediate nodes can be "or", "and" or "not" filters, leaf nodes contain filters, that are matched against records (like "collection", "source" or "issn").
A filter needs to implement is Apply. If the filter takes configuration options, it needs to implement UnmarshalJSON as well. Each filter can define arbitrary options, for example a HoldingsFilter can load KBART data from a single file or a list of urls.
[1] https://en.wikipedia.org/wiki/Binary_expression_tree#Boolean_expressions
The simplest filter is one, that says *yes* to all records:
{"DE-X": {"any": {}}}
On the command line:
$ span-tag -c '{"DE-X": {"any": {}}}' < input.ldj > output.ldj
Another slightly more complex example: Here, the ISIL "DE-14" is attached to a record, if the following conditions are met: There are two alternatives, each consisting of a conjuntion. The first says: IF "the record is from source id 55" AND IF "the record can be validated against one of the holding files given by their url", THEN "attach DE-14". The second says: IF "the record is from source id 49" AND "it validates against any one of the holding files given by their urls" AND "the record belongs to any one of the given collections", THEN "attach DE-14".
{ "DE-14": { "or": [ { "and": [ { "source": [ "55" ] }, { "holdings": { "urls": [ "http://www.jstor.org/kbart/collections/asii", "http://www.jstor.org/kbart/collections/as" ] } } ] }, { "and": [ { "source": [ "49" ] }, { "holdings": { "urls": [ "https://example.com/KBART_DE14", "https://example.com/KBART_FREEJOURNALS" ] } }, { "collection": [ "Turkish Family Physicans Association (CrossRef)", "Helminthological Society (CrossRef)", "International Association of Physical Chemists (IAPC) (CrossRef)", "The Society for Antibacterial and Antifungal Agents, Japan (CrossRef)", "Fundacao CECIERJ (CrossRef)" ] } ] } ] } }
If is relatively easy to add a new filter. Imagine we want to build a filter that only allows records that have the word "awesome" in their title.
We first define a new type:
type AwesomeFilter struct{}
We then implement the Apply method:
func (f *AwesomeFilter) Apply(is finc.IntermediateSchema) bool { return strings.Contains(strings.ToLower(is.ArticleTitle), "awesome") }
That is all. We need to register the filter, so we can use it in the configuration file. The "unmarshalFilter" (filter.go) method acts as a dispatcher:
func unmarshalFilter(name string, raw json.RawMessage) (Filter, error) { switch name { // Add more filters here. case "any": return &AnyFilter{}, nil case "doi": ... // Register awesome filter. No configuration options, so no need to unmarshal. case "awesome": return &AwesomeFilter{}, nil ...
We can then use the filter in the JSON configuration:
{"DE-X": {"awesome": {}}}
Further readings: http://theory.stanford.edu/~sergei/papers/sigmod10-index.pdf
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AndFilter ¶ added in v0.1.75
type AndFilter struct {
// contains filtered or unexported fields
}
AndFilter returns true, only if all filters return true.
func (*AndFilter) Apply ¶ added in v0.1.75
func (f *AndFilter) Apply(is finc.IntermediateSchema) bool
Apply returns false if any of the filters returns false. Short circuited.
func (*AndFilter) UnmarshalJSON ¶ added in v0.1.75
UnmarshalJSON turns a config fragment into an or filter.
type AnyFilter ¶ added in v0.1.75
type AnyFilter struct {
Any struct{} `json:"any"`
}
AnyFilter validates any record.
type CollectionFilter ¶
type CollectionFilter struct {
// contains filtered or unexported fields
}
CollectionFilter returns true, if the record belongs to any one of the collections.
func (*CollectionFilter) Apply ¶
func (f *CollectionFilter) Apply(is finc.IntermediateSchema) bool
Apply filter.
func (*CollectionFilter) UnmarshalJSON ¶ added in v0.1.75
func (f *CollectionFilter) UnmarshalJSON(p []byte) error
UnmarshalJSON turns a config fragment into a ISSN filter.
type DOIFilter ¶
type DOIFilter struct {
// contains filtered or unexported fields
}
DOIFilter allows records with a given DOI. Can be used in conjuction with "not" to create blacklists.
func (*DOIFilter) Apply ¶
func (f *DOIFilter) Apply(is finc.IntermediateSchema) bool
Apply applies the filter.
func (*DOIFilter) UnmarshalJSON ¶ added in v0.1.75
UnmarshalJSON turns a config fragment into a filter.
type Filter ¶
type Filter interface {
Apply(finc.IntermediateSchema) bool
}
Filter returns go or no for a given record.
type HoldingsFilter ¶ added in v0.1.75
type HoldingsFilter struct {
// contains filtered or unexported fields
}
HoldingsFilter uses the new licensing package.
func (*HoldingsFilter) Apply ¶ added in v0.1.75
func (f *HoldingsFilter) Apply(is finc.IntermediateSchema) bool
Apply returns true, if there is a valid holding for a given record. This will take multiple attibutes like date, volume, issue and embargo into account. This function is very specific: it works only with intermediate format and it uses specific information from that format to decide on attachment.
func (*HoldingsFilter) UnmarshalJSON ¶ added in v0.1.75
func (f *HoldingsFilter) UnmarshalJSON(p []byte) error
UnmarshalJSON deserializes this filter.
type ISSNFilter ¶ added in v0.1.75
type ISSNFilter struct {
// contains filtered or unexported fields
}
ISSNFilter allows records with a certain ISSN.
func (*ISSNFilter) Apply ¶ added in v0.1.75
func (f *ISSNFilter) Apply(is finc.IntermediateSchema) bool
Apply applies ISSN filter on intermediate schema, no distinction between ISSN and EISSN.
func (*ISSNFilter) UnmarshalJSON ¶ added in v0.1.75
func (f *ISSNFilter) UnmarshalJSON(p []byte) error
UnmarshalJSON turns a config fragment into a filter.
type NotFilter ¶ added in v0.1.75
type NotFilter struct {
// contains filtered or unexported fields
}
NotFilter inverts another filter.
func (*NotFilter) Apply ¶ added in v0.1.75
func (f *NotFilter) Apply(is finc.IntermediateSchema) bool
Apply inverts another filter.
func (*NotFilter) UnmarshalJSON ¶ added in v0.1.75
UnmarshalJSON turns a config fragment into a not filter.
type OrFilter ¶ added in v0.1.75
type OrFilter struct {
// contains filtered or unexported fields
}
OrFilter returns true, if at least one filter matches.
func (*OrFilter) Apply ¶ added in v0.1.75
func (f *OrFilter) Apply(is finc.IntermediateSchema) bool
Apply returns true, if any of the filters returns true. Short circuited.
func (*OrFilter) UnmarshalJSON ¶ added in v0.1.75
UnmarshalJSON turns a config fragment into a or filter.
type PackageFilter ¶ added in v0.1.59
type PackageFilter struct {
// contains filtered or unexported fields
}
PackageFilter allows all records of one of the given package name.
func (*PackageFilter) Apply ¶ added in v0.1.59
func (f *PackageFilter) Apply(is finc.IntermediateSchema) bool
Apply filters packages.
func (*PackageFilter) UnmarshalJSON ¶ added in v0.1.75
func (f *PackageFilter) UnmarshalJSON(p []byte) error
UnmarshalJSON turns a config fragment into a filter.
type SourceFilter ¶
type SourceFilter struct {
// contains filtered or unexported fields
}
SourceFilter allows all records with the given source id or ids.
func (*SourceFilter) Apply ¶
func (f *SourceFilter) Apply(is finc.IntermediateSchema) bool
Apply filter.
func (*SourceFilter) UnmarshalJSON ¶ added in v0.1.75
func (f *SourceFilter) UnmarshalJSON(p []byte) error
UnmarshalJSON turns a config fragment into a filter.
type Tagger ¶ added in v0.1.75
type Tagger struct {
// contains filtered or unexported fields
}
Tagger is takes a list of tags (ISILs) and annotates and intermediate schema according to a number of filters, defined per label. The tagger can be loaded directly from JSON.
func (*Tagger) Tag ¶ added in v0.1.75
func (t *Tagger) Tag(is finc.IntermediateSchema) finc.IntermediateSchema
Tag takes an intermediate schema record and returns a labeled version of that record.
func (*Tagger) UnmarshalJSON ¶ added in v0.1.75
UnmarshalJSON unmarshals a complete filter config from serialized JSON.
type Tree ¶ added in v0.1.130
type Tree struct {
// contains filtered or unexported fields
}
Tree allows polymorphic filters.
func (*Tree) Apply ¶ added in v0.1.130
func (f *Tree) Apply(is finc.IntermediateSchema) bool
Apply applies the root filter.
func (*Tree) UnmarshalJSON ¶ added in v0.1.130
UnmarshalJSON gathers the top level filter name and unmarshals the associated filter.