filter

package
v0.1.157 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 12, 2017 License: GPL-3.0 Imports: 11 Imported by: 0

Documentation

Overview

Package filter implements flexible ISIL attachments with expression trees[1], serialized as JSON. The top-level key is the label, that is to be given to a record. Here, this label is an ISIL. Each ISIL can specify a tree of filters. Intermediate nodes can be "or", "and" or "not" filters, leaf nodes contain filters, that are matched against records (like "collection", "source" or "issn").

A filter needs to implement is Apply. If the filter takes configuration options, it needs to implement UnmarshalJSON as well. Each filter can define arbitrary options, for example a HoldingsFilter can load KBART data from a single file or a list of urls.

[1] https://en.wikipedia.org/wiki/Binary_expression_tree#Boolean_expressions

The simplest filter is one, that says *yes* to all records:

{"DE-X": {"any": {}}}

On the command line:

$ span-tag -c '{"DE-X": {"any": {}}}' < input.ldj > output.ldj

Another slightly more complex example: Here, the ISIL "DE-14" is attached to a record, if the following conditions are met: There are two alternatives, each consisting of a conjuntion. The first says: IF "the record is from source id 55" AND IF "the record can be validated against one of the holding files given by their url", THEN "attach DE-14". The second says: IF "the record is from source id 49" AND "it validates against any one of the holding files given by their urls" AND "the record belongs to any one of the given collections", THEN "attach DE-14".

{
  "DE-14": {
    "or": [
      {
        "and": [
          {
            "source": [
              "55"
            ]
          },
          {
            "holdings": {
              "urls": [
                "http://www.jstor.org/kbart/collections/asii",
                "http://www.jstor.org/kbart/collections/as"
              ]
            }
          }
        ]
      },
      {
        "and": [
          {
            "source": [
              "49"
            ]
          },
          {
            "holdings": {
              "urls": [
                "https://example.com/KBART_DE14",
                "https://example.com/KBART_FREEJOURNALS"
              ]
            }
          },
          {
            "collection": [
              "Turkish Family Physicans Association (CrossRef)",
              "Helminthological Society (CrossRef)",
              "International Association of Physical Chemists (IAPC) (CrossRef)",
              "The Society for Antibacterial and Antifungal Agents, Japan (CrossRef)",
              "Fundacao CECIERJ (CrossRef)"
            ]
          }
        ]
      }
    ]
  }
}

If is relatively easy to add a new filter. Imagine we want to build a filter that only allows records that have the word "awesome" in their title.

We first define a new type:

type AwesomeFilter struct{}

We then implement the Apply method:

func (f *AwesomeFilter) Apply(is finc.IntermediateSchema) bool {
    return strings.Contains(strings.ToLower(is.ArticleTitle), "awesome")
}

That is all. We need to register the filter, so we can use it in the configuration file. The "unmarshalFilter" (filter.go) method acts as a dispatcher:

func unmarshalFilter(name string, raw json.RawMessage) (Filter, error) {
    switch name {
    // Add more filters here.
    case "any":
        return &AnyFilter{}, nil
    case "doi":
        ...

    // Register awesome filter. No configuration options, so no need to unmarshal.
    case "awesome":
        return &AwesomeFilter{}, nil

    ...

We can then use the filter in the JSON configuration:

{"DE-X": {"awesome": {}}}

Further readings: http://theory.stanford.edu/~sergei/papers/sigmod10-index.pdf

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AndFilter added in v0.1.75

type AndFilter struct {
	// contains filtered or unexported fields
}

AndFilter returns true, only if all filters return true.

func (*AndFilter) Apply added in v0.1.75

func (f *AndFilter) Apply(is finc.IntermediateSchema) bool

Apply returns false if any of the filters returns false. Short circuited.

func (*AndFilter) UnmarshalJSON added in v0.1.75

func (f *AndFilter) UnmarshalJSON(p []byte) (err error)

UnmarshalJSON turns a config fragment into an or filter.

type AnyFilter added in v0.1.75

type AnyFilter struct {
	Any struct{} `json:"any"`
}

AnyFilter validates any record.

func (*AnyFilter) Apply added in v0.1.75

Apply will just return true.

type CollectionFilter

type CollectionFilter struct {
	// contains filtered or unexported fields
}

CollectionFilter returns true, if the record belongs to any one of the collections.

func (*CollectionFilter) Apply

Apply filter.

func (*CollectionFilter) UnmarshalJSON added in v0.1.75

func (f *CollectionFilter) UnmarshalJSON(p []byte) error

UnmarshalJSON turns a config fragment into a ISSN filter.

type DOIFilter

type DOIFilter struct {
	// contains filtered or unexported fields
}

DOIFilter allows records with a given DOI. Can be used in conjuction with "not" to create blacklists.

func (*DOIFilter) Apply

func (f *DOIFilter) Apply(is finc.IntermediateSchema) bool

Apply applies the filter.

func (*DOIFilter) UnmarshalJSON added in v0.1.75

func (f *DOIFilter) UnmarshalJSON(p []byte) error

UnmarshalJSON turns a config fragment into a filter.

type Filter

type Filter interface {
	Apply(finc.IntermediateSchema) bool
}

Filter returns go or no for a given record.

type HoldingsFilter added in v0.1.75

type HoldingsFilter struct {
	// contains filtered or unexported fields
}

HoldingsFilter uses the new licensing package.

func (*HoldingsFilter) Apply added in v0.1.75

Apply returns true, if there is a valid holding for a given record. This will take multiple attibutes like date, volume, issue and embargo into account. This function is very specific: it works only with intermediate format and it uses specific information from that format to decide on attachment.

func (*HoldingsFilter) UnmarshalJSON added in v0.1.75

func (f *HoldingsFilter) UnmarshalJSON(p []byte) error

UnmarshalJSON deserializes this filter.

type ISSNFilter added in v0.1.75

type ISSNFilter struct {
	// contains filtered or unexported fields
}

ISSNFilter allows records with a certain ISSN.

func (*ISSNFilter) Apply added in v0.1.75

func (f *ISSNFilter) Apply(is finc.IntermediateSchema) bool

Apply applies ISSN filter on intermediate schema, no distinction between ISSN and EISSN.

func (*ISSNFilter) UnmarshalJSON added in v0.1.75

func (f *ISSNFilter) UnmarshalJSON(p []byte) error

UnmarshalJSON turns a config fragment into a filter.

type NotFilter added in v0.1.75

type NotFilter struct {
	// contains filtered or unexported fields
}

NotFilter inverts another filter.

func (*NotFilter) Apply added in v0.1.75

func (f *NotFilter) Apply(is finc.IntermediateSchema) bool

Apply inverts another filter.

func (*NotFilter) UnmarshalJSON added in v0.1.75

func (f *NotFilter) UnmarshalJSON(p []byte) (err error)

UnmarshalJSON turns a config fragment into a not filter.

type OrFilter added in v0.1.75

type OrFilter struct {
	// contains filtered or unexported fields
}

OrFilter returns true, if at least one filter matches.

func (*OrFilter) Apply added in v0.1.75

func (f *OrFilter) Apply(is finc.IntermediateSchema) bool

Apply returns true, if any of the filters returns true. Short circuited.

func (*OrFilter) UnmarshalJSON added in v0.1.75

func (f *OrFilter) UnmarshalJSON(p []byte) (err error)

UnmarshalJSON turns a config fragment into a or filter.

type PackageFilter added in v0.1.59

type PackageFilter struct {
	// contains filtered or unexported fields
}

PackageFilter allows all records of one of the given package name.

func (*PackageFilter) Apply added in v0.1.59

Apply filters packages.

func (*PackageFilter) UnmarshalJSON added in v0.1.75

func (f *PackageFilter) UnmarshalJSON(p []byte) error

UnmarshalJSON turns a config fragment into a filter.

type SourceFilter

type SourceFilter struct {
	// contains filtered or unexported fields
}

SourceFilter allows all records with the given source id or ids.

func (*SourceFilter) Apply

Apply filter.

func (*SourceFilter) UnmarshalJSON added in v0.1.75

func (f *SourceFilter) UnmarshalJSON(p []byte) error

UnmarshalJSON turns a config fragment into a filter.

type SubjectFilter added in v0.1.130

type SubjectFilter struct {
	// contains filtered or unexported fields
}

SubjectFilter returns true, if the record has an exact string match to one of the given subjects.

func (*SubjectFilter) Apply added in v0.1.130

Apply filter.

func (*SubjectFilter) UnmarshalJSON added in v0.1.130

func (f *SubjectFilter) UnmarshalJSON(p []byte) error

UnmarshalJSON turns a config fragment into a ISSN filter.

type Tagger added in v0.1.75

type Tagger struct {
	// contains filtered or unexported fields
}

Tagger is takes a list of tags (ISILs) and annotates and intermediate schema according to a number of filters, defined per label. The tagger can be loaded directly from JSON.

func (*Tagger) Tag added in v0.1.75

Tag takes an intermediate schema record and returns a labeled version of that record.

func (*Tagger) UnmarshalJSON added in v0.1.75

func (t *Tagger) UnmarshalJSON(p []byte) error

UnmarshalJSON unmarshals a complete filter config from serialized JSON.

type Tree added in v0.1.130

type Tree struct {
	// contains filtered or unexported fields
}

Tree allows polymorphic filters.

func (*Tree) Apply added in v0.1.130

func (f *Tree) Apply(is finc.IntermediateSchema) bool

Apply applies the root filter.

func (*Tree) UnmarshalJSON added in v0.1.130

func (f *Tree) UnmarshalJSON(p []byte) error

UnmarshalJSON gathers the top level filter name and unmarshals the associated filter.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL