islint

package module
v0.1.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 7, 2015 License: GPL-3.0 Imports: 9 Imported by: 0

README

islint

Intermediate Schema linter. What is linting?

Documentation on godoc.org.

Linux 64-bit toy: islint

Usage

$ islint -ls
CurrencyInTitle
EndPageBeforeStartPage
EtAlAuthorName
ExcessivePunctuation
InvalidCollection
InvalidEndPage
InvalidStartPage
InvalidURL
KeyTooLong
NAInAuthorName
NoPublisher
PublicationDateTooEarly
PublicationDateTooLate
RepeatedSlash
RepeatedSubtitle
ShortAuthorName
SuspiciousPageCount
WhitespaceAuthor

$ islint < file.is
2015/12/03 14:45:55 1000000
2015/12/03 14:45:55 1000000 total, 911306 ok, 88694 or 9.733% with issues
2015/12/03 14:45:55 map[SuspiciousPageCount:5 ExcessivePuctuation:5
                        CurrencyInTitle:1007 PublicationDateTooEarly:52361
                        RepeatedSubtitle:18294 EndPageBeforeStartPage:390
                        InvalidStartPage:231 InvalidCollection:16782]
2015/12/03 14:46:47 2000000
2015/12/03 14:46:47 2000000 total, 1786939 ok, 213061 or 11.923% with issues
2015/12/03 14:46:47 map[CurrencyInTitle:5668 InvalidStartPage:685
                        SuspiciousPageCount:5 PublicationDateTooEarly:146781
                        RepeatedSubtitle:34849 EndPageBeforeStartPage:5146
                        InvalidCollection:20572 ExcessivePuctuation:381
                        InvalidEndPage:7]
2015/12/03 14:47:37 3000000
2015/12/03 14:47:37 3000000 total, 2651675 ok, 348325 or 13.136% with issues
2015/12/03 14:47:37 map[PublicationDateTooEarly:195313 RepeatedSubtitle:118735
                        EndPageBeforeStartPage:5712 InvalidCollection:21339
                        ExcessivePuctuation:388 InvalidEndPage:7
                        CurrencyInTitle:7511 InvalidStartPage:731
                        SuspiciousPageCount:5]
...
2015/12/03 16:01:27 88521109 total, 83013026 ok, 5508083 or 6.635% with issues
2015/12/03 16:01:27 map[CurrencyInTitle:330554 InvalidStartPage:90924
                        SuspiciousPageCount:63 InvalidURL:37
                        PublicationDateTooLate:4 PublicationDateTooEarly:582577
                        RepeatedSubtitle:2402953 EndPageBeforeStartPage:81716
                        InvalidCollection:2060252 ExcessivePuctuation:3169
                        InvalidEndPage:3856]

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// EarliestDate is the earliest publication date we accept.
	EarliestDate = time.Date(1458, 1, 1, 0, 0, 0, 0, time.UTC)
	// LatestDate represents the latest publication date we accept.
	LatestDate = time.Now().AddDate(5, 0, 0)

	// AllowedCollections
	AllowedCollections = assetutil.MustLoadStringSet("assets/collections/collections.tsv",
		"assets/collections/crossref.tsv")
)

Functions

func AllowedCollectionNames

func AllowedCollectionNames(is finc.IntermediateSchema) error

AllowedCollectionNames checks for a fixed list of allowed collection names, stored under assets, refs. #6496.

func FeasibleAuthor added in v0.1.2

func FeasibleAuthor(is finc.IntermediateSchema) error

FeasibleAuthor checks for a few suspicious authors patterns, refs. #4892, #4940.

func HasPublisher added in v0.1.1

func HasPublisher(is finc.IntermediateSchema) error

HasPublisher tests, whether a publisher is given.

func KeyLength

func KeyLength(is finc.IntermediateSchema) error

KeyLength checks the length of the record id. memcachedb limits this to 250 bytes.

func NoCurrencyInTitle

func NoCurrencyInTitle(is finc.IntermediateSchema) error

NoCurrencyInTitle, e.g. http://goo.gl/HACBcW Cartier , Marie . Baby, You Are My Religion: Women, Gay Bars, and Theology Before Stonewall . Gender, Theology and Spirituality. Durham, UK: Acumen, 2013. xii+256 pp. $90.00 (cloth); $29.95 (paper).

func NoExcessivePunctuation added in v0.1.1

func NoExcessivePunctuation(is finc.IntermediateSchema) error

NoExcessivePuctuation should detect things like this title: CrossRef????????????? https://goo.gl/AD0V1o

func NoRepeatedSlash added in v0.1.4

func NoRepeatedSlash(is finc.IntermediateSchema) error

NoRepeatedSlash checks a DOI for repeated slashes, refs. #6312.

func PlausibleDate

func PlausibleDate(is finc.IntermediateSchema) error

PlausibleDate checks for suspicious dates, refs. #5686.

func PlausiblePageCount

func PlausiblePageCount(is finc.IntermediateSchema) error

PlausiblePageCount checks, wether the start and end page look plausible.

func SubtitleRepetition

func SubtitleRepetition(is finc.IntermediateSchema) error

SubtitleRepetition, refs #6553.

func ValidURL

func ValidURL(is finc.IntermediateSchema) error

ValidURL checks, if a URL string is parseable.

Types

type Issue added in v0.1.1

type Issue struct {
	Kind    Kind
	Record  finc.IntermediateSchema
	Message string
}

Issue contains information about a quality issue in an intermediate schema record.

func (Issue) Error added in v0.1.1

func (e Issue) Error() string

Error formats the error.

func (Issue) TSV added in v0.1.1

func (e Issue) TSV() string

TSV returns a tab representation.

type Kind

type Kind uint16
const (
	KeyTooLong Kind = iota
	InvalidStartPage
	InvalidEndPage
	EndPageBeforeStartPage
	InvalidURL
	SuspiciousPageCount
	PublicationDateTooEarly
	PublicationDateTooLate
	InvalidCollection
	RepeatedSubtitle
	CurrencyInTitle
	ExcessivePunctuation
	NoPublisher
	ShortAuthorName
	EtAlAuthorName
	NAInAuthorName
	WhitespaceAuthor
	RepeatedSlash
)

type TestSuite

type TestSuite []Tester

TestSuite is a group of tests.

type Tester added in v0.1.1

type Tester interface {
	TestRecord(finc.IntermediateSchema) error
}

Tester is a intermediate record checker.

type TesterFunc added in v0.1.1

type TesterFunc func(finc.IntermediateSchema) error

TesterFunc makes a function satisfy an interface.

func (TesterFunc) TestRecord added in v0.1.1

func (f TesterFunc) TestRecord(is finc.IntermediateSchema) error

TestRecord delegates test to the given func.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL