schema

package
v2.5.1+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 21, 2018 License: Apache-2.0 Imports: 13 Imported by: 0

Documentation

Overview

Package schema houses simple data types for titles, issues, batches, etc. Types which live here are generally meant to be very general-case rather than trying to hold all possible information for all possible use cases.

Index

Constants

View Source
const (
	// WSNil should only be used to indicate a workflow step is irrelevant or else unset
	WSNil                    WorkflowStep = ""
	WSSFTP                                = "SFTPUpload"
	WSScan                                = "ScanUpload"
	WSAwaitingProcessing                  = "AwaitingProcessing"
	WSAwaitingPageReview                  = "AwaitingPageReview"
	WSReadyForMetadataEntry               = "ReadyForMetadataEntry"
	WSAwaitingMetadataReview              = "AwaitingMetadataReview"
	WSReadyForMETSXML                     = "ReadyForMETSXML"
	WSReadyForBatching                    = "ReadyForBatching"
	WSInProduction                        = "InProduction"
)

All possible statuses an issue could have

Variables

This section is empty.

Functions

func CondensedDate

func CondensedDate(rawDate string) string

CondensedDate returns the date in a consistent format for use in issue key TSV output

func IssueDateEdition

func IssueDateEdition(rawDate string, edition int) string

IssueDateEdition returns the combination of condensed date (no hyphens) and two-digit edition number for use in issue keys and other places we need the "local" unique string

func IssueKey

func IssueKey(lccn, rawDate string, edition int) string

IssueKey centralizes the generation of our unique "key" for an issue using the lccn + date + edition

func TrimCommonPrefixes

func TrimCommonPrefixes(s string) string

TrimCommonPrefixes strips "The", "A", and "An" from the string if they're at the beginning, and removes leading spaces

Types

type Batch

type Batch struct {
	// MARCOrgCode tells us the organization responsible for the images in the batch
	MARCOrgCode string

	// A batch's keyword is normally short, such as "horsetail", but our in-house
	// batches have much longer keywords to ensure uniqueness
	Keyword string

	// Usually 1, but I've seen "_ver02" batches occasionally
	Version int

	// Issues links the issues which are part of this batch
	Issues IssueList

	// Location is where this batch can be found, either a URL or filesystem path
	Location string

	Errors apperr.List
}

Batch represents high-level batch information

func ParseBatchname

func ParseBatchname(fullname string) (*Batch, error)

ParseBatchname creates a Batch by splitting up the full name string

func (*Batch) AddError

func (b *Batch) AddError(err apperr.Error)

AddError attaches err to this batch

func (*Batch) AddIssue

func (b *Batch) AddIssue(i *Issue)

AddIssue adds the issue to this batch's list, and sets the issue's batch

func (*Batch) Fullname

func (b *Batch) Fullname() string

Fullname is the full batch name

func (*Batch) TSV

func (b *Batch) TSV() string

TSV returns a string uniquely identifying this batch by location as well as name, and an issue count to offer some verification or reporting

type DuplicateIssueError

type DuplicateIssueError struct {
	*IssueError
	Location string
	Name     string
	IsLive   bool
}

DuplicateIssueError implements apperr.Error for duped issue situations, and holds onto extra information for figuring out how to handle the dupe

type File

type File struct {
	*fileutil.File
	Location string
	Issue    *Issue
	Errors   apperr.List
}

File just gives fileutil.File a location and issue pointer

func (*File) AddError

func (f *File) AddError(err apperr.Error)

AddError puts err on this file and reports to its issue that one of its children has an error

type Issue

type Issue struct {
	MARCOrgCode string
	Title       *Title
	RawDate     string // This is the date as seen on the filesystem when the issue was uploaded
	Edition     int
	Batch       *Batch
	Files       []*File
	Errors      apperr.List

	// Location is where this issue can be found, either a URL or filesystem path
	Location string

	WorkflowStep WorkflowStep
	// contains filtered or unexported fields
}

Issue is an extremely basic encapsulation of an issue's high-level data

func (*Issue) CheckDupes

func (i *Issue) CheckDupes(lookup *Lookup)

CheckDupes centralizes the logic for seeing if an issue has a duplicate in a given lookup, adding a duplication error if there is a dupe and that dupe is considered to be more "canonical" than this issue. e.g., if there's an issue in the metadata entry stage and another in the sftp upload, the upload is considered the dupe, not the one in metadata entry.

func (*Issue) DateEdition

func (i *Issue) DateEdition() string

DateEdition returns the combination of condensed date (no hyphens) and two-digit edition number for use in issue keys and other places we need the "local" unique string

func (*Issue) ErrDuped

func (i *Issue) ErrDuped(dupe *Issue)

ErrDuped flags this issue with a DuplicateIssueError

func (*Issue) ErrFolderContents

func (i *Issue) ErrFolderContents(extra string)

ErrFolderContents tells us the issue's files on disk are invalid in some way

func (*Issue) ErrInvalidFolderName

func (i *Issue) ErrInvalidFolderName(extra string)

ErrInvalidFolderName adds an Error for invalid folder name formats

func (*Issue) ErrNoFiles

func (i *Issue) ErrNoFiles()

ErrNoFiles adds an error stating the issue folder is empty

func (*Issue) ErrReadFailure

func (i *Issue) ErrReadFailure(err error)

ErrReadFailure indicates the issue's folder wasn't able to be read

func (*Issue) ErrTooNew

func (i *Issue) ErrTooNew(hours int)

ErrTooNew adds an error for issues which are too new to be processed. hours should be set to the minimum number of hours an issue should be untouched before being considered "safe".

func (*Issue) FindFiles

func (i *Issue) FindFiles()

FindFiles clears the issue's file list and then reads everything in the issue directory, appending it to the now-empty list. This will silently fail when the issue's location is invalid, not readable, or isn't an absolute path beginning with "/". This is only meant for issues already discovered on the filesystem.

func (*Issue) IsLive

func (i *Issue) IsLive() bool

IsLive returns true if the issue both has a batch *and* the batch appears to be on the live site

func (*Issue) Key

func (i *Issue) Key() string

Key returns the unique string that represents this issue

func (*Issue) LastModified

func (i *Issue) LastModified() time.Time

LastModified tells us when *any* change happened in an issue's folder. This will return a meaningless value on live issues.

func (*Issue) TSV

func (i *Issue) TSV() string

TSV gives us something which can be used to uniquely identify all aspects of this issue's data for reporting and/or data verification

func (*Issue) WorkflowIdentification

func (i *Issue) WorkflowIdentification() string

WorkflowIdentification returns a human-readable explanation of where an issue lives currently is in the workflow - currently used for adding to "likely duplicate of ..."

type IssueError

type IssueError struct {
	Err  string
	Msg  string
	Prop bool
}

IssueError implements apperr.Error and forms the base for all issue errors

func (*IssueError) Error

func (e *IssueError) Error() string

func (*IssueError) Message

func (e *IssueError) Message() string

Message returns the long, human-friendly error message

func (*IssueError) Propagate

func (e *IssueError) Propagate() bool

Propagate returns whether the error should flag the object's parent as also having an error

type IssueList

type IssueList []*Issue

IssueList groups a bunch of issues together

func (IssueList) SortByKey

func (list IssueList) SortByKey()

SortByKey modifies the IssueList in place so they're sorted alphabetically by issue key. In cases where the keys are the same, the TSV is used to ensure sorting is still consistent, if not ideal.

type IssueMap

type IssueMap map[string]IssueList

IssueMap links a textual issue key to one or more Issue objects

type Key

type Key struct {
	Source string
	LCCN   string
	Year   int
	Month  int
	Day    int
	Ed     int
}

Key defines the precise issue (or subset of issues) we want to find. Note that the structure here is very specific to this issue finder, so we don't expect (or even want) reuse.

func ParseSearchKey

func ParseSearchKey(ik string) (*Key, error)

ParseSearchKey attempts to read the given string, returning an error if the string isn't a valid search key, otherwise returning a proper issueSearchKey

func (Key) String

func (k Key) String() string

String returns the textual representation of this search key for use in lookups

type Lookup

type Lookup struct {
	sync.RWMutex

	// Issue lets us find issues by key; we should usually have only one
	// issue per key, but the live site could have something that's still sitting
	// in the "ready for ingest" area, or the page backup area.
	Issue IssueMap

	// issueNoEdition is a lookup containing all issues for a given partial
	// key, where the partial key contains everything except an Issue edition
	IssueNoEdition IssueMap

	// issueNoDay looks up issues without day number or edition
	IssueNoDay IssueMap

	// issueNoMonth looks up issues without month, day number, or edition
	IssueNoMonth IssueMap

	// issueNoYear looks up issues without any date information
	IssueNoYear IssueMap
}

Lookup aggregates issue lists to create very granularly searchable data

func NewLookup

func NewLookup() *Lookup

NewLookup sets up an issue key lookup for use

func (*Lookup) Issues

func (l *Lookup) Issues(k *Key) IssueList

Issues returns the list of issues which match the given search key

func (*Lookup) Populate

func (l *Lookup) Populate(issues IssueList)

Populate stores the given list of issues in the various maps

type Title

type Title struct {
	LCCN               string
	Name               string
	PlaceOfPublication string
	Errors             apperr.List

	// Issues contains the list of issues associated with a single title; though
	// this can be derived by iterating over all the issues, it's useful to store
	// them here, too
	Issues IssueList

	// Location is where the title was found on disk or web; not actual Title metadata
	Location string
	// contains filtered or unexported fields
}

Title is a publisher's information, unique per LCCN

func (*Title) AddError

func (t *Title) AddError(err apperr.Error)

AddError attaches err to this title

func (*Title) AddIssue

func (t *Title) AddIssue(i *Issue) *Issue

AddIssue adds the issue to this title's list, and sets the issue's title

func (*Title) GenericTitle

func (t *Title) GenericTitle() *Title

GenericTitle returns a title with the same generic information, but none of the data which is tied to a specific title on the filesystem or website: location and issue list

func (*Title) TSV

func (t *Title) TSV() string

TSV returns a string representing this title uniquely by including its location and a count of issues. The issue count won't help us deserialize, but the purpose is just for data verification and simple reporting.

type TitleList

type TitleList []*Title

TitleList is a simple slice of titles for easier built-in sorting and identifying a unique list of all titles

func (TitleList) SortByName

func (list TitleList) SortByName()

SortByName sorts the titles by their name, using location and lccn when names are the same

func (TitleList) Unique

func (list TitleList) Unique() TitleList

Unique returns a new list containing generic versions of each unique LCCN

type WorkflowStep

type WorkflowStep string

WorkflowStep describes the location within the workflow any issue can exist - this is basically a more comprehensive list than what's in the database in order to capture every possible location: live batches, sftped issues awaiting processing, etc.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL