irdmtools

package module
v0.0.16 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 7, 2023 License: BSD-3-Clause Imports: 21 Imported by: 0

README

Institutional Repository Data Management Tools

This is a proof of concept set tools and Go packages for working with institutional repositories. Initial target is Invenio RDM.

The proof of concept is being developed around RDM's web services (e.g. REST API and OAI-PMH), PostgreSQL database and external metadata services (e.g. CrossRef, DataCite).

Caltech Library is using testing the prototype with continuous content migration, aggregation and metadata analysis.

Tools

rdmutil

This tool is for interacting with an Invenio RDM repository.

  • get_all_ids uses the OAI-PMH service to harvest all the current record ids in an Invenio RDM instance (very slow due to rate limits)
  • get_modified_ids uses the OAI-PMH service with the "from" and "until" attributes to get a list of modified record ids (very slow due to rate limits)
  • get_record retrieves a specific RDM record based on the id (quick, uses the RDM REST API)
  • query can retrieve a selection of records from the RDM REST API, it is limited to 10K total returned records by RDM/Elasticsearch's configuration
  • harvest reads a JSON array of record ids from a file and harvests the RDM records into a dataset v2 collection

rdmutil configuration is read either from the envinronment or a JSON formated configuration file. See the man page for details.

eprint2rdm

This tool is migrating content from an EPrints repository via the EPrint REST API. It will retrieve an EPrint XML representation of the EPrint record and transform it into a JSON encded simplified record nearly compatible with Invenio RDM.

Requirements

  • An Invenio RDM deployment
  • To building the software and documentation
    • git
    • Go >= 1.20.1
    • Make (e.g. GNU Make)
    • Pandoc >= 3
  • For harvesting content

Installation

This codebase is speculative. It is likely to change and as issues are identified. To install you need to download the source code and compile it. Here's the steps I take to install irdmtools.

git clone git@github.com:caltechlibrary/irdmtools
cd irdmtools
make
make test
make install

Documentation

Overview

irdmtools is a package for working with institutional repositories and data management systems. Current implementation targets Invenio-RDM.

@author R. S. Doiel, <rsdoiel@caltech.edu> @author Tom Morrell, <tmorrell@caltech.edu>

Copyright (c) 2023, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

/ irdmtools is a package for working with institutional repositories and data management systems. Current implementation targets Invenio-RDM.

@author R. S. Doiel, <rsdoiel@caltech.edu> @author Tom Morrell, <tmorrell@caltech.edu>

Copyright (c) 2023, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

irdmtools is a package for working with institutional repositories and data management systems. Current implementation targets Invenio-RDM.

@author R. S. Doiel, <rsdoiel@caltech.edu> @author Tom Morrell, <tmorrell@caltech.edu>

Copyright (c) 2023, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

irdmtools is a package for working with institutional repositories and data management systems. Current implementation targets Invenio-RDM.

@author R. S. Doiel, <rsdoiel@caltech.edu> @author Tom Morrell, <tmorrell@caltech.edu>

Copyright (c) 2023, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

irdmtools is a package for working with institutional repositories and data management systems. Current implementation targets Invenio-RDM.

@author R. S. Doiel, <rsdoiel@caltech.edu> @author Tom Morrell, <tmorrell@caltech.edu>

Copyright (c) 2023, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

irdmtools is a package for working with institutional repositories and data management systems. Current implementation targets Invenio-RDM.

@author R. S. Doiel, <rsdoiel@caltech.edu> @author Tom Morrell, <tmorrell@caltech.edu>

Copyright (c) 2023, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

irdmtools is a package for working with institutional repositories and data management systems. Current implementation targets Invenio-RDM.

@author R. S. Doiel, <rsdoiel@caltech.edu> @author Tom Morrell, <tmorrell@caltech.edu>

Copyright (c) 2023, Caltech All rights not granted herein are expressly reserved by Caltech.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Index

Constants

View Source
const (
	// Version number of release
	Version = "0.0.16"

	// ReleaseDate, the date version.go was generated
	ReleaseDate = "2023-06-07"

	// ReleaseHash, the Git hash when version.go was generated
	ReleaseHash = "c3e2ab5"

	LicenseText = `` /* 1430-byte string literal not displayed */

)

Variables

This section is empty.

Functions

func AddAdditionalTitles added in v0.0.3

func AddAdditionalTitles(rec *simplified.Record, title *simplified.TitleDetail) error

func AddBookTitle

func AddBookTitle(rec *simplified.Record, bookTitle string) error

func AddDate added in v0.0.5

func AddDate(rec *simplified.Record, dt *simplified.DateType) error

func AddFunder

func AddFunder(rec *simplified.Record, funder *simplified.Funder) error

func AddKeyword

func AddKeyword(rec *simplified.Record, keyword string) error

func AddPublicationDate

func AddPublicationDate(rec *simplified.Record, dt string, publicationType string) error

func AddRelatedIdentifiers added in v0.0.3

func AddRelatedIdentifiers(rec *simplified.Record, identifiers []*simplified.Identifier) error

func AddRights added in v0.0.5

func AddRights(rec *simplified.Record, rights []*simplified.Right) error

func AddSubject

func AddSubject(rec *simplified.Record, subject string) error

func AddSubjects added in v0.0.5

func AddSubjects(rec *simplified.Record, subjects []*simplified.Subject) error

func CheckWaitInterval added in v0.0.4

func CheckWaitInterval(iTime time.Time, wait time.Duration) (time.Time, bool)

CheckWaitInterval checks to see if an interval of time has been met or exceeded. It returns the remaining time interval (possibly reset) and a boolean. The boolean is true when the time interval has been met or exceeded, false otherwise.

``` tot := len(something) // calculate the total number of items to process t0 := time.Now() iTime := time.Now() reportProgress := false

for i, key := range records {
    // ... process stuff ...
    if iTime, reportProgress = CheckWaitInterval(rptTime, (30 * time.Second)); reportProgress {
        log.Printf("%s", ProgressETR(t0, i, tot))
    }
}

```

func CrosswalkCrossRefWork

func CrosswalkCrossRefWork(cfg *Config, work *crossrefapi.Works, resourceTypeMap map[string]string, contributorTypeMap map[string]string) (*simplified.Record, error)

CrosswalkCrossRefWork takes a Works object from the CrossRef API and maps the fields into an simplified Record struct return a new struct or error.

func CrosswalkEPrintToRecord

func CrosswalkEPrintToRecord(eprint *eprinttools.EPrint, rec *simplified.Record, resourceTypes map[string]string, contributorTypes map[string]string) error

CrosswalkEPrintToRecord implements a crosswalk between an EPrint 3.x EPrint XML record as struct to a Invenio RDM record as struct.

func FmtHelp added in v0.0.11

func FmtHelp(src string, appName string, version string, releaseDate string, releaseHash string) string

FmtHelp lets you process a text block with simple curly brace markup.

func GetEPrint added in v0.0.3

func GetEPrint(baseURL string, eprintID int, timeout time.Duration, retryCount int) (*eprinttools.EPrints, error)

GetEPrint fetches a single EPrint record via the EPrint REST API.

func GetKeys added in v0.0.3

func GetKeys(baseURL string, timeout time.Duration, retryCount int) ([]int, error)

GetKeys returns a list of eprint record ids from the EPrints REST API

func GetModifiedRecordIds

func GetModifiedRecordIds(cfg *Config, start string, end string) ([]string, error)

GetModifiedRecordIds takes a configuration object, contacts am RDM instance and returns a list of ids created, deleted or updated in the time range specififed. I problem is encountered returns an error.

The configuration object must have the InvenioAPI and InvenioToken attributes set.

NOTE: This method relies on OAI-PMH, this is a rate limited process so results can take quiet some time.

func GetRawRecord added in v0.0.5

func GetRawRecord(cfg *Config, id string) (map[string]interface{}, error)

GetRawRecord takes a configuration object and record id, contacts an RDM instance and returns a map[string]interface{} record

``` cfg, _ := LoadConfig("config.json") id := "qez01-2309a" rl := new(RateLimit) mapRecord, err := GetRawRecord(cfg, rl, id)

if err != nil {
	 // ... handle error ...
}

```

func GetRecord

func GetRecord(cfg *Config, id string) (*simplified.Record, error)

GetRecord takes a configuration object and record id, contacts an RDM instance and returns a simplified record, a rate limit struct and an error value.

The configuration object must have the InvenioAPI and InvenioToken attributes set.

``` cfg, _ := LoadConfig("config.json") id := "qez01-2309a" var rl *RateLimit record, rateLimit, err := GetRecord(cfg, rl, id)

if err != nil {
	 // ... handle error ...
}

```

func GetRecordIds

func GetRecordIds(cfg *Config) ([]string, error)

GetRecordIds takes a configuration object, contacts am RDM instance and returns a list of ids and error.

The configuration object must have the InvenioAPI and InvenioToken attributes set.

NOTE: This method relies on OAI-PMH, this is a rate limited process so results can take quiet some time.

func Harvest

func Harvest(cfg *Config, fName string, debug bool) error

func LoadTypesMap added in v0.0.3

func LoadTypesMap(fName string, mapTypes map[string]string) error

```

func ProgressETR added in v0.0.4

func ProgressETR(t0 time.Time, i int, tot int) string

ProgressETR returns a string with the percentage processed and estimated time remaining. It requires the a counter of records processed, the total count of records and a time zero value.

``` tot := len(something) // calculate the total number of items to process t0 := time.Now() iTime := time.Now() reportProgress := false

for i, key := range records {
    // ... process stuff ...
    if iTime, reportProgress = CheckWaitInterval(rptTime, (30 * time.Second)); reportProgress {
        log.Printf("%s", ProgressETR(t0, i, tot))
    }
}

```

func ProgressIPS added in v0.0.4

func ProgressIPS(t0 time.Time, i int, timeUnit time.Duration) string

ProgressIPS returns a string with the elapsed time and increments per second. Takes a time zero, a counter and time unit. Returns a string with count, running time and increments per time unit. ``` t0 := time.Now() iTime := time.Now() reportProgress := false

for i, key := range records {
    // ... process stuff ...
    if iTime, reportProgress = CheckWaitInterval(rptTime, (30 * time.Second)); reportProgress || i = 0 {
        log.Printf("%s", ProgressIPS(t0, i, time.Second))
    }
}

```

func Query

func Query(cfg *Config, q string, sort string) ([]map[string]interface{}, error)

Query takes a query string and returns the paged object results as a slice of `map[string]interface{}`

``` records, err := Query(cfg, "Geological History in Southern California", "newest")

if err != nil {
    // ... handle error ...
}

for _, rec := ranges {
    // ... process results ...
}

```

func QueryCrossRefWork

func QueryCrossRefWork(cfg *Config, doi string, mailTo string, dotInitials bool, downloadDocument bool) (*crossrefapi.Works, error)

func QueryDataCite

func QueryDataCite(cfg *Config, doi string, mailTo string, dotInitials bool, downloadDocument bool, debug bool) (map[string]interface{}, error)

func SampleConfig

func SampleConfig(configFName string) ([]byte, error)

SampleConfig display a minimal configuration for the rdmutil cli. The minimal values in the configuration are "invenio_api" url and "invenio_token" holding the access token.

```

src, err := SampleConfig("irdmtools.json")
if err != nil {
    // ... handle error ...
}
fmt.Printf("%s\n", src)

```

func SetArticleNumber

func SetArticleNumber(rec *simplified.Record, articleNo string) error

func SetContributors added in v0.0.3

func SetContributors(rec *simplified.Record, creators []*simplified.Creator) error

func SetCreators added in v0.0.3

func SetCreators(rec *simplified.Record, creators []*simplified.Creator) error

func SetDOI

func SetDOI(rec *simplified.Record, doi string) error

Wraps the simplified package with crosswalks

func SetDescription

func SetDescription(rec *simplified.Record, description string) error

func SetEdition

func SetEdition(rec *simplified.Record, edition string) error

func SetFullTextStatus

func SetFullTextStatus(rec *simplified.Record, status bool) error

func SetFunding

func SetFunding(rec *simplified.Record, funding []*simplified.Funder) error

func SetIssue added in v0.0.16

func SetIssue(rec *simplified.Record, issue string) error

func SetMonographType

func SetMonographType(rec *simplified.Record, monographType string) error

func SetPageRange

func SetPageRange(rec *simplified.Record, pageRange string) error

func SetPresentationType added in v0.0.3

func SetPresentationType(rec *simplified.Record, presentationType string) error

func SetProject

func SetProject(rec *simplified.Record, project string) error

func SetPublication

func SetPublication(rec *simplified.Record, publication string) error

func SetPublicationDate added in v0.0.5

func SetPublicationDate(rec *simplified.Record, pubDate string) error

func SetPublisher

func SetPublisher(rec *simplified.Record, publisher string) error

func SetPublisherLocation

func SetPublisherLocation(rec *simplified.Record, publisherLocation string) error

func SetReferred

func SetReferred(rec *simplified.Record, referred bool) error

func SetResourceType

func SetResourceType(rec *simplified.Record, resourceType string, resourceTypeMap map[string]string) error

func SetSeries

func SetSeries(rec *simplified.Record, series string) error

func SetTitle added in v0.0.3

func SetTitle(rec *simplified.Record, title string) error

func SetVolume

func SetVolume(rec *simplified.Record, volume string) error

Types

type Config

type Config struct {
	// Debug is set true then methods with access to the Config obect
	// can use this flag to implement addition logging to standard err
	Debug bool `json:"-"`
	// Repository Name, e.g. CaltechAUTHORS, CaltechTHESIS, CaltechDATA
	RepoName string `json:"repo_name,omitempty"`
	// InvenioAPI holds the URL to the InvenioAPI
	InvenioAPI string `json:"rdm_url,omitempty"`
	// InvenioToken is holds the token string to access the API
	InvenioToken string `json:"rdmtok,omitempty"`
	// Invenio DSN holds the data source name for the Postgres database storing the invenio records
	InvenioDSN string `json:"rdm_dsn,omitempty"`
	// InvenioStorage holds the URI to the default storage of Invenio RDM objects, e.g. local file system or S3 bucket
	InvenioStorage string `json:"rdm_storage,omitempty"`
	// CName holds the dataset collection name used when harvesting content
	CName string `json:"c_name,omitempty"`
	// MailTo holds an email address to use when an email (e.g. CrossRef API access) is needed
	MailTo string `json:"mailto,omitempty"`
	// contains filtered or unexported fields
}

Config holds the common configuration used by all irdmtools

func NewConfig added in v0.0.5

func NewConfig() *Config

NewConfig generates an empty configuration struct.

func (*Config) LoadConfig

func (cfg *Config) LoadConfig(configFName string) error

LoadConfig reads the configuration file and initializes the attributes in the Config struct. It returns an error if problem were encounter. NOTE: It does NOT merge the settings in the environment.

```

cfg := NewConfig()
if err := cfg.LoadConfig("irdmtools.json"); err != nil {
   // ... handle error ...
}
fmt.Printf("Invenio RDM API UTL: %q\n", cfg.IvenioAPI)
fmt.Printf("Invenio RDM token: %q\n", cfg.InvenioToken)
fmt.Printf("Dataset Collection: %q\n", cfg.CName)
fmt.Printf("MailTo: %q\n", cfg.MailTo)

```

func (*Config) LoadEnv

func (cfg *Config) LoadEnv(prefix string) error

LoadEnv checks the environment for configuration values if not previusly sets them. It will apply a prefix to the expected environment variable names if one is provided.

```

cfg := new(Config)
if err := cfg.LoadEnv("TEST_"); err != nil {
      // ... error handle ...
}

```

type Doi2Rdm

type Doi2Rdm struct {
	Cfg *Config
}

Doi2Rdm holds the configuration for doi2rdm cli.

func (*Doi2Rdm) Configure

func (app *Doi2Rdm) Configure(configFName string, envPrefix string, debug bool) error

Configure reads the configuration file and environtment initialing the Cfg attribute of a Doi2Rdm object. It returns an error if problem were encounter.

```

app := new(irdmtools.Doi2Rdm)
if err := app.Configure("irdmtools.json", "TEST_"); err != nil {
   // ... handle error ...
}
fmt.Printf("Invenio RDM API UTL: %q\n", app.Cfg.IvenioAPI)
fmt.Printf("Invenio RDM token: %q\n", app.Cfg.InvenioToken)

```

func (*Doi2Rdm) Run

func (app *Doi2Rdm) Run(in io.Reader, out io.Writer, eout io.Writer, options map[string]string, doi string) error

Run implements the doi2rdm cli behaviors. With the exception of the "setup" action you should call `app.LoadConfig()` before execute Run.

```

app := new(irdmtools.Doi2Rdm)
if err := app.LoadConfig("irdmtools.json"); err != nil {
   // ... handle error ...
}
recordId := "wx0w-2231"
src, err := app.Run(os.Stdin, os.Stdout, os.Stderr,
                     "get_record", []string{recordId})
if err != nil {
    // ... handle error ...
}
fmt.Printf("%s\n", src)

```

type EPrint2Rdm

type EPrint2Rdm struct {
	Cfg *Config
}

EPrint2Rdm holds the configuration for rdmutil cli.

func (*EPrint2Rdm) Run

func (app *EPrint2Rdm) Run(in io.Reader, out io.Writer, eout io.Writer, username string, password string, host string, eprintId string, resourceTypesFName string, contributorTypesFName string, allIds bool, idList string, cName string, debug bool) error

Run implements the eprint2rdm cli behaviors.

```

	app := new(irdmtools.EPrint2Rdm)
	eprintUsername := os.Getenv("EPRINT_USERNAME")
	eprintPassword := os.Getenv("EPRINT_PASSWORD")
	eprintHost := "eprints.example.edu"
	eprintId := "11822"
	resourceTypes := map[string]string{}
 if err := LoadTypesMap("resource-types.csv", resourceTypes);
		err != nil {
		// ... handle error ...
	}
 contributorTypes := map[string]string{}
 if err := LoadTypesMap("contributor-types.csv", contributorTypes);
		err != nil {
		// ... handle error ...
	}
	src, err := app.Run(os.Stdin, os.Stdout, os.Stderr,
						eprintUser, eprintPassword,
						eprintHost, eprintId,
                     resourceTypes, contributorsTypes,
						debug)
	if err != nil {
		// ... handle error ...
	}
	fmt.Printf("%s\n", src)

```

type EPrintKeysPage added in v0.0.2

type EPrintKeysPage struct {
	XMLName xml.Name `xml:"html"`
	Anchors []string `xml:"body>ul>li>a"`
}

EPrintKeysPage holds the structure of the HTML page with the EPrint IDs embedded from the EPrint REST API.

type Hits

type Hits struct {
	Hits  []map[string]interface{} `json:"hits,omitempty"`
	Total int                      `json:"total,omitempty"`
}
type Links struct {
	Self string `json:"self,omitempty"`
	Next string `json:"next,omitempty"`
	Prev string `json:"prev,omitempty"`
}

type OAIHeader

type OAIHeader struct {
	Status     string   `xml:"status,attr,omitempty" json:"status,omitempty"`
	Identifier string   `xml:"identifier,omitempty" json:"identifier,omitempty"`
	DateStamp  string   `xml:"datestamp,omitempty" json:"datestamp,omitempty"`
	SetSpec    []string `xml:"setSpec,omitempty" json:"set_spec,omitempty"`
}

OAIHeader holds the response items for

type OAIListIdentifiers

type OAIListIdentifiers struct {
	Headers         []OAIHeader `xml:"header,omitempty" json:"header,omitempty"`
	ResumptionToken string      `xml:"resumptionToken,omitempty" json:"resumption_token,omitempty"`
}

type OAIListIdentifiersResponse

type OAIListIdentifiersResponse struct {
	XMLName         xml.Name            `xml:"OAI-PMH" json:"-"`
	XMLNS           string              `xml:"xmlns,attr,omitempty" json:"xmlns,omitempty"`
	ResponseDate    string              `xml:"responseDate,omitempty" json:"response_date,omitempty"`
	Request         string              `xml:"request,omitempty" json:"request,omitempty"`
	RequestAttr     map[string]string   `xml:"request,attr,omitempty" json:"request_attr,omitempty"`
	ListIdentifiers *OAIListIdentifiers `xml:"ListIdentifiers,omitempty" json:"list_identifiers,omitempty"`
}

OAIListIdendifiersResponse

type QueryResponse

type QueryResponse struct {
	//
	Hits   *Hits  `json:"hits,omitepmty"`
	Links  *Links `json:"links,omitempty"`
	SortBy string `json:"sortBy,omitempty"`
}

QueryResponse holds the response to /api/records?q=...

type RateLimit

type RateLimit struct {
	// Limit maps to X-RateLimit-Limit
	Limit int `json:"limit,omitempty"`
	// OldLimit holds the last value of rate limit before change.
	OldLimit int `json:"-"`
	// Remaining maps to X-RateLimit-Remaining
	Remaining int `json:"remaining,omitempty"`
	// Reset maps to X-RateLimit-Reset
	Reset int `json:"reset,omitempty"`
}

RateLimit holds the values used to play nice with OAI-PMH or REST API. It normally is extracted from the response header.

func (*RateLimit) Fprintf

func (rl *RateLimit) Fprintf(out io.Writer)

func (*RateLimit) FromHeader

func (rl *RateLimit) FromHeader(header http.Header)

FromHeader takes an http.Header (e.g. http.Response.Header) and updates a rate limit struct.

``` rl := new(RateLimit) rl.FromHeader(header) ```

func (*RateLimit) FromResponse

func (rl *RateLimit) FromResponse(resp *http.Response)

FromResponse takes an http.Response struct and extracts the header values realated to rate limits (e.g. X-RateLite-Limit)

``` rl := new(RateLimit) rl.FromResponse(response) ```

func (*RateLimit) ResetString

func (rl *RateLimit) ResetString() string

func (*RateLimit) String

func (rl *RateLimit) String() string

func (*RateLimit) Throttle

func (rl *RateLimit) Throttle(i int, tot int)

Throttle looks at the rate limit structure and implements an appropriate sleep time based on rate limits.

```

 i, tot := 0, 1000 // This ith' iteration and total number of records
	rl := new(RateLimit)
	// Set our rate limit from
	rl.FromResponse(response)
 rl.Throttle(i, tot)

```

func (*RateLimit) TimeToReset

func (rl *RateLimit) TimeToReset() (time.Duration, time.Time)

func (*RateLimit) TimeToWait

func (rl *RateLimit) TimeToWait(unit time.Duration) time.Duration

SecondsToWait returns the number of seconds (as a time.Duratin) to wait to avoid a http status code 429 and a ratio (float64) of remaining per request limit.

``` rl := new(RateLimit) rl.FromHeader(response.Header) timeToWait := rl.TimeToWait() time.Sleep(timeToWait) ```

type RdmUtil

type RdmUtil struct {
	Cfg *Config
}

RdmUtil holds the configuration for rdmutil cli.

func (*RdmUtil) Configure

func (app *RdmUtil) Configure(configFName string, envPrefix string, debug bool) error

Configure reads the configuration file and environtment initialing the Cfg attribute of a RdmUtil object. It returns an error if problem were encounter.

```

app := new(irdmtools.RdmUtil)
if err := app.Configure("irdmtools.json", "TEST_"); err != nil {
   // ... handle error ...
}
fmt.Printf("Invenio RDM API UTL: %q\n", app.Cfg.IvenioAPI)
fmt.Printf("Invenio RDM token: %q\n", app.Cfg.InvenioToken)

```

func (*RdmUtil) GetModifiedIds

func (app *RdmUtil) GetModifiedIds(start string, end string) ([]byte, error)

GetModified returns a byte slice for a JSON encode list of record ids modified (created, updated, deleted) in the given time range. If a problem occurs an error is returned.

```

app := new(irdmtools.RdmUtil)
if err := app.LoadConfig("irdmtools.json"); err != nil {
   // ... handle error ...
}
src, err := app.GetModifiedIds("2020-01-01", "2020-12-31")
if err != nil {
    // ... handle error ...
}
fmt.Printf("%s\n", src)

```

func (*RdmUtil) GetRawRecord added in v0.0.5

func (app *RdmUtil) GetRawRecord(id string) ([]byte, error)

GetRawRecord returns a byte slice for a JSON encoded record as a `map[string]interface{}` retrieved from the RDM API.

```

app := new(irdmtools.RdmUtil)
if err := app.LoadConfig("irdmtools.json"); err != nil {
   // ... handle error ...
}
recordId := "woie-x0121"
src, err := app.GetRawRecord(recordId)
if err != nil {
    // ... handle error ...
}
fmt.Printf("%s\n", src)

```

func (*RdmUtil) GetRecord

func (app *RdmUtil) GetRecord(id string) ([]byte, error)

GetRecord returns a byte slice for a JSON encoded record or an error.

```

app := new(irdmtools.RdmUtil)
if err := app.LoadConfig("irdmtools.json"); err != nil {
   // ... handle error ...
}
recordId := "woie-x0121"
src, err := app.GetRecord(recordId)
if err != nil {
    // ... handle error ...
}
fmt.Printf("%s\n", src)

```

func (*RdmUtil) GetRecordIds

func (app *RdmUtil) GetRecordIds() ([]byte, error)

GetRecordIds returns a byte slice for a JSON encode list of record ids or an error.

```

app := new(irdmtools.RdmUtil)
if err := app.LoadConfig("irdmtools.json"); err != nil {
   // ... handle error ...
}
src, err := app.GetRecordIds()
if err != nil {
    // ... handle error ...
}
fmt.Printf("%s\n", src)

```

func (*RdmUtil) Harvest

func (app *RdmUtil) Harvest(fName string) error

Harvest takes a JSON file contianing a list of record ids and harvests them into a dataset v2 collection. The dataset collection must exist and be configured in either the environment or configuration file.

func (*RdmUtil) Query

func (app *RdmUtil) Query(q string, sort string) ([]byte, error)

Query returns a byte slice for a JSON encode list of record summaries or an error.

```

app := new(irdmtools.RdmUtil)
if err := app.LoadConfig("irdmtools.json"); err != nil {
   // ... handle error ...
}
src, err := app.Query("My favorite book", -1, "newest")
if err != nil {
    // ... handle error ...
}
fmt.Printf("%s\n", src)

```

func (*RdmUtil) Run

func (app *RdmUtil) Run(in io.Reader, out io.Writer, eout io.Writer, action string, params []string) error

Run implements the irdmapp cli behaviors. With the exception of the "setup" action you should call `app.LoadConfig()` before execute Run.

```

app := new(irdmtools.RdmUtil)
if err := app.LoadConfig("irdmtools.json"); err != nil {
   // ... handle error ...
}
recordId := "wx0w-2231"
src, err := app.Run(os.Stdin, os.Stdout, os.Stderr,
                     "get_record", []string{recordId})
if err != nil {
    // ... handle error ...
}
fmt.Printf("%s\n", src)

```

Directories

Path Synopsis
cmd
doi2rdm
doi2rdm is a command line program for harvesting DOI metadata from CrossRef and DataCite returning a JSON documentument sutiable for import into Invenio RDM.
doi2rdm is a command line program for harvesting DOI metadata from CrossRef and DataCite returning a JSON documentument sutiable for import into Invenio RDM.
eprint2rdm
eprint2rdm is a command line program for harvesting an EPrint metadata record and return a Invenio RDM style record.
eprint2rdm is a command line program for harvesting an EPrint metadata record and return a Invenio RDM style record.
rdmutil
rdmutil is a command line program for working with Invenio RDM.
rdmutil is a command line program for working with Invenio RDM.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL