pkpindex

command

v0.3.16 Latest Latest Go to latest Published: Nov 6, 2024 License: GPL-3.0 Imports: 15 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/miku/metha

Links

Open Source Insights

README ¶

PKP Journal info

NOTE: As of 2022-01-01 PKP is not maintained any more.

https://pkp.sfu.ca/2021/10/05/pkp-index-retiring-as-of-january-1-2022/

extract basic journal info from PKP index
TODO: check, if there is a database dump or real API
https://index.pkp.sfu.ca/
2020-02-23, 5024 entries

$ make
$ ./pkpindex

Output will json lines (oai endpoint is guessed):

{
  "name": "Scholarly and Research Communication",
  "homepage": "http://src-online.ca/index.php/src",
  "oai": "http://src-online.ca/index.php/src/oai"
}
{
  "name": "Stream: Culture/Politics/Technology",
  "homepage": "http://journals.sfu.ca/stream/index.php/stream",
  "oai": "http://journals.sfu.ca/stream/index.php/stream/oai"
}

Additional ideas:

check, if journal site is part of a bigger installation (move path element up and pattern match).

Documentation ¶

Overview ¶

Small util to get journal info from https://index.pkp.sfu.ca currently including 1264043 records indexed from 4960 publications.

https://pkp.sfu.ca/2015/10/23/introducing-the-pkp-index/

Usage:

$ make $ ./pkpindex

Output will json lines (oai endpoint is guessed):

{
  "name": "Scholarly and Research Communication",
  "homepage": "http://src-online.ca/index.php/src",
  "oai": "http://src-online.ca/index.php/src/oai"
}
{
  "name": "Stream: Culture/Politics/Technology",
  "homepage": "http://journals.sfu.ca/stream/index.php/stream",
  "oai": "http://journals.sfu.ca/stream/index.php/stream/oai"
}

Additional ideas:

* check, if journal site is part of a bigger installation (move path element up and pattern match).

Notes.

Index page will not yield 404 on invalid page, so max page needs to be set manually for now. Pagination seems to require more, maybe cookies.

Pagination is broken, direct link, with custom UA, cookie ends always ends up at first page; probably a bit too much JS.

Fetch each journal info page, e.g. https://index.pkp.sfu.ca/index.php/browse/archiveInfo/5421 - non-existent pages will redirect to homepage, but not via HTTP 3XX, but via "refresh" header (http://www.otsukare.info/2015/03/26/refresh-http-header).

Certainly, a site with character.

<div id="content"> <h3>Revista de Psicologia del Deporte</h3> <p class="archiveLinks"><a href="https://index.pkp.sfu.ca/index.php/browse/index/37">Browse Records</a>  |  <a href="http://rpd-online.com" target="_blank">Journal Website</a>  |  <a href="http://rpd-online.com/issue/current" target="_blank">Current Issue</a>  |  <a href="http://rpd-online.com/issue/archive" target="_blank">All Issues</a></p>

Let's https://github.com/ericchiang/pup

cat page-000281.html | pup 'h3 text{}' # Journal of Modern Materials cat page-000281.html | pup 'p.archiveLinks > a:nth-child(2) attr{href}' # https://journals.aijr.in/index.php/jmm/index

Source Files ¶

View all Source files

pkpindex.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL