spn

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 18, 2024 License: AGPL-3.0 Imports: 10 Imported by: 0

README

gospn

Save Page Now client in Go

Set environment variable GOSPN_DEBUG to 1 to enable debug output.

Usage

c, _ := spn.Init("YOUR_ACCESS_KEY", "YOUR_SECRET_KEY")
defer c.Close()
options := spn.CaptureOptions{
    SkipFirstArchive:    true,
    IfNotArchivedWithin: "3d",
    ... More options ...
}
url := "https://example.com"
captureResp, err = c.Capture(url, options)

Some possible capture responses:

{"url":"https://example.com/","job_id":"spn2-0123456789abcdef0123456789abcdef12345678"}
{"message": "Cannot resolve host nxdomain.fake.tld.", "status": "error", "status_ext": "error:invalid-host-resolution"}
{"url":"https://example.com/","job_id":null,"message":"The same snapshot had been made 3 minutes ago. You can make new capture of this URL after 2 hours."}

[!NOTE] Capture() will return immediately after sending the request to the Save Page Now API. The actual capture process may take a while to complete. You can use the GetCaptureStatus() method to check the status of the capture job.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CaptureOptions

type CaptureOptions struct {
	// Capture a web page with errors (HTTP status=4xx or 5xx). By default SPN2 captures only status=200 URLs.
	CaptureAll bool `spn:"capture_all"`
	// Capture web page outlinks automatically. This also applies to PDF, JSON, RSS and MRSS feeds.
	CaptureOutlinks int `spn:"capture_outlinks"`
	// Capture full page screenshot in PNG format. This is also stored in the Wayback Machine as a different capture.
	CaptureScreenshot bool `spn:"capture_screenshot"`
	// The capture becomes available in the Wayback Machine after ~12 hours instead of immediately. This option helps reduce the load on our systems. All API responses remain exactly the same when using this option.
	DelayWBAvailability bool `spn:"delay_wb_availability"`
	// Force the use of a simple HTTP GET request to capture the target URL. By default SPN2 does a HTTP HEAD on the target URL to decide whether to use a headless browser or a simple HTTP GET request. force_get overrides this behavior.
	ForceGet bool `spn:"force_get"`
	// Skip checking if a capture is a first if you don’t need this information. This will make captures run faster.
	SkipFirstArchive bool `spn:"skip_first_archive"`
	// if_not_archived_within=<timedelta>
	//
	// Capture web page only if the latest existing capture at the Archive is older than the <timedelta> limit.  Its  format could be any datetime expression like “3d 5h 20m” or just a number of seconds, e.g. “120”. If there is a capture within the defined timedelta, SPN2 returns that as a recent capture. The default system <timedelta> is 45 min.
	//
	// if_not_archived_within=<timedelta1>,<timedelta2>
	//
	// When using 2 comma separated <timedelta> values, the first one applies to the main capture and the second one applies to outlinks.
	IfNotArchivedWithin string `spn:"if_not_archived_within"`
	// Return the timestamp of the last capture for all outlinks.
	OutlinksAvailability bool `spn:"outlinks_availability"`
	// Send an email report of the captured URLs to the user’s email.
	EmailResult bool `spn:"email_result"`
	// Run JS code for <N> seconds after page load to trigger target page functionality like image loading on mouse over, scroll down to load more content, etc. The default system <N> is 5 sec.
	//
	// More details on the JS code we execute:
	// https://github.com/internetarchive/brozzler/blob/master/brozzler/behaviors.yaml
	//
	// WARNING: The max <N> value that applies is 30 sec.
	//
	// NOTE: If the target page doesn’t have any JS you need to run, you can use js_behavior_timeout=0 to speed up the capture.
	JsBehaviorTimeout string `spn:"js_behavior_timeout"` // It's hard to determine if int 0 is user input or default value, so we use string instead
	// Use extra HTTP Cookie value when capturing the target page.
	CaptureCookie string `spn:"capture_cookie"`
	// Use custom HTTP User-Agent value when capturing the target page.
	UseUserAgent string `spn:"use_user_agent"`

	// target_username=<XXX>
	// target_password=<YYY>
	//
	// Use your own username and password in the target page’s login forms.
	TargetUsername string `spn:"target_username"`
	// target_username=<XXX>
	// target_password=<YYY>
	//
	// Use your own username and password in the target page’s login forms.
	TargetPassword string `spn:"target_password"`
}

func (CaptureOptions) Encode

func (opts CaptureOptions) Encode() url.Values

converts CaptureOptions to url.Values

type CaptureResponse

type CaptureResponse struct {
	URL       string `json:"url"`
	JobID     string `json:"job_id"`
	Status    string `json:"status"`
	StatusExt string `json:"status_ext"`
	Message   string `json:"message"`
}

CaptureResponse represent the JSON response from SPN returned when a capture is executed

type CaptureStatus

type CaptureStatus struct {
	Timestamp   string   `json:"timestamp"`
	DurationSec float64  `json:"duration_sec"`
	OriginalURL string   `json:"original_url"`
	Status      string   `json:"status"`
	StatusExt   string   `json:"status_ext"`
	JobID       string   `json:"job_id"`
	Outlinks    []string `json:"outlinks"`
	Resources   []string `json:"resources"`
	Exception   string   `json:"exception"`
	Message     string   `json:"message"`
}

CaptureStatus represent the date returned by the /save/status/{job_id} endpoint

type Connector

type Connector struct {
	AccessKey  string
	SecretKey  string
	HTTPClient *http.Client
	// contains filtered or unexported fields
}

Connector represent the necessary data to execute SPN requests

func Init

func Init(accessKey, secretKey string) (Connector, error)

Init initialize the SPN connector that can be used to trigger archiving for an URL

func (Connector) Capture

func (c Connector) Capture(URL string, options CaptureOptions) (captureResponse CaptureResponse, err error)

Capture execute a capture via https://web.archive.org/save and return the response. Options for the capture can be specified when calling the method

func (*Connector) Close

func (c *Connector) Close()

func (Connector) GetAvailableCaptureSlot

func (c Connector) GetAvailableCaptureSlot() (err error)

Wait until a capture slot is available

func (Connector) GetCaptureStatus

func (c Connector) GetCaptureStatus(jobID string) (captureStatus CaptureStatus, err error)

GetCaptureStatus retrieve the informations about a SPN job

func (Connector) GetUserStatus

func (c Connector) GetUserStatus() (userStatus UserStatus, err error)

GetUserStatus retrieve the user status for a given SPN account

type UserStatus

type UserStatus struct {
	DailyCaptures      int `json:"daily_captures"`
	DailyCapturesLimit int `json:"daily_captures_limit"`
	Available          int `json:"available"`
	Processing         int `json:"processing"`
}

UserStatus represent the data returned by the /save/status/user endpoint

func (*UserStatus) Update

func (to *UserStatus) Update(from UserStatus)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL