scraper

package
v0.0.0-...-54a1e58 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 24, 2023 License: MIT Imports: 18 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CreateNewLiveVodsPriorityQueue

func CreateNewLiveVodsPriorityQueue() *liveVodsPriorityQueue

This makes it easy to fetch the VOD with the oldest LastUpdated time.

func CreateNewOldVodQueue

func CreateNewOldVodQueue() *oldVodsPriorityQueue

func CreateNewWaitVodsPriorityQueue

func CreateNewWaitVodsPriorityQueue() *waitVodsPriorityQueue

This makes it easy to fetch the VOD with the oldest LastInteraction time.

func RunScraper

func RunScraper(ctx context.Context, databaseUrl string, evictionRatio float64, params RunScraperParams) error

databaseUrl is the postgres database to connect to. evictionRatio should be at least 1. We select live vods that were updated at most evictionRatio * (liveVodEvictionThreshold + waitVodEvictionThreshold) ago before the newest live vod. params are the parameters twitch graphql scraper.

func RunScraperForever

func RunScraperForever(ctx context.Context, scraperDuration time.Duration, databaseUrl string, evictionRatio float64, params RunScraperParams)

func ScrapeTwitchLiveVodsWithGqlApi

func ScrapeTwitchLiveVodsWithGqlApi(ctx context.Context, params ScrapeTwitchLiveVodsWithGqlApiParams) error

This function scares me. This function scrapes the Twitch Graphql API and fetches .m3u8 files for streams that finish. It doesn't exit if a Twitch Graphql API request fails. Instead, it resets the cursor and starts over. It stores the results in a database with concurrent updates, so you should use a queries struct that is safe for that. If any database query or modification returns an error, the function finishes and cleans up all resources.

func SetBoxArtWidthHeight

func SetBoxArtWidthHeight(boxArtUrl string, width int, height int) string

func SetProfileImageWidth

func SetProfileImageWidth(profileImageUrl string, width int) string

Types

type LiveVod

type LiveVod struct {
	StreamerId           string
	StreamId             string
	StartTimeUnix        int64
	StreamerLoginAtStart string
	GameIdAtStart        string
	MaxViews             int
	LastUpdatedUnix      int64 // time twitchgql request completed
	LastInteractionUnix  int64 // last time interacted with (e.g. time twitchgql request completed or time sql fetch completed)

}

func (*LiveVod) GetVideoData

func (vod *LiveVod) GetVideoData() *vods.VideoData

type RunScraperParams

type RunScraperParams struct {
	// In any interval of this length, the api will be called at most twice and on average once.
	TwitchHelixFetcherDelay time.Duration
	// Time limit for .m3u8 and Twitch Helix requests. If this is exceeded in the TwitchGQL loop, the for-loop continues. TODO: I should fix this.
	RequestTimeLimit time.Duration
	// If a VOD in the queue of live VODs has a last updated time older than this, it is moved out of the live VODs queue.
	LiveVodEvictionThreshold time.Duration
	// If a VOD in the queue of wait VODs has a last interaction time older than this, it is moved out of the wait VODs queue.
	WaitVodEvictionThreshold time.Duration
	// The queue of old VODs for fetching .m3u8 will never exceed this size. The VODs with the lowest view counts are evicted.
	MaxOldVodsQueueSize int
	// This is the number of goroutines fetching the .m3u8 files and compressing them.
	NumHlsFetchers int
	// In any interval of this length, at most two .m3u8 files will be processed and on average once.
	HlsFetcherDelay time.Duration
	// If this amount of time passes since the last time the cursor was reset, the cursor will be reset.
	CursorResetThreshold time.Duration
	// This is the libdeflate compression level. The highest is 1 and the lowest is 12.
	// It seems best when it's 1. The level of compression is good enough and it is fastest.
	LibdeflateCompressionLevel int
	// The queue of live VODs includes a VOD iff a VOD has at least this number of viewers.
	MinViewerCountToObserve int
	// The queue of old VODs includes a VOD iff a VOD has at least this number of viewers.
	// If a stream is observed to hvae stopped and then restarted, the stream is still recorded.
	MinViewerCountToRecord int
	// Num streams per request (must be between 1 and 30 inclusive)
	NumStreamsPerRequest int
	// Vods older than the current time minus this duration will be deleted
	OldVodsDelete time.Duration
	// Twitch helix client ID
	ClientId string
	// Twitch helix client secret
	ClientSecret string
}

type ScrapeTwitchLiveVodsWithGqlApiParams

type ScrapeTwitchLiveVodsWithGqlApiParams struct {
	RunScraperParams
	// initial live vod queue fetched from database
	InitialWaitVodQueue *waitVodsPriorityQueue
	// sqlc queries instance
	Queries *sqlvods.Queries
}

type VodDataPoint

type VodDataPoint struct {
	ResponseReturnedTimeUnix int64
	Node                     *helix.Stream
}

type VodResult

type VodResult struct {
	Vod                *LiveVod
	HlsBytes           []byte
	HlsBytesFound      bool
	RequestInitiated   time.Time
	HlsDomain          sql.NullString
	Public             sql.NullBool
	ProfileImageUrl    sql.NullString
	BoxArtUrl          sql.NullString
	HlsDurationSeconds sql.NullFloat64
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL