Documentation ¶
Index ¶
- Constants
- type Configuration
- type GabClient
- func (c *GabClient) GetGroup(id int) (*ct.Group, error)
- func (c *GabClient) GetGroupPosts(id int) ([]*ct.GroupPost, error)
- func (c *GabClient) GroupRange(start, stop int) ([]*ct.Group, error)
- func (c *GabClient) GroupRangeC(resc chan<- *ct.CrawlResult, done chan<- struct{}, start, stop int)
- func (c *GabClient) GroupsUntilError(start int) ([]*ct.Group, int, error)
- func (c *GabClient) GroupsUntilErrorC(resc chan<- *ct.CrawlResult, done chan<- struct{}, cancel <-chan struct{}, ...)
- func (c *GabClient) Login(login *Login) error
- type HTTPClient
- type HTTPClientRequest
- type Login
- type State
Constants ¶
const UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0"
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Configuration ¶
type Configuration struct { // Login credentials. Login *Login // CredentialsFile indicates the path to a YAML-formatted file containing // user credentials. At present, only the authorization token is supported. // Default: "credentials.yaml." CredentialsFile string // ExportFormat may be one of "json" (default) or "csv" for outputing // crawled data. JSON-formatted files will always contain data encapsulated // in a JSON array. CSV-formatted files will always start with column names. ExportFormat string // DoNotSaveState disables crawler state saving. Presently, only the // authentication token is saved. DoNotSaveState bool // OutputFile determines the path and file name to which crawled data should // be stored. If OutputFile exists, it will be renamed with a timestamp // suffix. OutputFile string // StateFile is the location of the crawler state previously saved if // DoNotSaveState is false. Default: ".state.yaml." StateFile string // Prompter is a function that requests information from the user, if // necessary, and returns a *Login struct on success. Depending on the front // end, this may request information via the terminal or via a dialog. This // function is not used if our local Login instance is populated. Prompter func() (*Login, error) }
type GabClient ¶
type GabClient struct { Token string Client *HTTPClient State State BaseURL string UserAgent string Username string DerivedID int MaxErrors int }
func NewGabClient ¶
func (*GabClient) GetGroupPosts ¶
GetGroupPosts retrieves the top 20 most recent posts from the specified group. At present, very little introspection is done on these posts and only the post ID and creation time are retrieved. No user information is saved as we're only currently interested in approximating the group's recent activity.
func (*GabClient) GroupRange ¶
GroupRange returns all groups between `start` and `stop`, inclusive. If an error is encountered that is not of type ErrHTTPStatus, this will bail and return the error. Otherwise, this method will continue collecting groups until the maximum boundary is reached. Returns ErrNonsense if start > stop.
func (*GabClient) GroupRangeC ¶
func (c *GabClient) GroupRangeC(resc chan<- *ct.CrawlResult, done chan<- struct{}, start, stop int)
GroupRangeC returns all groups between `start` and `stop`, inclusive. If an error is encountered that is not of type ErrHTTPStatus, this will bail and return the error. Otherwise, this method will continue collecting groups until the maximum boundary is reached. Returns ErrNonsense if start > stop.
func (*GabClient) GroupsUntilError ¶
GroupsUntilError collects group information until MaxErrorsOnFetch. Returns a slice of ct.Group, if successful (otherwise this slice is empty), the last successful ID (or 0 if a failure occurred, and the last error encountered during scan. Be aware that, while most calls to this function should return an error on failure (typically 404, but the value of the error will need some introspection to validate), the ONLY way to guarantee that you've reached the end of the groups list is to test the length of the returned group slice.
func (*GabClient) GroupsUntilErrorC ¶
func (c *GabClient) GroupsUntilErrorC(resc chan<- *ct.CrawlResult, done chan<- struct{}, cancel <-chan struct{}, start int)
GroupsUntilErrorC collects group information from the starting group id `start` and passes it along to the result channel `resc`. Completion is signalled by sending an empty struct via `done`. Errors are embedded in `resc` via its Result member. This call will continue until ct.MaxErrorsOnFetch is reached, where errors are defined as anything >= a 400 response code.
Note that it is up to the caller to handle error conditions beyond the automatic bail out (in addition to resuming after such failures). In particular, Gab now introduced a 429 Throttled response for submitting too many requests within a relatively narrow window. Likewise, callers must monitor for this response code and behave accordingly by throttling the client.
Future versions of this implementation should probably pay careful attention to the response headers returned by the server so as to properly response to any indications that a throttled request is necessary.
type HTTPClient ¶
type HTTPClient struct { BaseURL string Headers http.Header SessionCookies *cookiejar.Jar // contains filtered or unexported fields }
func NewHTTPClient ¶
func NewHTTPClient(baseURL string) *HTTPClient
func (*HTTPClient) Get ¶
func (c *HTTPClient) Get(path string) (*HTTPClientRequest, error)
func (*HTTPClient) Post ¶
func (c *HTTPClient) Post(path string) (*HTTPClientRequest, error)
type HTTPClientRequest ¶
type HTTPClientRequest struct { Query url.Values Form url.Values Headers http.Header Request *http.Request // contains filtered or unexported fields }
func (*HTTPClientRequest) Commit ¶
func (cr *HTTPClientRequest) Commit() (*clientResponse, error)