Documentation ¶
Index ¶
- type APICompatError
- type FeedCursor
- type FeedFilter
- type FeedIterResult
- type FeedPage
- type FeedPageReader
- type GalleryDownloadResult
- type GenericFeedCursor
- type MediaDownloadError
- type SearchFeedCursor
- type Tweet
- type TweetEmbeddedCard
- type TweetEmbeddedGallery
- type TweetEmbeddedQuote
- type TweetEmbeddedVideo
- type TwitterHTTP
- type TwitterSession
- type URLError
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type APICompatError ¶
type APICompatError struct {
// contains filtered or unexported fields
}
APICompatError occurs when the process of extracting scraped data was unsuccessful. This is most likely the result of Twitter changing its internal interfaces or bug in the parser.
func (*APICompatError) Error ¶
func (e *APICompatError) Error() string
func (*APICompatError) TwitterID ¶
func (e *APICompatError) TwitterID() *uint64
TwitterID returns a numeric twitter ID that's associated with the error.
type FeedCursor ¶
type FeedCursor interface { RetrievePage() (FeedPageReader, error) Seek(string) bool }
FeedCursor is an interface for navigating a paginated Twitter feed.
type FeedFilter ¶
type FeedFilter int
FeedFilter enum represents a feed that is a target for scraping (regular or media feed).
const ( // FeedTypeRegular is a regular Twitter feed (contains all tweets). FeedTypeRegular FeedFilter = 0 // FeedTypeMedia is a media-only feed (contains only image/video/postcard // tweets). FeedTypeMedia FeedFilter = 1 )
type FeedIterResult ¶
FeedIterResult is the result of calling FeedIterResult() to retrieve a single tweet from feed.
type FeedPage ¶
type FeedPage struct {
// contains filtered or unexported fields
}
FeedPage stores a single page from Twitter feed.
Tweets and additional page data can be retrieved through FeedPage interface, which is implemented by this type.
func NewFeedPage ¶
func NewFeedPage(structuredJSON interface{}) *FeedPage
NewFeedPage creates a page parser.
func (*FeedPage) GetMinPosition ¶
GetMinPosition returns a position of this page within feed.
type FeedPageReader ¶
FeedPageReader interface defines means of accessing paginated feed's tweets and each page's position within feed. Twitter uses pagination to sub-divide feeds that contain large number of tweets.
type GalleryDownloadResult ¶
type GalleryDownloadResult struct { FileExt string Body io.ReadCloser Error error }
GalleryDownloadResult is a result of calling Download() on an embedded gallery object.
type GenericFeedCursor ¶
type GenericFeedCursor struct {
// contains filtered or unexported fields
}
GenericFeedCursor is used for traversing any paginated feed that is not a search feed.
This cursor has a limit to how many pages it can navigate. This limit is imposed by Twitter and if it's important to retrieve every possible tweet then SearchFeedCursor should be used instead.
func NewGenericFeedCursor ¶
func NewGenericFeedCursor( username string, ttype FeedFilter, resumeAt ...string, ) *GenericFeedCursor
NewGenericFeedCursor creates a generic feed cursor for traversing single user's Twitter feed.
func (*GenericFeedCursor) RetrievePage ¶
func (t *GenericFeedCursor) RetrievePage() (FeedPageReader, error)
RetrievePage downloads page at the current cursor position.
Does not advance the cursor.
func (*GenericFeedCursor) Seek ¶
func (t *GenericFeedCursor) Seek(position string) bool
Seek positions cursor at given position within feed.
type MediaDownloadError ¶
type MediaDownloadError struct {
// contains filtered or unexported fields
}
MediaDownloadError is an error that happens when downloading embedded media in tweet.
func (*MediaDownloadError) Cause ¶
func (t *MediaDownloadError) Cause() error
Cause returns cause of the error.
func (*MediaDownloadError) Error ¶
func (t *MediaDownloadError) Error() string
func (*MediaDownloadError) URL ¶
func (t *MediaDownloadError) URL() string
URL returns a URL associated with the error.
type SearchFeedCursor ¶
type SearchFeedCursor struct {
// contains filtered or unexported fields
}
SearchFeedCursor is used for traversing search feeds.
func NewSearchFeedCursor ¶
func NewSearchFeedCursor(query string, resumeAt ...string) *SearchFeedCursor
NewSearchFeedCursor creates a cursor for traversing search results returned from given query.
func (*SearchFeedCursor) RetrievePage ¶
func (t *SearchFeedCursor) RetrievePage() (FeedPageReader, error)
RetrievePage downloads page at the current cursor position.
Does not advance the cursor.
func (*SearchFeedCursor) Seek ¶
func (t *SearchFeedCursor) Seek(position string) bool
Seek positions cursor at given position within feed.
type Tweet ¶
type Tweet struct { ID uint64 `json:"id,string"` Timestamp time.Time `json:"timestamp"` Text string `json:"text"` Extra interface{} `json:"embed"` }
Tweet represents a single tweet.
type TweetEmbeddedCard ¶
type TweetEmbeddedCard struct {
CardURL string
}
TweetEmbeddedCard represents a postcard embedded within tweet.
func (*TweetEmbeddedCard) MarshalJSON ¶
func (t *TweetEmbeddedCard) MarshalJSON() ([]byte, error)
MarshalJSON returns TweetEmbeddedCard encoded as a JSON bytestring.
type TweetEmbeddedGallery ¶
type TweetEmbeddedGallery struct {
ImageURLs []string
}
TweetEmbeddedGallery represents multiple images embedded within tweet.
func (*TweetEmbeddedGallery) Download ¶
func (t *TweetEmbeddedGallery) Download() <-chan GalleryDownloadResult
Download initiates a sequental download of all images within a Tweet.
Returned channel can be used to read each image's entire body and file extension.
func (*TweetEmbeddedGallery) MarshalJSON ¶
func (t *TweetEmbeddedGallery) MarshalJSON() ([]byte, error)
MarshalJSON returns TweetEmbeddedGallery encoded as a JSON bytestring.
type TweetEmbeddedQuote ¶
type TweetEmbeddedQuote struct {
QuoteURL string
}
TweetEmbeddedQuote represents a quote, that references another tweet, that is embedded within tweet.
func (*TweetEmbeddedQuote) MarshalJSON ¶
func (t *TweetEmbeddedQuote) MarshalJSON() ([]byte, error)
MarshalJSON returns TweetEmbeddedQuote encoded as a JSON bytestring.
type TweetEmbeddedVideo ¶
type TweetEmbeddedVideo struct {
VideoURL string
}
TweetEmbeddedVideo represents a video embedded within tweet.
func (*TweetEmbeddedVideo) MarshalJSON ¶
func (t *TweetEmbeddedVideo) MarshalJSON() ([]byte, error)
MarshalJSON returns TweetEmbeddedVideo encoded as a JSON bytestring.
type TwitterHTTP ¶
type TwitterHTTP struct {
// contains filtered or unexported fields
}
TwitterHTTP is a session parameters that can be shared across multiple TwitterSession`s.
func NewTwitterHTTP ¶
func NewTwitterHTTP() *TwitterHTTP
NewTwitterHTTP creates new session parameters.
type TwitterSession ¶
type TwitterSession struct {
// contains filtered or unexported fields
}
TwitterSession represents a single scraping session.
func NewTwitterSession ¶
func NewTwitterSession(cursor FeedCursor) *TwitterSession
NewTwitterSession creates new TwitterSession based on given cursor.
func (*TwitterSession) FeedIter ¶
func (t *TwitterSession) FeedIter(singlePage ...bool) <-chan (FeedIterResult)
FeedIter returns a channel which can be used to read all available feed tweets.
Using FeedIter() is the recommended way for scraping tweet data.
Depending on cursor used, not all available tweets may be retrieved by the iterator. Twitter puts a hard limit on a maximum number tweets in a feed. So far, the only known way to completely retrieve the entire twitter feed is to iterate over the feed using a search query with a sliding time range until no tweets are getting returned.