Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type APIResult ¶
type APIResult struct { // Indicates if we actually found IP addresses to probe Attempted bool // The ID response object from the Kubo API ID *api.IDResponse // The Kubo routing table. Doesn't contain multi addresses. Don't use this to continue crawling. RoutingTable *api.RoutingTableResponse }
type Crawler ¶
type Crawler struct {
// contains filtered or unexported fields
}
Crawler encapsulates a libp2p host that crawls the network.
func NewCrawler ¶
NewCrawler initializes a new crawler based on the given configuration.
func (*Crawler) StartCrawling ¶
func (c *Crawler) StartCrawling(ctx context.Context, crawlQueue *queue.FIFO[peer.AddrInfo], resultsQueue *queue.FIFO[Result])
StartCrawling enters an endless loop and consumes crawl jobs from the crawl queue and publishes its result on the results queue until it is told to stop or the crawl queue was closed.
type P2PResult ¶
type P2PResult struct { RoutingTable *RoutingTable // The agent version of the crawled peer Agent string // The protocols the peer supports Protocols []string // Any error that has occurred when connecting to the peer ConnectError error // The above error transferred to a known error ConnectErrorStr string // Any error that has occurred during fetching neighbor information CrawlError error // The above error transferred to a known error CrawlErrorStr string // When was the connection attempt made ConnectStartTime time.Time // As it can take some time to handle the result we track the timestamp explicitly ConnectEndTime time.Time }
type Persister ¶
type Persister struct {
// contains filtered or unexported fields
}
Persister handles the insert/upsert/update operations for a particular crawl result.
func NewPersister ¶
NewPersister initializes a new persister based on the given configuration.
func (*Persister) StartPersisting ¶
func (p *Persister) StartPersisting(ctx context.Context, persistQueue *queue.FIFO[Result], resultsQueue *queue.FIFO[*db.InsertVisitResult])
StartPersisting enters an endless loop and consumes persist jobs from the persist queue until it is told to stop or the persist queue was closed.
type Result ¶
type Result struct { // The crawler that generated this result CrawlerID string // The crawled peer Peer peer.AddrInfo // The neighbors of the crawled peer RoutingTable *RoutingTable // Indicates whether the above routing table information was queried through the API. // The API routing table does not include MultiAddresses, so we won't use them for further crawls. RoutingTableFromAPI bool // The agent version of the crawled peer Agent string // The protocols the peer supports Protocols []string // Any error that has occurred when connecting to the peer ConnectError error // The above error transferred to a known error ConnectErrorStr string // Any error that has occurred during fetching neighbor information CrawlError error // The above error transferred to a known error CrawlErrorStr string // When was the crawl started CrawlStartTime time.Time // When did this crawl end CrawlEndTime time.Time // When was the connection attempt made ConnectStartTime time.Time // As it can take some time to handle the result we track the timestamp explicitly ConnectEndTime time.Time // Whether kubos RPC API is exposed IsExposed null.Bool }
Result captures data that is gathered from crawling a single peer.
func (*Result) ConnectDuration ¶
ConnectDuration returns the time it took to connect to the peer. This includes dialing and the identity protocol.
func (*Result) CrawlDuration ¶
CrawlDuration returns the time it took to crawl to the peer (connecting + fetching neighbors)
type RoutingTable ¶
type RoutingTable struct { // PeerID is the peer whose neighbors (routing table entries) are in the array below. PeerID peer.ID // The peers that are in the routing table of the above peer Neighbors []peer.AddrInfo // First error that has occurred during crawling that peer Error error // Little Endian representation of at which CPLs errors occurred during neighbors fetches. // errorBits tracks at which CPL errors have occurred. // 0000 0000 0000 0000 - No error // 0000 0000 0000 0001 - An error has occurred at CPL 0 // 1000 0000 0000 0001 - An error has occurred at CPL 0 and 15 ErrorBits uint16 }
RoutingTable captures the routing table information and crawl error of a particular peer
func (*RoutingTable) PeerIDs ¶
func (rt *RoutingTable) PeerIDs() []peer.ID
type Scheduler ¶
type Scheduler struct {
// contains filtered or unexported fields
}
The Scheduler handles the scheduling and managing of
a) crawlers - They consume a queue of peer address information, visit them and publish their results on a separate results queue. This results queue is consumed by this scheduler and further processed b) persisters - They consume a separate persist queue. Basically all results that are published on the crawl results queue gets passed on to the persisters. However, the scheduler investigates the crawl results and builds up aggregate information for the whole crawl. Letting the persister directly consume the results queue would not allow that.
func NewScheduler ¶
NewScheduler initializes a new libp2p host and scheduler instance.
func (*Scheduler) CrawlNetwork ¶
CrawlNetwork starts the configured amount of crawlers and fills the crawl queue with bootstrap nodes to start with. These bootstrap nodes will be enriched by nodes we have seen in the past from the database. It also starts the persisters
func (*Scheduler) TotalErrors ¶
TotalErrors counts the total amount of errors - equivalent to undialable peers during this crawl.