dht-crawler

command module

v0.0.0-...-361014b Latest Latest Go to latest Published: Feb 17, 2022 License: MIT Imports: 10 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/MildC/dht-crawler

Links

Open Source Insights

README ¶

A fork from https://github.com/shiyanhui/dht

See the video on the Youtube.

中文版README

Introduction

DHT implements the bittorrent DHT protocol in Go. Now it includes:

It contains two modes, the standard mode and the crawling mode. The standard mode follows the BEPs, and you can use it as a standard dht server. The crawling mode aims to crawl as more metadata info as possiple. It doesn't follow the standard BEPs protocol. With the crawling mode, you can build another BTDigg.

Installation

go get github.com/MildC/dht-crawler/dht

Example

Below is a simple spider. You can move here to see more samples.

import (
    "fmt"
    "github.com/shiyanhui/dht"
)

func main() {
    downloader := dht.NewWire(65535)
    go func() {
        // once we got the request result
        for resp := range downloader.Response() {
            fmt.Println(resp.InfoHash, resp.MetadataInfo)
        }
    }()
    go downloader.Run()

    config := dht.NewCrawlConfig()
    config.OnAnnouncePeer = func(infoHash, ip string, port int) {
        // request to download the metadata info
        downloader.Request([]byte(infoHash), ip, port)
    }
    d := dht.New(config)

    d.Run()
}

Note

The default crawl mode configure costs about 300M RAM. Set MaxNodes and BlackListMaxSize to fit yourself.
Now it cant't run in LAN because of NAT.

TODO

NAT Traversal.
Implements the full BEP-3.
Optimization.

FAQ

Why it is slow compared to other spiders ?

Well, maybe there are several reasons.

DHT aims to implements the standard BitTorrent DHT protocol, not born for crawling the DHT network.
NAT Traversal issue. You run the crawler in a local network.
It will block ip which looks like bad and a good ip may be mis-judged.

License

MIT, read more here

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
dht Package dht implements the bittorrent dht protocol.	Package dht implements the bittorrent dht protocol.
sample
elasticsearch
getpeers
torrent

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL