image_fetcher

package module

v0.0.0-...-1cbef6d Latest Latest Go to latest Published: Jul 20, 2024 License: Apache-2.0 Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/flew1x/image_fetcher

Links

Open Source Insights

README ¶

Web Crawler

A simple web crawler written in Go that downloads images from web pages and supports configurable request timeouts and depth limits.

Features

Crawl Web Pages: Recursively crawls web pages starting from a given URL.
Download Images: Downloads images found on the web pages.
Configurable Request Delay: Allows setting a delay between requests to avoid spamming.
Depth Limit: Controls the depth of the crawl to avoid excessively deep traversals.

Installation

Clone the repository:

git clone https://github.com/flew1x/image_fetcher
cd image_fetcher

Usage

Run the application from the command line with the required flags. The application accepts the following flags:

-f The starting URL for the crawl.
-p: The directory path where downloaded images will be saved.
-e or --depth: The maximum depth of the crawl.
-t or --timeout: The delay between requests, specified as a duration (e.g., 10s, 1m).

Example To run the web crawler with a 2-minute delay between requests, crawling from http://example.com, saving images to ./images, and with a maximum depth of 3:

go run main.go -f http://example.com -p ./images -e 3 -t 2m

Also you can build the app with make build

Documentation ¶

Index ¶

type Crawler
- func NewCrawler(p Parser, d Downloader, path string, delay time.Duration) *Crawler
- func (c *Crawler) Crawl(startURL string, maxDepth int) error
type Downloader
type HTMLParser
- func NewHTMLParser() *HTMLParser
- func (p *HTMLParser) Parse(urlStr string) (imageSources []string, links []string, err error)
type HTTPDownloader
- func NewHTTPDownloader() *HTTPDownloader
- func (d *HTTPDownloader) Download(source string, baseURL string, path string) error
type Parser

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Crawler ¶

type Crawler struct {
	// contains filtered or unexported fields
}

func NewCrawler ¶

func NewCrawler(p Parser, d Downloader, path string, delay time.Duration) *Crawler

func (*Crawler) Crawl ¶

func (c *Crawler) Crawl(startURL string, maxDepth int) error

Crawl performs a breadth-first crawl of the web starting from the given URL.

type Downloader ¶

type Downloader interface {
	Download(src string, baseURL string, path string) error
}

type HTMLParser ¶

type HTMLParser struct{}

func NewHTMLParser ¶

func NewHTMLParser() *HTMLParser

func (*HTMLParser) Parse ¶

func (p *HTMLParser) Parse(urlStr string) (imageSources []string, links []string, err error)

Parse parses the HTML content from the given URL and extracts the image sources (src attributes) of all <img> tags.

type HTTPDownloader ¶

type HTTPDownloader struct{}

func NewHTTPDownloader ¶

func NewHTTPDownloader() *HTTPDownloader

func (*HTTPDownloader) Download ¶

func (d *HTTPDownloader) Download(source string, baseURL string, path string) error

Download downloads an image from the specified source URL.

Parameters: - src: the URL of the image to download. - baseURL: the base URL for resolving relative URLs. - path: the directory path to save the downloaded image. Returns: - error: an error if there was an issue during the download process.

type Parser ¶

type Parser interface {
	Parse(url string) (imageSources []string, links []string, err error)
}

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
main
utils

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL