image_fetcher

package module
v0.0.0-...-1cbef6d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 20, 2024 License: Apache-2.0 Imports: 11 Imported by: 0

README

Web Crawler

A simple web crawler written in Go that downloads images from web pages and supports configurable request timeouts and depth limits.

Features

  • Crawl Web Pages: Recursively crawls web pages starting from a given URL.
  • Download Images: Downloads images found on the web pages.
  • Configurable Request Delay: Allows setting a delay between requests to avoid spamming.
  • Depth Limit: Controls the depth of the crawl to avoid excessively deep traversals.

Installation

  1. Clone the repository:

    git clone https://github.com/flew1x/image_fetcher
    cd image_fetcher
    
    

Usage

Run the application from the command line with the required flags. The application accepts the following flags:

-f The starting URL for the crawl.
-p: The directory path where downloaded images will be saved.
-e or --depth: The maximum depth of the crawl.
-t or --timeout: The delay between requests, specified as a duration (e.g., 10s, 1m).

Example To run the web crawler with a 2-minute delay between requests, crawling from http://example.com, saving images to ./images, and with a maximum depth of 3:

go run main.go -f http://example.com -p ./images -e 3 -t 2m

Also you can build the app with make build

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

func NewCrawler

func NewCrawler(p Parser, d Downloader, path string, delay time.Duration) *Crawler

func (*Crawler) Crawl

func (c *Crawler) Crawl(startURL string, maxDepth int) error

Crawl performs a breadth-first crawl of the web starting from the given URL.

type Downloader

type Downloader interface {
	Download(src string, baseURL string, path string) error
}

type HTMLParser

type HTMLParser struct{}

func NewHTMLParser

func NewHTMLParser() *HTMLParser

func (*HTMLParser) Parse

func (p *HTMLParser) Parse(urlStr string) (imageSources []string, links []string, err error)

Parse parses the HTML content from the given URL and extracts the image sources (src attributes) of all <img> tags.

type HTTPDownloader

type HTTPDownloader struct{}

func NewHTTPDownloader

func NewHTTPDownloader() *HTTPDownloader

func (*HTTPDownloader) Download

func (d *HTTPDownloader) Download(source string, baseURL string, path string) error

Download downloads an image from the specified source URL.

Parameters: - src: the URL of the image to download. - baseURL: the base URL for resolving relative URLs. - path: the directory path to save the downloaded image. Returns: - error: an error if there was an issue during the download process.

type Parser

type Parser interface {
	Parse(url string) (imageSources []string, links []string, err error)
}

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL