crawler

package module

v1.0.1 Latest Latest Go to latest Published: Apr 24, 2023 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/STRockefeller/article-crawler

Links

Open Source Insights

README ¶

article-crawler

Description

This package is a simple web crawler written in Go, that extracts text content from a given URL by recursively traversing the HTML document tree and selecting certain HTML tags. The tags selected for extraction include p, h1, h2, h3, h4, h5, h6, ul, ol, pre, and blockquote.

Installation

To use this package, you will need to have Go installed on your system. Once you have Go installed, you can add the package to your project using the following command:

go get github.com/STRockefeller/article-crawler

Usage

To use the crawler, simply call the Crawl function with the URL you want to crawl as its argument. The function will return a string containing the extracted text content.

package main

import (
	"fmt"
	"github.com/STRockefeller/article-crawler"
)

func main() {
	url := "https://example.com"
	text := crawler.Crawl(url)
	fmt.Println(text)
}