crawlerdetect

package
v0.1.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 12, 2024 License: MIT Imports: 7 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// Baidu is the largest search engine in China, providing various services beyond search such as maps, news, images, etc.
	Baidu = "baidu"

	// Bing is a web search engine owned and handled by Microsoft. It provides web search services to users and has a variety of features such as search, video, images, maps, and more.
	Bing = "bing"

	// Google is globally recognized and is the most used search engine. It handles over three billion searches each day and offers services beyond search like Gmail, Google Docs, etc.
	Google = "google"

	// SoGou is another Chinese search engine, owned by Sohu, Inc. It's the default search engine of Tencent's QQ soso.com, sogou.com, and Firefox in China.
	SoGou = "sogou"
)

Declaration of constants that represent four different search engines.

Variables

This section is empty.

Functions

This section is empty.

Types

type Strategy

type Strategy interface {
	// CheckCrawler is a method that takes an IP address as input and returns a boolean
	// and an error. The boolean indicates whether the given IP address belongs to a crawler
	// or bot, and the error provides details if something went wrong during the check.
	// The implementation of this method should contain the logic for determining crawler activity.
	CheckCrawler(ip string) (bool, error)
}

Strategy is an interface defining the methods that all crawler check strategies must implement. Different search engines may have their own implementation of the Strategy interface to accommodate their specific methods for detecting crawlers.

func InitCrawlerDetector

func InitCrawlerDetector(crawler string) Strategy

InitCrawlerDetector is a function that retrieves a Strategy instance from a pre-defined map of strategies called strategyMap, based on the specified crawler string. Each crawler in the map is associated with a specific initialization function for its strategy, which is assumed to have been initialized earlier and stored in the strategyMap. This function acts as a lookup to fetch the appropriate Strategy instance for a given crawler.

Parameters:

  • crawler: A string that identifies the crawler whose Strategy instance needs to be retrieved. It acts as a key to the strategyMap.

Returns:

  • Strategy: A Strategy instance associated with the provided crawler string. If the crawler string does not exist in the map, the function returns a nil value.

Usage Notes:

  • The strategyMap is a global variable where the key is a string representing the crawler's name, and the value is an instance of a Strategy implementation specific to that crawler.
  • The provided crawler string should match one of the keys in the strategyMap for the function to return a valid Strategy instance.
  • If the crawler string is not found in the strategyMap or is misspelled, the function will return a nil value, which the caller must check for before proceeding to use the returned Strategy instance.

Example:

  • To retrieve the Strategy associated with 'Google', you would call: googleStrategy := InitCrawlerDetector("Google")

type UniversalStrategy

type UniversalStrategy struct {
	Hosts []string // Hosts is a slice of strings that contains the hostnames or IP addresses used to identify crawlers.

}

UniversalStrategy is a structure that holds information relevant to a generic approach for checking crawlers across different search engines.

func InitUniversalStrategy

func InitUniversalStrategy(hosts []string) *UniversalStrategy

InitUniversalStrategy is a function that initializes a UniversalStrategy instance. It takes a slice of hosts as input, which represent the hostnames or IP addresses used to identify crawlers. The function returns a pointer to a new UniversalStrategy instance that embeds this input data.

Parameters:

  • Hosts: This is a slice of strings that hold hostnames/IP addresses for crawler identification.

The function constructs a UniversalStrategy struct and sets its internal "Hosts" field to the input slice of hostnames/IP addresses.

func (*UniversalStrategy) CheckCrawler

func (s *UniversalStrategy) CheckCrawler(ip string) (bool, error)

CheckCrawler is a method associated with the UniversalStrategy struct, intended to determine if a given IP address belongs to a known crawler, typically employed by search engines. It operates by performing a reverse lookup of the IP to obtain hostnames and then matching these against the UniversalStrategy's list of hosts that are known to be crawlers.

Parameters:

  • IP: A string representing the IP address to be checked.

Returns:

  • bool: True if the IP address is identified as a crawler, false otherwise.
  • error: Any error encountered during the execution of the IP lookup or subsequent operations.

The method performs the following steps:

  1. It executes a reverse DNS lookup on the given IP address to retrieve associated hostnames.
  2. If an error occurs during the lookup, it returns false along with the error.
  3. If no hostnames are found, it means the IP cannot be linked to any crawler and returns false.
  4. If hostnames are found, it attempts to match them with known crawler hosts in the UniversalStrategy's list. This is done through a custom matchHost method that is not shown here.
  5. If there's no match, it returns false.
  6. If a match is found, it then performs a forward IP lookup for the matched hostname to verify the IP address.
  7. If the forward lookup yields an error, it returns false and the error.
  8. Finally, it checks if the list of IPs from the forward lookup of the hostname contains the original IP address. If so, it confirms the IP address belongs to a known crawler and returns true; otherwise, it returns false.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL