webrisk

package module
v0.0.0-...-0d3a5d5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 4, 2024 License: Apache-2.0 Imports: 31 Imported by: 7

README

Web Risk Client App | Container & Go

Web Risk is the enterprise version of Google's Safe Browsing API that protects 5 Billion devices globally from dangerous URLs including phishing, malware, unwanted software, and social engineering.

This client implements the Web Risk Update API, which allows for URLs to be checked for badness via privacy-preserving and low-latency API. It works out-of-the-box via either Docker or Go.

This README provides a quickstart guide to running a client either with Docker or as Go binaries. It also serves as a reference implementation of the API. The GoDoc and API documentation in the .go source files provide more details on fine-tuning the parameters if desired.

Supported clients:

  • wrserver runs a thin HTTP client that can query URLs via a POST request or a redirection endpoint that diverts bad URLs to a warning page. This is the client wrapped by Docker.
  • wrlookup is a command line service that takes URLs from STDIN and outputs results to STDOUT. It can accept multiple URLs at a time on separate lines.

Supported blocklists:

The client is originally forked from the Safebrowsing Go Client.

Enable Web Risk

To begin using Web Risk, you will need a GCP Account and a project to work in.

  1. Enable the Web Risk API.

  2. Create an API Key.

  3. Enable Billing for your account and make sure it's linked to your project.

Install Docker and/or Go

To use the Container App, you will need Docker. To compile binaries from source or run tests install Go.

We have included a Dockerfile to accelerate and simplify onboarding. This container wraps the wrserver binary detailed below.

Clone and Build Container

Building the container is straightforward.

First, clone this repo into a local directory.

git clone https://github.com/google/webrisk && cd webrisk

Build the container. This will run all tests before compiling wrserver into a distroless container.

docker build --tag wr-container .

Run Container

We supply the APIKEY as an environmental variable to the container at runtime so that the API Key is not revealed as part of the docker file or in docker ps. This example also provides a port binding.

docker run -e APIKEY=XXXXXXXXXXXXXXXXXXXXXXX -p 8080:8080 wr-container

wrserver defaults to port 8080, but you can bind any port on the host machine. See the Docker documentation for details.

See Using wrserver below for how to query URLs or use the redirection endpoint.

Go Binary Quickstart | wrlookup example

The Go Client can be compiled and run directly without Docker. In this example we will use that to run the wrlookup binary that takes URLs from STDIN and outputs to STDOUT.

Before compiling from source you should install Go and have some familiarity with Go development. See here for a good place to get started.

Clone Source & Install Dependencies

To download and install this branch from the source, run the following commands.

First clone this repo into a local directory and switch to the webrisk directory.

git clone https://github.com/google/webrisk && cd webrisk

Next, install dependencies.

go install .

Build and Execute wrlookup

After installing dependencies, you can build and run wrlookup

go build -o wrlookup cmd/wrlookup/main.go

Run the binary and supply an API key.

./wrlookup -apikey=XXXXXXXXXXXXXXXXXXXXXXX

You should see some output similar to below as wrlookup starts up.

webrisk: 2023/01/27 19:36:46 database.go:110: no database file specified
webrisk: 2023/01/27 19:36:53 database.go:384: database is now healthy
webrisk: 2023/01/27 19:36:53 webrisk_client.go:492: Next update in 30m29s

wrlookup will take any URLs from STDIN. Test your configuration with a sample:

http://testsafebrowsing.appspot.com/s/social_engineering_extended_coverage.html #input
Unsafe URL: [SOCIAL_ENGINEERING_EXTENDED_COVERAGE] # output

Using wrserver

wrserver runs a WebRisk API lookup proxy that allows users to check URLs via a simple JSON API. This local API will use the API key supplied by the Docker container or the command line that runs the binary.

First start the wrserver by either running the container or binary.

To run in Docker:

docker run -e APIKEY=XXXXXXXXXXXXXXXXXXXXXXX -p 8080:8080 <container_name>

To run from a CLI, compile as wrlookup above and run:

./wrserver -apikey=XXXXXXXXXXXXXXXXXXXXXXX

With the default settings this will start a local server at 0.0.0.0:8080.

The server has a lightweight implementation of a Web Risk Lookup API-like endpoint at v1/uris:search. To use the local endpoint to check a URL, send a POST request to 0.0.0.0:8080/v1/uris:search with the a JSON body similar to the following.

{
  "uri":"http://testsafebrowsing.appspot.com/s/social_engineering_extended_coverage.html"
}

A sample cURL command:

curl -H 'Content-Type: application/json' \
	-d '{"uri":"http://testsafebrowsing.appspot.com/s/social_engineering_extended_coverage.html"}' \
	-X POST '0.0.0.0:8080/v1/uris:search'

See Sample URLs below to test the different blocklists.

wrserver also serves a URL redirector listening on /r?url=... which will show an interstitial for anything marked unsafe.

If the URL is safe, the client is automatically redirected to the target. Otherwise an interstitial warning page is shown as recommended by Web Risk.

Try some sample URLs:

http://0.0.0.0:8080/r?url=https://testsafebrowsing.appspot.com/s/social_engineering_extended_coverage.html
http://0.0.0.0:8080/r?url=https://testsafebrowsing.appspot.com/s/malware.html
http://0.0.0.0:8080/r?url=https://www.google.com/
Differences from Web Risk Lookup API

There are two significant differences between this local endpoint and the public v1/uris:search endpoint:

  • The public endpoint accepts GET requests instead of POST requests.
  • The local wrserver endpoint uses the privacy-preserving and lower latency Update API making it better suited for higher-demand use cases.

Sample URLs

For testing the blocklists, you can use the following URLs:

Troubleshooting

4XX Errors

If you start the client without proper credentials or project set up, you will see an error similar to what is shown below on startup:

webrisk: 2023/01/27 19:36:13 database.go:217: ListUpdate failure (1): webrisk: unexpected server response code: 400

For 400 errors, this usually means the API key is incorrect or was not supplied correctly.

For 403 errors, this could mean the Web Risk API is not enabled for your project or your project does not have Billing enabled.

Configuration

Both wrserver (used by docker run) and wrlookup support several command line flags.

  • apikey (required) -- Used to Authenticate requests with the Web Risk API. The API itself must also be enabled on the same project & be linked to a Billing account.

  • threatTypes (optional) -- A comma-separated lists of different blocklists to load and check URLs against. Available options include MALWARE,UNWANTED_SOFTWARE,SOCIAL_ENGINEERING, SOCIAL_ENGINEERING_EXTENDED_COVERAGE. This arg will also accept ALL which is the default behavior.

  • maxDiffEntries (optional) -- An int32 value that will set the max number of hash prefixes returned in a single diff request. This can be used in resource-bound environments to control bandwidth usage. The default value of 0 will result in this limit being ignored. Otherwise, this must be set to a positive integer which must be a power of 2 between 2 ^ 10 and 2 ^ 20.

  • maxDatabaseEntries (optional) -- An in32 value that will set the upper boundary has prefixes to be returned from the API and stored locally. This can be used to limit the number of hash prefixes to be searched against. The default value of 0 will result in this limit being ignored. Otherwise, this must be set to a positive integer which must be a power of 2 between 2 ^ 10 and 2 ^ 20. Note: Setting this limit will decrease blocklist coverage.

About the Social Engineering Extended Coverage List

This is a newer blocklist that includes a greater range of risky URLs that are not included in the Safebrowsing blocklists shipped to most browsers. The extended coverage list offers significantly more coverage, but may have a higher number of false positives. For more details, see here.

WebRisk System Test

To perform an end-to-end test on the package with the WebRisk backend, run the following command after exporting your API key as $APIKEY:

go test github.com/google/webrisk -v -run TestWebriskClient

Documentation

Overview

Copyright 2023 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Package webrisk implements a client for the Web Risk API v4.

At a high-level, the implementation does the following:

            hash(query)
                 |
            _____V_____
           |           | No
           | Database  |-----+
           |___________|     |
                 |           |
                 | Maybe?    |
            _____V_____      |
       Yes |           | No  V
     +-----|   Cache   |---->+
     |     |___________|     |
     |           |           |
     |           | Maybe?    |
     |      _____V_____      |
     V Yes |           | No  V
     +<----|    API    |---->+
     |     |___________|     |
     V                       V
(Yes, unsafe)            (No, safe)

Essentially the query is presented to three major components: The database, the cache, and the API. Each of these may satisfy the query immediately, or may say that it does not know and that the query should be satisfied by the next component. The goal of the database and cache is to satisfy as many queries as possible to avoid using the API.

Starting with a user query, a hash of the query is performed to preserve privacy regarded the exact nature of the query. For example, if the query was for a URL, then this would be the SHA256 hash of the URL in question.

Given a query hash, we first check the local database (which is periodically synced with the global Web Risk API servers). This database will either tell us that the query is definitely safe, or that it does not have enough information.

If we are unsure about the query, we check the local cache, which can be used to satisfy queries immediately if the same query had been made recently. The cache will tell us that the query is either safe, unsafe, or unknown (because the it's not in the cache or the entry expired).

If we are still unsure about the query, then we finally query the API server, which is guaranteed to return to us an authoritative answer, assuming no networking failures.

Index

Constants

View Source
const (
	// DefaultServerURL is the default URL for the Web Risk API.
	DefaultServerURL = "webrisk.googleapis.com"

	// DefaultUpdatePeriod is the default period for how often UpdateClient will
	// reload its blocklist database.
	DefaultUpdatePeriod = 30 * time.Minute

	// DefaultID is the client ID sent with each API call.
	DefaultID = "WebRiskContainer"
	// DefaultVersion is the Version sent with each API call.
	DefaultVersion = "1.0.0"

	// DefaultRequestTimeout is the default amount of time a single
	// api request can take.
	DefaultRequestTimeout = time.Minute
)
View Source
const (
	ThreatTypeUnspecified               = ThreatType(pb.ThreatType_THREAT_TYPE_UNSPECIFIED)
	ThreatTypeMalware                   = ThreatType(pb.ThreatType_MALWARE)
	ThreatTypeSocialEngineering         = ThreatType(pb.ThreatType_SOCIAL_ENGINEERING)
	ThreatTypeUnwantedSoftware          = ThreatType(pb.ThreatType_UNWANTED_SOFTWARE)
	ThreatTypeSocialEngineeringExtended = ThreatType(pb.ThreatType_SOCIAL_ENGINEERING_EXTENDED_COVERAGE)
)

List of ThreatType constants.

Variables

DefaultThreatLists is the default list of threat lists that UpdateClient will maintain. If you modify this variable, you must refresh all saved database files.

Functions

func ValidURL

func ValidURL(url string) bool

ValidURL parses the given string and returns true if it is a Web Risk compatible URL.

In general, clients can (and should) just call LookupURLs, which performs the same checks internally. This method can be useful when checking a batch of URLs, as the first parse failure will cause LookupURLs to stop processing the request and return an error.

Types

type Config

type Config struct {
	// ServerURL is the URL for the Web Risk API server.
	// If empty, it defaults to DefaultServerURL.
	ServerURL string

	// ProxyURL is the URL of the proxy to use for all requests.
	// If empty, the underlying library uses $HTTP_PROXY environment variable.
	ProxyURL string

	// APIKey is the key used to authenticate with the Web Risk API
	// service. This field is required.
	APIKey string

	// ID and Version are client metadata associated with each API request to
	// identify the specific implementation of the client.
	// They are similar in usage to the "User-Agent" in an HTTP request.
	// If empty, these default to DefaultID and DefaultVersion, respectively.
	ID      string
	Version string

	// DBPath is a path to a persistent database file.
	// If empty, UpdateClient operates in a non-persistent manner.
	// This means that blocklist results will not be cached beyond the lifetime
	// of the UpdateClient object.
	DBPath string

	// UpdatePeriod determines how often we update the internal list database.
	// If zero value, it defaults to DefaultUpdatePeriod.
	UpdatePeriod time.Duration

	// ThreatListArg is an optional string that will be parsed into ThreatLists.
	// It is expected that names will be an exact match and comma-separated.
	// For Example: 'MALWARE,SOCIAL_ENGINEERING'.
	// Will also accept 'ALL' and load all threat types.
	// If empty, ThreatLists will be loaded instead.
	ThreatListArg string

	// MaxDiffEntries sets the maximum entries to request in a single call to ComputeThreatListDiff.
	// This can be used in resource-constrained environments to limit the number of entries fetched
	// at once. The default behavior (0) is to ignore this limit.
	// If set, this should be a power of 2 between 2 ** 10 and 2 ** 20.
	MaxDiffEntries int32

	// MaxDatabaseEntries sets the maximum entries that the client will accept & store in the local
	// database. This can be used to limit the size of the blocklist at the trade-off of decreased
	// coverage. The default behavior (0) is to ignore this limit.
	// If set, this should be a power of 2 between 2 ** 10 and 2 ** 20.
	MaxDatabaseEntries int32

	// ThreatLists determines which threat lists that UpdateClient should
	// subscribe to. The threats reported by LookupURLs will only be ones that
	// are specified by this list.
	// If empty, it defaults to DefaultThreatLists.
	ThreatLists []ThreatType

	// RequestTimeout determines the timeout value for the http client.
	RequestTimeout time.Duration

	// Logger is an io.Writer that allows UpdateClient to write debug information
	// intended for human consumption.
	// If empty, no logs will be written.
	Logger io.Writer
	// contains filtered or unexported fields
}

Config sets up the UpdateClient object.

type Stats

type Stats struct {
	QueriesByDatabase int64         // Number of queries satisfied by the database alone
	QueriesByCache    int64         // Number of queries satisfied by the cache alone
	QueriesByAPI      int64         // Number of queries satisfied by an API call
	QueriesFail       int64         // Number of queries that could not be satisfied
	DatabaseUpdateLag time.Duration // Duration since last *missed* update. 0 if next update is in the future.
}

Stats records statistics regarding UpdateClient's operation.

type ThreatType

type ThreatType uint16

ThreatType is an enumeration type for threats classes. Examples of threat classes are malware, social engineering, etc.

func (ThreatType) String

func (tt ThreatType) String() string

type URLThreat

type URLThreat struct {
	Pattern string
	ThreatType
}

A URLThreat is a specialized ThreatType for the URL threat entry type.

type UpdateClient

type UpdateClient struct {
	// contains filtered or unexported fields
}

UpdateClient is a client implementation of API v4.

It provides a set of lookup methods that allows the user to query whether certain entries are considered a threat. The implementation manages all of local database and caching that would normally be needed to interact with the API server.

func NewUpdateClient

func NewUpdateClient(conf Config) (*UpdateClient, error)

NewUpdateClient creates a new UpdateClient.

The conf struct allows the user to configure many aspects of the UpdateClient's operation.

func (*UpdateClient) Close

func (wr *UpdateClient) Close() error

Close cleans up all resources. This method must not be called concurrently with other lookup methods.

func (*UpdateClient) LookupURLs

func (wr *UpdateClient) LookupURLs(urls []string) (threats [][]URLThreat, err error)

LookupURLs looks up the provided URLs. It returns a list of threats, one for every URL requested, and an error if any occurred. It is safe to call this method concurrently.

The outer dimension is across all URLs requested, and will always have the same length as urls regardless of whether an error occurs or not. The inner dimension is across every fragment that a given URL produces. For some URL at index i, one can check for a hit on any blocklist by checking if len(threats[i]) > 0. The ThreatEntryType field in the inner ThreatType will be set to ThreatEntryType_URL as this is a URL lookup.

If an error occurs, the caller should treat the threats list returned as a best-effort response to the query. The results may be stale or be partial.

func (*UpdateClient) LookupURLsContext

func (wr *UpdateClient) LookupURLsContext(ctx context.Context, urls []string) (threats [][]URLThreat, err error)

LookupURLsContext looks up the provided URLs. The request will be canceled if the provided Context is canceled, or if Config.RequestTimeout has elapsed. It is safe to call this method concurrently.

See LookupURLs for details on the returned results.

func (*UpdateClient) Status

func (wr *UpdateClient) Status() (Stats, error)

Status reports the status of UpdateClient. It returns some statistics regarding the operation, and an error representing the status of its internal state. Most errors are transient and will recover themselves after some period.

func (*UpdateClient) WaitUntilReady

func (wr *UpdateClient) WaitUntilReady(ctx context.Context) error

WaitUntilReady blocks until the database is not in an error state. Returns nil when the database is ready. Returns an error if the provided context is canceled or if the UpdateClient instance is Closed.

Directories

Path Synopsis
cmd
wrserver/statik
Package statik contains static assets.
Package statik contains static assets.
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL