links-r-us

module
v0.0.0-...-7b67181 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2023 License: MIT

README

Example project made following instructions from Achilleas Anagnostopoulos's "Hands-On Software Engineering with Golang". It is an adaptation of the original repository, updated to use newer packages and reorganized to suit my liking.

Links 'R' Us is a web crawler and search engine that uses PageRank to order its results. It provides a frontend for submitting new links to index or search terms to look up. Note that the crawler only supports (non-code) text-based content (i.e. HTML and TXT).

Links 'R' Us provides both an in-memory and external data store/text indexer in the form of CockroachDB and Elasticsearch, respectively. It can be extended to use any data store or text indexer implementation, granted it satisfies the required interfaces.

Requirements

This project makes use of the following programs. It is not required to install any of them since the monolithic version can run entirely in-memory. The particular versions I used have been listed in (parentheses):

The following are needed to run the distributed version of the project:

Currently, the distributed version does not work. The Makefiles, instructions, and manifests for running the distributed version have been provided, however they may need to be edited in order to work properly with this layout.

Usage

There are two versions of the program: a monolithic version that compiles to a single binary and can be run on your machine, and a microservice-based version that is meant to be deployed on a Kubernetes cluster. As of now, the Makefile targets for deploying this cluster do not work.

Monolith
Usage of monolith:
  -crawler-num-workers int
        The number of workers to use for crawling web-pages (defaults to number of CPUs) (default 8)
  -crawler-reindex-threshold duration
        The minimum amount of time before re-indexing an already-crawled link (default 168h0m0s)
  -crawler-update-interval duration
        The time between subsequent crawler runs (default 5m0s)
  -frontend-listen-addr string
        The address to listen for incoming front-end requests (default ":8080")
  -frontend-max-summary-length int
        The maximum length of the summary for each matched document in characters (default 256)
  -frontend-results-per-page int
        The number of entries for each search result page (default 10)
  -link-graph-uri string
        The URI for connecting to the link-graph (supported URIs: in-memory://, postgresql://user@host:26257/linkgraph?sslmode=disable) (default "in-memory://")
  -pagerank-num-workers int
        The number of workers to use for calculating PageRank scores (defaults to number of CPUs) (default 8)
  -pagerank-update-interval duration
        The time between subsequent PageRank score updates (default 1h0m0s)
  -partition-detection-mode string
        The partition detection mode to use. Supported values are 'dns=HEADLESS_SERVICE_NAME' (k8s) and 'single' (local dev mode) (default "single")
  -text-indexer-uri string
        The URI for connecting to the text indexer (supported URIs: in-memory://, es://node1:9200,...,nodeN:9200) (default "in-memory://")

Default values have been provided for all flags, so the monolithic version works right out of the box:

go run cmd/monolith

If you've installed CockroachDB, you can use the following make targets (in order) to get it set up:

make cdb/start
make cdb/setup
make cdb/migrations/up

If you've made changes to the CockroachDB DSN string, you should update the .envrc file with its correct value. When supplying a value for the -link-graph-uri flag, you should use postgres:// for the URI scheme, not cockroachdb://.

If you've installed Elasticsearch, you can use the following target to get it set up:

make es/start

If you are not on Windows, you should update the target to use /bin/elasticsearch instead of /bin/elasticsearch.bat.

Note that this target requires the use of an ES_HOME environment variable set to the location of your Elasticsearch installation.

Microservices

Makefiles for deploying both your monolithic and microservice-based Kubernetes cluster can be found under the k8s/ directory. Both directories provide their own README instructions, but in either case the steps are more or less the same.

For an example of what the output should look like, see these two videos: https://youtu.be/Oevqe9kKkKc https://youtu.be/w01k604kQ8o

Start the Minikube cluster using the following target:

make bootstrap-minikube

Add your cluster's Docker registry to the top-level object of your Docker daemon:

{
  "insecure-registries": [
    "<minikube-ip>:5000"
  ]
}

Build and push all required Docker images to your registry:

make dockerize-and-push

Once your containers have been built, you can deploy the application:

make deploy

If successful, the frontend of the application will be available at http://<minikube-ip>.

There are two options for cleaning up the application:

kubectl delete namespace linksrus

This will delete the namespaces, but will retain the CockroachDB and Elasticsearch databases. This will be useful if you want to switch between the monolithic and distributed versions of the application.

make purge

This is a more complete version of make deploy. This deletes our namespaces as well as any CockroachDB and Elasticsearch data.

To fully clean everything up, run the following commands:

docker system prune
minikube delete

You may need to wait a bit before running minikube delete since docker system prune will write to some logs needed by it.

Examples

If your application was launched, visiting http://<minikube-ip> or http://localhost:<port> (depending on how you deployed it) will result in the following homepage:

home.png

Submit a link to be crawled in the next pass:

submit-site.png

If you're running the application for the first time, you may not have anything indexed yet:

search-fail.png

Wait a bit for the crawler to index all of its links, then run your search query again:

search-success.png

Directories

Path Synopsis
Package bsp provides primitives for working with the Bulk Synchronous Parallel model.
Package bsp provides primitives for working with the Bulk Synchronous Parallel model.
cmd
mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.
mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.
graphapi/mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.
indexapi/mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.
Package inmem provides an in-memory graph implementation.
Package inmem provides an in-memory graph implementation.
mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL