2018-grpc-takehome

command module
v0.0.0-...-8c873f1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 25, 2024 License: MIT Imports: 13 Imported by: 0

README

Technical test: Go developer

The following test is to be implemented in Go and while you can take as much time as you need,
it's not expected that you spend more than 3 or 4 hours on it.

This exercise was a very good learning experience for me, as I'd not used protobuf or gRPC, and we had not used channels and semaphores much at all at WhiteHat. I spent quite a bit of time learning how to use them in this context, and if nothing else, I certainly learned a lot of new idioms and techniques.

The test consists of implementing a "Web Crawler as a gRPC service". The application consists of
a command line client and a local service which runs the actual web crawling. The communication
between client and server should be defined as a gRPC service (*). For each URL, the Web Crawler,
creates a "site tree", which is a tree of links with the root of the tree being the root URL. The 
crawler should only follow links on the domain of the provided URL and not follow external links.
Bonus points for making it as fast as possible.

Notes on the implementation below.

gRPC Web Crawl Server

    server [-tls --tls_cert_file=<cert> --tls_key_file=<key>]
           [--port=10000] [-debug] [-mock]

If TLS is to be used, all three of the TLS items (tls, tls_cert_file', tls_key_file`) must be supplied. Port defaults to 10000 unless otherwise specified. (2024 followup note: this was before Let's Encrypt was easy to use, so I was doing this all by hand. I'd certainly use it now.)

If debug is supplied, the server's debug logging is enabled. (Setting the TESTING environment variable to a non-null value also enables debugging.)

If mock is supplied, the server uses the mock URL fetcher for all URL operations. This can be useful if you need to verify operation of the crawler without actually crawling any sites.

The application consists of a command line client and a local service which does the actual web crawling. Client and server communicate via gRPC[1].

The client supplies one or more URLs to the crawler, which creates a "site tree" -- a tree of links with the root of the tree being the supplied root URL.

The crawler will not follow links off the domain of the root URL, but will record those offsite links. Links that can't be followed or parsed will be recorded .

gRPC CLI Client

The client provides the following operations:

  • crawl start www.example.com
  • Starts crawling at www.example.com, only following links on example.com.
  • crawl stop www.example.com
  • Stops crawling of example.com.
  • crawl status www.example.com
  • Shows the crawl status for the supplied URL.
  • crawl show
  • Displays the crawled URLs as a tree structure.

The CLI uses the Cobra CLI library, allowing us to have a CLI similar to Docker or Kubernetes.

Building it

All of the external dependencies are in the dep configuration; install dep and run dep ensure to install them. make will build the CLI client and the server.

Testing it

The tests use ginkgo, so you will need to install it:

go get -u github.com/onsi/ginkgo/ginkgo  # installs the ginkgo CLI
go get -u github.com/onsi/gomega/...     # fetches the matcher library

This will be installed in your $GOPATH/bin; if that's not in your PATH, you can run the tests with $GOPATH/bin/ginkgo -r.

Running it

make run will build the client and server, and also kill any old server and start a new one for you.

./crawl start <url>    # Starts up a new crawl
./crawl stop <url>     # Pauses a crawl
./crawl status <url>   # Shows status of the URL crawl.
./crawl show <url>     $ Displays a tree representation of the crawled URLs.

External dependencies of note

Notes on the implementation

The basic architecture of the crawler is based on the one in the Go Tour; I've switched up the recursion limit check to be an off-domain check. Integrating gotree to record the structure made the process of creating the site tree very simple.

Having the Fetcher be an interface made it much easier to work through the implementation process, and made it very easy to swap implementations at runtime for the server.

Colly made the actual web fetch and parse very simple, and I plan to use it again in later projects.

Notes to readers in 2024

This test is six years old at this point, but is not-terrible enough that I wanted to post it as an example of a non-trivial use of gRPC, channels, and the like. I found it actually of some use in verifying static sites.

I was ghosted post submission.

There was no NDA, so I feel free to post the code at this point. I was a relatively inexperienced Go programmer at the time -- our microservices at WhiteHat were all basically HTTP servers (not even HTTPS!) that ran SQL queries and either updated the database or returned JSON -- so it was a massive effort to learn all the new technologies in a few days, which I did in good faith; at the time, I was mystified and not a little disappointed to hear nothing back in response. Now, in 2024, I see it was just a foreshadowing of recruiting processes to come.

For other software engineers, remember that the interview process tells you about the employer as well as telling them about you: uncompensated take-home tests tell you that their attitude is that your time isn't valuable; that your free time is theirs to put claims on.

Getting ghosted also tells you that you are not respected, even to the point of providing the minimum "thanks, appreciate your effort here, but we've decided not to move forward".

I would note that I had at the time been evaluating the company's product to see if it was going to work for me as a Dropbox alternative. I decided to not "move forward" too, deleted it, and haven't looked at it since.

Oh, and they spelled my name wrong when creating the original repo. My resume was right there.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
api
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL