robotstxt-datastore

module
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 29, 2020 License: BSD-3-Clause

README

robotstxt-datastore CI GitHub release (latest by date)

robotstxt-datastore is a gRPC service to store robots.txt data at a postgres database.

Deploy

You can use the following docker-compose snippet to deploy a postgres database and the gRPC server.

version: "3"
services:

  # the postgres database container
  postgres:
    image: postgres:10-alpine
    container_name: robotstxt_datastore_postgres
    restart: always
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
      POSTGRES_DB: robotstxt
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - robotstxt_datastore_net
    volumes:
      - ./postgres-data:/var/lib/postgresql/data

# the grpc server container
  server:
    image: paulvollmer/robotstxt-datastore:v0.1.1
    container_name: robotstxt_datastoreer_server
    restart: always
    depends_on:
      - postgres
    environment:
      DATABASE_HOST: robotstxt_datastore_postgres
      DATABASE_PORT: 5432
      DATABASE_USER: postgres
      DATABASE_PASSWORD: password
      DATABASE_NAME: robotstxt
      # SENTRY_DSN: http://dsn@localhost:9100/2
    ports:
      - "5000:5000"
    networks:
      - robotstxt_datastore_net

networks:
  robotstxt_datastore_net:

Environment Variables

At the table below yo find all environment variables to configure the gRPC server.

env-var description default value
SERVER_ADDR the grpc server address :5000
DATABASE_HOST the database host localhost
DATABASE_PORT the database port 5432
DATABASE_USER the database user postgres
DATABASE_PASSWORD the database password password
DATABASE_NAME the database name robotstxt
DATABASE_SSLMODE the database ssl mode disable
REFRESH_AFTER the delay after the robots.txt will be reloaded (in seconds) 864000 default set to 10 days
DEFAULT_REQUEST_SCHEME the default scheme used to send requests https
DEFAULT_LIMIT the default list limit 100
USERAGENT the User-Agent used to send the request robotstxtbot
SENTRY_DSN a sentry dsn url

Development

Clone the repository

git clone git@github.com:paulvollmer/robotstxt-datastore.git
gRPC Server

If you change the protobuf sources, you need to recompile by running

cd proto
make

If you change the ent/schema you need to generate the database code

cd server
make ent

Run the tests and build the grpc server

cd server
make test
make build
gRPC Client

With the client you can check, list or get robots.txt data straight from your terminal by sending requests to the grpc server.

cd client
go build
./client check google.com

To refresh a robots.txt you can use the -r flag

./client check -r google.com

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL