dp-search-data-extractor

command module
v0.13.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 29, 2022 License: MIT Imports: 7 Imported by: 0

README

dp-search-data-extractor

Service to retrieve published data to be used to update a search index This service calls /publisheddata endpoint on zebedee.

This service listens to the "content-published" kafka topic for events of type contentPublishedEvent e.g. "data": { "contentPublishedEvent": { "CollectionID": "", "DataType": "", "URI": "businessindustryandtrade" } }

This service takes the URI, from the consumed event, and calls /publisheddata endpoint on zebedee. It passes in the URI as a path parameter e.g. http://localhost:8082/publisheddata?uri=businessindustryandtrade This service then logs whether Zebedee has retrieved the data successfully.

healthcheck
    make debug and then 
    check http://localhost:25800/health

TODO : we are simply writing the retrieved content to a file for now but in future this will be passed on to another service via an as yet unwritten kafka producer.

Getting started
  • Run make debug

The service runs in the background consuming messages from Kafka. An example event can be created using the helper script, make produce.

Dependencies
  • golang 1.16.x
  • Running instance of zebedee
  • Requires running…
  • No further dependencies other than those defined in go.mod
Configuration
Environment variable Default Description
BIND_ADDR localhost:25800 The host and port to bind to
GRACEFUL_SHUTDOWN_TIMEOUT 5s The graceful shutdown timeout in seconds (time.Duration format)
HEALTHCHECK_INTERVAL 30s Time between self-healthchecks (time.Duration format)
HEALTHCHECK_CRITICAL_TIMEOUT 90s Time to wait until an unhealthy dependent propagates its state to make this app unhealthy (time.Duration format)
KAFKA_ADDR "localhost:9092" The address of Kafka (accepts list)
KAFKA_OFFSET_OLDEST true Start processing Kafka messages in order from the oldest in the queue
KAFKA_VERSION 1.0.2 The version of Kafka
KAFKA_NUM_WORKERS 1 The maximum number of parallel kafka consumers
KAFKA_SEC_PROTO unset (only TLS) if set to TLS, kafka connections will use TLS
KAFKA_SEC_CLIENT_KEY unset PEM [2] for the client key (optional, used for client auth) [1]
KAFKA_SEC_CLIENT_CERT unset PEM [2] for the client certificate (optional, used for client auth) [1]
KAFKA_SEC_CA_CERTS unset PEM [2] of CA cert chain if using private CA for the server cert [1]
KAFKA_SEC_SKIP_VERIFY false ignore server certificate issues if set to true [1]
KAFKA_CONTENT_PUBLISHED_GROUP dp-search-data-extractor The consumer group this application to consume ImageUploaded messages
KAFKA_CONTENT_PUBLISHED_TOPIC content-published The name of the topic to consume messages from
ZEBEDEE_URL http://localhost:8082 The URL for the Zebedee
KEYWORDS_LIMITS -1 The keywords allowed, default no limit
DATASET_API_URL http://localhost:22000 The URL for the DatasetAPI
SERVICE_AUTH_TOKEN "" The user auth token for the DatasetAPI

Notes:

  1. For more info, see the kafka TLS examples documentation
Healthcheck

The /health endpoint returns the current status of the service. Dependent services are health checked on an interval defined by the HEALTHCHECK_INTERVAL environment variable.

On a development machine a request to the health check endpoint can be made by:

curl localhost:25800/health

Contributing

See CONTRIBUTING for details.

License

Copyright © 2021, Office for National Statistics (https://www.ons.gov.uk)

Released under MIT license, see LICENSE for details.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
cmd
features

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL