go-sqs-deduplication

module
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2024 License: MIT

README

go-sqs-deduplication

What

The batch deduplicator works by taking as many messages off an AWS SQS queue as possible (up to maxInflight) and deleting duplicates based on a unique identifier. This cycle repeats until all of the messages on the queue have been processed, or the number of unique messages exceeds maxInflight. Finally the visibility of unique messages is reset so they show back up on the queue.

Pulling, deleting, and reseting messages happens concurrently by numWorkers workers.

The deduplicator expects SQS messages in following format:

{
    "data": {
        "uuid": "abc123"
     }
}

Where uuid is the unique identifier used to deduplicate messages.

The code can be extended to work with other queues or message formats.

Usage

Run once:

go run cmd/dedup.go -queueURL=someURL -numWorkers=100  -profileName=someProfile

Run forever:

$ go run cmd/dedup.go -queueURL=someURL -numWorkers=50 -runForever -secondsToSleepBetweenRuns=600

Help:

  -maxInflight int
    	Maximum number of inflight messages allowed by queue (default 100000)
  -numWorkers int
    	Number of concurrent workers to use (default 20)
  -profileName string
    	AWS profile to use
  -queueURL string
    	SQS URL (required)
  -runForever
    	Runs in a loop with secondsToSleepBetweenRuns
  -secondsToSleepBetweenRuns int
    	Time to sleep between runs if running forever (default 60)

Run tests:

$ go test ./...

Directories

Path Synopsis
internal
sqs

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL