kantoku

package module

v0.0.0-...-75be81d Latest Latest Go to latest Published: Sep 11, 2024 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ischenkx/kantoku

Links

Open Source Insights

README ¶

Kantoku

A platform for distributed task execution

Todo

Back-end

Core

Basic
- Task Dependencies
- Services
  - Common service structure for different types of processors
    - We need a single package that would take care of running and gracefully shutting down
      all types of processors
  - Cross-session service identification
    - Each service instance must have its persistent ID that can be used as a consumer group label for queues.
  - Service discovery
    - We need to collect information about services (their presence, id, other properties)
- Clean Code/Architecture
  - Decompose transport and business logic (reading events must be separated from their processing)
Task Restarts

A restart seems to be an easy concept but in the world of kantoku a few problems arise:
- Should task be recreated (or sending a new task_ready event is enough)? Should output resources be recreated (if yes, how to manage dependents of the failed task?)?
- Restarting a task parallelly can cause double output resource resolution. So, it's crucial to guarantee some kind of synchronization of the action. It can be done by delegating the restarting logic to scheduler (actually it's not a complete solution yet, synchronization among multiple scheduler instances must be also provided) or by setting some flag in the restarted task's info, e.g. restarted: true (this approach also has a tricky part: the task storage should guarantee atomic writes and such stuff).
- Restart context id and restart parent id: restarts represent a simple tree.
Investigate alternatives for records
- Consider using raw mql requests (can be used with Postgres via FerretDB)
- ~~Consider using postgres + jsonb~~
Implement auto tasks parsing from source code
- Use "source" column in the specifications table to identify which tasks are expected to be deleted
- Either parse task from source code structures with embedded kantoku.Task or register tasks manually in some package-wise router (in this case, code generation for task files is preferrable)
Pipelines

Pipelines are DAGs of tasks. It's pretty simple to implement but it requires good visualization along with support for partial restarts (a task restart without output reallocation).
- Preprocessors: pure functions written in some scripting language that are used for a slight data rectification in order to match a dependent task input's type.
Processor Nodes Nodes that would read the events and process them with multiple handlers (status update/garbage collector/etc)
WebSocket API
- Use Centrifugo for notifications about updates in a context/task/namespace
CLI (ktk)
- Design a ktk cli utility similar to kubectl
API
- Make it follow REST principles (stop using POST for all requests)
- Consider using jsonrpc
Spawners

A spawner is a new entity. It is a program that spawns new tasks when some event happens. Might be a kafka message, a cron event or a telegram message.
Batching

APIs and services should support batched execution. It would reduce amount of network requests to databases and overhead of small requests
Garbage collector

Garbage collection is necessary to limit the amount of used resources
Scheduler
- Dependencies
  - Make resource dependency resolver instant using resource initialization notifier.
  - What other resolvers can be useful?
  - Should dependencies fail or have any other statuses besides PENDING and READY?
  - Investigate approaches for a better dependencies engine
- Make it scalable
Executor

Executor interface is pretty simple and the current implementation is sufficient for basic usage. In the future ( meaning that this task shouldn't be taken until more fundamental problems are solved) it'd be nice to have a more complex and environment-aware version. Some points on what'd be cool to see:
- Select the running machine based on some set of labels in task's info
- Shard tasks by their execution context / resource usage. This would improve usage of a local resource cache
- Implement Leader/Follower decomposition.

Infra

Save logs to some persistent storage (OpenSearch/Elastic/Loki)
Use Prometheus and Grafana for metrics
When environment is not set up (Docker isn't running), services start to ddos the components (Consul, Postgres, etc). We need timeouts.

Tests

Add tests for all services
Add tests for kantoku (end2end)
Write unit tests for common stuff

Docs

Create a JS emulator of the whole system and show it using reactflow

Web UI

Task List

Fix failed and ok statuses
Add running status (a.k.a received)
Make ID a hyperlink to task (and remove the eye action)
Make one-sided filter on updated_at
Add updated_at column and make it sortable

Task Show

Show the task's status (ok, failed, running...)
Show type (aka spec) and info about it
List dependencies
Show updated_at
List parameters (aka inputs) and outputs as a form
Show all info as json editor, make its style fit the antd's theme
Show specifications in "Specifications" tab
Show graphs on frontend
- Use context_id and parent_id to connect tasks

Links

Jira

Miro

Documentation ¶

Index ¶

func Connect(url string, options ...oas.ClientOption) (*kantokuHttp.Client, error)
type Kantoku

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Connect ¶

func Connect(url string, options ...oas.ClientOption) (*kantokuHttp.Client, error)

Types ¶

type Kantoku ¶

type Kantoku = core.AbstractSystem

Source Files ¶

View all Source files

bindings.go

Directories ¶

Path	Synopsis
cmd
stand/executor
stand/load_resource
stand/sample_project/net/http
stand/sample_project/recursive
stand/sample_project/scraper
stand/sample_project/test
stand/services/discovery
stand/services/http_api
stand/services/processor
stand/services/processor_v2
stand/services/scheduler
stand/services/status
stand/spec_loader
stand/utils
pkg
common/data/codec
common/data/storage
common/data/uid
common/dependency
common/dependency/inmem
common/dependency/postgres/batched
common/dependency/postgres/batched/model
common/logging/prefixed
common/service
common/transport/broker
common/transport/broker/watermill
core
core/database/event_broker
core/database/resource_db
core/database/task_db
core/services/executor
core/services/scheduler/dependencies
core/services/scheduler/dependencies/manager
core/services/scheduler/dependencies/manager/resolvers/resource_resolver
core/services/scheduler/dependencies/manager/resolvers/resource_resolver_v2
core/services/scheduler/dependencies/manager/task2group
core/services/scheduler/dummy
core/services/status
core/taskopts
lib/builder
lib/builder/errx
lib/discovery
lib/discovery/consul
lib/gateway/api/kantokuhttp
lib/gateway/api/kantokuhttp/oas Package oas provides primitives to interact with the openapi HTTP API.	Package oas provides primitives to interact with the openapi HTTP API.
lib/gateway/cli
lib/resources
lib/tasks/exe
lib/tasks/fn
lib/tasks/fn/future
lib/tasks/restarter
lib/tasks/specification
lib/tasks/specification/typing

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL