tsuki

package module

v0.0.0-...-a7fd79a Latest Latest Go to latest Published: Oct 13, 2020 License: BSD-3-Clause Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/kureduro/tsuki

Links

Open Source Insights

README ¶

tsuki-banner

tsuki

A simple distributed file system (with SPoF).

Contribution

@artembakhanov: nameserver creation, writing report, debugging, Docker image creation;
@kuredoro: fileserver creation, writing report, testing;
@TheSharpOwl: client creation, writing report.

Protocols are designed by @artembakhanov and @kuredoro.

How to run

Locally

To run a name server or a fileserver locally, one could run

$ go run cmd/tsukinsd/*

or

$ go run cmd/tsukifsd/*

respectively.

The client provides a friendly CLI interface (help messages included). To build and run it, run

$ go build -o tsuki cmd/tsuki/*
$ ./tsuki

In a container

The Docker Hub repository can be found here.

To run this code one needs first to get a virtual network for safety and security so that the servers can communicate. That can be either VPS provided network or Docker's overlay network.

All the fileservers must be safely run (docker-compose example is below). The fileservers should have static IP addresses and open ports (our recommendation: 7000 for client-storage communication; 7001 should be hidden from the public internet).
Then one name server should be run on a different host (or VM) and with different ports (we use 7070 for client-nameserver communication and 7071 for nameserver-fileserver one). Before running the nameserver one must specify the parameters of the DFS they want in the config.poml file. For instance, to specify the number of replicas to 3 they must write replicas=3. The config file is provided by default.
The DFS will start working so that any client can invoke init procedure to start working with the system.

Now let us talk about running more specifically. To run the name server (after negotiating port and address issues) one needs to create a docker-compose file as follows:

version: 3

services:
  nameserver:
    image: artembakhanov/project2:tsukinsd
    ports:
    # other can be specified
      - 7070:7070
      - 7071:7071
    volumes:
      - ./data:.

Run docker-compose up on the nameserver host and it will start working if everything is specified correctly.

version: 3

services:
  fileserver:
    image: artembakhanov/project2:tsukifsd
    # other can be specified
    ports:
      - 7000:7000
      - 7001:7001
    volumes:
      - ./data:.

Architecture

The architecture is very similar to the popular ones. There is one server (node) called "nameserver" (NS) with which any client contacts in the first place. This server decided whether to give the client an approval to upload its file or download a requested one. On the diagram below you can see how the channels are working. Other servers are fileservers (FS) and are not aware of what is going on in the system; their primary goal is to be alive all the time (if possible) and store chunks of data. They can get a command from the nameserver and must obey it and complete it as soon as possible. They also send periodic messages called heartbeats to the nameserver with the intention that in case of any failure the nameserver will know who failed and will take necessary actions.

The diagram above shows all the possible channels of communication in client-NS-FS structure. Note that all the messages between NS and FS are hidden from the public network.

Nameserver

Let us now go to the nameserver architecture. The main goal of the nameserver is to contain all the information about the current state of the file system.

Filetree

The file system is logical and stored in the RAM as a hierarchical structure (a tree). The example is provided below:

Tree node

In the tree, there can be two types of nodes: a file, which cannot have children, and the tree, which has children but no data (no chunks). In our code, the tree is organized as a hashmap from the full path (we call it path address) to the node itself. Each node has references to all its children and to its parent so knowing the address of one node we can traverse in tree easily.

Let us look at the implementation of the node itself.

Each tree node knows the general information about the file: its address, size, creation date, etc. Also, it has a small piece of information provided by the tree: its children and its parent. It also should contain information about whether it was removed or not, since we use lazy removing: just mark some node as removed and remove it from the hashmap but the node itself will stay in the tree structure and will be removed eventually. It is a very nice solution in case of some expensive operations like directory removing: just mark one directory as dead and return success to the client and only then it will work with the mess it created.

The chunk structure is very simple.

Chunk always has a unique id. It is impossible that two chunks have the same ID so the UUID is used for their identification. It also contains the information to which file it belongs. Other information (in red) is about the status of the chunk. The general status may be PENDING (the file is just created but the client has not uploaded this chunk yet), OK (the client can download the file), DOWN (no server can provide this chunk; the file is DEAD) and OBSOLETE (a file to which this chunk belongs is removed). One of the tasks of the NS is to keep the number of OK replicas to be the same as the configuration number of replicas.

Public communication service

Provides simple REST API service to the clients (/upload, /touch, /rmfile, etc)

Private communication service and heartbeat manager

Provides REST API service to fileservers.

Also, there is a simple heartbeat message that has 2 different timeouts: soft (11 secs) and hard (61 secs). The first decides if the FS will be given to the client and the last when the nameserver should start the relocation of the chunks to maintain replicas. Each fileserver should send one heartbeat message in 3 seconds.

Fileserver

The fileserver architecture is a bit simpler than the one of the nameserver. The fileserver goal is to store the chunks of data and to obey all the nameserver commands.

All the chunks are stored in a flat directory since hierarchy is maintained on the nameserver. Another service maintains chunk and token states.

Communication protocols

Key highlights

Support for concurrent reads and writes
Support for multiplexed downloads and uploads
Load balancing
Client authentication
Support for different replica counts
Rejecting writes in case of memory deficiency [TODO]

Overview

The communication protocols are built on top of HTTP/1.1 and represent a REST API. The full description of the API can be found on the repository's wiki.

The main motivations for the design of these protocols are security and scalability.

An authentication system was incorporated into the protocols and represents a token-based authentication. It allows controlling what actions and on what chunks may be performed by clients on an individual basis.

Another interesting decision is that chunks are immutable. The removal of chunks, a purge request, respects users of the chunks. Before the actual deletion of the data, it waits until all tokens associated with them expire. The chunks to be purged are marked obsolete and token emission for them is halted. This scheme permits safe removal of files in case of concurrent access by multiple clients.

Using the removal primitive, the updates on files may be implemented in three phases: purge chunks, upload new ones and replace chunks associated with the file in the nameserver's database.

There is no separate interface for the replication process between fileservers. To replicate a chunk, FS sends it to the client-port of the destination FS, effectively reusing the logic written for the client. And prior to this, destination FS receives an expect request for that particular chunk from the nameserver. The orchestration is fully contained within the nameserver. It produces a sequence of messages, addressed to different fileservers, waits for confirmations of replicas, and decides what to do next. The replication process is sped up by utilizing epidemic propagation.

For the case of slow network channels on DFS' side, the client is able to download and upload chunks from and to multiple servers simultaneously. The number of servers is generally the number of replicas (if there are enough servers, of course). If clients don't utilize multiplex data loading, the servers to be requested are selected in Round-Robin fashion, which represents a load balancing mechanism.

The nameserver populates the pool of trusted servers, PTS, by probing fileservers. Until probed, fileservers can be accessed and modified by anyone. Authentication is disabled. Once probed, fileservers recognize the leader, permanently remember it, and block any further data-sensitive requests coming from unknown addresses.

Finally, we've introduced a notion of soft timeouts on heartbeats from the fileservers. Upon reaching this time of inactivity from a fileserver, the nameserver marks it as potentially dead and prevents its address to appear in responses to clients' requests for data. This improves the chances that the clients would be able to download and upload data reliably.

What can be improved

Protection against write-write concurrency The update method implementation described here is not consistent, when multiple writes are concurrent.
Replication canceling (in case of immediate overwrite) If the above problem is solved and the order of writes is consistent, then an optimization can be employed: cancel replication and purge chunks of the outdate copy.
High network consumption The replication is done on chunk-by-chunk basis: chunks are replicated one by one as they are being downloaded. This negatively affects network performance. A possible solution is to use the buffering of requests for replication at the FS side. Or, alternatively, employ batch replication requests that will ask to replicate multiple chunks simultaneously to a single target (already implemented in FS, but not used).
Authentication engine: users, passwords, permissions Because authentication permeates through the protocol design, our code needed to support general cases of this idea. This made code flexible enough to allow relatively easy implementation of "users", "ownership", and "permissions" on files, much as in the physical file systems.
Data compression Data may be compressed via DEFLATE or any other relatively fast compression algorithm to save network bandwidth.
Stateful name server In the current design, it is assumed that the nameserver never fails. But this is exactly the area that can be improved! Logging and snapshotting of NS' state, and consequent resurrection from them after failure is a possible solution to this problem.
Data-preserving failures For simplicity, we've assumed that if FS fails, all the data it stored is also lost. Obviously, if FS' host was just restarted, no data was lost and NS should be fully aware of that fact and utilize the chunks that survived efficiently.
Timeouts and requests for new fileservers If FS fails in the middle of data transfer, the subject should be able to request from NS a new server to restart or continue (for write and read, respectively) the transaction.

Documentation ¶

Index ¶

Constants
func AssertChunkContents(t *testing.T, chunks ChunkDB, id, want string)
func AssertChunkDoesntExists(t *testing.T, chunks ChunkDB, id string)
func AssertReceivedChunkCalls(t *testing.T, nsConn *SpyNSConnector, ids ...string)
func AssertResponseBody(t *testing.T, got, want string)
func AssertStatus(t *testing.T, got, want int)
func NewCancelTokenRequest(token string) *http.Request
func NewExpectRequest(action, token string, chunks ...string) *http.Request
func NewGetChunkRequest(id, token string) *http.Request
func NewPostChunkRequest(id, content, token string) *http.Request
func NewProbeRequest(remoteAddr string) *http.Request
func NewPurgeRequest(chunks ...string) *http.Request
func NewReplicateRequest(addr, token string, chunks ...string) *http.Request
type ChunkDB
type ChunkError
- func (c ChunkError) Error() string
type ConfigurableSleeper
- func (s *ConfigurableSleeper) Sleep()
type ExpectAction
type ExpectationDB
- func NewExpectationDB() *ExpectationDB
- func (e *ExpectationDB) Get(token string) *TokenExpectation
- func (e *ExpectationDB) MakeObsolete(chunks ...string) (toPurge []string)
- func (e *ExpectationDB) Remove(token string) []string
- func (e *ExpectationDB) Set(token string, exp *TokenExpectation)
type FSProbeInfo
type FileServer
- func NewFileServer(store ChunkDB, nsConn NSConnector) (s *FileServer)
- func (s *FileServer) CancelTokenHandler(w http.ResponseWriter, r *http.Request)
- func (s *FileServer) Expect(token string, action ExpectAction, chunks ...string) error
- func (s *FileServer) ExpectHandler(w http.ResponseWriter, r *http.Request)
- func (s *FileServer) GenerateProbeInfo() *FSProbeInfo
- func (s *FileServer) GetTokenExpectationForChunk(token, id string) ExpectAction
- func (s *FileServer) ProbeHandler(w http.ResponseWriter, r *http.Request)
- func (s *FileServer) PurgeHandler(w http.ResponseWriter, r *http.Request)
- func (s *FileServer) ReceiveChunk(w http.ResponseWriter, r *http.Request, id, token string)
- func (s *FileServer) ReplicateHandler(w http.ResponseWriter, r *http.Request)
- func (s *FileServer) SendChunk(w http.ResponseWriter, r *http.Request, id, token string)
- func (cs *FileServer) ServeClient(w http.ResponseWriter, r *http.Request)
- func (s *FileServer) ServeNS(w http.ResponseWriter, r *http.Request)
type FileSystemChunkStorage
- func NewFileSystemChunkStorage(dir string) (*FileSystemChunkStorage, error)
- func (s *FileSystemChunkStorage) BytesAvailable() int
- func (s *FileSystemChunkStorage) Create(id string) (io.Writer, func(), error)
- func (s *FileSystemChunkStorage) Exists(id string) bool
- func (s *FileSystemChunkStorage) Get(id string) (io.Reader, func(), error)
- func (s *FileSystemChunkStorage) Remove(id string) error
type HTTPNSConnector
- func (c *HTTPNSConnector) GetNSAddr() string
- func (c *HTTPNSConnector) IsNS(addr string) bool
- func (c *HTTPNSConnector) Poll()
- func (c *HTTPNSConnector) ReceivedChunk(id string)
- func (c *HTTPNSConnector) SetNSAddr(addr string)
type HTTPPoller
- func (p *HTTPPoller) Poll()
type Heart
- func NewHeart(poller Poller, sleepFor time.Duration) *Heart
- func (h *Heart) Contract()
- func (h *Heart) Poll(count int)
type InMemoryChunkStorage
- func NewInMemoryChunkStorage(index map[string]string) *InMemoryChunkStorage
- func (s *InMemoryChunkStorage) BytesAvailable() int
- func (s *InMemoryChunkStorage) Create(id string) (io.Writer, func(), error)
- func (s *InMemoryChunkStorage) Exists(id string) (exists bool)
- func (s *InMemoryChunkStorage) Get(id string) (io.Reader, func(), error)
- func (s *InMemoryChunkStorage) Remove(id string) error
type NSConnector
type Poller
type Sleeper
type SpyNSConnector
- func (c *SpyNSConnector) GetNSAddr() string
- func (c *SpyNSConnector) IsNS(addr string) bool
- func (c *SpyNSConnector) Poll()
- func (c *SpyNSConnector) ReceivedChunk(id string)
- func (c *SpyNSConnector) Reset()
- func (c *SpyNSConnector) SetNSAddr(addr string)
type SpyPoller
- func (p *SpyPoller) Poll()
type SpySleeper
- func (s *SpySleeper) Sleep()
type SpySleeperTime
- func (s *SpySleeperTime) Sleep(duration time.Duration)
type TokenExpectation

Constants ¶

View Source

const (
	ErrChunkExists   = ChunkError("chunk already exists")
	ErrChunkNotFound = ChunkError("chunk does not exists")
)

View Source

const (
	ExpectActionNothing = ExpectAction(0)
	ExpectActionRead    = ExpectAction(1)
	ExpectActionWrite   = ExpectAction(2)
)

View Source

const NSPORT = ":7071"

Variables ¶

This section is empty.

Functions ¶

func AssertChunkContents ¶

func AssertChunkContents(t *testing.T, chunks ChunkDB, id, want string)

func AssertChunkDoesntExists ¶

func AssertChunkDoesntExists(t *testing.T, chunks ChunkDB, id string)

func AssertReceivedChunkCalls ¶

func AssertReceivedChunkCalls(t *testing.T, nsConn *SpyNSConnector, ids ...string)

func AssertResponseBody ¶

func AssertResponseBody(t *testing.T, got, want string)

func AssertStatus ¶

func AssertStatus(t *testing.T, got, want int)

func NewCancelTokenRequest ¶

func NewCancelTokenRequest(token string) *http.Request

func NewExpectRequest ¶

func NewExpectRequest(action, token string, chunks ...string) *http.Request

func NewGetChunkRequest ¶

func NewGetChunkRequest(id, token string) *http.Request

func NewPostChunkRequest ¶

func NewPostChunkRequest(id, content, token string) *http.Request

func NewProbeRequest ¶

func NewProbeRequest(remoteAddr string) *http.Request

func NewPurgeRequest ¶

func NewPurgeRequest(chunks ...string) *http.Request

func NewReplicateRequest ¶

func NewReplicateRequest(addr, token string, chunks ...string) *http.Request

Types ¶

type ChunkDB ¶

type ChunkDB interface {
	Get(id string) (io.Reader, func(), error)
	Create(id string) (io.Writer, func(), error)
	Exists(id string) bool

	// Should be concurrency safe
	Remove(id string) error

	BytesAvailable() int
}

type ChunkError ¶

type ChunkError string

func (ChunkError) Error ¶

func (c ChunkError) Error() string

type ConfigurableSleeper ¶

type ConfigurableSleeper struct {
	Duration  time.Duration
	SleepFunc func(time.Duration)
}

func (*ConfigurableSleeper) Sleep ¶

func (s *ConfigurableSleeper) Sleep()

type ExpectAction ¶

type ExpectAction int

type ExpectationDB ¶

type ExpectationDB struct {
	// contains filtered or unexported fields
}

func NewExpectationDB ¶

func NewExpectationDB() *ExpectationDB

func (*ExpectationDB) Get ¶

func (e *ExpectationDB) Get(token string) *TokenExpectation

func (*ExpectationDB) MakeObsolete ¶

func (e *ExpectationDB) MakeObsolete(chunks ...string) (toPurge []string)

func (*ExpectationDB) Remove ¶

func (e *ExpectationDB) Remove(token string) []string

func (*ExpectationDB) Set ¶

func (e *ExpectationDB) Set(token string, exp *TokenExpectation)

type FSProbeInfo ¶

type FSProbeInfo struct {
	Available int
}

type FileServer ¶

type FileServer struct {
	// contains filtered or unexported fields
}

func NewFileServer ¶

func NewFileServer(store ChunkDB, nsConn NSConnector) (s *FileServer)

func (*FileServer) CancelTokenHandler ¶

func (s *FileServer) CancelTokenHandler(w http.ResponseWriter, r *http.Request)

func (*FileServer) Expect ¶

func (s *FileServer) Expect(token string, action ExpectAction, chunks ...string) error

func (*FileServer) ExpectHandler ¶

func (s *FileServer) ExpectHandler(w http.ResponseWriter, r *http.Request)

func (*FileServer) GenerateProbeInfo ¶

func (s *FileServer) GenerateProbeInfo() *FSProbeInfo

func (*FileServer) GetTokenExpectationForChunk ¶

func (s *FileServer) GetTokenExpectationForChunk(token, id string) ExpectAction

func (*FileServer) ProbeHandler ¶

func (s *FileServer) ProbeHandler(w http.ResponseWriter, r *http.Request)

func (*FileServer) PurgeHandler ¶

func (s *FileServer) PurgeHandler(w http.ResponseWriter, r *http.Request)

func (*FileServer) ReceiveChunk ¶

func (s *FileServer) ReceiveChunk(w http.ResponseWriter, r *http.Request, id, token string)

func (*FileServer) ReplicateHandler ¶

func (s *FileServer) ReplicateHandler(w http.ResponseWriter, r *http.Request)

func (*FileServer) SendChunk ¶

func (s *FileServer) SendChunk(w http.ResponseWriter, r *http.Request, id, token string)

func (*FileServer) ServeClient ¶

func (cs *FileServer) ServeClient(w http.ResponseWriter, r *http.Request)

func (*FileServer) ServeNS ¶

func (s *FileServer) ServeNS(w http.ResponseWriter, r *http.Request)

type FileSystemChunkStorage ¶

type FileSystemChunkStorage struct {
	Dir string
	// contains filtered or unexported fields
}

func NewFileSystemChunkStorage ¶

func NewFileSystemChunkStorage(dir string) (*FileSystemChunkStorage, error)

func (*FileSystemChunkStorage) BytesAvailable ¶

func (s *FileSystemChunkStorage) BytesAvailable() int

func (*FileSystemChunkStorage) Create ¶

func (s *FileSystemChunkStorage) Create(id string) (io.Writer, func(), error)

func (*FileSystemChunkStorage) Exists ¶

func (s *FileSystemChunkStorage) Exists(id string) bool

func (*FileSystemChunkStorage) Get ¶

func (s *FileSystemChunkStorage) Get(id string) (io.Reader, func(), error)

func (*FileSystemChunkStorage) Remove ¶

func (s *FileSystemChunkStorage) Remove(id string) error

type HTTPNSConnector ¶

type HTTPNSConnector struct {
	Addr string
	// contains filtered or unexported fields
}

func (*HTTPNSConnector) GetNSAddr ¶

func (c *HTTPNSConnector) GetNSAddr() string

func (*HTTPNSConnector) IsNS ¶

func (c *HTTPNSConnector) IsNS(addr string) bool

func (*HTTPNSConnector) Poll ¶

func (c *HTTPNSConnector) Poll()

func (*HTTPNSConnector) ReceivedChunk ¶

func (c *HTTPNSConnector) ReceivedChunk(id string)

func (*HTTPNSConnector) SetNSAddr ¶

func (c *HTTPNSConnector) SetNSAddr(addr string)

type HTTPPoller ¶

type HTTPPoller struct {
	// contains filtered or unexported fields
}

HTTPPoller will poll the specified URL link. The link should contain http:// at the beginning.

func (*HTTPPoller) Poll ¶

func (p *HTTPPoller) Poll()

type Heart ¶

type Heart struct {
	Poller  Poller
	Sleeper Sleeper
}

func NewHeart ¶

func NewHeart(poller Poller, sleepFor time.Duration) *Heart

func (*Heart) Contract ¶

func (h *Heart) Contract()

func (*Heart) Poll ¶

func (h *Heart) Poll(count int)

Poll will make count consequent polls with calls to sleeper in-between. Set count to -1, to poll indefinetely.

type InMemoryChunkStorage ¶

type InMemoryChunkStorage struct {
	Index map[string]string
	Mu    sync.RWMutex
	// contains filtered or unexported fields
}

func NewInMemoryChunkStorage ¶

func NewInMemoryChunkStorage(index map[string]string) *InMemoryChunkStorage

func (*InMemoryChunkStorage) BytesAvailable ¶

func (s *InMemoryChunkStorage) BytesAvailable() int

func (*InMemoryChunkStorage) Create ¶

func (s *InMemoryChunkStorage) Create(id string) (io.Writer, func(), error)

func (*InMemoryChunkStorage) Exists ¶

func (s *InMemoryChunkStorage) Exists(id string) (exists bool)

func (*InMemoryChunkStorage) Get ¶

func (s *InMemoryChunkStorage) Get(id string) (io.Reader, func(), error)

func (*InMemoryChunkStorage) Remove ¶

func (s *InMemoryChunkStorage) Remove(id string) error

type NSConnector ¶

type NSConnector interface {
	ReceivedChunk(id string)

	SetNSAddr(addr string)
	GetNSAddr() string
	IsNS(addr string) bool

	Poller
}

type Poller ¶

type Poller interface {
	Poll()
}

type Sleeper ¶

type Sleeper interface {
	Sleep()
}

type SpyNSConnector ¶

type SpyNSConnector struct {
	Addr       string
	PulseCount int
	// contains filtered or unexported fields
}

func (*SpyNSConnector) GetNSAddr ¶

func (c *SpyNSConnector) GetNSAddr() string

func (*SpyNSConnector) IsNS ¶

func (c *SpyNSConnector) IsNS(addr string) bool

func (*SpyNSConnector) Poll ¶

func (c *SpyNSConnector) Poll()

func (*SpyNSConnector) ReceivedChunk ¶

func (c *SpyNSConnector) ReceivedChunk(id string)

func (*SpyNSConnector) Reset ¶

func (c *SpyNSConnector) Reset()

func (*SpyNSConnector) SetNSAddr ¶

func (c *SpyNSConnector) SetNSAddr(addr string)

type SpyPoller ¶

type SpyPoller struct {
	CallCount int
}

func (*SpyPoller) Poll ¶

func (p *SpyPoller) Poll()

type SpySleeper ¶

type SpySleeper struct {
	CallCount int
}

func (*SpySleeper) Sleep ¶

func (s *SpySleeper) Sleep()

type SpySleeperTime ¶

type SpySleeperTime struct {
	DurationSlept time.Duration
}

func (*SpySleeperTime) Sleep ¶

func (s *SpySleeperTime) Sleep(duration time.Duration)

type TokenExpectation ¶

type TokenExpectation struct {
	// contains filtered or unexported fields
}

token -> chunkId -> ExpectAction Putting token first makes it more cache-friendly, since there much less tokens than chunks. type TokenExpectations map[string]map[string]ExpectAction

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
tsuki
tsukifsd
tsukinsd

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL