consistent-hashing

command

v0.0.0-...-24a15c0 Latest Latest Go to latest Published: Feb 12, 2024 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/srinathln7/system-design

Links

Open Source Insights

README ¶

Consistent Hashing with Hash Ring

This Go program implements a simple consistent hashing algorithm using a hash ring. Consistent hashing is an algorithm often used in distributed systems to efficiently distribute data among a changing set of nodes. It is neither a service by itself nor it does the "actual" data transfer. It solves only the data ownership problem and not the data distribution problem. Here is "Why" consistent hashing is needed.

Need for Even Data Distribution: In large-scale distributed systems, data is distributed across multiple servers for horizontal scaling. To ensure predictable performance, it's crucial to evenly distribute data among these servers.
Simple Hashing: Initially, a common method for distributing data evenly is simple hashing. Each object's key is hashed using a hashing function, and then the modulo operation is performed on the hash result to determine which server the object belongs to.
Drawbacks of Simple Hashing: While simple hashing works well when the cluster size is fixed and data distribution is even, it becomes problematic when servers are added or removed. In such cases, most of the keys need to be redistributed, leading to performance issues.
Introduction to Consistent Hashing: Consistent hashing mitigates the issues of traditional hashing by ensuring that most objects remain assigned to the same server even as the number of servers changes.
Core Concept of Consistent Hashing:
- Both servers and objects are hashed using the same hashing function, mapping them to the same range of values.
- A hash ring is constructed by connecting both ends of the hash space, forming a ring structure.
- Servers are placed on the hash ring, and objects are mapped onto the ring using their hashes.
- To locate the server for a particular object, one traverses clockwise from the object's position on the ring until a server is found.
Benefits of Consistent Hashing:
- Adding or removing servers requires redistributing only a fraction of the keys, minimizing data movement.
- Ensures predictable performance even in dynamic environments where servers come and go frequently.
Addressing Uneven Data Distribution: While consistent hashing ensures even distribution in theory, in practice, segments of the ring may still be unevenly distributed. Virtual nodes are introduced to address this issue, where each server appears at multiple locations on the ring, representing virtual nodes.
Real-world Applications: Consistent hashing is used in various distributed systems, including NoSQL databases like Amazon DynamoDB and Apache Cassandra, content delivery networks like Akamai, and load balancers like Google Load Balancer.

Consistent hashing is a fundamental technique in distributed systems design, enabling scalability and fault tolerance while maintaining performance and efficiency.

Implementation

HashRing Struct

The HashRing struct contains the following attributes:

mu: A read-write mutex to ensure thread safety.
nodes: A sorted slice of node IDs representing the hash ring.
hashMap: A map that associates node IDs with their corresponding node names.

Functions

`newHashRing() *HashRing`

Creates and returns a new HashRing with initialized attributes.

`hashKey(key string) int`

Hashes a given key using the FNV-1a algorithm and returns the resulting hash as an integer.

`(hr *HashRing) AddNodeToRing(node string)`

Adds a new node to the hash ring. It computes the hash of the node and ensures it doesn't already exist. Nodes are added to both the nodes slice and the hashMap.

`(hr *HashRing) DeleteNodeFromRing(node string)`

Deletes a node from the hash ring. It finds the node in the nodes slice, removes it, and updates the hashMap accordingly.

`(hr *HashRing) GetNodeForKey(key string) int`

Finds the node responsible for a given key by hashing the key and searching for the corresponding node ID in the nodes slice. This code defines the GetNodeForKey method for the HashRing struct. The purpose of this method is to determine which node in the hash ring is responsible for a given key using consistent hashing.

Here's a breakdown of the function:

hr.mu.RLock(): It acquires a read lock on the hash ring's mutex, ensuring that no other goroutine can modify the hash ring while this operation is in progress.
defer hr.mu.RUnlock(): This ensures that the read lock is released when the function completes, regardless of whether it returns normally or with an error.
keyID := hashKey(key): It calculates the hash of the input key using the hashKey function, which employs the FNV-1a algorithm.
index := sort.Search(len(hr.nodes), func(i int) bool {...}): It performs a binary search on the sorted hr.nodes slice to find the index where the key's hash value would fit in the sorted order. The provided anonymous function returns true when the desired position is found.
If the index is equal to the length of hr.nodes, it means the key's hash value is greater than or equal to all nodes' hash values in the ring. In this case, the search "circles back" to the beginning of the ring by setting the index to 0.
return hr.nodes[index]: Finally, the function returns the node ID associated with the determined index in the hr.nodes slice. This represents the node responsible for the given key in the hash ring.

In summary, the GetNodeForKey method uses consistent hashing principles to find the node responsible for a specific key in the hash ring, providing a balanced distribution of keys among the available nodes.