IPFS Recovery
The project was originally started as a submission
for the HackFS hackathon and became Top-10 Finalist 🏆.
Also check-out our presentation.
Building a way for content to persist permanently despite any damage to data and the network by
bringing data recovery algorithms into IPFS protocol.
Table of Contents
Background
The IPFS project, at its core, tries to upgrade the Internet to make it better in multiple ways. One of the goals is to
make the Web permanent. This IPFS characteristic is very promising, but still not in the state of the art form and
requires more RnD in that vector. On the other side, Computer Science, for many years of existence, has plenty of research
related to data and the ways to make it persistent against multiple data-loss factors within mostly
centralized systems. On the way to permanency, those inventions can apply to IPFS protocol taking the most out of them,
and then newer innovations might take place instead after gathering all the experience. Further work in this avenue can
also ensure integrity even in doomsday scenarios where large portions of the network can go down, allowing content to
still be recovered. Though there are multiple discussions in the IPFS ecosystem regarding the data persevering mechanisms,
like erasure codings, none of them were actually implemented.
The IPFS Recovery project brings data recovery algorithms into IPFS with the above aim. It does so by creating new IPLD
data structures able to do self-recovery in case some of the nodes are lost and can't be found on the network due to
node churn, network issues, or physical storage damage.
Implementation
The Recovery currently points to the main IPFS implementation in Golang and follows all its development guidelines and
best practices. The Golang Recovery implementation is a fully modular library with clean API boundaries that aims to
provide convenient use and excellent abstraction for all current and future implementations.
Algorithms
Reed-Solomon
For the initial version, the project started with industry-standard Reed-Solomon
coding.
Alpha Entanglements
As a next step, novel Alpha Entanglements schema has been chosen. It provides
better performance and higher recovery ratio comparing with the former algorithm. In particular, entanglements are
interesting as they provide the ability to create self-healing networks.
IPFS Fork
As Recovery follows IPFS ecosystem modularity best practises,
its fork is integrated in a just a few small changes.
First, it covers DAG sessions
with custom NodeGetter that can recover nodes on the fly if content is requested but not found.
Furthermore, fork adds additional functionality to IPFS CLI extending it with recovery
command group. Currently,
it is only capable for encoding DAGs with Reed Solomon recoverability using encode
, but later CLI will be extended with
full featured management for Recovery, like re-encoding, manual recovery and algorithm choices.
Testground Plans
IPFS ecosystem recently launched new project aimed to test p2p system on large scale to simulate real world behavior.
Using it for benchmarking and testing Recovery is a must,
as it goals to improve IPFS protocol.
Future Work
-
Upgrade to more complex Alpha Entanglement parity lattice to reach better
performance.
-
IPLD specs formalization through active discussions and feedback processing.
-
Implementations for latest go
IPLD version and for js
as well.
-
Extensive Testground simulation to gather real-world resiliency benchmarks and
to examine various other erasure codes
Tryout
- Build forked IPFS
- Encode ANY IPFS content:
ipfs recovery encode <path>
- List all the blocks encoded content consist of:
ipfs refs <enc_cid> -r
- Remove any random blocks yourself from the given list:
ipfs block rm ...<cid>
- Be amazed after seeing that it is still possible to get your content back:
ipfs get <enc_cid>
Contributors
License
MIT © Hlib Kanunnikov