Storing data in large, centralized data centers comes with performance, availability and scalability issues, as well as high capital or operational expenses. Centralized data is also an open invitation to sophisticated cyberattacks. For these reasons, companies are looking for ways to decentralize data storage. Blockchain storage is one way to do that.
Blockchain storage is still a relatively young technology, but its popularity is growing. Potential enterprise use cases have started to emerge in an effort to increase data storage security and reliability. Understanding how this technology works is a critical first step to determining if it’s the right approach for your organization.
How blockchain storage works
Blockchain is a distributed ledger technology for recording transactions between two or more parties. Until recently, the technology had been used primarily to support cryptocurrencies, such as bitcoin, but it’s now gaining ground in other areas.
The blockchain ledger serves as a decentralized database that maintains details about each transaction. The transactions are added to the ledger in chronological order and stored as a series of blocks. Each block references the preceding block to form an interconnected chain.
The ledger is distributed across multiple nodes, with each node maintaining a complete copy. Blockchain automatically synchronizes and validates the transactions across all nodes. The ledger is transparent to and verifiable by all participating members, eliminating the need for a central authority or third-party verification service.
Because of its distributed nature, blockchain is being touted as a natural fit for peer-to-peer (P2P), decentralized storage. In this scenario, blockchain provides the structure necessary to create a logical storage pool of geographically dispersed storage resources that serve as the blockchain nodes.
The following figure provides an overview of how blockchain storage works.
A blockchain-based storage system prepares the data for storage and then distributes it across a decentralized infrastructure, a process that can be broken into the six steps that follow:
- Create data shards. The storage system breaks the data into smaller segments, a process called sharding. Sharding involves breaking the data into manageable chunks that can be distributed across multiple nodes. The exact approach to sharding depends on the type of data and the application doing the sharding. Sharding a relational database is different from sharding a NoSQL database or sharding files on a file share.
- Encrypt each shard. The storage system then encrypts each data shard on the local system. The content owner has complete control over this process. The goal is to ensure that no one other than the content owner can view or access the data in a shard, wherever the data is located and whether that data is at rest or in motion.
- Generate a hash for each shard. The blockchain storage system generates a unique hash — an encrypted output string of a fixed length — based on the shard’s data or encryption keys. The hash is added to both the ledger and shard metadata to link transactions to the stored shards. The exact approach to generating hashes varies from one system to the next.
- Replicate each shard. The storage system replicates each shard so there are enough redundant copies to ensure availability and performance and protect against degradation and data loss. The content owner chooses how many copies to make of each shard and where those shards are located. As part of this process, the content owner should establish a threshold for the minimum number of copies to maintain to ensure against data loss.
- Distribute the replicated shards. A P2P network distributes the replicated shards to geographically dispersed storage nodes, either regionally or globally. Multiple organizations or individuals — sometimes referred to as farmers — own the storage nodes, leasing extra storage space in exchange for some type of compensation, usually cryptocurrency. No one entity owns all the storage resources or controls the storage infrastructure. Only content owners have full access to all their data, no matter where those nodes are located.
- Record transactions to the ledger. The storage system records all transactions in the blockchain ledger and syncs that information across all nodes. The ledger stores details relevant to the transaction, such as the shard location, shard hash and leasing costs. Because the ledger is based on blockchain technology, it’s transparent, verifiable, traceable and tamper-proof.
Although step six is listed last, blockchain integration is an ongoing process, with the exact approach dependent on the storage system. For example, it might initially record the transaction in the blockchain ledger when the storage process first begins. Then, it would update the transaction with information, such as the unique hash or node-specific details, as they become available. Then, after the transaction has been verified by the participating nodes, the system marks the transaction as final within the ledger and locks it to prevent changes.
The six steps described here are meant as a way to conceptualize the blockchain storage process. The exact approach will depend on how the specific storage system is implemented for a given use case and how that data storage is managed.