Introduction
Last updated
Last updated
In today's world, most IT companies use centralized databases such as MySQL, Oracle Database, MongoDB, etc. Centralized databases are fast, reliable, secure, easy to design, easy to maintain and administer, and easier to be changed, re-organized, mirrored, or analyzed. Most of the time, they are heavily secured and kept behind closed doors, and furthermore, only a few people have access to them. For most known use cases, these properties more than satisfy requirements. However, there are no pros without cons.
A centralized database, hidden from the public eye, can be easily tampered with, meaning that the data stored within those databases is mutable and can be altered by those who have access to it. There have been various cases of data tampering such as data manipulation at forensic labs, falsifying data about products, tampering data on materials used in optical disks and many more. This presents a great risk for the customers, consumers or people whose data is stored within those databases. As if that is not enough, such databases can also store user information which should not be held - the so called user metadata, unbeknownst to their users. Besides the data tampering, there are also issues with transparency, since everything happens behind closed doors. We have seen numerous scandals from companies like Facebook, Google and even the largest banks that were facing enormous fines for money laundering. From the user perspective, it all comes down to trust. A user is forced to trust the entity and rely on them to treat their private data safely and furthermore, not exploit it.
There are many pros and cons that we haven't mentioned here, but it should be enough to get a general idea of where we are heading. That being said, we can now venture into the topic of distributed databases.
A distributed database is a database in which all the information is stored in numerous physical locations, geographically spread across the world. There are multiple types of distributed databases, such as cloud distributed databases, but for the sake of simplicity we will focus on the ones used in the distributed ledger technologies (DLT) space. As the name itself suggests, these ledgers (another name for databases) are used by the set of technologies and applications whose most basic requirements are decentralization and distribution. While the term distributed refers to geographical spread, the term decentralization refers to having no central administrator.
A distributed ledger is a consensus of replicated, shared, and synchronized digital data, geographically spread across the world. The network has the following features:
Composed of individual nodes (devices)
Nodes are connected to each other to form the distributed network (and ledger)
Each node stores its own copy of the ledger (database)
Nodes vote via a consensus algorithm to reach consensus whenever an update to the network happens
The system formed by the nodes is completely trust-less - there is no entity the nodes need to trust
It is important to mention that DLT is an abstraction and one which has many different implementations. Examples of such implementations include blockchains and DAGs.
The above image shows the difference between centralized and distributed networks. In a centralized network the database is hosted within the central node in the middle of the graph, while in the distributed network, each node holds an identical copy of the database. It is already clear from the image that if the central node fails, the entire network is offline, while in the distributed network, any part of the graph can fail and as long as at least two nodes are running, the network will be online and available.
DLTs have quite a few advantageous properties over centralized databases. To name a few: decentralization, immutability, data persistence and integrity, no central authority, trust-less, no middlemen, dynamic, open source, pseudonymous , cryptographically secure, censorship resistant, robust, no single point of failure, etc.
Having all those properties does come at a cost though. Since all nodes within a network need to have the exact same copy of the database, this means that every new updated transaction has to propagate through the entire network. Since the network is global, this takes time and it is not instantaneous. Nonetheless, such networks run into data propagation and scalability issues. This is most likely the biggest issue that every DLT project is currently trying to solve, while still maintaining their distributed-network requirement. For example, the Bitcoin network can handle up to 4 transactions per second. As this does not represent that many transactions, the network can run perfectly fine. On the other hand, when there are lots of transactions, the network starts struggling and the act of sending transactions becomes expensive due to higher demand.
It is also important to mention, that the cons described above are just problems related to the specific implementations. The newer generations of DLTs have already come to the point where they solve these issues by design and are now tackling smaller, temporary problems. One such project is IOTA.
IOTA is the very first DLT project that went beyond blockchain with its very own new architecture of a Directed-Acyclic Graph (DAG). In the next few chapters, we will describe what IOTA is, how it works, give the reader access to multiple resources and show how to use the basic functionalities of IOTA with C#.