Under the Hood of the Blockchain

Since Satoshi Nakamoto published the Bitcoin white paper in 2008, cryptocurrencies and Blockchain (the underlying technology that sustains the cryptocurrencies) have received growing interest. The technology appeals to many organizations, including finance, political movements, technology, business, security agencies and governments. To some, Blockchain seems to be able to change the economic system, the civil society, boost the enhancements in other technologies and create new organizations where members relate in ways not seen before.

My interest in Blockchain is purely technical, though, and it is what I will discuss in this post.

What is Blockchain?

Before discussing Blockchain, let’s clarify a few terms.

Distributed Ledger Technology (DLT) is an umbrella term to describe a set of databases that several participants share. These databases are spread across several computers, called nodes, which don’t belong to a single tenant but to different participants. Additionally, the DLT databases record all transactions (transfer of value or any other event) in a Ledger, an encrypted file spread among the nodes.

So a blockchain is a type of DTL, i.e. a database shared by several tenants and spread across their computers. In the particular case of the blockchains, the Ledger is stored in blocks (batches of transactions). Moreover, there is a consensus mechanism that dictates how the participants can add data to the Ledger and how to record modifications since they can never delete transactions. Another critical characteristic is that all participants keep a copy of the Ledger in their nodes.

DLT vs Blockchain

The Blockchain was designed to fulfil the following characteristics: decentralization, anonymity, security, scalability, provenance, immutability and finality. Thus it allows all participants to share the data without risking issues in data quality or theft. We will discuss below the Blockchain components that support the overall objective of the Blockchain.

There are many blockchains, e.g. Bitcoin, Ethereum, Dogecoin, etc. They support different purposes; they sustain diverse cryptocurrencies and tokens, and use divergent consensus mechanisms. Consequently, the blockchains are organized into several categories, such as:

  • Public blockchains, when everybody can participate, e.g. cryptocurrencies.
  • Private, when they belong to a restricted number of participants, such as corporations that create a blockchain to track a supply chain.
  • Permissionless, when anyone can join).
  • Permissioned when the participants have to request access.

From Blockchain 1.0 to Blockchain 4.0

The first Blockchain was Bitcoin, which aimed to store and exchange value. I.e., Bitcoin is a cryptocurrency.

Shortly after Bitcoin, other cryptocurrencies appear in the market, and all of them belong to the first generation of Blockchain.

Eventually, Vitálik Buterin took the concept of Blockchain beyond the exchange of value, and he created Ethereum, which includes the ideas of Smart Contract and enhances the Proof of Work, the consensus mechanism used until now. Ethereum implies the advent of the second Blockchain generation.

In Ethereum, the transactions are not necessarily monetary only but include any other event, as the steps in a supply change. Regarding Smart Contracts, they are self-automated code that executes the business logic in Ethereum. Additionally, Ethereum enhances the consensus mechanism to improve scalability and make it less consuming. On a separate note, Ethereum changed the consensus mechanism this week to make it even more efficient.

It was just a natural step in the evolution of Blockchain that allowed running dApps (Decentralized Applications) on the Blockchain, such as NFT (Non-Fungible Tokens), that set up the property of unique goods as artwork. This event marked the beginning of the third generation of Blockchain.

In parallel, some people think internet activity is too constrained to the services a few corporations offer. They believe Blockchain can decentralize online applications and make them more independent of those large corporations. So they develop the concept of Web 3. An example of these Web3 applications is Presearch, a decentralized search engine focused on security and privacy based on Ethereum.

Finally, the fourth Blockchain generation aims to make Blockchain an Enterprise level technology that can integrate with different platforms and use Artificial Intelligence.

From Blockchain 1.0 to Blockchain 4.0
Fundamental purposes and features of every Blockchain generation, highlighting key technological concepts related to Blockchain 3.0.

Components

The diagram below shows the main components in all (or most) Blockchains at a high level. Then, depending on their design and objective, they may have differences. Remember that the first-generation Blockchains don’t have Smart Contracts, Distributed Virtual Machines (Ethereum Virtual Machine, EVM, for Ethereum) or oracles (external data sources to the Smart Contracts).

Blockchain components - High-level internal architecture
High-level blockchain internal architecture. It shows the most frequent parts of a blockchain and excludes the components specific to every particular blockchain.

Several publications describe in detail the different elements of a Blockchain. For example, Unblockchain by Henrique Centieiro (2021) explains many consensus mechanisms and lists all Bitcoin header fields. This blog post about the Ethereum fields is quite comprehensive. So I won’t discuss all these fields.

However, as a Data Architect, I find the fields for Data Lineage (traceability) and immutability fascinating. These fields are the core of the Blockchain principles, along with the consensus mechanisms and the digital signatures.

  • The Previous Hash chains the current block with its preceding block in the Blockchain. This field is the SHA-256 value of the previous block, i.e. the output of a cryptographic function that transforms any input text, of any length, into a 32-bit text, always the same.
  • Merkle Root Hash is a data structure that summarizes all the transaction details in the corresponding block in an efficient manner.
  • Timestamp when the block was added to the chain (mined).

By the by, the nonce and difficulty are meant to apply the consensus mechanism and add the block.

Use Cases

Apart from cryptocurrencies, other Blockchain use cases are:

  • Trace a supply chain (e.g., diamonds or food),
  • Letters of credit in banks,
  • Carbon trading credit (to incentivize the reduction of greenhouse gas emissions),
  • Insurance automatization,
  • Collectable assets (NFT), and many others.

Henrique Centeiro discusses in his book Unblockchain when not to use Blockchains. It is not a technology for all use cases, and we must carefully evaluate our use case against the characteristics of the technology, such as the immutability of the information we store.

Blockchain is not excessively fast to read or write data because it validates all the blocks’ transactions until the first one. As an example, Centeiro explains how to deploy an Ethereum node in his book. When he prepared the exercise, there were roughly 600 GB in the Ethereum, and it could take up to several days (depending on where you install your node) to copy all the Ledger in the node since it had to validate all blocks.

As shown above, Blockchain won’t be the right solution for a Data Warehouse, a Data Lake or Lakehouse, or any other functionality that requires large volumes of data. It won’t be the right fit either for near-real-time transactions or high concurrency.

Caveats of Current Blockchain Designs

It is well-known that choosing the consensus algorithm implies a trade-off between security, scalability and decentralization. When you want to design a Blockchain, you must evaluate your use case and requirements against the different algorithms and their caveats.

However, the most relevant issue for me about this technology is its integration with other applications and ecosystems. Blockchain technology is meticulous in keeping Data Lineage, Data Quality and Security in a system where several actors participate with divergent interests. To do it, Blockchain leverages very ingenious mechanisms. Nevertheless, we lose traceability and accountability when the data leaves the Blockchain. Many users rely on intermediaries to participate in a Blockchain (currency dealers, art galleries that hold NFTs, etc.). While the intermediaries keep their logs on the transactions, the final user can’t validate his/ her trade and assets with the same certainty as within the Blockchain.

As an example of the above, Moxie Marlinspike, a cryptographer and creator of the open-source encrypted messaging app Signal, ran a test in January. He created a satirical NFT. Marlinspike published his NFT in OpenSea, and when OpenSea removed the NFT from his website, the NFT also disappeared from Marlinspike’s wallet.

Unfortunately, the poor integration between the public Blockchains and the clients materializes in less ludic scenarios, such as Mt. Gox bankruptcy, a Bitcoin dealer, or the theft of assets in MyBitcoin.com.

To Know More

There are many online courses, documentation and forums if you want to mine in a public Blockchain or learn the technology for any other aspect. Your preferred search engine is your friend.

If you are interested in understanding cryptocurrencies, my favourite introduction to the topic is a Spanish graphic novel called Mr Meta. I’m afraid it wasn’t translated into other languages. It is clear, uses excellent examples, and is quite amusing.

Finally, Digital Gold by Nathaniel Popper (2016) tells the history of Bitcoin and explains its impact on finances, society and politics.

Comments

2 responses to “Under the Hood of the Blockchain”

  1. Excellent!!!

    1. Thank you, Diego 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *