My Blockchain Primer

Introduction

Distributed Ledger Technology (DLT), blockchain, bitcoin, etc. are buzzwords these days. I am also intrigued by blockchain technology and felt that I should pen down my findings and views too. As it is often said, if one cannot explain something, it means he or she does not understand the topic. Hence, here goes my amateur understanding of blockchain technology and it might help you if you have just started out on reading up on blockchain.

I will add new information to this article from time to time (when I am free).

Bitcoin? Blockchain? Somewhat similar yet different?

Bitcoin and Ethereum are both cryptocurrencies that are based on blockchain technology. (src: https://ciberia.com.br/wp-content/uploads/2017/06/a0b968dea6f20718b728506b02b95a62.jpg)

Between the terms bitcoin and blockchain, most would be more familiar with bitcoin which is a cryptocurrency and their value have appreciated quite a fair bit over the last few months or years. For the uninitiated, the underlying technology behind cryptocurrency is blockchain.

What is Blockchain?

How Blockchain works

Blockchain is not the silver bullet to all problems you are facing now.

Blockchain is a globally verifiable, tamper-resistant and operationally resilient ledger shared among the participants. As the ledger (or data set) is globally verifiable by the participants, the need for a trusted third party diminishes. The term trusted third party might sound unfamiliar to you at this juncture but trust me, you actually know it and most of the current processes have this element of trust in place.

Trusted Third Party and Blockchain

What is Trusted Third Party?

A trusted third party is an entity which facilitates interactions between two parties who both trust the third party; the Third Party reviews all critical transaction communications between the parties.

Think of online shopping, payments, identities, etc. What gave you the confidence to purchase things off a particular e-commerce platform, make payments to another party or trust the person you are communicating with?

Thinking deeper, it is probably due to accreditation by some reliable entities that you have trust or confidence in. If you trust the reliable entity and that entity trusts the party you are dealing with, it kinds of form a transitive trust relationship between you and the other party.

Would you prefer pay the seller via direct bank transfer or through PayPal? Would you trust the identity of a person furnished by a Government registrar or some social media sites? There you go, trusted third parties. (I do hope you have chosen PayPal and Government registrar).

Compromised Trusted Third Parties?

Have you ever thought what if trusted third parties are compromised? For example, having some information manipulated or undetected changes. Just like in the case of Sandra Bullock’s identity in the movie ‘The Net’ which her identity in the Government registrar was maliciously changed and several criminal charges were added to her account, causing her to be a fugitive in her own country.

No Trusted Third Party in Blockchain?

In blockchain, there are no trusted third party. Take bitcoin for example, every transaction is validated before they are stored on the globally verifiable ledger. Once stored on the ledger, it is incredibly difficult to maliciously change them (until quantum computing comes along). Essentially, the bitcoin network contains globally verifiable immutable records that majority of the participants can vouch for. Hence, the need for a trusted third party diminishes.

Why immutable? Let’s look at the blockchain technology itself.

The ABCs of Blockchain

Back to blockchain – Blocks and chains?

Not sure you know of Russian dolls? The set of dolls that you find a slightly smaller one inside (almost) every time when you open up the current one.

Russian dolls – A set of wooden dolls of decreasing sized placed inside one another. (src: http://www.bluemaize.net/arts-crafts-sewing/russian-wooden-nesting-dolls)

Now imagine that you wrote a note on the innermost doll. To read or amend the note, one have to open up all the bigger dolls to reach the it.

Blockchain is similar in the sense that they are blocks (of information) chained together to form a single chain of blocks. To amend the information in a particular block, some blocks will be affected. Hence, blockchain. How aptly named.

How are the blocks chained together?

Blockchain – a long chain of linked blocks of transaction(s). (src: http://www.ybrikman.com/assets/img/blog/bitcoin/bitcoin-block-chain-verified.png)

So how are they chained together? The header information of the previous block is used as one of the inputs to calculate the header information of the current block (i.e. the proof-of-work in the above diagram), and the header information from current block is used for the next.

Merkle Tree and Merkle Root?

That information is the value of the Merkle root, or root of the hash tree or Merkle tree. Merkle tree allows efficient verification of  the contents of large data structures. The hash values of the individual contents (with value of the Merkle root in previous block) forms the leaf nodes of the Merkle tree. They are paired (e.g. hash 1 with 2, hash 3 with 4) and hash together to form a single hash. This pairing and hashing is repeated until a single hash value is derived and that is the root of the Merkle tree – Merkle root.

A hash tree or Merkle tree is a tree in which every leaf node is labelled with a data block and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes. Hash trees allow efficient and secure verification of the contents of large data structures. Hash trees are a generalization of hash lists and hash chains. (src: http://chimera.labs.oreilly.com/books/1234000001802/ch07.html#merkle_trees)

To prove the existence of a particular item (i.e. HK), one will just need the other value in the pair (i.e. HL) and every non-leaf (i.e. HIJ, HMNOP, HABCDEFGH) along the way up to the root (HABCDEFGHIJKLMNOP). One should arrive at the same Merkle root value.

Hash are essentially unique fingerprints of data. One of the requirements of a good hash function is that they don’t have collisions. That is, no two unique data should have the same hash value.

The first block of the blockchain network is called the genesis block (or block #0).

Imagine changing a particular transaction stored in one of the blocks, you will need to calculate the entire chain of blocks following it and ensure that the block’s header hash is the same as the values stored in the rest of the participants in the blockchain network. Hence, it has a prohibitively high cost of modifying these cryptographic protected transactions.

In cryptocurrencies, there is a deliberate difficulty or handicap set in place when calculating the Merkle root. The Merkle root has to be below (or smaller than) the target. To achieve this, the miner introduces a nonce into the calculation of the Merkle tree to vary the value of the Merkle root. The miner goes through an exhaustive trial-and-error process to find the correct nonce to get a Merkle root that meets the requirement. Decreasing the value of the nonce does not necessarily means a decrease in the value of Merkle root.

Blockchain – Permissionless or permissioned?

The blockchain network can be operated as a public network or private network. In a public or permissionless blockchain, anybody can participate as various roles (e.g. validator, user). On the other hand, only allowed parties can take part in these activities in a private or permissioned blockchain network.

Which setup is better? Honestly, it depends on the type of problem you are trying to solve. Regardless of which setup you choose, I felt that operational resilience should always be considered. Can your blockchain still deliver its intended benefits if there are malicious users or participants? Blockchain is not the silver bullet to all problems you are facing now.

For payment networks that users are global and can perform fund transfers between them, it makes whole lot of sense for it to be a public blockchain with the necessary smart contracts in place to perform all the necessary consensus checks before fund movement occurs.

On the other hand, to protect the integrity of Intellectual Property (IP) works of an R&D firm, one might opt for a private (or permissioned) blockchain network to ensure the consistencies and integrity during the life-cycle of the intellectual property in information systems.

Are all my private data now public information?

What about Data Privacy in Blockchain?

Everybody gets a copy of my data so does it mean that my private data is now publicly available?

Well, yes and no. Remember that I mentioned about it being globally verifiable? The good thing is that it need not be stored in whole and in clear on the blockchain.

The blockchain network could just store the hash of your data and it would prove the integrity and existence of the original piece of information stored in your own private system. In this setup, the hash is public while your data are still private.

Of course, there are other techniques like data encryption if you choose to keep the data on the blockchain network.

Blockchain platforms

What are the available Blockchain platforms?

When I say platform, it is really referring to the technical platform providing blockchain capabilities. They support the various blockchain-based services or solutions we see today, e.g. cryptocurrency (e.g. bitcoin, Ethereum), supply chain (e.g. OTDocs).

Some of the popular blockchain platform that I know of are the Hyperledger Fabric and Ethereum.

I was fiddling with both Hyperledger Fabric (0.6 and 1.0) and I would say that it takes a while to get the hang of understanding the components (e.g. orderer, endorse, chaincode, peer, certificate authority). Mix in stuffs like Practical Byzantine Fault Tolerant (PFBT) blockchain setup is enough to make one go sleepless for a couple of nights.

The great thing for Hyperledger Fabric 0.6, there are quite a fair bit of articles on setting up Hyperledger Fabric 0.6 network with PBFT on Docker. Tough luck for Hyperledger Fabric 1.0. Luckily I managed to get Hyperledger Fabric 1.0 with PBFT up on my Windows Docker.

Formal verification of Blockchain platforms?

Til date, I have yet to see any party doing a formal verification of Hyperledger Fabric. Not sure if there are any done for other blockchain platforms though.

Consensus mechanisms

Proof-of-Work and Proof-of-Stake?

It is (financially) cheaper to attack a proof-of-stake system as attackers do not face technological or economical disincentive.

In earlier sections, we mentioned about proof-of-work which miners performs expensive computer calculations to solve a block (and get the block rewards). Proof-of-work is just one of the distributed consensus mechanisms.

Proof-of-stake is another distributed consensus mechanism. In proof of stake, a Turing-complete or deterministic algorithm is used to validate each transaction and block. There are no block rewards and miners get the transaction rewards instead.

Comparing between proof-of-work and proof-of-stake, one glaring difference is that less computational power is required in a proof-of-stake system. It is also means that it is (financially) cheaper to attack a proof-of-stake system as attackers do not face technological or economical disincentive. Malicious actors can more easily perform a 51% attack on a proof-of-stake system.

Hence, the algorithm used in a proof-of-stake system needs to be as bulletproof as possible to cut out malicious actors.

How fast is Blockchain or it’s transaction rate?

Transactions per second of various implementations

At the end of the day, it is a fine balance between cyber security, operational resiliency and business objectives (of using blockchain). You can’t have your cake and eat it too.

For information systems, we are often concerned about its transaction rate or how many business transactions can the system handle in a given amount of time. Typically, we term it as the number of transactions per second (TPS).

Based on online findings, bitcoin does 3 TPS, Ethereum supports (on the average) 5 TPS and while Hyperledger Fabric’s current performance goal is 100,000 TPS, it only managed to hit 700 transactions per second according to some users. To put these figures into perspective, VISA network processes an estimated 45,000 transactions per second at its peak.

What is ‘Confirmation’ in blockchain?

In blockchain, transactions first go into an unconfirmed transaction pool. This pool contains non-validated transaction that will eventually be picked up to be processed by the miner or endorsing peer into a block (of transactions). The TPS figures above means how fast transactions take to be validated and included into a block.

If it is the top-most block, it would be considered as 1 confirmation. If there is another newly minted block after it, it would be considered as 2 confirmations. The number of confirmation increases as new blocks are minted (after the block containing your transaction).

As I have highlighted in earlier sections, the difficulty of manipulating data in a block increases as more blocks are minted after it. Hence, there can be rules in place to enforce that transactions are considered valid only when there are 6 confirmations (i.e. 5 blocks after the block containing your transaction). This greatly increases the tamper resistance of your transaction.

So why does my cryptocurrency transaction takes so long?

For cryptocurrencies, aside from the TPS and confirmation rules (i.e. number of confirmations required), the fees that you pay the miners or endorsing peers for processing your transaction also affects how readily your transaction will be picked up from the unconfirmed transactions pool. Essentially, the more fee you pay, the more miners are incentivized to process your transactions (before others).

So how do I speed up my own blockchain network?

The transaction speed depends on the hashing speed of your miners or endorsing peers, namely, a function of memory, CPU or GPU, configuration, etc. As we have learnt that the blockchain network is not solely just transaction speed, confirmation rules also plays a part as it affects the time taken for your transaction to be valid and recognized by the network.

At the end of the day, it is a fine balance between cyber security, operational resiliency and business objectives (of using blockchain). You can’t have your cake and eat it too.

Usage #1 – Protecting integrity of infrastructures with Blockchain

The CIA of information

The basics of cyber security always circles around the Confidentiality, Integrity and Availability (CIA) of information. Data breaches is a compromise to confidentiality; data manipulation is compromise to integrity; and denial of service attack is a compromise to availability.

Cost of compromising Confidentiality versus Integrity

Confidentiality Breach Integrity Breach
Your car's telematics Your driving pattern is exposed Your car reports that braking system is fine when it has failed
Your pacemaker Your heartbeat pattern is made known publicly It could be maliciously shutdown, causing death
Your medical records Your allergies are known to public Your allergies are erased causing allergens to be administered during medical care
Your transport fleet Your competitors know how many of your platforms are serviceable Your competitors can affect your decision in assigning platform for jobs
Your destination GPS location Your destination is known to public You can be redirected to another location

Hash has pretty much been a tamper-evident seal on a digital asset. When you download files, you check against it’s published MD5 to validate it’s integrity.

So how can blockchain help?

When protecting sensitive records, the danger is often the fact that they can be altered, deleted, maliciously changed, affected by hackers, malware, etc. For this, blockchain can prove the integrity of the record during its life-cycle in information systems.

While hash serves as a tamper-evident seal on digital assets, the storage of the hash is not tamper resistant. Malicious actors can manipulate both the digital asset and hash and it will go undetected.

In earlier section, we mentioned about storing of hash onto the blockchain to provide tamper-resistance to the hash value.

Guardtime’s KSI

Guardtime’s Keyless Signatures’ Infrastructure (KSI) technology leverages linking-based time-stamping system to provide both proof of time and integrity of digital assets. KSI only uses hash-function cryptography, allowing verification to rely only on the security of hash-functions and the availability of the public ledger (i.e. blockchain).

A user interacts with the KSI system by submitting a hash-value of the data to be signed into the KSI infrastructure and is then returned a signature which provides cryptographic proof of the time of signature, integrity of the signed data, as well as attribution of origin i.e. which entity generated the signature.

One point to note is that records themselves are not stored in KSI. What is being stored is just a series of hash values that show every time a file or data is updated.

Do I need Guardtime’s KSI to achieve it?

Well, not really. One can build a linking-based time-stamping system for hashes on blockchain. It is just a matter of capital expenditures (CAPEX) and operating expenses (OPEX) of building your own system versus a turnkey.

For a proof-of-concept, perhaps it will be good to build your own to see if it fits your needs.

Some queries I have currently regarding this

One thing that has been bothering me in such system is what goes into validating the data before it get persisted on the blockchain. In cryptocurrencies, validations like ensuring there are no double-spending or spending when your wallet is empty. But what goes into KSI?

 

Treat shadowandy!

If these step-by-step guides have been very helpful to you and saved you a lot of time, please consider treating shadowandy to a cup of Starbucks.