Data Processing in network of Blockchain
From Slide Number 70 -74
Let's take a look at the third component of the blockchain database, at how data are sent
and received over the network. Say you use a client application like a Bitcoin wallet to
generate some transaction data, if you are using a traditional database to record it, this
process is easy. You just send it over to the web address of the database server, and it will be
quickly received and processed.
If you're using a relatively centralized blockchain with a limited number of nodes, like ripple,
this process is also easy. You just broadcast the data to all the nodes simultaneously, and
they'll receive and process the data in a relatively short time frame. Things get a little bit more
complicated. If you're using a distributed blockchain like Bitcoin instead of a few central
parties, you get a network with hundreds of 1000s to millions of nodes because of network
latency, if you're trying to get the transaction uploaded to all the nodes, it would take a really
long time. Therefore, in a distributed blockchain, some compromise has to be implemented
to make this process more time eFicient. The setting will look familiar to you, if you have used
peer to peer file sharing apps like BitTorrent.
Say you have a network of nodes carrying the blockchain.
Let's use Bitcoin as the example, where the network of nodes are also called miners.
Suppose you want to send some bitcoin that you have to another person, and this means
getting the transaction record uploaded onto the Bitcoin Blockchain. So you'll use your
wallet app to generate the transaction data, which, as we see shortly, is just a programming
script. But instead of sending it to the entire network, you're going to broadcast the
transaction data to a few miners that are closest to you in terms of network latency, and the
miners will propagate the data on the network using what's called a gossip protocol, as this
name suggests. The protocol is similar to how a gossip was spread among a network of
friends. You wouldn't shout the gossip aloud to everyone, but instead you whisper it to the
people sitting next to you. Same process here, the miners will propagate the transaction to
the miners that are closest to them in terms of network latency, and this process goes on
and on until the entire network has received the data.
From Slide Number 75 -77
Graph theory suggests that this form of information propagation is more eFicient than the
setting where you broadcast the information to everybody. In practice, though this has two
limitations.
1. The obvious one is that despite being more eFicient, it will still take some time for the
information to go through, particularly if the network is large, or the data is large in
size, it could still take minutes or even hours. Therefore, most blockchain variations
implement some sort of time cutoF. For instance, in Bitcoin, every block is cut at
roughly every 10 minutes. If your data doesn’t make it to the block, you'll have to wait
for it to get onto the next block 10 minutes later.
This process, however, introduces a second problem.
2. Sometimes the 10 minute cutoF is not enough time for the data to be propagated
across the entire network, so at the end of every interval, some nodes, whenever
received it, some nodes wouldn't. They would have a diFerent set of transaction
information received from other wallet apps closest to them. This conflict has to be
resolved, and this is the reason for the fourth blockchain component, the consensus
mechanism.
From Slide Number 78 -81
Now that we know how the data is propagated across the network, let's take a closer look at
the data themselves and see how they are generated and received. Again, as I said before,
most blockchains use a scripting language, so the data broadcast and stored on the
blockchain are essentially some programming code and the associated inputs and outputs.
This is an example of the Bitcoin transaction data that any wallet could generate. The input
has a hash pointer containing the hash of a previous transaction, proving that you have
received the Bitcoin next script is just a script that uses your private key to sign the
transaction as an output you're going to put the amount that you want to send the receiver's
quote, unquote shipping address. Which is just a hash of their public key. And finally, a
couple of scripts for the miners to execute to check the validity of the transaction, including
checking the signature to make sure that it matches your public key, and also checking the
transaction amount to make sure that you're not spending more than you have. And that's
essentially how your data is processed in a distributed blockchain like Bitcoin.
The transaction data is generated as a script using the client app and signed using your
private key. When the miners receive the script, they execute them to check the transaction
has the valid signature and the amount, then propagate them to the other miners, once it
oFicially makes to the blockchain. After the consensus is reached, the receiver can then
repeat the process and send the coin that they just received elsewhere. The final piece of the
blockchain data processing puzzle is the notion of blocks.
From Slide Number 82 -86
Blocks are there simply to serve as a quote, unquote batch processing mechanism to
enhance the eFiciency and enforce a time cutoF for data to be broadcast across the
network. In this feature, the blockchain network is similar to the ACH network that we talked
before, instead of the credit card network, sure you can use each transaction as a block and
process them in real time, just like a credit card transaction, but usually it's more eFicient to
do at least some batch processing like an ACH transaction. DiFerent blockchains use
diFerent parameters on the block size and time intervals, and there's a lot of flexibility there,
from around 10 minutes in Bitcoin to between 10 and 20 seconds for Ethereum. During this
interval, the nodes receive the data as usual, and instead of sending each one through in real
time, they're going to group them into a pending block. And again, because of network
latency or even attack attempts, this pending block could be diFerent for each node, as each
pending block would contain a diFerent set of transactions that the nodes have received so
far. At the end of each interval, the nodes are going to reconcile their pending blocks using
some consensus algorithm so at the end, only one block makes it to the chain and is
downloaded by all the nodes. And this process would repeat within each block, the data is
usually organized by the nodes themselves, using, for instance, Merkel trees. Here, the
important part is, because of the decentralized nature of the network, there is no hard and
fast rule that requires the nodes to organize the data strictly in a first come first served basis.
In fact, the nodes have complete discretion on whether to receive the data and how to
organize it within the block. In many cases, you might have to pay to get your data received
and broadcast, and as we'll see in the module on cryptocurrencies, this could serve as an
important incentive for the nodes to participate in the network you.
Network latency
Network latency refers to the delay experienced when data travels across a network,
essentially the time it takes for a request to be sent and for the response to be received. It's
often referred to as "lag" and is measured in milliseconds. Low latency means faster
response times, while high latency leads to noticeable delays.
Miners
In a blockchain network, miners are a specific type of node that use computational power to
secure the network and add new blocks of transactions. They compete to solve complex
mathematical problems, and the winner is rewarded with cryptocurrency. This process is
known as mining, and miners are essential for validating transactions and maintaining the
integrity of the blockchain.
Gossip Protocol
In blockchain technology, the gossip protocol is a decentralized communication
mechanism that facilitates the eFicient and reliable exchange of information between nodes
in a network. It's a core component in maintaining the robustness and scalability of
blockchain systems by enabling rapid dissemination of data like transactions and blocks.
Consensus mechanism
A consensus mechanism in blockchain is a protocol that ensures all network participants
agree on the state of the blockchain ledger and the validity of transactions. It's a self-
regulatory system that synchronizes the network, preventing issues like double-spending
and ensuring data integrity. By reaching a consensus, the blockchain maintains a single,
agreed-upon version of its history, rather than each node maintaining its own copy.
Block based batching
Block-based batching, in a nutshell, means grouping similar tasks or items into defined time
blocks or chunks, then processing them as a single batch. This approach helps improve
productivity by reducing context switching and focusing on one type of task for a set
period. It's often used in conjunction with time blocking, a technique that involves assigning
specific time slots for diFerent activities.
Merkel Tree
In blockchain technology, a Merkle tree is a binary hash tree that eFiciently summarizes and
verifies data, particularly transactions within a block. It allows for eFicient verification of
data integrity and inclusion without needing to download the entire blockchain.