Blockchain Module 1
Blockchain Module 1
Study Material
Course Name: Block Chain Applications
Module – No.1
By
1.1 Introduction
For the past few decades, there had been a rise of many applications over the internet that solves real
time problems in a collaborative and decentralized manner. A numerous such applications are popular
and common universally. However, the concept of digital currencies exists since 1980s, but it took
more than two decades to make a decentralized solution possible. Over the past, digital currencies used
a central authority to store and maintain the transaction records. B-Money, Bitgold, RPOW are some of
the examples of a centralized approach. Later, distributed solutions to store the transactions of currencies
were developed to eliminate the need for central authorities like bank. However, these currencies had a
risk of double spending. In other words, it is possible to make two different transactions with same coins
which are not possible in centralized approach. Moreover, to get an agreement on distributed
information and maintaining a consistent state in distributed environment led to Byzantine Generals
Problem. Therefore, Quorum systems are developed where malicious users and faulty information are
accepted by the system. But the concept of voting in quorum systems i.e. an information will be
accepted if majority of the users have voted for it have led to Sybil attacks. Quorum systems also gave
rise to temporary inconsistencies.
In 2008, Satoshi Nakamoto developed a bitcoin designwhich overcomes all the above mentioned
difficulties. Bitcoin becomes widespread immediately due to its combined contributions from previous
research works. Bitcoin uses a unique feature named Proof-of-Work to restrict the number of votes per
entity thereby making a decentralized approach real. The nodes of a decentralized bitcoin network is
called miner. The miner collects all transactions of the network in blocks. A collection of such blocks
linked via some cryptographic mechanism is called Blockchain. Blockchain Technology is a
distributed, decentralized, peer-peer network to store transactions of the network without any third party.
This technology came to limelight with the introduction of Bitcoin network. It allows the bitcoin users to
transmit their rights to information to another bitcoin user publicly over the network. Blockchain allows
the nodes to verify and manage the network. The cryptographic hashing mech- anism used in blockchain
lets the data in blocks to be tamper-proof and secure. In bitcoin blockchain the users are enabled to be
pseudonymous which means the trans- actions are available public but their identities are not. Some of
the key characteristics are mentioned below.
Ledger: An open, append only ledger is used by blockchain to record the transaction history. The data
in this ledger cannot be modified unlike the traditional databases.
Secure: Blocks in blockchain are cryptographically linked and does not let the data to be tampered
thereby assuring the security of information over blockchain.
Shared: The public ledger can be shared among the users of the network thus assuring the transparency
among the users.
Distributed: The blockchain is distributed among the users of the network which makes it strong against
the attacks. By increasing the number of nodes in the network the security of the information on
blockchain is high.
May be open source or closed source platforms Open source platforms available for anyone to
download
Possible to restrict read and write access to users Anyone in the network can read or write transactions
over the blockchain
Uses consensus methods but not necessary to keep Users’ needs to adapt consensus mechanism in
or maintain expensive resources order to avoid malicious users to publish blocks
Users identity are known and authorised Users identity are pseudonymous
At the top-level, widely-known cryptographic concepts like hashing, asymmetric key cryptography,
digital signatures combined with principles in record keeping are used in Blockchain Technology. Some
of the key components like Addresses, Blocks, transactions, hashing and digital signatures are
explained below in detail.
Most of the blockchain implementation uses Secure Hash Algorithm (SHA) that generates an
output of size 256-bits. SHA-256 hashing algorithm produces an output of 32 bytes (256-bits) usually
displayed as 64 hexadecimal characters. Other than SHA-256, Keccak and RIPEMD-60 are some other
hashing algorithms used in blockchain network. Hashing techniques are used for various operations in
blockchain such as address creation, securing block data and block header etc.
Another key component used is Nonce which is used in proof-of-work consensus mechanism. A
nonce is a random number that is combined with the block data to produce different hash output. The
proof-of-work consensus model functions by adjusting the nonce value and provides a method for
obtaining specific output values by retaining the same data. The structure of a block is given in Fig.
1.1.
Figure 1.1 shows the block structure divided into a block header and a body. The body of a block
consists of all the transactions of that block whereas block header in turn has the hash of the block, hash
of the previous block, the nonce value, timestamp denoting the time details of the block published and
the merkle root [7] which is nothing but the consolidated hash value of the all the hashes of the
transactions.
1.1.3 Transactions
A transaction in blockchain means an interaction between two entities. In case of cryptocurrencies, the
transfer of bitcoin or any other cryptocurrency from one user to the other is called as transaction whereas
in a business scenario transfer of ownership or activities involved in digital assets is considered as a
transaction. Every block contains zero or more transactions. The data included in a transaction
generally are transaction input, output, sender’s address, sender’s public key, and a digital signature.
Although mainly used for the transfer of digital objects, transactions may commonly be used for data
transfer. In a basic scenario, someone might only choose to publish data public on the blockchain
forever. Or it can be used to transfer and process the data and then stores the result in blockchain as in
smart contracts system. More than the data and the transmission validity and authenticity of the
transaction are vital. Validity of the transaction assures that the transaction adheres to the blockchain
implementation protocols and authenticity implies that the sender of the transaction has access to the
digital assets that are transmitted. The sender of the transaction digitally signs it by using his private
key and can be verified by anyone using the sender’s public key.
Public key cryptography otherwise called as Asymmetric key cryptography is used in blockchain for
various operations. Asymmetric key uses two key namely public and private key which are
computationally related to each other. Out of these, the public key can be viewed by anyone whereas
private key is kept secret. However, it is not possible to compute the private key using the openly
available public key. At the same time, it possible to encrypt the data using a private key and can be
decrypt using the corresponding public key. This process can be vice-versa. This cryptographic
mechanism assures users the authenticity and integrity of data but maintains transparency at the same
time. However, the process of encryption and decryption in asymmetric key cryptography is considered
slow to compute. But, symmetric key encryption methodology where a single key is used for
Department of Information Science & Engineering
4
encryption and decryption is easy to compute however there’s a need for trust among the users to share
the keys. Therefore to simplify the process the data is encrypted using symmetric key technique and
then the symmetric key in turn is encrypted using asymmetric key technique. In this way the speed of
asymmetric key technique can be greatly improved.
Addresses are usually a short alphanumeric characters used as a sender and receiver’s transaction point.
A hash function is used to derive user’s public key. Different blockchain implementation uses different
ways to derive the addresses. And the users of blockchain network have to store their private keys in a
secure place. Instead of storing them manually, software is used to store them. The software used to
store the private keys is called as wallets. Apart from private keys the wallets can store the user’s
addresses and public keys as well. Wallets are used to calculate the number of digital assets owns by a
particular trusted user.
1.1.6 Blocks
The transactions made by the blockchain users are submitted to the network via software like web
services, mobile applications and so on. Once a transaction is submitted, the software sends it to a
particular node or to a set of nodes. This does not mean that the transaction is added to the blockchain.
The transactions would be in queue of the publishing node and will be added in blockchain after the
node publishes a block. A block includes a block header where the metadata of the block is available
and a block body where all valid transactions will be included. The metadata of the block varies based
on the blockchain implementation. A general structure of a block can be referred in Fig. 1.1. These
blocks are chained together through the hash of the previous block and form a blockchain. For instance,
any data change in any of the block will result in a different hash and will be reflected in the subsequent
blocks. Hence it is easy to identify whether a block has been tampered or not.
Key feature behind the technology of blockchain is to determine the user to publish the blocks. As the
network node publishes a block they would be rewarded with cryptocurrency. Due to this, it is possible
for many nodes to compete for publishing nodes. This problem can be solved using consensus
mechanisms. This allows a group of users who don’t trust each other to work together. There are several
consensus models being used such as Proof of Work (PoW), Proof of Stake (PoS), Proof of Authority
(PoA), Proof of Elapsed Time (PoET) etc.
The consensus mechanism is a process of decision making where the network users agree and
support a decision for the betterment of the network. In consensus model, in order to add a block in
blockchain a miner (node) has to solve a crypto- graphic puzzle. The process of solving the puzzle
requires huge computation and is hard to solve. Once the puzzle is solved it’ll be broadcasted to the
network for verification. Once verification is successful, the block will be added to the blockchain.
In 1994, Nick Szabo developed an automated transaction procedure that implements the terms of an
agreement or contract. Blockchain performs transactions in a pre- agreed fashion where the
participating entities agree upon the contractual terms. The main objective of a smart contract is to
execute the terms and conditions of a contract automatically thereby minimizing the need for
Department of Information Science & Engineering
5
intermediaries. Smart contracts are cluster of data and code or programmed applications that are
implemented via digitally signed transactions over the blockchain network. The execution of smart
contracts is done by the nodes of the network and the results are stored on the blockchain. Regardless
of the number of nodes executing the smart contract the result of the execution must always be same.
Various operations can be performed using a smart contract like some computations, providing access,
storing information and even reverting back the financial transactions. It is to be noted that not all
blockchain models can execute smart contracts. Bitcoin blockchain do not support smart contracts but
uses some scripting languages to offer limited programmability. Whereas, Ethereum and Hyperledger
run smart contracts built over them. The programming language used to write smart contracts are
Serpent and Solidity. However, the most widespread language is solidity.
Over the years, blockchain has evolved rapidly that it provides many more solutions than just the
decentralization of cryptocurrency. While Bitcoin Blockchain is consid- ered as a first generation
blockchain, Ethereum and Smart contracts forms the second generation blockchain and the development
of Decentralized Apps (DApps) are the third generation blockchain models. Bitcoin blockchain enables
the financial trans- action in a decentralized way and eliminates the need for trusted third parties. The
transactions are based on public key cryptography and digital signatures. The nodes that validate the
transactions uses a PoW mechanism based on Hashcash and SHA- 256 hashing algorithms. Though it
is claimed that the users of bitcoin blockchain can remain anonymous but, it is possible to trace back the
transaction and find the iden- tity of the users. Hence the users are pseudonymous. The users were
rewarded with incentives i.e. bitcoins for publishing the blocks. However scalability played a major
drawback in bitcoin blockchain. Moreover, it is not well suited for general purpose applications due to
its limitations. Thus in 2013, Ethereum a general purpose blockchain platform was developed.
Ethereum addresses most of the scripting and transaction limitations of bitcoin blockchain. Therefore
Ethereum led to the develop- ment of smart contracts a small programmed application to be stored and
execute over the blockchain network. Smart contracts enable automatic execution of conditions while
validating the transactions. Therefore it reduces the cost involved in verifica- tion, fraud prevention and
many more and ensures transparency. Though it provides lot of advantages, Ethereum smart contracts
does have some limitations like, complex programming languages to write smart contracts, difficult to
modify or end the smart contracts once executed and so on. However, with the growing economic
demand, Ethereum could not support huge volumes of transactions. Therefore, Blockchain is
increasingly heading to decentralized web, incorporating systems for data collec- tion, smart contracts,
communication networks and open standards. This paved the way for DApps which means
Decentralized Applications whose backend runs on a Blockchain network and its front end has a user
interface of any programming language. DApp is open source and uses decentralized consensus
mechanisms. With growing popularity of DApps it is integrated with many industrial applica- tions
thereby enabling a cross chain communication.
Due to its salient features blockchain is applied not only in decentralized cryptocurrencies but much
beyond that. Blockchain can change the business transactions models and protocols of managing assets,
E-voting, renting a car, watching a movie and many more. It widens its applications in major sectors
like FinTech, Healthcare, Governance, Supply chain, Manufacturing Industries, Insurance, Education,
IoT, Big Data systems and Machine Learning etc.
The above mentioned areas are some of the applications where blockchain is revolutionizing but not
limited only to those domains. There are many more areas where researchers are trying to fit in
blockchain in order to utilize its entire potential.
Even though Blockchain Technology has numerous potential in it there are certain challenges that limit
the application of blockchain on a wider range. Few major challenges can be as follows.
1.4.1 Scalability
Due to the increase in the number of transaction every now and then, the size and volume of lockchain
also getting large day by day. Every node has to collect all the transactions and validate them on the
blockchain. Besides this, blockchain has a restriction on block size and the amount of time taken to
publish the blocks only 7 transactions per second can take place. This may not suffice the requirement
of processing a large amount of data in real-time. And moreover since the size of the block is small
miners tend to prefer validating transaction with higher fee due to which smaller transactions gets
delayed. Some developments to resolve these issues are storage optimization and redesigning of
blockchain.
Blockchain is more prone to attacks like this. Selfish Mining is a strategy where an over ambitious
miner secretly keeps his blocks without publishing it. It would be revealed to the public only if some
conditions are satisfied. This secretly mined private chains which are longer than the current openly
available chain, all other miners would agree to it. As a result honest miners would have wasted their
resources on a chain that is going to be abandoned. In this way selfish miners may be rewarded with
higher incentives. Likewise blockchain is susceptible for many attacks like Sybil attacks, Double
spending , 51% attacks and so on.
Nevertheless, Blockchain has been transforming both the industry and the academia with its distinct
properties like decentralization, anonymity, integrity and transparency. The applications of blockchain
have gone beyond cryptocurrencies and transactions. The decentralization nature of blockchain over
the already existing internet is very interesting in terms of data redundancy and survivability. Out of
some solutions blockchain is the perfect solution for problems where trust is of key concern. Even
though blockchain has not reached its maturity it still continues to suit applications of different
domains globally.
Bitcoin is an interconnection of computing nodes where the source code of bitcoin is deployed and
stored in its blockchain. While the collection of transactions is known as a block, blockchain is a
collection of blocks. All the nodes that run the blockchain have the same collection of blocks and
transactions and transparently see the new blocks being added with new blockchain transactions. To
realize the wicked act, miners need to gain the hash rate 51% or more (known as the 51% attack) that
makes up a bitcoin. Although, such an attack is still theoretical because bitcoin currently has more
than 10,000 computing nodes which are growing consistently, making such attacks improbable [1].
Blockchain is progressing & recasting the industry of information technology with superior
security, competence, and flexibility. There are various use cases of Blockchain Technology which
majorly include cryptocurrencies such as Bitcoin, Ethereum, Litecoin, Ripple etc. [2]. Bitcoin is a
cryptocurrency popularly used for peer-to-peer and decentralized payments where the transactions
are performed without an intermediary. Transactions performed using Bitcoin are verified by inter-
mediate nodes in the network. These transactions are then registered in the globally accessible ledger
called Bitcoin Blockchain. The novel concept of Bitcoin was orig- inally presented by the mysterious
Satoshi Nakamoto in 2008 and was realised as open-source software in 2009 and since then it has
surfaced as the most favourable cryptocurrency among all its competitors thereby adding billions of
dollars to the economy within few years. As, Bitcoin employs P2P network that does not use any
external bodies such as banks or any other electronic financial service provider for supervises and
observes the validation or approval of transactions. Bitcoin has progressively drawn the attention of
the public and advancing with increasingly more customers connecting with the payment system as
it is now described as revolutionary, fast, tax-free, and convenient digital currency [3].
Being a cryptocurrency based on account entries, the bitcoin is described as surplus remaining in
bitcoin account. Bitcoin accounts are described as Elliptical Curve Cryp- tographic Key Pairs. Bitcoin
employs Elliptical Curve Digital Signature Algorithm (ECDSA) to make sure that electronic funds are
spent by the legitimate user. ECDSA is the cryptographic algorithm that has the curve specification
secp256k1 that signi- fies the private keys with size of 256-bit [4]. Secp256k1 is used to refer to the
essential parameters of elliptic curve employed in asymmetric key cryptography for bitcoin
blockchain and is described in the specifications for the competent cryptography. ECDSA
conceptualizes the following [5]:
(a) Private key: randomly generated secret number which is known to the entity who has generated it. In
bitcoin blockchain, users that possess the private-key can spend funds using the blockchain. The private-
key is a single unsigned integer of 256-bit.
(b) Public key: number computed using the private key that is not kept secret. It is used to ascertain whether
the signature is authentic i.e. generated with proper key while keeping the private-key secret. In bitcoin
blockchain, public-keyscan be either uncompressed (i.e. 65 bytes, prefixed with 0 × 04 and followed by
2 keys of length 256-bits or compressed (i.e. 33 bytes, prefixed with 0 × 02 or 0 × 03 and key-length 256-
bits).
(c) Signature: number, generated mathematically using a hash of something that is to be signed, plus a
private-key.
A bitcoin block is a container that amasses transactions arranged linearly over a period in the globally
distributed ledger, blockchain. Transaction’s data is persistently stored in files known as blocks. It is
data structure just like individual pages similar to the pages of the record book or bank’s transaction
ledger. A bitcoin block has a header and list of transactions. The transaction list takes the maximum
size of the block. Various fields, their description and size of a block are depicted in Table 2.1.
Block’s Header
The header of a block consists of three sets of information [4]. They are:
(a) references to the immediately previous block connecting to the current block.
(b) metadata set to relate the mining competition, i.e., difficulty, timestamp, and nonce
(c) the data structure, Merkle root, to describe all the transactions in the block.
The cryptographic hash is main identifier which is also considered as digital fingerprint. The block
header is hashed twice by SHA-256 hashing algorithm to compute the cryptographic hash. This
results in block hash which is a 32-byte hash or more precisely block’s header hash because it is
calculated using the block header. For instance, the hash of the very first block header created for
bitcoin blockchain is 000000000019d6689c085ae163431e934ff763ae46a2a6c172b3f1b60a8ce26f.
Hash
value of the header is utilized to identify the bitcoin block unambiguously and more importantly it is
the unique identifier for a block. Hash of the header can be derived autonomously by any node by
hashing the header of the block. Every node calculates the hash of the new block as it is received It is
then stored in a separate autonomous table as a metadata of that block. This facilitates indexing and
speedy retrieval of the blocks from the disk. This hash is neither encapsulated in the data structure of
the block nor transmitted in the network along with the block, nor stored as persistence storage of the
block.
The most crucial part of the bitcoin system is a transaction. These are data structures used to cipher the
funds transfer from the source of the fund, known as the input, to a destination known as an output in
the bitcoin system. Every transaction is needed to be created, validated, propagated, and incorporated
to the public balance sheet of the transaction and then entered in the bitcoin’s blockchain. There are
various fields of the transaction. These are shown in Table 2.2.
One of the elementary components of the bitcoin transaction is Unspent Trans- action Output
(UTXO). They are inseparable blocks of bitcoin locked to a specified proprietor and reorganized as
units of currency by the unified network. The bitcoin’s network keeps track of the ready-to-use UTXO.
The amount is saved in blockchain in the form of UTXO whenever any user receives bitcoin and
might be outspread as UTXO among a large number of transactions. The concept of bitcoin balance
is deduced by wallet application. The blockchain is scanned and all the UTXO belonging to the users
are aggregated by the wallet to calculate the users’ balance. The value of the UTXO can be arbitrarily
designated as multiple of satoshis. Bitcoin is divided into 8-decimal places similar to the dollars which
is divided into 2-decimal places. Once UTXO is created, it cannot be divided. Therefore, if it is larger
than its required value, it must be consumed completely, and changes must be reflected in the
transaction. That is if there are 30 bitcoin UTXO and only 2 UTXO are needed to be spent then the
transaction must completely eat up the 30 bitcoin UTXO and produce the following two output: (a)
payment of 2 bitcoin to the desired recipient
and (b) payment of 28 bitcoin as the change back to the wallet which is at hand for the transactions
to come.
UTXO that are exhausted for a particular transaction are known as transaction inputs and the
UTXO that are constructed through the transaction are called trans- action output. In this manner, the
clusters of the values of the bitcoin travel forward from one owner to the other to form a series of
transactions that consume and create UTXOs. Signature of the current user is used by the transactions
to unlock the UTXO and then consuming it. Transactions create UTXOs and lock them to next
owner’s bitcoin address.
Transaction Input
The pointers to the UTXO are called transaction input. These are transaction hash and sequence
number of the a UTXO in the bitcoin blockchain. It includes the scripts for unlocking for spending
UTXO. These scripts must meet the requirements of the spending conditions that the UTXOs has set.
This is a signature that proves the possession of the bitcoin address in locking script. The wallet of
the user selects from the pool of the remaining UTXOs and creates transaction. For an instance, if the
payment to made is of 0.020 bitcoin, the wallet app selects 0.010 UTXO and adding them up for the
payment. After UTXO selection, unlocking scripts are produced by the wallet and making the UTXO
eligible for spending by satisfying the locking script conditions. The unlocking scripts contain the
signatures for every UTXO. The wallet then adds unlocking scripts and UTXO references as input to
the transactions3 Bitcoin Mining.
In blockchain, mining is appending a new block at the end. In bitcoin network, mining process
adds a new bitcoin to the electronic fund supply. Mining nodes are the specialized nodes on the
bitcoin network. Such nodes listen for the new block that is propagated on the bitcoin netw ork.
It also helps to safeguard the bitcoin network against dishonest transactions more- over preventing
transactions from paying out the same amount of bitcoin again and again which is commonly known
as double-spending. In turn, the miners get rewarded for providing the processing power to the bitcoin
network. They play a vital role in validating new transactions and documenting them on the
distributed ledger. After every 10 min, a newly mined block that contains the transaction that occurred
since the last block is mined, i.e., the most recent transaction. These transactions are incor- porated
inside the block after which they are added to the blockchain as confirmed transactions allowing the
possessor of the bitcoin to spend whatever they have gained in those transactions [4].
The mining nodes participate and compete for working out a difficult-to-solve cryptographic hash
algorithm based mathematical puzzle. In turn, they earn two types of rewards: (a) new coin that is
generated after each block has been mined and, (b) fees for all the validating and recording the
transaction. The solution to such mathematical puzzles is called PoW i.e., Proof-of-Work. The battle
of solving the PoW algorithm form the basis for the security model of bitcoin. The process of mining
facilitates the monetary supply for Bitcoin which is similar to the banks that issues the new money by
printing currency notes. The number of bitcoins that can be added by the miner drops roughly after
every four years which is almost every 210,000 blocks. Initially, the number of bitcoins that can be
added per block were 50 in January 2009 which declined to 6.25 bitcoin every block on May 11, 2020
[6]. In this manner, there is an exponential decrease in the reward of the miner and until 2140
approximately all the bitcoin i.e. 20.99999998 million will be issued and no new bitcoins will be
Department of Information Science & Engineering
1
2
issued.
Every transaction includes a transaction fee. This fee is an overabundant bitcoin between inputs
and outputs of the transaction. The miner winning PoW challenge gets it as reward. As the time is
increasing, the reward earned by the miner is decreasing while the total number of transactions per
block are increasing and the larger propor- tion of miners’ earning will be from the transaction fees.
Mining process accredit the network-wide consensus in decentralized environment and safeguards the
bitcoin network from attacks.
The traditional payment systems depend upon the trust model having centralized authority that
provides the clearinghouse services by verifying and clearing the trans- actions. On the other hand, the
bitcoin blockchain has no central authority, blocks in a blockchain are assembled separately in the
network and have an entire replica of the public ledger that can be a trusted authoritative log.
Decentralized consensus in bitcoin comes to the light through the interaction of four processes
occurring separately on the mining nodes in the network:
(a) Every transaction is verified independently based on an extensive criteria list. The verification is done
by the full node.
(b) The mining nodes aggregate the transactions independently into new blocks that is coupled with
demonstrated computation through the PoW algorithm.
(c) Every node independently verifies and assembles recent blocks into blockchain.
(d) The chain with massive cumulative calculations shown by PoW are selected by every node
independently.
Wallet software generates transactions by collecting Unspent Transaction Output, furnishing
relevant scripts for unlocking, and creating recent outputs being allocated to new owners. Transaction
is then forwarded to adjacent nodes for network-wide propagation. Every node verifies the
transaction and forwards the valid transaction to their adjacent nodes. The verification ensures that
only the valid transactions are propagated across the entire network and invalid transactions are
discarded at the first node that confronts them.
The crucial concepts which cannot be easily differentiated are anonymity and privacy. While anonymity
is hiding the owner’s identity, privacy means hiding of the back- ground [7]. In a real-life scenario,
the user’s privacy more desirable than anonymity because the protection of personal data is required
for its proper usage. For example, personal email account information may be known to many, but the
restricted content can only access by the account owner using a password. Hence, privacy is neces-
sary for almost all systems and applications [6, 8]. While anonymity is the property that the criminals
look for. It becomes impossible to hold criminals accountable for the crime they have committed [9].
There are application areas other than crim- inal activities where anonymity is required. The best-
suited example is the ballot system. Being untraceable and unidentifiable is the key objective for
anonymity [10]. True anonymity cannot be ensured as many applications that claim to be anony- mous
have flaws due to which identity information is leaked. Mixing services [11], commonly known as
mixing networks or mixnets are being employed to avert tracing acts of messages through a network.
Such mixing services may be unreliable and lead to overheads in terms of computation and
communication [12]. Anonymization employing onion routing [13] is extensively used to hide the
personal information by unveiling the problem of tracking the IP. TOR [14], the most outstanding
and prosperous anonymity network has flaws [7, 15].
Fundamentally, for achieving deanonymization and extracting the information, analysis of privacy
and anonymity is performed by the spending effort that would weaken the privacy of the users.
After analysis, outcomes are the potential aims to be achieved. Outcomes of analyzing privacy and
anonymity are as follows:
(a) Bitcoin Addresses Discovery: All the possible bitcoin addresses of an entity are discovered including the name of
the person or the company.
(b) Identity Discovery: All the potential distinguishing information, for instance, the name of the company or the person
is procured that starts with a bitcoin address.
(c) Mapping of IP Address with Bitcoin Address: Mapping of possible IP Addresses where the transaction was
generated is done with the Bitcoin addresses.
(d) Bitcoin Address Linking: New bitcoin addresses are suggested for use by the bitcoin users every time they get the
new payment [18]. Due to this reason, each user has multiple bitcoin addresses. In this outcome, address belonging
to the users are linked.
(e) Mapping of Geo-locations with Bitcoin Address: Using the bitcoin address geographical location of the user can
be obtained.
There may be a transition among the outcomes discussed above. For example, the bitcoin address
that belongs to user can be discovered which can be linked to the other bitcoin addresses of the user. In
the similar manner, mapping of bitcoin address can be done so that it is easier to obtain the identity or
the geographical position of the user who possesses that address.
There are various ways to serve this purpose. Research shows that there some studies that use the
ways while there may be numerous studies that just mention the methods but do not use them. The
following are the studies that have either mentioned or applied the methods respectively:
(a) Transacting: The address of bitcoin can be learned by performing transaction with other users
to purchase goods, etc. For such transactions, the seller’s bitcoin address must be known to the
buyer. Therefore, if the seller wants to receive the payment, he/she must compulsorily provide
his/her bitcoin address with the buyer. Therefore, it is easier for an entity to learn the bitcoin
address of any entity or a person just by acting as a buyer assuming that such parties are in sales
business. Transacting methods means active participation in the network. Reid and Harrigan [19]
stated that transaction methods include active participation in the network and operating in money
laundry services. In [20] Meiklejohn et al. named the transacting method as re-identification
attack. In re-identification attacks, accounts are opened, and purchases are made from infamous
Bitcoin merchants and services providers such as Mt. Gox and Silk Road.
(b) Utilizing the Off-network Knowledge: All the Off-network data-sources which are publicly
available can be used discover bitcoin addresses belonging some user entities or conversely. The
websites used for donation that brings out the IP and key information were utilized by Reid and
Harrigan [19]. In this process, identification of entities related theft of 25,000 BTC was done by
employing off-network information. Ortega [21] collected around 4,000 bitcoin address from a
well-known wired forum where the bitcoin addresses and the real- world locations can be
Bitcoin is considered as a monetary asset which is traded using various exchanges such as a stock
market. Various factors have been investigated by the researchers that are affecting the price of the
bitcoin and the criterion causing the fluctuations using diverse investigative and empirical
approaches. Research done by the authors [25–27] are the perfect examples to support the cause. With
the advances in artificial intelligence, several machine learning (ML) and deep learning (DL) based
models for bitcoin prices prediction are proposed [27–33]. Chen et al. [34] developed a model for
forecasting the bitcoin price. This latent source model was implemented by Shah et al. [35]. Shah’s
model earned a remarkable return of 89% in 50 days with Sharpe ratio which evaluate the
performance of stake along with adjusting for the risk. Testing period selected for the study
presented the improvement of 33% utilizing the buy and hold strategy. Several unsuccessful efforts
were done to recreate the same study independently. Geourgoula et al. [36] implemented sentiment
analysis using Support Vector Machine (SVM) and investigated determinants of the price of bitcoin.
Matta et al. [37] inspected association among the price of Bitcoin, views for bitcoin on Google Trends
and tweets, concluded that there is weak to moderate correlation among price of bitcoin and both
positive tweets on Twitter and Google Trends views and concluded that these factors can be utilized
as predictors. The study came with a limitation that the same used was only 60 days and the
sentiments were considered as variable. In another study, Matta et al. [38] carried out the similar
technique to predict the trading volume instead of predicting the price of bitcoin and concluded that
views on Google Trends’ were strongly correlated with the Bitcoin price. The sample collection
covered a duration of just less than one year and data source was used for implementation purpose.
Some researchers have applied wavelets to find similar results [39]. Kristoufek used the wavelet
coherence analysis on bitcoin price and conclude that there is a positive correlation between search
engine views, network hash rate, and mining difficulty with the bitcoin price. Greaves et al. [40]
examined the bitcoin for price prediction employing Artificial Neural Network (ANN) and SVM and
Department of Information Science & Engineering
1
5
claimed an accuracy of 55%. They found that limited forecasting in the blockchain data since the price
is governed by exchanges and the behavior is placed outside of the extent of the blockchain. Similarly,
Madan et al. [41] implemented ML techniques like random forest, SVM, and Binomial GLM on the
blockchain data and forecasted the bitcoin price with an accuracy of more than 97% with the
limitation that the results were not cross validated. Due to which, the data may be overfitted and it
cannot be guaranteed that model will generalize. The two prediction models have been presented by
McNally et al. [27] and compared the model built on long short-term memory (LSTM) and recurrent
neural network (RNN) with an autoregressive integrated moving average (ARIMA) model [41],
which is widely used time-series forecasting model. The model for classification was developed
which utilized bitcoin price information that predicts that the price of the bitcoin climbs up and down
based on the history of previous bitcoin price. The authors of [27] demonstrated that model based on
ARIMA does not stand against the models based on RNN and LSTM. Saad and Mohaisen [28] used
the price information and the information from bitcoin blockchain like mining difficulty, total count
of wallets, hash rate, unique addresses, etc., and utilized the highly correlated attributes for building
the forecasting models. They also considered and studied various models developed on random
forests, linear regression, neural networks, and gradient boosting. In addition to the blockchain
information, Jang and Lee [32] gave thought to the blockchain information and macroeconomic
attributes such as the exchange rates between major flat currencies, NASDAQ, S & P 500, Euro Stoxx
50, etc. Jang et al. [29], in their follow-up researches, put forward LSTM model with rolling window
and manifested that the LSTM based model overshadowed the forecasting models based on SVM,
linear regression, LSTM, and neural network. Likewise, Shintate and Pichl [33] showed that deep
learning-based random sampling model proposed by them has overshadowed LSTM-based models.
Network infrastructure has existed since many decades. So as the malicious users referred to as
malignant [42] exist in the network system. Such malignant users carry out mendacious transactions
in the network system which carries financial transac- tions. The main objective is to stop and such
malicious users from carrying out illicit acts [42] in the network so that the financial and transactional
activities run properly. It is crucial to disclose suspicious conduct in bitcoin network because of the
extremely fast-growing nature of the fraud. Attempts made by the client in participating in more than
two transactions over the same bitcoin or the same number of bitcoins leads to double-spending
attack. This is genuine due to propagation delay in broad- casting the pending payments across the
bitcoin network, which results nodes being given non-validated transactions at different times [43].
Many research solutions and studies have been presented in the recent times to overcome anomaly
detection. Such attempts present a broad range of techniques that includes ML methods as well. For
instance, Smith, et al. in [44] utilized clustering methods so that malicious acts are seized in the
network and classify licit users from malignant users.
In past, several studies have utilized ML techniques for addressing the secu- rity threats such as
[45, 46]. In their research, Pham et al. [42], investigated the Bitcoin network for detecting the such
users and the transactions which seems to be disreputable and used the methods of unsupervised
learning including Mahalanobis distance, Unsupervised SV Machine and k-means clustering on the
graphs generated by Bitcoin Network. Again, Pham et al. [45] by using machine learning based superior
method to detected anomalies in bitcoin system by analysing clients and their trans- actions which is
most dubious where a destructive behaviour is considered as a proxy for ambiguous activities. Monamo
et al. [47] used kd-trees and trimmed k-means for detection of fraud over the bitcoin blockchain
network. Also, Monamo et al. [46] in another research explored the application of trimmed k-means
Department of Information Science & Engineering
1
6
in the identification of fraudulent activities in transactions performed using Bitcoin and claims to
detect more fraudulent transactions than the researches of same type and on same dataset. While
Zambre et al. [48] identified potential rogue users in the Bitcoin network on the basis real reported
robberies using k-means classification. Bartoletti et al.[49] proposed an automated exploration of
Ponzi schemes on bitcoin, a classic fraud masqueraded, based on supervised learning algorithms.
Zhdanova et al. [50] revealed fraud-chains by developing a strategy for detecting fraud chains in
Mobile Money Transfer using machine learning based micro structuring techniques. Harlev et al.
[51] presented the first-ever approach to reduce the anonymous behaviour of Bitcoin by using
Supervised ML for prediction of the type of undetected entities while Yin et al. [52] analysed the
Bitcoin ecosystem and presented the first-ever approxima- tion of the dimension of cybercriminal
entities by applying Supervised ML on 854 observations that are classified into 12 classes and out of
which 5 classes were found to be related to cybercriminal acts and around of 100,000 unclassified
observations. Hirshman et al. [53] applied Unsupervised ML algorithms for exploring anonymity in
bitcoin transaction by clustering the dataset. Liu et al. [54] presented an approach based on ML to
capture the double-spending attacks in transaction performed using bitcoin consisting of different
immune-based blockchain nodes that deals with identi- fication component. Bogner et al. [55] adopted
machine learning for graphical threat detection and presented the human operators with a perceptive
way to develop an understanding of blockchain through gathering the features of the system into group
of attributes that are depicted graphically. Remy et al. [56] tracked the acts of clients in bitcoin
ecosystem using the community identification on low intensity network signals employing machine
learning network analysis techniques. Kurtulmus et al. [57] proposed a by-product protocol that
employs the globally dispersed behaviour of smart contracts along with ML based artificially
intelligent problem solving to find the crowd-sourcing funds for research and to effectively present
new marketplace without the requirement of mediator. Shaukat et al. [58] presented a ML based solu-
tion for an exhaustive investigation of ransomware dataset for providing a layered defence
mechanism against the cryptographic ransomwares in Bitcoin & other cryp- tocurrencies. Baqer et al.
[59] performed empirical analysis where a stress test based on clustering is deployed for detecting spam
transaction in the Bitcoin cryptocurrency network. Holub et al. [60] proposed an NLP and ML based
phishing ring DNS style identification scheme where the identification strategy relies on the
observations of freshly launched and/or registered domains. Ermilov et al. [61] introduced an off-
chain knowledge solution along with the knowledge for bitcoin address separation and categorisation
for detecting and filtering errors in users’ input data and therefore avoiding an unreliable Bitcoin usage
model. Dey et al. [62] provided and method- ology based on the intelligent software agents which
handles stakeholders’ activi- ties in Bitcoin ecosystem for detecting anomalous behaviours
employing the Super Machine Learning Algorithm along with algorithmic game theory. Portnoff et
al. [63] designed a machine learning based classifiers for differentiating between adver- tisement
posted by the same author and the several other authors along with a linking technique that utilizes
leakages from the Bitcoin systems and sex advertisement onto Bitcoin transactions and public wallets.
2.6 Conclusion
This chapter introduces Bitcoin and the cryptographic mechanism ECDSA used in it. It then described
the structure of Bitcoin Block which has block size, block header, transaction counter, and list of the
transaction as fields, followed by the structure of the Bitcoin transaction which includes the fields for
the version of the transaction, total inputs, and the outputs comprised in the transactions, transaction
outputs & inputs and locktime. Adding a new Bitcoin for the electronic fund supply an important task of
the mining process and hence the chapter also describes the mining process in which the mining nodes
participate and compete to work for the difficult-to-solve cryptographic hash algorithm based
Department of Information Science & Engineering
1
7
mathematical puzzle and earns the transaction fees for all the transactions they have validated as a
reward. The verification of a transaction is done against the criteria defined in a checklist which
includes, data structure and syntax of the transaction, list of input and output, limitation in size of the
transactions, etc. Machine learning and deep learning forms the important tools and techniques for
solving classification and prediction problems and can be used specifically for the forecasting the
price of Bitcoin. For the prediction, LSTM outperforms the other models like Deep Neural Network,
Deep Residual Network, SVM, etc. Anonymity and Privacy are the two faces of the same coin and
are very crucial for transacting over the Bitcoin network. Lastly, this chapter lists most common
security menace and their abnormal behaviors in bitcoin network with their solution employing ML
techniques.
In future, deanonymization of bitcoin may be taken a step forward to prevent illicit acts like robbery,
ransomwares, etc. Also, ML and DL techniques can be utilized for estimating the price of bitcoin and
classifying the possible threats on the bitcoin network.