Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
289 views13 pages

Cryptography & Hashing Explained

Encoding transforms data into a proper format for a system to consume and is reversible without a key. Encryption transforms data to keep it secret and is reversible with a password or key. Hashing maps data to a fixed size output in a one-way process. Cryptographic hashing combines encryption and hashing to provide digital signatures for verifying document approval. Common applications of hashing include hash tables for fast searching, verifying file integrity by comparing hashes, and protecting passwords by storing hashed values.

Uploaded by

Didc Clnovv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
289 views13 pages

Cryptography & Hashing Explained

Encoding transforms data into a proper format for a system to consume and is reversible without a key. Encryption transforms data to keep it secret and is reversible with a password or key. Hashing maps data to a fixed size output in a one-way process. Cryptographic hashing combines encryption and hashing to provide digital signatures for verifying document approval. Common applications of hashing include hash tables for fast searching, verifying file integrity by comparing hashes, and protecting passwords by storing hashed values.

Uploaded by

Didc Clnovv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 13

From Hashing to Bitcoin

Encoding, Encrpytion and Hashing


Encoding, encrpytion and hashing are easily confused. All three of them transform data into another format, encoding
and encryption are reversible (encoding vs decoding, while encryption vs decryption), while hashing is irreversible. The
purpose of encoding is to transform data into a proper format for a system to consume (or to access), such as ASCII
and base64, the scheme is publicly available, no key is needed to decode. The purpose of encryption is to transform
data in order to keep it secret from others, so that it can only be consumed by targeted recipients, who can reverse the
transformation with password (key). Hashing is a mapping (known as hashing function f) that map any input (usually
a string or a serialized data structure) into a fixed size string (or a fixed size integer, or a fixed length byte stream),
while fulfulling the following properties : (1) the same inputs give the same outputs, different inputs give different
outputs, (2) reversing the transformation is impossible (i.e. we cannot find input given output, or f-1 is unknown) and
(3) a minor change in input will result in a drastic change in output (known as the avalanche effect). There is no
obvious pattern in the mapping, hashing function is like a pseudo random redistribution of the inputs, but of course,
hashing function is nonstochastic (it is deterministic). Hashing an object means digesting the object with hashing
function to get a hash value, i.e. serialized object is the input of hashing function, while hash value is the output of
hashing function.

encoding
encryption
hashing

reversible
yes
yes
no

cryptographic hashing

no

key involved purposes


no
format conversion
yes
keeping secret
yes / no
(1) hash table, allow O(1) searching
(2) verifying file integrity
(3) protecting password
yes
digital signature

Finally, we have cryptographic hashing, which is a combination of encryption and hashing, it is usually used as digital
signature. Please do not confuse encryption with digital signature, a digital signature on a document cannot make the
document confidential, the digital signature can only be used as an endorsement of the document, i.e. declaring that
the document is approved by the signer. Given the credit of the signer, we can trust the document.

Encryption
Cryptography involves encryption (from plaintext to ciphertext) and decryption (from ciphertext to plaintext). A key is
needed, here the key means password, which is a different concept from the key in hash. Cryptography can be roughly
divided into symmetric and asymmetric, the latter is known as public key cryptography. For symmetric cryptography,
there is a private key, available only to the parties who share the secret, the same key is used for both encryption and
decryption. For asymmetric cryptography, there are two keys : public key and private key, you can do encryption with
either key and do decryption with another key (i.e. you cannot encrypt and decrypt with the same key). Consider a
manager having the private key, publishes the public key to all his colleagues. All documents sent by colleagues
should be encrypted with the public key, thus only the manager has the right to decrypt using his private key. In the
other way round, when documents sent by the manager are encrypted with his private key, everyone able to decrypt
using the public key, doing this way seems to be meaningless, but later we will see, this can be used together with
hashing to generate digital signature, this application is not regarded as cryptography, it is known as cryptographic
hashing, please read latter sections.

cryptography
cryptographic hashing

public key
for encryption
for signing document

private key
for decryption
for signature verification

Hashing
Hashing function maps key into integers (or buckets). The hash function should distribute the keys as uniformly as
possible to the buckets (i.e. output space), so that the output space is evenly used (in this context, key is the input to
hash function, it is a different concept from with the key in encryption). Suppose hash function h(k) has bucket size
M:

h( k ) {1,2,3,..., M }

prob ( k )

k |h ( k ) 1

prob ( k )

k |h ( k ) 2

prob ( k )

k |h ( k ) 3

i.e.

prob ( k )

k |h ( k ) m

1/ M

m [1, M ]

Some examples of hash functions :


(1)

h(k )

(2)

h(k )

=
=

k mod M
floor (( kc floor ( kc)) M )
floor ((kc mod 1) M )

known as Knuth multiplicative hash, c is irrational

The first example is applicable only when M is not a power of 2, otherwise if we allow M = 2n, then the hash function is
a filter that simply select the n lower bits as the hash output. Prime number which is closed to a power of 2 is a good
choice of M. The second example multiplies M with a fraction, which lies within [0,1], to create a floating point that lies
within [0,M]. For efficient implementation, we can pick M to be power of 2 and Knuth suggests that the optimal value
of c to be (5-1)/2 = 0.6180339887... (is there any proof ?). Lets take a look at the four different applications of
hashing.
Application 1 Hash table
A linear search in an unordered linear data structure, such as std::vector and std::list, has an efficiency of O(N), while
a binary search in an ordered data structure, such as std::set and std::map, has a better efficiency of O(logN), which
can be further improved to O(1) in another data structure, the hash table !! Hash table is an array of M buckets, with
index m[1,M]. When an object is inserted into hash table, it should be hashed (with hash function) to get a bucket
index (hash value), the object is then put into the bucket. Theoretically, when a good hash function is used, different
objects have different hash values, however there is no guarantee. When multiple objects share the same hash value,
collision occurs. To solve hash collusion, we can store a list of objects instead of one single object in each bucket, this
method is called separate chaining. However, if collision happens too frequently, searching efficiency will be reduced.
Examples of hash table include std::unordered_set and std::unordered_map.
Application 2 File integrity
Since different objects have different hash values (for an ideal hashing function), besides, a minor change in the object
will result in a drastic change in hash value (avalanche effect), hashing thus becomes a useful tool for verifying file
integrity. When a file is hashed, it has a unique hash value, when it is corrupted, its hash value changes. If you
download a file from a site, you can do integrity checking if the site publishes hash values together with download
link. Program md5sum is provided in linux to perform the well known MD5 hashing. For example, in linux :
$ cat file1
This is a very small file with a few characters.
$ cat file2
this is a very small file with a few characters.
$ md5sum file1 file2
75cdbfeb70a06d42210938da88c42991 file1
6fbe37f1eea0f802bd792ea885cd03e2 file2

Please note that the size of hash value is fixed (i.e.


number of buckets is constant), which is 2324 in
this example.

Application 3 Protecting password


Some people like to use the same password for multiple websites, it is a bad idea for websites to store their users
passwords inside web servers in raw format, instead they should hash the passwords before storing in web servers.
Passwords transferred in the network is in raw format, while passwords stored in servers are hashed, thus passwords
read from the network should be firstly hashed before comparing with the hashed passwords in servers. Assuming
that different servers use different hashing functions, we can avoid hackers from hacking users other accounts if their
hashed passwords are stolen from one of the web servers (but what happens if hackers steal passwords directly from
the network, rather than from the servers?).
Cryptographic hashing
Digital signature is a combination of private key cryptography and hashing (please note that you cannot accomplish it
with symmetric cryptography). The document to be signed is firstly hashed, so that it is transformed into an output
with fixed size, called the message digest (note : document can be very large, while message digest has fixed size). The
message digest is then encrypted using private key to generate a signature (i.e. with message digest as plaintext, and
signature as ciphertext), the signature is then append to the raw document to form a signed document. Please note :
(1) the holder of private key is the only who can make the signature and (2) the document is not encrypted, everyone
can read the document, they just dont know whether the document is reliable, unless document is endorsed by some
authorities. Here is the signing algorithm :

hashing

encrypt with
private key

signature

append to the
raw document

signature

raw document

message digested

signed document

How can we verify whether a document is signed from the signed document and public key only? Firstly, the signed
document is partitioned into a signature and a raw document, the signature is then decrypted with public key, which
is then compared with the message digest generated by hashing the raw document. If they are equivalent, then we can
claim that the document is signed. Here is the signature verification algorithm :

hashing

signature

signature

If they are equivalent, then


the document is signed.

decrypt with
public key

Practical public key cryptography and hashing include :


algorithm
hashing
hashing
public key cryptography
public key cryptography

MD5
SHA256
ECDSA
RSA

application
linux command md5sum
bitcoin blockchains construction (also called bitcoin mining)
bitcoin transactions digital signature
HKEXs Orion open gateway (use openssl library)

Elliptic curve digital signature algorithm (ECDSA)


Now lets take a look at how elliptic curve digital signature algorithm generate a pair of public key and private key. It
involves two mathematical concepts (1) elliptic curve and (2) finite field arithmetic, the latter requires number theory,
please note that this section is just a simple introduction to ECDSA while skipping complicated number theory, thus
the mathematical treatment in this section is not vigorous enough. First of all, an elliptic curve is defined as :
y2

x 3 ax b

For bitcoin, we have a=0 and b=7, which look likes this :

It has several useful properties : (1) it is symmetric about x axis (the proof is easy), (2) any non vertical straight line
y=mx+c intersecting the elliptic curve at two non tangent points, will always intersect a third point on the curve and
(3) any non vertical straight line y=mx+c tangent to the elliptic curve at one point, will intersect precisely one other
point on the curve (how can we prove property 2 and 3). Lets consider the following system of equations :
y2

y =

mx c

( mx c) 2 =

0 =

x 3 ax b
x

which is the elliptic curve


which is the non vertical line

ax b

x 3 m 2 x 2 ( a 2mc) x (b c 2 )

(equation 1)

Therefore property 2 and 3 can be combined as this statement : cubic equation 1 either has

0 real root, i.e. 3 imaginary roots or


1 real root, i.e. 2 imaginary roots or
3 real roots, among which, two of them may be the same.
With property 2 and 3, we can define point addition (LHS figure) and point doubling (RHS figure). Point addition P+Q =
R of two points P and Q lying on the elliptic curve is defined as the reflection through x-axis of the third intersecting
point R between the curve and the straight line joining P and Q, while point doubling P+P = R of a point P lying on the
elliptic curve is defined as the reflection through x-axis of the intersecting point R between the curve and the tangent
at P.

With point addition and point doubling, we can define point multiplication as :

R =
=

NP
( N 1) P P
( N 2) P P P

P PP ...

or

R =

2
...2 P ... 2
2 ...

2P 2

2

2 ...

2P
x1

x2

xK

where

N kK1 2 xk 2 x1 2 x2 ... 2 xK , suppose x1 x 2 ... x K


=

2
2 ...

2( P 2
2 ...
2 P ... 2
2 ...
2 P )

2
2 ...
2( P 2
2 ...
2( P ... 2
2 ...
2 P )) this procedure can be repeated

x2 x1

x1

xK x1

x2 x1

x1

xM x1 x2

(equation 2)
Lets find the intersection R=(rx,-ry) and its reflection R=(rx,ry) given point P=(px,py) and Q=(qx,qy) for point addition.

where

m ( rx p x ) p y

r y

(q y p y ) /( q x p x )

as R must lie on the line PQ

and suppose the line joining PQ is y=mx+c, its intersection with elliptic curve can be obtained solving equation 1.
x 3 m 2 x 2 ( a 2mc) x (b c 2 )

( x p x )( x q x )( x rx )

x 3 m 2 x 2 ( a 2mc) x (b c 2 )

x 3 ( p x q x rx ) x 2 ( p x q x q x rx rx p x ) x p x q x rx

m2

p x q x rx

by comparing the quadratic term

Thus we have the reflection of intersection :


ry =

m( p x r x ) p y

rx =

m2 ( px qx )

where m ( q y p y ) /( q x p x )

(equation

3a)
Lets find the intersection R=(rx,-ry) and its reflection R=(rx,ry) given point P=(px,py) for point doubling. All the above are
still valid, only except for the value of m, we need to find by taking derivative of the elliptic curve.

d ( x 3 ax b)

dy 2

2 ydy =

3 x 2 dx adx

dy / dx

(3 x 2 a ) /( 2 y )

Thus we have the intersection :


ry =

m( p x r x ) p y

rx =

m2 2 px

2
where m (3 p x a ) /(2 p y )

(equation

3b)
Now, lets introduce the finite field. In the context of ECDSA, finite field can be regarded as a predefined set of positive
integers within which every calculation must fall (here calculation includes addition, subtraction, multiplication and
division). However, elliptic curve is a continuous curve in 2, how can we transform a floating point coordinate pair
into an integer pair that lies within a range (0 x< M and 0 y< M)? It involves rational number and mod operation (any
number lying outside range can be wrapped around by mod operation).
More about field a set of numbers form a field if addition, subtraction, multiplication and division among numbers in
the set return a number in the same set. For example, rational numbers form a field, because all operations of rational
numbers result in rational numbers (you can prove it easily by declaring rational number r = p/q, where p and q are
integers), similarly real numbers form a field, complex numbers form a field, yet integers do not form a field, as integer
divided by integer may result in floating point. Please also note the coordinate of P, Q and R in the elliptic curve, if P
and Q have rational coordinates, the coordinates of R is rational too (I am not sure whether this is related to the finite
field in ECDSA). Now, ignoring the details of finite field, this is the elliptic curve in finite field with modulo 67 :

R
Q
R
P

Please note the following. (1) As elliptic curve is symmetric in continuous field, it must be symmetric in finite field, but
the axis of symmetry shifts to y = 67/2, since reflection y mod 67 = (67-y) mod 67, for example, reflection of 34 is -34
mod 67, which is 33. (2) When we plot infinite long stline in the finite field, it will wrap around when it reaches either
x=67 or y=67, please see how the line PQ wrap around in RHS figure. (3) The points on LHS figures form a finite field,
as operations of the points (i.e. point addition, point doubling and point multiplication) return a point that belongs to
the same set. (4) Point lying on the elliptic curve can be solely determined by x coordinate, as its y coordinate (and its
reflections y coordinate) can be found by : y = (x3+ax+b). Hence intersection R can be easily found : extend stline PQ
(wrap around if necessary) until it reaches x = m2-(px+qx) according to equation 3a, which is x=47 in the above example.
The ECDSA protocol is uniquely defined by the following set of parameters :

elliptic curve parameters a and b


prime modulo M
base point P
order N

Public key cryptography then involves point multiplication Pn, where P lies on elliptic curve with parameters a and b
while n[1,N]. For bitcoin, all the parameters are very enormous numbers which make brute force reverse engineering
impossible. Bitcoin uses elliptic curve y2 = x3 + 7, while
prime modulo

base point

order

=
=

2256 232 29 28 27 26 24 - 1

FFFFFFFF
FFFFFFFF
= 04 79BE667E
029BFCDB
483ADA77
FD17B448
=
FFFFFFFF

FFFFFFFF
FFFFFFFF
F9DCBBAC
2DCE28D9
26A3C465
A6855419
FFFFFFFF

FFFFFFFF
FFFFFFFE
55A06295
59F2815B
5DA4FBFC
9C47D08F
FFFFFFFF

FFFFFFFF
FFFFFC2F
CE870B07
16F81798
0E1108A8
FB10D4B8
FFFFFFFE

Note : Base point should be a coordinate pair, we


somehow combine the x and y coordinate, then
convert it to a byte stream.

BAAEDCE6 AF48A03B BFD25E8C D0364141

Now lets see how we can generate a private public key pair. Private key is just a random number chosen in between 1
and N, then public key is derived from point multiplication : public_key = private_key base_point, which can be
implemented by equation 2 for better efficiency. Thus given private key, we can generate public key, but not the other
way round, this is a one way trip. Since a point on elliptic curve finite field can be determined solely by x coordinate,
the public key can be compressed by storing the x coordinate only (of course you also need to record which side it lies :
original side vs reflection). Now, I am going to skip the signing procedure and signature verification procedure.
For more details about finite field arithmetic, please refer to the book Cryptography and Security in Computing by
InTech, particularly chapter 6. Among those public key cryptography, RSA is a good algorithm to start with (it involves
the following concepts only : prime number, greatest common divisor, congruence and Eulers phi function), for more
details about RSA, please refer to the website Number Theory and the RSA Public Key Cryptosystem. For more details
about ECDSA, please refer to the website Maths behind bitboin by Eric Rykwalder.

Payment
Payment is a separate process from trading, because payment can be very slow, it involves a lot procedures to ensure a
safe transfer of money (risk management), while trading can be very fast, like those in high frequency trading (hence
there exists a settlement step which handles payment separately). VISA handles 2000 transactions per second (tps) on
average, with peak capacity at 56000 tps, while Paypal handles 115 tps on average. Nowadays bitcoin handles 7tps on
average. Therefore scalability is an issue for bitcoin.
Money serves three purposes : (1) payment, (2) storage of values and (3) accounting (like calculating GDP). Payment is
means money transfer, while money supply M1 includes currency, deposit and credit. Traditionally, payment is done in
a centralized way, which means there exists a financial institution as an intermediator. Suppose A gives B a cheque
with a unique serial number, B requires a centralized financial institutions help to ensure two things before he can
accept the payment : (1) A does have the ownership of the cheque (i.e. A has the right to spend it) and (2) A has not
spent the money before (known as double spending). Both problems can be solved easily either by (1) going through an
intermediator or (2) using physical currency (central bank centralizes legal tender printing). How about a decentralized
world?

Introduction to bitcoin network


Owning a bitcoin does not mean owning an encryted bitcoin file, a transaction does not mean passing that file around,
instead owning a bitcoin means you have the right to spend it (or transfer it to someone) by broadcasting a transaction
message in the bitcoin network, which will create a transaction record in a distributed ledger, known as blockchain,
after bitcoin networks verification (or more precisely, reaching consensus by the nodes in bitcoin network). The
blockchain is a publicly available ledger (like an accounting book), it is a record of all transactions in the entire bitcoin
history. Blockchain records bitcoin transactions only, it does not record bitcoin balance for each account (it is users
responsibility to work on his own balance), and of course, we can work out the balance of all accounts given the entire
blockchain. In bitcoin protocol, blockchain is a chain (or a list) of blocks, to be more precise, it is a very linear tree of
blocks, while block is a group of transaction records. Thus blockchain, block and transaction record are the three
most fundalmental concepts in bitcoin protocol.
Each participant is a node in the bitcoin network. There are three types of nodes, (1) miners who help to manage the
ledger while earning new bitcoins and transaction fee in return, (2) monitor who provide monitoring service over the
bitcoin network, such as blockchain.info, which publishes a lot of realtime statistics such as turnover, total market
capitalization and block informations, and finally (3) users who use bitcoin for payment. Users need to run a software,
or mobile apps, known as the wallet, for generating transaction messages, making signature and checking whether a
transaction is confirmed by the bitcoin network. The bitcoin network is governed solely by bitcoin protocol, which is
simply a set of rules and message definition. There is no centralized bitcoin server, then how does bitcoin protocol
verify bitcoin ownership and address double spending problem? The short answer is, bitcoin protocol verifies bitcoin
ownership using ECDSA and addresses double spending problem by blockchain, which is constructed through voting
by miners with their computation power, involving in numerous SHA256 hashing. Here are some major miners.

by blockchain.info 1st Aug 2015

Transaction
Transaction is the core of bitcoin protocol. Suppose that you want to transfer an amount of bitcoins to someone, firstly
you should generate a transaction message (including your signature), and broadcast it to bitcoin network, then each
node in the network will verify this transaction (i.e. whether you have the right to spend the bitcoins). A transaction
message records (1) the single source (or multiple sources) from which you get the bitcoins, which is known as input,
(2) the amount of bitcoins and the destination, which is known as output, and (3) your own ECDSA signature together
with the corresponding public key (ECDSA signature means : appending input with output, which is then signed with
ECDSA private key). [Transaction message format described here is just for illustration only, it is different from the
exact protocol, for details, please refer to the bitcoin specifications]. Input source is specified by transaction id, output
destination is specified by the address of the receiver, but wait how can we generate transaction id and address?
Transaction id is generated by hashing (by default, we assume SHA256 is used) the transaction. Thus we expect that
miners are responsible for building a std::map<transaction_id, transaction> whenever they receive a new broadcasted
transaction message from the network. This is what miners do :
void new_Tx_received(std::map<Tx_id, Tx>& pending_Tx, const Tx& Tx)
{
Tx_id Tx_id = SHA256.hash(Tx);
// (1) Tx stands for transaction.
pending_Tx[Tx_id] = Tx;
// (2) Pending Tx are unconfirmed transactions.
}

Transaction message does not contain its own transaction id. With the above routine, all old transactions, no matter
whether they are confirmed or not, can be retrieved from the map using their id. Unlike transaction id, bitcoin address
is generated from the ECDSA public key by :
ECDSA.public_key = ECDSA.private_key * ECDSA.base_point; // Recall ECDSA
address = base58.encode(RIPEMD160.hash(SHA256.hash(ECDSA.public_key)));

We can generate address from public key, but not the other way round. Suppose miners receive a transaction message
from the network, there is a problem with the above routines : they simply accept all transactions without checking
whether the sender has the right to spend the bitcoins he claims he owns. A transaction means granting private key
holder of the address specified in the destination (i.e. output field in transaction message) the right to spend a certain
amount of bitcoins, therefore miners can verify bitcoins ownership through two steps : (1) verify the source of bitcoins
(i.e. input field in transaction message) and (2) verify if the sender has the required signature.
bool verify_ownership(const std::map<Tx_id, Tx>& pending_Tx, const Tx& Tx)
{
// step 1 : verify whether the source is valid
address = base58.encode(RIPEMD160.hash(SHA256.hash(Tx.public_key)));
Tx source_Tx = pending_Tx[Tx.input.source_Tx_id];
if (source_Tx.output.destination_address != address) return false;
// step 2 : verify whether the signature is valid
ECDSA::verify_signature(Tx.signature, Tx.public_key);
}

Furthermore, if users want to cache the bitcoins they receive from network, it can be done by comparing destination
address in transaction message with their own. Therefore users can derive their balance from the entire bitcoin
history.
void wallet::new_Tx_received(const Tx& Tx)
{
wallet.address = base58.encode(RIPEMD160.hash(SHA256.hash(wallet.public_key)));
if (Tx.output.destination_address == wallet.address)
{
Tx_id Tx_id = SHA256.hash(Tx);

wallet.incoming_Tx[Tx_id] = Tx;
wallet.balance += Tx.output.amount;
}
}
void wallet::new_Tx_transferred(Tx& Tx)
{
Tx.input.source_Tx_id = ... ;
Tx.output.destination_address = ... ;
Tx.output.amount = ... ;
wallet.balance -= Tx.output.amount;
}

// fill this please


// fill this please
// fill this please

In general, one transaction supports multiple inputs and multiple outputs, which means, we can group all bitcoins we
received from different sources, spend the sum by distributing to different destinations, so that the amount of bitcoins
in the input and ouput conserves, in other words, it allows merging and splitting of value. One of the outputs can be
your own address for collecting changes. Please note that, you need the specify the output channel in the source. For
example, suppose my own address is F452EA90 :
void wallet::new_Tx_transferred(Tx& Tx)
{
Tx.input[0].source_Tx_id = (D56A83B1,6);
Tx.input[1].source_Tx_id = (D56A83B2,2);
Tx.input[2].source_Tx_id = (D56A83B3,3);
Tx.output[0].destination_address = F452EA90;
Tx.output[1].destination_address = F452EA91;
Tx.output[2].destination_address = F452EA92;
Tx.output[0].amount = 5;
Tx.output[1].amount = 40;
Tx.output[2].amount = 15;
wallet.balance -=60;
}

//
//
//
//

transaction D56A83B1, output 6


transaction D56A83B2, output 2
transaction D56A83B3, output 3
suppose wallet.address = F452EA90

// change = (10+20+30)(40+15)transact_fee

// suppose hash of this tranaction is D56A83BB

All transactions form a directed cyclic graph G={V,E}, where vertex vnV denotes an account (with unique address,
private-public key pair) and directed edge en,mE denotes a transaction from vertex vn to vertex vm. Here is the directed
acyclic graph for the above multi-inputs multi-outputs example, please note that time propagates along directed edges.
`
account-A

Tx-D56A83B1
output[6].address = F452EA90
output[6].amount = 10

account-B

Tx-D56A83B2
output[2].address = F452EA90
output[2].amount = 20

account-C

Tx-D56A83B3
output[3].address = F452EA90
output[3].amount = 30

my account that
can make signature
for addr F452EA90

Tx-D56A83BB
input[0].source_Tx_id = (D56A83B1,6)
input[1].source_Tx_id = (D56A83B2,2)
input[0].source_Tx_id = (D56A83B3,3)
output[0].address = F452EA90
output[1].address = F452EA91
output[2].address = F452EA92
output[0].amount = 5
output[1].amount = 40
output[2].amount = 15

account that can


make signature
for addr F452EA91
account that can
make signature
for addr F452EA92

Please note the following. (1) Transactions D56A83B1, B2 and B3 all have multiple outputs, though they are not
plotted in the above graph, we can imagine that it is a very complicated graph. (2) Transaction id is not included in the
transaction message, it is generated through hashing by miners and users. (3) Address of sender is not included in the
transcation message, it is redundant, as it can be traced out like step 1 in routine verify_ownership, lets recall :
address get_sender_address(const Tx& Tx)
{
return pending_Tx[Tx.input.source_Tx_id.first].
output[Tx.input.source_Tx_id.second].destination_address;
}

When a new bitcoin is generated as a reward for a miner, it is also represented as a transaction, which has no input
source. The new bitcoin is called a coinbase.
void new_bitcoin_for_rewarding_miner(Tx& Tx)
{
Tx.input[0].source_Tx_id = COINBASE;
Tx.output[0].destination_address = miners address;
Tx.output[0].amount = rewarding_amount;
}

Lets summarise what we have got at this moment. At the core of bitcoin is a distributed ledger of all transactions, from
which the current balance of each account can be derived. A transaction is simply a message that instructs ledger to
debit sender address and credit receiver address, the transaction must be signed with senders private key. With
routine verify_ownership, no one can spend bitcoins that are not owned by themselves, only private key holder of
destination address specified in sources transaction has the right to spend. However, it is still possible for the user
to broadcast false transactions by double spending, i.e. a user really owns some bitcoins, but he spends it twice, in
other words, he creates money. Thus bitcoin should have some mechanisms to prevent double spending, otherwise it
will result in hyperinflation, and destroy the currency eventually.

Blockchain
Lets firstly introduce block and blockchain, then we will see how double spending can be executed and how it can be
prevented by blockchain. A block is a collection of transactions, there is no retrictions on the number of transactions
per block (please check bitcoins specification). When a miner keeps receiving broadcast transaction messages, he can
start building blocks in parallel. You can imagine that a miner running a process with multithreads, one thread keeps
receiving transactions and appends them into a map of pending transactions, while another thread builds block from
the map, the map is thus the common resource shared between these two threads (single producer single consumer
model). Later we will see that this is in fact a process with at least three threads. Before introducing the block content,
lets see what is a Merkle tree, which is also known as a hash tree.
Merkle root
L=hash(L0+L1)

where L=label, hash()=hash_function

L0=hash(L00+L01)

L1=hash(L10+L11)

L00=hash(data0)

L01=hash(data1)

L10=hash(data2)

L11=hash(data3)

data0

data1

data2

data3

A Merkle tree is a tree in which every non-leaf node is labelled with the hash value of the concatenated labels of all its
children nodes, while every leaf node is labelled with the hash value of a data. SHA256 is used as the hash function in
bitcoin. A block is consisted of a block header and a block body :
block header
block body

=
=

Merkle root + hash value of previous block (parent block) + nonce


Merkle tree

All blocks concatenate to form a linked list (or a tree to be precise, but a rather linear one), known as the blockchain.
Each block points to its previous block (or parent block) with the hash value of previous block. Thus if a miner wants
to search an old block efficiently, it should build a std::map<hash_block, block>. Nonce is just a random number. A
block is considered to be valid if hash value of the block header is within a certain threshold, i.e. hash(block.header) <
threshold, or equivalently, the hash value in binary or heximal format, starts with a certain number of zeros, such as :
000000000000002e9067f1cf7252333f7aeb619c89d220985a70ac0e015248e0

To construct a valid block given a map of pending transactions, miners should build the Merkle tree and search for a
nonce value, that makes a valid hash value. This process is done by brute force, it takes time, and thus it is known as
mining (or proof of work). Difficulty of mining depends on the threshold, which is adjusted by bitcoin protocol from
time to time so that it keeps a nearly constant growth rate of blockchain roughly at 1 new block per 10 minutes. When
a miner completes a block, he should then (1) broadcast the block to the network and (2) removes all transactions that
constitute the block from the map of pending transactions that he maintains (of course, he cant modify other miners
map of pending transactions). All miners should compete to find the next valid block, the winner is rewarded with (1)
new bitcoins called coinbases and (2) transaction fee for all transactions in the completed valid block.
When a miner receives a broadcast message of the next block while he is working on that block (i.e. someone is faster
than him in finding the nonce value and earns the coinbases), he should firstly verify if the received block is valid by
checking all hash values in block header and block body (this is fast as the most time consuming calculation is brute
force search for nonce value, which is now found), if it is valid, he can insert the received block into his blockchain,
with insersion location specified by the block in the field hash value of previous block. Therefore, insersion does not
necessarily happen at the end of blockchain, instead it may happen in the middle, which results in branches. Thus
the term blockchain is a little bit confusing, because it is in fact a tree. After that, he can either : (1) keeps on working
his block until it is finished, and broadcasts it, in this case, he is introducing branches in the blockchain (as there are
multiple broadcasted blocks sharing the same parent block) or (2) abandones the working block, starts working after

the received block (i.e. works on a new block using the received block as the parent block), but before that, he should
update his map of pending transactions by removing all transactions included in the received block. Miner can choose
between these two options based on his logics (or even in a random fashion), implementation is really up to the miner,
as long as he can maximise his profit.
There are still a lot of unanswered questions. (1) Do miners maintain the same map of pending transactions? (2) Do
miners maintain the same blockchain? Is there any official version blockchain (or ground truth)? (3) As blockchain is a
tree, there are multiple leaf-nodes or leaf-blocks, so when miners build a new block, to which previous block should it
point to? (4) Is there any limit on the number of pending transactions? Can a miner build a block with no transaction?
(5) Are miners looking for the same nonce? (6) As there are multiple branches, how do we know the real transaction
history? Lets find address them one by one.
First of all, bitcoin network is lossy. Some broadcast transaction messages and some broadcast completed blocks may
be dropped, some miners may miss certain transaction messages or certain completed blocks. Bitcoin protocol should
tolerate the loss and recover the truth of whole transaction history as blockchain grows. Thus each miner may own a
different map of pending transactions and also a different version of blockchain. As there exists no centralized server,
no one knows the so called ground truth of blockchain. As shown in the following example, LHS and RHS are slightly
different versions of blockchain maintained by two miners, each square denotes a valid completed block received from
the network. Although there exists no officially recorded transaction history, miners do come to consensus about the
real historical path (known as the trunk, as indicated by black solid line). It is not 100% accurate, but its likelihood
increases as both blockchains grow. Besides, we are more confident about the front end of the trunk, while uncertain
about the back end of the trunk.
the trunk

miner 1

miner 2

Secondly, given a blockchain tree, a miner can build a new block using any existing block as the parent block. If the
miner choose to point to a leaf-block, then he is extending the trunk or the branch that the leaf-block lies, if the miner
choose to point to non-leaf-node, he is introducing new branches in the blockchain. Besides, there is no retriction on
the transactions that a miner puts in a new block, he can either put many transactions into the block, hoping to earn
more transaction fee, or starts block building without waiting for more pending transactions, hoping to complete brute
force search as soon as he can, this is up to his strategy. Statistics show that the average number of transactions per
block is around 200-300. Besides, each miner must include a transaction that transfer coinbase into his own address
in the Merkle tree, this serves as a reward for the miner, which forms the source of new bitcoins.
Thirdly, each miner are looking for a different nonce value, this is because of 3 reasons. Each miner builds the new
block with (1) a different subset of pending transactions, (2) a different parent block and (3) a coinbase transaction to
a different address. Due to avalanche effect of hashing, any minor change in the transactions will result in a drastic
change in Merkle tree and hence a complete different nonce value. Hence all miners are searching for a different valid
nonce value. Winning is thus completely random (probably uniformly distributed), chance of winning the next block is
proportional to a miners computational power. For example, a miner having 10% of computation power of the whole
bitcoin network will have 10% chance of winning the next block. Therefore the chance of winning consecutively by the
same miner is low, even if he is the most powerful one. This prevents hackers from manipulating the blockchain.
Finally, we can see how the whole thing works. There is no centralized blockchain. Miners do not communicate. Each
miner keeps his own version of blockchain, although they are different, they are overlapping. The longest overlapping
path is known as the trunk. The front end of the trunk is relatively stable, while the back end of the trunk is still fuzzy
as blockchains in all miners grow. We will see that when a block lies more than six-blocks deep inside the trunk, it
can be considered to be stable, all transactions in that block (or prior to that block) can be considered to be confirmed,
thus receivers of confirmed transactions can then spend their bitcoins (question : block is generated at the rate of 1
per 10 mintues, receivers of bitcoins need to wait for an hour so that their transactions can sink to 6-blocks deep in
the trunk before they can spend their bitcoins, is that right?). Please note that the trunk contains no leaf-block,
except near the back end.
Besides, given an entire blockchain with multiple leaf-blocks, when we transverse the tree starting from root block to
each leaf-block via a different path, we need to update the pending transactions independently (for different paths). In

10

other words, each leaf-block should own an individual map of pending transactions (while non-leaf-blocks do not).
However, as blockchain grows, number of leaf-blocks increases, miners need to manage increasing number of pending
transaction map, which is infeasible. Therefore miners should stop managing pending transaction map for confirmed
portion of the blockchain.
What pending transactions should a miner pick in his new block? How should he choose the parent block? He should
choose in a way so that his completed block has a higher probability to fall into the trunk (in case if he is the lucky
one who wins the next block, now you know, winning a block is purely a random event), so that he can earn both the
coinbases and transaction fee. A block can become a block in the trunk if it is followed by many latter blocks, the more
followers it has, the higher probability it is in the trunk. Therefore this is a voting, a voting by computational power. If
other miners trust your broadcasted block, they will vote by investing their computational power in building new
blocks behind yours (i.e. using your block as parent block). Therefore, what a miner choose to include in his new block
are those that make other miners vote him : to be a honest miner, pick true (verified) transactions into his new block,
and use the most trustable leaf-block as the parent block. This is how bitcoin protocol encourage miners to work the
trunk honestly in a collective way.
The miner should be implemented with at least 3 threads :

thread 1 receive broadcast message of transaction, update pending transaction map for leaf-blocks,

thread 2 receive broadcast message of blocks, verify and insert them into blockchain, and

thread 3 with some logics, pick a leaf-block and build a new block after it.
How can blockchain avoid missing transactions? Suppose Tx10 Tx18 are pending transactions, some are missing in
some blocks, different miners try to broadcast new blocks with pending transactions. This is how blockchain recovers
the missing part. We denote the trunk in red, and parent blocks by brackets.
blk_A
Tx:10,14

blk_B(A)
Tx:11,13

blk_D(B)
Tx:12,15

blk_F(E)
Tx:17,18

blk_C(A)
Tx:15

blk_E(C)
Tx:16,13

blk_G(E)
Tx:17

blk_I(G)
Tx:18

blk_H(E)
Tx:11,12

blk_J(H)
Tx:17,18

If you are a miner building a new block, which


block would you like to follow : block I or J?
Block J of course, this is how the missing part
is
recovered!
Besides,
the
order
of
transactions is the decision of the trunk,
rather the actual time when user broadcasts
the transaction.

How to prevent double spending?


Can a miner steal coinbases by copying an existing block in blockchain, and modifying only the coinbase transaction
output in order to transfer all coinbases to his own address, then broadcasting the block as if a newly found block by
reusing the nonce found by others? The answer is no, because once any content of the block changes, he needs to
rework nonce value by brute force again. Now we know that (1) a bitcoin miner cannot steal an existing block, (2) a
bitcoin user cannot steal a transaction. The remaining problem that bitcoin needs to address is doubling spending,
which means bitcoin owner broadcasts two transaction messages, sharing the same input source of bitcoin. There are
three possible cases.
Case 1, a miner (carelessly or deliberately) puts these two transactions into the same block and broadcasts the block,
this block will not pass the verification by other miners, hence they do not vote this invalid block by following another
branch. Case 2, two miners, each of them see either one transaction only, broadcast their new valid blocks (each
contains one of the duplicated transactions) to the network. Now suppose the two blocks share the same parent block,
thus creating branches in the blockchain, other miners will vote either one branch by following their favourable one.
The trunk will eventually transverse through either one of them only as the blockchain grows. As a result, double
spending is avoided, miners will pick one of them through collective decision, while the other transaction is considered
to be unconfirmed. Suppose Tx13 and Tx14 are double spending :
blk_A
Tx:10,12

blk_B(A)
Tx:11,13

blk_D(B)
Tx:15

blk_F(D)
Tx:17,18

Tx13 = A sends bitcoins to B.


Tx14 = A sends bitcoins to C.

blk_C(A)
Tx:14

blk_E(C)
Tx:16,13

blk_G(D)
Tx:17

blk_I(G)
Tx:16,18

blk_H(E)
Tx:11,15

blk_J(H)
Tx:17,18

In case 2, if A double spends, the network will


pick either one of them only, avoiding double
spending. The chance for bitcoins going to the
hands of B or C is 50-50, as a result, A
cannot control how he spends.

In the example above, a honest miner will not generate block E, as he should have detected double spending (once in
block C and once in block E), similarly, no honest miner will follow block E as it is invalid. Thus here comes case 3,
the only way a fraudulent user can double spend is to build the blocks C, E, H and J all by himself, he needs to mine
all the nonce values and broadcast the whole fake path. However, winning a block is a random event, the chance of

11

winning successive blocks by the same hacker with limited computational power is very low. By the time the hacker
solves his first block, the network would probably completed next few blocks, and he can never catch-up.
This is a race between honest chain (BDGI) and attacker chain (CEHJ). The block on which branching starts is treated
as the reference point (block B or block C), and let the current progress of the honest miner and the attacker be x and
y respectively, then the difference in progress m = x-y can be modelled as a Bernoulli random walk.
honest chain
honest miner progress (x)
attacker chain
attacker progress (y)

B
0
C
0

D
1
E
1

G
2
H
2

I
3
J
3

This is analogous to Gamblers ruin problem. Let the probability that honest miner wins the next block be p (m is then
incremented by 1), while the probability that attacker wins the next block be q = 1-p (m is then decremented by 1). The
honest miner is now m blocks faster than the attacker, probability that the attacker will catch up from behind is given
by equation 2 in Gamblers ruin.doc as :

prob (unsafe | m x y ) =

(q / p) x y

if

pq

and

x y

if

pq

and

x y

Suppose now user B has received Tx13, our objective is to find x such that Tx13 can be confirmed and user B is safe
to spend the bitcoins. This is accomplished by solving for x such that prob(unsafe|x) is smaller than a predefined
threshold. Given no extra information, both x and y follow Poisson distribution.

Poisson( E[ x ])

y ~

Poisson( E[ y ])

Since the expected progress is directly proportional to the successful probability, we have :

E[ y ] / E[ x ]

q/ p

prob ( y | x )

y e / y!

y 0 prob (unsafe | x, y ) prob ( y | x )

prob (unsafe | x ) =

where E[ x | x ]( q / p ) x ( q / p )
by

law

of

total probability
=

y 0

xy10 prob (unsafe | m x y ) prob ( y | x) y x prob (unsafe | m x y ) prob ( y | x)

xy10 (q / p) x y ( y e / y! ) y x ( y e / y! )

xy10 (q / p) x y ( y e

prob (unsafe | m x y ) prob ( y | x )

always assume

p >q

/ y! ) 1 xy10 ( y e / y! )

avoid

summation to infinity
=

1 xy10 (( q / p) x y 1)( y e / y! )

Lets recall the law of total probability.

prob (x )

prob ( x | A) prob ( A)

prob ( x | y )

prob ( x | y , A) prob ( A | y )

Here is an implementation in C++.


double unsafe_probability(double p, unsigned short x)
{
double q = 1-p;
double lambda = x*(q/p);
double sum = 1;
for(unsigned short y=0; y<x; ++y)
{
double poisson = exp(-lambda);

12

for(unsigned short k=1; k<=y; ++k) poisson *= (lambda/k);


sum += (pow(q/p, x-y)-1) * poisson;
}
return sum;
}

Running the function with q=0.1, we can see that unsafe probability drops off exponentially.
x
0
1
2
3
4
5
6
7
8
9
10

prob(unsafe|x)
1.0000000
0.2045873
0.0509779
0.0131722
0.0034552
0.0009137
0.0002428 (chance of successful attack < 0.01% for x=6)
0.0000647
0.0000173
0.0000046
0.0000012

Conclusion
We have known for decades, there are scientific proofs that it is impossible to coordinate the exact information among
multiple distant nodes in a network without a central authority (this is not limited to the context of currency). In
2008, Satoshi Nakamoto, published a paper with a practical solution to this impossible problem. All new transactions
will be kept inside a block, which is periodically sealed, and insered into a blockchain. Every nodes in the network has
its own version of blockchain. The trunk can be found when nodes reach consensus, This is a voting with
computational power. This is why the blockchain is the most important invention in bitcoin.

Reference
Bitcoin : A Peer-to-Peer Electronic Cash System, Satoshi Nakamoto, 2008.
Bitcoin Mining Explained Like Youre Five
Bitcoin transaction fees explained

13

You might also like