PDF - Js Viewer
PDF - Js Viewer
Master of Science
of Rhodes University
After its birth in cryptocurrencies, distributed ledger (blockchain) technology rapidly grew
in popularity in other technology domains. Alternative applications of this technology
range from digitizing the bank guarantees process for commercial property leases (Anz
and IBM, 2017) to tracking the provenance of high-value physical goods (Everledger Ltd.,
2017). As a whole, distributed ledger technology has acted as a catalyst to the rise of
many innovative alternative solutions to existing problems, mostly associated with trust
and integrity.
In this research, a niche application of this technology is proposed for use in digital foren-
sics by providing a mechanism for the transparent and irrefutable verication of digital
evidence, ensuring its integrity as established blockchains serve as an ideal mechanism to
store and validate arbitrary data against.
To my wife, Liezel, thank you for your loving assistance, understanding and relentless
support. Thank you for the motivation you provided when it was most necessary. I could
not have completed this without you.
To my mother and father, thank you for showing me the value of hard work and for
instilling in me an insatiable thirst for more knowledge.
To my sister and brother, thank you for setting a great example, for providing the moti-
vation to always strive to do better and to never stop exploring or asking questions.
To Jock, the trailblazer, thank you for everything you have done for me in the past few
years both professionally and personally.
To my supervisor, Yusuf, thank you for all the guidance you provided throughout this
process.
To the open source software community, thank you for enabling all of this and so much
more.
Finally, to all my friends and family, thank you for understanding and tolerating far too
many absences. I’m back now.
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Literature study 6
2.1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Blockchains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.6 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3 Research design 51
3.4.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4.2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4 Implementation 98
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
B Other 153
B.1 Email correspondence with Peter Todd re: OTS timestamp structure. . . . 153
List of Figures
3.6 blockchain.info lookup of the Bitcoin block number recorded in the timestamp 82
i
LIST OF FIGURES ii
5.1 CSV data loaded into Microsoft Excel for analysis . . . . . . . . . . . . . . 113
5.5 Time to complete a timestamp relative to the date and time the timestamp
was created. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.6 Accuracy of a timestamp relative to the date and time the timestamp was
created. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.1 Description of data elds extracted from the data set . . . . . . . . . . . . 113
5.2 Description of data elds calculated from the data in Table 5.1 . . . . . . . 115
iii
Chapter 1
Introduction
1.1 Motivation
The potential applications of blockchain technology are vast and continue to diversify
every day with the emergence of smart contract platforms such as Ethereum (Ethereum
Foundation, 2016), payment solutions such as Ripple1 and digital currencies such as Zcash
(Zerocoin Electric Coin Company, 2016). However, despite its wide adoption, blockchain
technology remains relatively unexplored even though the technology demonstrates versa-
tility in areas that extend beyond payments and currency. It achieves this by solving a few
1
https://ripple.com/
1
1.2. RESEARCH OBJECTIVES 2
It is for this reason that there is an endeavour to innovatively apply blockchain technology
to solve issues of trust and integrity in the realm of digital forensics and augment the
often-challenged eld of digital forensics. By providing a mechanism, process and toolset
to formalise the validation of the integrity of digital evidence, a mutual condence can
foster between digital forensics and legal processes.
By leveraging the properties of blockchain technology, a process was researched and then
implemented to create an irrefutable, immutable and inherently veriable proof of the
existence of digital evidence. The proposed process is accompanied by a standardised
process for performing proof creation and validation in a popular digital forensics software
suite.
The goal of this research is to transparently and conveniently integrate distributed ledger
timestamping technology into a digital forensic workow. To achieve this, the following
high level actions were performed:
Given the pace at which blockchain technologies are being innovated, there is a need to
research and clarify how these diverse technologies can be used in the solution postulated.
Building such a solution on an ephemeral base technology would mitigate any long-term
benets it could potentially contribute. As an understanding of relevant technologies and
their respective benets and limitations is essential to the ongoing success of this work,
a thorough compare and contrast exercise will be conducted to accurately inform the
success of the above mentioned goals.
Candidate technologies, starting with previous work by Weilbach (2014) and further in-
formed by newer work like Crespo and Garcı́a (2017) and Opentimestamps.org (2017),
1.3. SCOPE AND LIMITS 3
will be utilised to elicit and develop of a set of requirements and possible limitations.
Based on these requirements and limitations produced through the technical review, a
chosen candidate technology for interacting with the Bitcoin blockchain will be further
researched and evaluated.
This will be followed by a documented PoC implementation of the proposed system, which
will be discussed to facilitate a better understanding of its purpose; this PoC will then also
serve to inform ongoing work in the eld. Focus will be placed on enhanced modularity of
the solution as its adoption and subsequent success would be heavily dependent on easy
integration into existing digital forensics tools and processes. The implementation of the
PoC will serve as a reference implementation to encourage further development of such
technologies and their use in the digital forensics domain.
Finally, the reference implementation and the underlying candidate technology will be
subjected to rigorous testing to measure its eectiveness and accuracy in aiding digital
evidence integrity validation.
This work touches on a range of technologies and algorithms associated with applied
cryptography and, thus, relies on many cryptographic primitives. The performance and
eciency of these has been tested and evaluated academically (Preneel, 1994) and through
repeated real-world application (Käsper, 2012). It is unnecessary to revalidate these well-
established results.
• Eects of changing block size and structure as a result of Bitcoin scaling proposals.
1.4. DOCUMENT STRUCTURE 4
What follows is an outline of the paper structure covering the main themes:
In Chapter 2, a thorough literature study is performed to discuss and analyse the two
concepts at the core of this research eort: digital forensics and blockchain technology.
As part of this literature study, the basic principles of both concepts are covered to ensure
that the reader has the necessary background knowledge. In terms of digital forensics, this
paper explores the current tool development practices and challenges faced, specically
relating to evidence integrity and the practices of verifying evidence integrity. In the case
of blockchain technology, a discussion on current notable implementations, the limitations
of the technology and the future of the technology is presented. Once digital forensics
and blockchain technology have been individually covered, a section on past and current
research eorts that have linked the two concepts is presented followed by an evaluation
of these works for relevance to the stated research goals.
As a result of the evolving, complex nature of the blockchain ecosystem, a detailed com-
pare and contrast exercise between the most prevalent implementations of blockchain
timestamping technologies is performed to determine which will be best suited to address
the goals of this research. The review will focus on identifying benecial aspects as well
as limiting factors of all current implementations. This section concludes with an in-
formed decision on which system to use as the basis for further development of the PoC
implementation.
The design section in Chapter 3 consolidates the knowledge gathered from the literature
review to clarify the goals of the research and to propose a research design in support of
those goals. The design includes a detailed discussion of the candidate technologies and
outlines how these will be integrated in support of the research goals. A nal research
design, based on a dened research question, includes a review of the timestamping proto-
col, the chosen method of implementation of the protocol, and a series of tests to measure
aspects of the design limitations and benets of the underlying technologies.
Chapter 4 chronicles the implementation process and presents the design artefacts orig-
inating from it. As part of this section, the progress of the implementation in relation
to use cases derived from the requirements is shown. Additionally, the choice of imple-
mentation language is also discussed in this section, along with challenges relating to the
implementation itself inform future development eorts in this eld.
1.4. DOCUMENT STRUCTURE 5
In Chapter 6 an evaluation of the overall success of the research eort relative to its
stated goals is performed, and the contribution of the work to the eld of digital forensics
is discussed along with how the use of blockchain technology is essential to the eort.
Finally, insights made into the problem domain and possible future work in this problem
domain are presented.
Chapter 2
Literature study
The most relevant domains are digital forensics and distributed ledger te chnologies, the
details of which, are essential to understanding the problem and possible solutions to the
problem.
The literature review starts with a section on the current state and challenges of digital
forensics. In the second section the current state of blockchain technology is reviewed
with an emphasis on understanding the emergence of various practical applications of
this versatile technology. The nal section discusses existing literature aligned with these
two previous domains. The literature review is concluded with a section discussing the
gaps in the current literature and how these gaps can be addressed.
Digital forensics is a science as young as modern personal computing and developed or-
ganically from a growing need investigate computer related crime, brought on by the
6
2.1. DIGITAL FORENSICS 7
emergence of cyber crime as networked computing and connectivity became more and
more popular in the 1990s (Berghel, 2003). At its advent, it had very limited applica-
tion and was not widely practiced; this changed at around the turn of the 21st Century
with the widespread adoption of networked computing. For the rst time ever, it was
common practice to share and distribute large volumes of information from person to
person, crossing geographical boundaries, using networked computers. The popularisa-
tion of networked computing outside of academic circles meant that families, individuals
or companies were creating, storing and sharing information of innite complexity and
variation, and each byte of information could potentially become part of an investigation,
be it criminal or exploratory. The increased adoption of networked computing also led to
a notable increase in digital forensics research, followed by current and sustained growth.
The nature of digital forensics necessitates constant growth and adaptation to accurately
deal with the constantly evolving subject matter and operational technology. To date,
digital forensics, as a practice and a science, is still playing catch up with the rapid
evolution of technology in computing.
Digital forensics, also referred to as computer forensics, deals with the acquisition, stor-
age, investigation and reporting of digital evidence in such a way as to ensure utmost
admissibility of the evidence by providing veriable assurances of its integrity. Tobergte
and Curtis (2013) formally dene computer forensics as: “...the discipline that combines
elements of law and computer science to collect and analyze data from computer systems,
networks, wireless communications, and storage devices in a way that is admissible as
evidence in a court of law”.
2.1.1 Purpose
The digital forensic process, as adapted from Valjarevic and Venter (2013), can, at a high
level, be described by three basic practices:
1. Acquisition
2. Analysis
3. Presentation
The act of acquiring evidence is the rst step in any digital forensic investigation and can
be a non-trivial task at the best of times as noted by Dykstra and Sherman (2012). The
2.1. DIGITAL FORENSICS 8
acquisition phase is also arguably the most critical in any investigation, as any error here
will naturally propagate to the following phases and potentially aect the integrity and
admissibility of the evidence as a whole, and as Wilson (2011) notes, any issue that ad-
versely aects the admissibility of digital evidence can cast doubt on entire investigations.
The analysis phase can be subdivided into activities such as identication, collection
and transportation. With digital evidence, as with physical evidence, the collection and
transportation activities pose the greatest threat to the chain of custody, and to the overall
integrity of the evidence. Particularly relevant with regard to digital evidence, though, is
the inherent need for the evidence to be moved or replicated from its (potentially volatile)
source to another system.
Dykstra and Sherman (2012) noted that completeness and accuracy are the two critical
measurable attributes of the acquisition phase; they continued by then explaining the
complex hierarchy of trust at play during a typical acquisition phase. They noted that
trust is required from the network level up to the operating system and application to
ensure evidence is free from accidental or intentional tampering. Many tools, techniques
and even frameworks have been developed solely for this purpose during the acquisition
phase, of which one, aimed at Infrastructure-as-a-Service (IaaS), is discussed in Dykstra
and Sherman (2013).
A common, and sometimes mandated practice during the acquisition phase is the act of
hashing evidence (Dykstra and Sherman, 2012). A cryptographic hash, also referred to
as a digest, is a unique, xed-length value, generated from any evidentiary artefact of
variable length, that can serve to identify that piece of evidence. A cryptographic hash
is the product of a one-way deterministic mathematical function through which data of
arbitrary length can be passed to produce a collision-resistant xed length representation
of that data (Witte, 2016). A key property of a hash function is that a minor change in
the input will result in a signicant change in the xed length output (Preneel, 1994).
Hashes are most commonly used to determine if the evidence has been tampered with
between the time the hash was generated and when the evidence is scrutinised.
A common use case for the hashing of evidence during initial acquisition would be when a
practitioner receives a hard drive disk (HDD) containing potential evidence. They would
generate a hash of the contents of the disk, duplicate the disk and then verify the integrity
of the copy by comparing the hash of the copy with the hash value of the original. If
the two hash values match, veriable proof exists that the content of the two HDDs are
exactly the same. In the above scenario, the responsibility for producing, comparing and
verifying the integrity of information is the sole responsibility of the investigator.
2.1. DIGITAL FORENSICS 9
The practice of hashing to verify the integrity of evidence is commonplace and may be
performed in any scenario where the integrity of the evidence might be questioned. It can
be performed on a single artefact, multiple artefacts or fragments of multiple artefacts;
the choice is up the investigator. However, the benet of producing hashes for as much
of the digital evidence as is practical is that it would enable the investigator to verify
the integrity of the evidence on a very granular level. Hash values and their use will be
discussed in much more detail in the coming sections of this work.
As much as the acquisition phase is critical to ensure the successful start of an investiga-
tion, the analysis phase is critical to developing a clear picture of events, armed with
evidentiary artefacts and contextual links. ‘Analysis’ is a broad term that encapsulates
an increasingly large number of specialised practices including, but not limited to, data
retrieval, log correlation and exploration.
Kessler (2006) notes that analysis of digital evidence is resource-intensive and usually
requires a signicant amount of human intelligence over an extended period of time.
Garnkel (2010) explains that the burden of analysis is exacerbated by the fact that, in
recent years, an investigation could require analysis of multiple devices as opposed to a
single device, as was the norm in years gone by.
The concept of integrity and the chain of custody is as relevant in this phase as in the
acquisition phase as there is interaction with the evidence. In an ideal scenario, analysis
would not be performed on the original artefacts but rather on a validated copy thereof;
I.E., where an investigator receives non-volatile evidence, such as a hard disk, under
controlled circumstances and as part of a dened process. In a non-ideal scenario, there
would be some level of interaction with the original evidentiary artefact; I.E., where
interaction with volatile evidence like memory, is required in the eld or in an uncontrolled
circumstance. It is during these non-ideal types of interaction that there exists the greatest
chance of intentionally or accidentally modifying the evidence in question. Any such
modication to the evidence that cannot be explained or reversed, can fundamentally
jeopardise the investigation as the integrity of the evidence is immediately questioned.
At a high level, the presentation phase of the digital forensic process involves sharing or
presenting the results to a selected audience, and includes showcasing and explaining the
information and facts concluded from the previous phases. Depending on the nature of
the investigation, the presentation phase could also include a list of necessary actions to
remedy an incident or mitigate a vulnerability.
Valjarevic and Venter (2011) noted that during presentation, the following artefacts can
2.1. DIGITAL FORENSICS 10
be expected:
• A time-line of all recorded actions and how these actions relate to users, and
As can be seen from Kessler (2006), the presentation phase of an investigation can be,
and most likely will be, subjected to intense scrutiny regarding the integrity of the pro-
cesses and evidence. This is especially if the investigation form part of criminal or legal
proceedings. It is, therefore, of paramount importance that any observations presented be
irrefutably backed up by facts derived from evidence of which the integrity can be proved
without a doubt.
As this research includes the development of a toolset that can be used by forensic in-
vestigators to timestamp and verify arbitrary evidence signatures against an immutable
source, it would be prudent to evaluate the history of tool development in this eld.
The development of tools in the digital forensic space is certainly not uncommon. As
mentioned before, whole frameworks, protocols and tool suites have been developed to
streamline the modern digital forensic process. An example of one such eort is seen in
Valjarevic and Venter (2013), with the Harmonised Digital Forensic Investigation Readi-
ness Process Model. Examples of popular tools include EnCase, SIFT and Volatility
among others (InfoSec Institute, 2017). These tools vary widely in functionality and
application, with some targeting very specic problems and others serving as high-level
frameworks for performing and managing digital investigations and the associated data.
These tools also vary between Free/Open Source Software (FOSS) or Commercial O-
The-Shelf (COTS) solutions.
Digital Forensics, as with almost all computer science disciplines, has had, and continues
to have, a strong reliance on FOSS solutions as FOSS continues to support a vast section
of the information systems advanced users and the public rely on every day. It is dicult
to quantify how much of all information systems used daily is made possible by FOSS,
2.1. DIGITAL FORENSICS 11
but considering that software such as Linux, OpenSSL, Apache and MySQL are all FOSS
based it’s clear how pervasive the use of FOSS is.
Carrier (2002) notes that digital forensics, in some basic incarnation, has existed as long
as computers have. They continue by stating that in years gone by, digital forensics was
a discipline limited to governments who used or developed proprietary tools to serve their
needs. This has changed in recent years as the commercial adoption of digital forensics
has led to the development of very competitive COTS as well as FOSS tools to aid digital
forensics professionals. The lack of formal development procedures and documentation
associated with FOSS meant that FOSS tools could be rapidly developed to satisfy needs
of investigators as they arose, leading to the popularisation of ad-hoc tool development.
Furthermore, the collaborative nature of FOSS and the fact that any person with the will
and skill could contribute to the software, resulted in feature-rich toolsets being developed.
Both Carrier (2002) and Manson, Carlin, Ramos, Gyger, Kaufman, and Treichelt 2007
noted that of the issues with FOSS tools, ease of use is among the biggest. To a certain
extent this is understandable, as most FOSS tools begin as purpose-built utilities to serve
a very specic need, and the developer is also the primary audience and user.
The development of digital forensic tools is more important than ever before as existing
tools become increasingly obsolete (Garnkel, 2010) and as technologies, complex and
proprietary data formats and protections like Full Disk Encryption (FDE) are more widely
adopted. And, regardless of individual characteristics and adoption rates, both FOSS and
COTS solutions have contributed and will continue to contribute to the advancement of
the digital forensics discipline.
2.1.3 Challenges
Challenges to the practice of digital forensics are numerous and increasing. Due to the
signicant variance in tool functionality and build quality, a great many resources have
been devoted to the validation of digital forensic tools. Validation eorts are extensive
and justiably so as a failure in a tool could potentially lead to the acquittal of the
guilty (Gottehrer, 2016) or conversely, the condemnation of the innocent. One such eort
that attempts to perform widespread validation of open source digital forensics tools is the
Computer Forensic Tool Testing (CFTT) workgroup established by the National Institute
of Standards in Technology (NIST). The CFTT aims, among other things, to baseline the
performance and accuracy of a wide variety of tools against a standardised methodology
2.1. DIGITAL FORENSICS 12
(Dykstra and Sherman, 2012). By doing so they hope to accredit the tools with the
necessary level of trust to ensure utmost admissibility of the evidence they produce.
One of the issues facing CFTT as well as other validation frameworks is the lack of insight
into known errors and failure rates for COTS and FOSS tools. Being able to establish a
known error rate for a specic tool is important for the following reason: If a key piece
of evidence can be based on a result of a tool with a known error rate of signicance ¿5%
or more, that evidence can be deemed unreliable and not admitted to court. If, however,
evidence was produced by tool with a non-signicant error rate in a testing environment,
that evidence can be seen as forensically sound since the procedure is also known to be
accurate.
This lack of transparency is part of the problem with tool validation and verication,
especially with COTS. It is extremely dicult to establish a known error rate for a tool
whose procedures are intentionally obscured (Carrier, 2002). It is understandable that
the producers of COTS tools want to protect their intellectual property by not releasing
source code or testing frameworks for the software, but this means that any users of COTS
accept that exhaustive testing was performed on the software without much proof. There
is, as noted by Carrier (2002), a commercial incentive for COTS providers to withhold
important error metrics, which can result in users questioning the integrity of established
software from a reputable vendor.
Carrier (2002) notes that there is a concerted eort to have digital forensic tools validated
and veried to use their output as potential evidence in legal proceedings, and when it
comes to FOSS tools there are two main issues to consider. Firstly, FOSS tools often lack
any kind of formal testing as a consequence of the circumstances and environment they
are developed under. Secondly, FOSS tools are easier to create validation tests for, since
the design, process and code is open for anyone to review. Carrier (2002) also notes on this
point that having open design standards and documentation allows testing methodologies
to be developed with more ease and that the open nature of FOSS means that bugs and
errors in the tool cannot be hidden. Carrier (2002) goes on to mention that FOSS tools
can have a known error rate to justify condence, since the defect history of the tool is
in the public domain and veriable. One of the reasons cited by Garnkel (2010) for
the decline of the “golden age of digital forensics” is the sheer variance in data and data
formats used on all manner of digital devices today. It is not uncommon to see multiple
dierent data storage and transmission protocols in a single standalone system, let alone
a highly integrated and complex system. Garnkel (2010) notes that it is increasingly
common for issues to arise with data analysis due to format incompatibilities and other
2.1. DIGITAL FORENSICS 13
similar factors.
Not only do digital forensic practitioners have to deal with increasingly complex data
formats, but they also have to deal with the increase in the volume of data stored on
devices. Garnkel (2010) notes that, due to the storage capacity of modern digital devices,
it can become impractical to perform basic tasks, such as creating forensic images of
devices, in a reasonable amount of time.
Garnkel (2010), Dykstra and Sherman (2012) and Dykstra and Sherman (2013) all noted
that the recent and sustained adoption of cloud computing poses a major risk to tradi-
tional digital forensics methods as access to the data needed for an investigation does
not necessarily reside with the party initiating the investigation. Part of the appeal of
the cloud computing model is that it abstracts some of the technicalities of managing
infrastructure or services from the end user and delegates that responsibility to the cloud
provider. This means that even with mandate from the owner of a system, it can be trou-
blesome to obtain the necessary log les to eectively perform an investigation. Dykstra
and Sherman (2013) points out that in this model, a lot of trust is placed in third parties
when it comes to validating the integrity of data; they event then developed a tool suite
called FROST to address these issues of trust.
Apart from the technical challenges noted previously, there exist a host of challenges as
well as opportunities regarding the legal aspects of digital forensics. Gottehrer (2016)
notes the shift in the nature of evidence from primarily paper-based physical artefacts
to digital mediums. They go on to note the implications this change might have on the
legal fraternity and specically noted that legal practitioners who do not understand the
nature of digital evidence and forensics do so at their own peril.
The need for legal requirements in digital forensics investigations is well established, as
noted by Kuntze, Rudolph, Alva, Endicott-Popovsky, Christiansen, and Kemmerich 2012,
when they mention that incorporating legal views into device design can assist in maintain-
ing the probative value of evidence produced by such devices. Although they specically
refer to devices, the same argument can be made for tools and software used during the
investigation process. They go on to note that such eorts to ensure admissibility of
evidence should be proactive, as any reactive eort would not add as much value.
One of the best known legal principles, as developed and implemented in the Unites
States of America (US) court systems, pertaining to the admissibility of digital evidence
is the Daubert standard. Kuntze et al. (2012) notes that the Daubert standard “...is often
used to determine if scientic evidence, including digital evidence, is admissible as valid
evidence in court.”
Carrier (2002) notes that the Daubert standard can be used in US courts to determine the
reliability of evidence presented during a trial. The Daubert standard usually applies to
scientic evidence, or evidence of a technical nature that is not generally understood by
judges or juries and was evidenced in its origin; Claried by Carrier (2002) as stemming
from the U.S. Supreme Court’s ruling in Daubert vs. Merrell Pharmaceuticals (1993). The
purpose of the standard is to verify the validity of scientic evidence, and by extension,
make such evidence admissible in a court. The standard aims to verify the methodology
as well as techniques employed to extract evidence and draw valid, true conclusions by
asking a question on each of the following four topics:
1. Error Rate
2. Testing
3. Publication
4. Acceptance
Because of the nature and acceptance of the Dauber standard, it would be prudent to
develop a digital forensic tool or process that can answer these questions easily and satisfy
a court’s demands for rigor, due process and validity.
2.2. BLOCKCHAINS 15
2.1.4 Summary
Most eorts from acquisition to presentation are geared toward preserving the integrity
of evidence and the chain of custody. Practices like hashing are a fundamental step in
this preservation process and are often used as the only mechanism to prove integrity of
evidence. Although the choice of hashing algorithm might change over time, the basic
process has not and it is still performed in an isolated context with the practitioner or
investigator being solely responsible for both creating the evidence signatures and verifying
them.
Both FOSS and COTS tools play a huge role in almost all digital forensic practices and
there is growing need to develop more advanced tools, as the evolution and adoption of
technology threatens current digital forensic processes.
An important aspect of tool development is the question of testing and validation as the
tools are very often used for reconstructing events and drawing conclusions from the data.
Any tool output that is not veriable and reproducible with a signicant level of accuracy
does not benet an investigation.
The nature of digital forensics and its tight coupling with legal matters means that legal
challenges often translate into substantial digital forensic challenges. As evidence from
more and more connected devices are used in legal proceedings, the reliance on and accu-
racy of the data produced by these devices is more important than ever before. The use
of data obtained from electronic devices to prove or disprove allegations in legal matters
is not common practice and the rigor of legal scrutiny is now cast upon digital forensic
methods, tools and practices. Legal concerns, like admissibility and digital forensics, will
continue to develop and become ever more important as the two elds coexist.
Integrity lies at the core of the discipline of digital forensics. Many of the processes, tools
and challenges depend on some level of veriable integrity.
2.2 Blockchains
2.2.1 Introduction
this work, blockchain will refer to the technology as a whole and not specic implementa-
tions of the technology. Where applicable, a specic implementation of a blockchain will
be referenced accordingly. Blockchain is simply another form of applied cryptography,
where existing cryptographic primitives like asymmetric cryptography, hashing, and Pub-
lic Key Infrastructure (PKI) are combined to form a new technology that aims to solve a
fundamental problem of trust.
What makes blockchain technology so appealing is the fact that it is distributed in nature,
both in terms of trust and processing. The distributed nature means there is no single
point of failure and a negligible possibility of undetected modication. As Lemieux (2017)
succinctly noted: “Blockchains and distributed ledger technology promises trusted and
immutable records in a wide variety of use cases involving recordkeeping, including real
estate and healthcare”. Lemieux (2017) further noted that that the main appeal of this
technology is its ability to produce immutable and trusted records, foregoing the need for
a trusted third party.
Blockchain technology is most well-known for its implementation in the form of the Bitcoin
blockchain, a system that provides an immutable public ledger of transactions that can
facilitate the transparent transfer of value between two parties without the need for a
trusted third party or intermediary. Witte (2016) explains that this transfer of value can
occur due to the transparent and unchangeable public ledger that forms the core of any
blockchain system. This ledger provides both parties the assurance that the value being
transferred belongs to a certain party and has not already been spent.
2.2.2 Purpose
The primary function of a blockchain, like the one in implemented in the Bitcoin network,
is often unclear and misunderstood. To better understand the function of the blockchain
a simple real-world scenario involving a bank and two bank customers can be used. to
explain the principles at work.
Scenario: current banking transaction model
Patron A wants to transfer value, in the form of money, to Patron B. In the traditional
banking model used by billions of people every day, Patron A would send a request to the
bank to transfer an arbitrary amount of money to Patron B. As Patron A and Patron B
both have accounts with the bank, the bank can facilitate the transaction by checking its
ledger to see if Patron A has the necessary funds. If so, the bank deducts that amount
from Patron A’s account, updates the ledger to record the transaction and then adds that
2.2. BLOCKCHAINS 17
amount to Patron B’s account, again updating its ledger to record the increased balance
in the account of Patron B. As simple as this seems, this type of transaction was only
possible because both parties, Patrons A and B, trusted the bank as an intermediary to
perform the transactions. Without this trust, this simple transaction would not have been
possible as there would be no way for Patron B to be sure that Patron A has not already
spent the money she needed to transfer. Similarly, there would be no way for the bank
to ensure that Patron A does not spend that same money on a subsequent transaction.
An updated and trusted ledger, maintained by the bank, is the only mechanism by which
such a transaction can be concluded.
In the above example, the bank or intermediary can easily be replaced with the Bitcoin
blockchain, because it is, fundamentally, a completely transparent ledger of transactions
that records and displays all transfer of value in line with the key properties of blockchains
(discussed later in Section 2.2.5.). In a traditional banking system, the bank or interme-
diary would be responsible for reconciling payments and balancing accounts, but this
function is now completely and transparently facilitated in an automated fashion on the
Bitcoin blockchain by willing participants. The main dierence between a traditional
banking ledger and the Bitcoin blockchain is that the Bitcoin blockchain is fully auto-
mated and there is no barrier to entry for potential participants.
Blockchains, such as the Bitcoin blockchain, have eectively solved the problem of coun-
terparty risk and settlement risk by becoming a universally trusted ledger of transactions,
controlled by a community of willing, incentivised participants, not a single entity. In
terms of the risk mentioned, counterparty risk is the risk that either party to a transac-
tion will not live up to their contractual obligation. Settlement risk, as noted by Peters
and Panayi (2016) is: “the risk that one leg of the transaction may be completed but not
the other”. Through the mechanisms of immutability and transparency, a blockchain-type
system can drastically reduce the need for a centralised third party, like a bank, to carry
counterparty or settlement risk.
Cryptocurrencies, like Bitcoin, are a form of digital currency where the system for ex-
changing the currency is digital and the value of the currency is also digital. Although
the Bitcoin blockchain is the most popular and most used of all such implementations, it is
important to note that not all blockchains need manifest as cryptocurrencies. Blockchains
can be used for a multitude of dierent applications, some of which are discussed later in
more detail.
2.2. BLOCKCHAINS 18
Witte (2016) notes that blockchain is based on two well-established cryptographic primi-
tives: Public Key Encryption (PKE) or asymmetric cryptography and cryptographic hash
functions. PKE is a very popular and universally used method of cryptography where a
message is encrypted and decrypted with dierent, but related keys. This diers from
more traditional, symmetric encryption where a message is encrypted and decrypted with
the same key. Cryptographic hashes, as already discussed, are the output of a determin-
istic function that takes input of variable length (pre-image) and produces a unique value
of xed length (hash) for that specic input. Hashes are computationally infeasible to
reverse to determine the pre-image of a given a hash.
Readers who are interested in the details of how these properties arise are referred to an
authoritative book such as Schneier (1993). For the purposes of this work, it is sucient
to understand that cryptographic hashes have, among other, the following two properties:
• pre-image resistance
Because a hash function produces a xed length output regardless of the length of the in-
put, there exists the possibility that two dierent input values will produce the same hash:
a hash collision. Some hash functions, with a shorter length output and subsequently,
lower entropy, are more prone to collision than others. Cryptographic hashes that are
less likely to suer from such collisions are said to be more collisions resistant. Without
collision resistance it would be trivial to create identical hashes for dierent pre-images,
negating the value of cryptographic hashes as a mechanism to verify the integrity of data.
Combining these two basic concepts, PKE and hashing, Nakamoto (2008) proposed the
Bitcoin blockchain upon which all subsequent blockchain implementations to date, are
based. Figure 2.1 is a simplied visual representation of a blockchain-type system, where
every block is dependent on the contents of the previous block.
In Figure 2.1, there is no starting, or genesis, block, but rather a sequence of blocks at
some point after the genesis block. It can be seen that one input into a block is the hash
of the previous block. To further improve security, this hash is combined with a nonce
and some arbitrary data items before it is once again hashed and provided as input to the
following block. A nonce, as explained by Rogaway (2004) in the context of cryptography,
is simply a value used once for a particular message or operation. Schneier (1993) further
noted that a nonce is usually a random value. By chaining blocks together like this, it is
possible to verify the data in it, as any change in the data will result in a change of the
hash which will necessarily cascade down the chain, changing all subsequent block hash
values.
Introduced to modern cryptography and computer science by Merkle (1980) in the early
1980s, MHTs are another cryptographic primitive that makes blockchain technology pos-
sible and practical. The initial use case for MHT, as is clear from Merkle (1980) and the
associated patent ling, was to facilitate the proof of a digital signature for the purpose
of authentication. Again, the use case in blockchain technology is slightly dierent but
rooted in the same principles.
MHTs rely heavily on hashing for its function and value. The broad purpose of a MHT
is to make the validation of data more ecient, by providing a way for large amounts of
data to be validated against a single hash value without having to rehash all the data.
It is often used in peer-to-peer services and protocols to facilitate the validation of data
2.2. BLOCKCHAINS 20
1. The root, also called the Merkle Root (MR), of which there is only one per tree
2. The nodes, also referred to as Child Nodes (H), of which there must be at least two;
theoretical there is no maximum number of Child Nodes per tree
3. The leaves (L) of which there must be at least two; theoretical there is no maximum
number of leaves per tree
Figure 2.2 shows a basic example of a MHT with four leaves, six nodes and a root. For
the purpose of explanation, the four leaves would be the raw data needing to be veried.
This data is not included in the tree but serves as the basis of its creation. Theoretically,
2.2. BLOCKCHAINS 21
there can be an innite number of leaves, but the number of leaves is usually limited
to avoid long running computation on the tree. One level up (level MR-2) there are the
nodes, H1 to H4 , which are hashes of the respective leaves (L1 to L4 ). It is essential to note
that these nodes are hashes (one-way functions) of the leaves but that the actual hash
algorithm is not stipulated. Each use case may call for dierent hash algorithms, based
on the preference for speed over security, or vice versa. In the Bitcoin implementation
and other implementations where security of the hash values (their resistance to collision)
is important, hash algorithms, like SHA256, are used. One level up (MR-1) are the
secondary nodes, which each consists of the hash of the concatenation (Hxy = Hx ‖Hy ) of
its children on MR-2. Finally, on the very top level is the MR which, like the nodes below
it, is a hash of its concatenated children. It is considered the root as it is a single hash
that incorporates elements of all the leaves. In this way, a seemingly insignicant change
in a single leaf will propagate up the tree and result in a changed MR. It is clear that
MR can be used to verify the integrity of all of the leaves independently or as a whole;
therein lies the power of MHT as a mechanism for verication.
Figure 2.2 is an example of a binary MHT, I.E., a tree where each node has at most
two children. MHTs can also be non-binary where each node can have more than two
children. The MHTs that are used to validate transactions in the Bitcoin blockchain are
binary trees.
Binary MHTs can be asymmetric, meaning that there is not an equal number of leaves
as can be seen in Figure 2.3. Due to the nature of binary MHTs, the computation to
get to the root is slightly adjusted where necessary, by duplicating the leaf or node to ll
the spot of the missing node. By following this basic rule, computation on the tree are
predictable and standardised.
Binary MHTs are only valuable if operations can be performed on them, which is why
the list of possible operations supported by binary MHTs include:
1. Search
2. Insert
3. Delete
4. Traverse
Calculating the MR is also sometimes referred to as ‘collapsing the tree’ as it reduces the
various leaves into a single root hash value.
2.2. BLOCKCHAINS 22
Due to the nature of MHTs, the performance of these operations varies, this is very
important as quick lookups are essential to the use of MHT in most applications like
Bitcoin. Brilliant.org (2015) gives an overview of the complexity (Big O notation) of
certain operations on a MHT as seen in Table 2.1, where the branching factor (number
of children of each node) is denoted by k for non-binary trees with n number of nodes.
A trees grow in size and complexity, so too does the complexity of operations on that
tree. By using MHTs, a large amount of arbitrary data can be hashed into a single MR
hash. To verify any leaf on the tree, its original data, the hashes on its path, and the root
hash needs to be known. This means that not all the leaves need to be present to be able
to validate the integrity of a single leaf, thereby allowing MHT preserve space and data
1
Amortised value estimate - averages out the worst operations over time.
2.2. BLOCKCHAINS 23
transfer operations.
Blocks
Bitcoin blocks are collections of structured data that form a fundamental part of the
ledger. A block can be separated into two main components: a header and a body (also
called the payload). The head of a block contains some reference data, including the
block structure version and a reference to the previous block, while the payload of a block
contains transaction data. The exact structure of a block can be seen in Table 2.2.
nVersion, one of the more notable elds in this discussion, is an indication of the block
structure version. This eld is necessary as the block structure might have changed over
time and block parsing systems will need to know the version of the block to ensure
compatibility.
HashPrevBlock is, as the name indicates, a hash of the header of the preceding block.
The hash is a double-SHA256 hash function (dSHA256) of the concatenated content of
the previous block header:
2.2. BLOCKCHAINS 24
It is this process, the hash referencing the header of the previous block, that facilitates
the formation of align chain. By having a HashMerkleRoot in each block, apart from the
very rst block, a chain is formed that indicates in which order blocks were incorporated
into the chain, as can be seen in Figure 2.4.
HashMerkleRoot is the root of the MHT computed over all the transactions included in
vtx[ ], using dSHA256.
nTime is the time in UNIX format of the creation of the block in question.
nBits stores the target value, denoted as T, used in Proof of Work (PoW) calculation.
nNonce is another component of the PoW puzzle, and is a simple nonce that can be
used during the PoW calculation as a source of randomness to get to reach the target T.
nNonce is explained in more detail in the section on PoW.
cnt vtx is the total number of transactions included in the block in the vtx[ ] array.
At the time of writing (August 2017), a Bitcoin block can be up 1 024 kilobytes (1 024
000 bytes) in size, but no larger. Blocks, larger than 1 024 000 bytes, are considered
invalid and will not be accepted by the network. As can be seen from the size allocations
2.2. BLOCKCHAINS 25
in Table 2.2, the header data for a block (all non-transaction data) can be up to 80 bytes
in size leaving the vast majority (1 023 920 bytes) for transaction data.
Transactions
Bitcoin transactions are collections of reference data; inputs and outputs specifying,
amongst other things, the source and destination of the transaction as well as the value
of the transaction. A regular Bitcoin transaction structure can be seen in Table 2.3
nVersion, as in the case of a block, indicates the version of the transaction structure
(which may change over time).
Another notable eld in the transaction structure is scriptSig that species the conditions
under which the transaction output can be spent. The script, Turing complete scripting
language built into the Bitcoin protocol, is called Script.
Similarly, scriptPubKey is another instance of Script that species the conditions under
which the output of the transaction can be claimed. These elds are primarily used for
embedding data as they are completely under the control of the user and can, compared
to other elds, store relatively large amounts of data, according to Okupski (2015).
2.2. BLOCKCHAINS 26
Bitcoin addresses are unique alphanumeric strings of characters that are used to identify
the source or destination of a Bitcoin transaction. These addresses can be 27-34 characters
in length, and are constructed through deterministic function with a variety of inputs.
The reason for the varied length of addresses is as a result of the nal address being
encoded as a base58 reduced character set. Base58 is used as the encoding of choice since
it removes some of the ambiguous characters from the base64 character set like 0 (zero)
and O (uppercase o), to avoid ambiguity in addresses and thus instances of erroneous
transfers. At the time of writing two types of Bitcoin addresses exist: Pay-to-PubkeyHash
(P2PKH ) and Pay-to-ScriptHash (P2SH ).
In the case of P2PKH, the basis of the address is the hash of the public key portion;
the public-private, Elliptic Curve Digital Signature Algorithm (ECDSA) keypair, asso-
ciated with a user or their wallet, a secure digital storage mechanism for Bitcoin. The
alphanumeric address is constructed by computing the SHA256 and RIPEMD160 hash
of the public key to produce the pubKeyHash. A version byte is then prepended to pub-
KeyHash. Once the version byte has been appended, a checksum is calculated over the
concatenation of pubKeyHash and the version byte by performing a dSHA256 hash over
it and truncating the result to the rst four bytes. Finally, the checksum is appended
to pubKeyHash and the result is encoded using the base58 character set resulting in the
nal P2PKH address.
The process for P2SH addresses is similar but uses the value of the redemption script
located in the scriptPubkey eld as the initial input rather than the hash of the public
key, as in P2PKH. The process from that point onwards is exactly as detailed for P2PKH
and results in the nal base58 -encoded P2SH address.
Aside from the two Bitcoin addresses used, the Bitcoin protocol also denes coinbase
transactions. These transactions do not facilitate the transfer of value between partici-
pants in the network; They are exclusively used to carry transaction fees that are rewarded
to nodes for processing regular transactions.
Chain
Nodes that process transactions in the Bitcoin network are referred to as miners, and
their function is to:
2. add those transactions to the block structure (as dened in Table 2.2), and then
Once this puzzle is solved, the miner broadcasts the proof along with the block, and other
miners then proceed to verify that proof. If the proof is accepted, the block is added to the
chain as the most recent block and the miner is rewarded for the work it has completed.
Once a new block is added to the chain, the hash of its header is used by the miners in
the network to create a new block, add waiting transactions to that block structure, and
repeat the whole process.
This ongoing work results in the chain as depicted in Figure 2.5. Due to the distributed
nature of the system where many nodes compete to solve the PoW puzzle, it occasionally
happens that more than one miner solves the PoW for dierent blocks at the same time.
When this happens, it results in a fork in the chain; and each node will then accept
the rst proof it receives as the correct one and build the chain from that block. When
this happens, the rejected block is called an orphaned block, depicted in Figure 2.5 as
the block with dotted borders. However, transactions that were part of these orphaned
blocks are not lost but as instead rebroadcast to the network for inclusion in the next
block. Miners always work on the longest chain, which implies the chain on which the
most computational eort was exerted. This is to ensure that there is consensus around
which chain is the correct chain, and to prevent malicious nodes from altering previous
blocks to create an alternative chain.
2.2. BLOCKCHAINS 28
As discussed, a fork in the blockchain can occur naturally as a matter of chance or it can
be an induced fork. These occur when the majority of nodes agree to reprocess a previous
block to create an alternative chain, invalidating the revious chain. Induced forks are
not a common occurrence but are usually the subject of controversy in the community of
Bitcoin nodes (Redman, 2017).
Bitcoin, like other forms of currency, backed by commodities and resources can suer the
eects of ination should it be overproduced. Since Bitcoin is completely digital, there
needs to be a mechanism to regulate the amount of Bitcoin released into the system. If
Bitcoins were trivial to create, it would have little to no value as a store of value, since
any person could simply create vast amounts of the currency. To combat the eects of
ination, Bitcoin is designed to be dicult to create through Controlled supply, which
is enforced in two ways: by having a nite supply of Bitcoin, and regulating the rate
at which new Bitcoins can be mined. The Bitcoin generation algorithm denes at what
rate currency can be created and any currency generated by violating these rules will be
rejected by the network.
The reward is given to the miners that solve the PoW puzzle, measured in units of Bitcoin
currency (BTC), is how new Bitcoins are introduced into the system. Rewards are variable
and diminishing geometrically over time to ensure that the rate at which Bitcoins are
created remains constant as the overall mining power of the network increases. This
diminishing reward for nding a block is called the block reward halving, and occurs
every 210 000 blocks. This means that after every 210 000 blocks, the reward for nding
a block is reduced by 50%. In the initial Bitcoin blocks, the reward was 50BTC. At the
time of writing, the reward halving has happened twice and the block reward has reduced
to 12.5BTC. By strictly applying this reward halving, the creation of new Bitcoins will
eectively stop after 64 halving operations as the reward for nding a block will be less
than the smallest unit of Bitcoin, a Satoshi. Estimates on when this point will be reached
vary, as it’s based on factors such as mining power and technological advances; some
sources, however, estimate the year 2140 (Bitcoinwiki, 2014).
Proof of Work
PoW is another important component of controlled supply as it ensures that the diculty
of nding a block can be adjusted to compensate for uctuations in the network’s ag-
2.2. BLOCKCHAINS 29
gregate mining power. By adjusting the diculty every 2016 blocks - through consensus
by all participating miners - the network can respond to uctuations in mining power
and ensure that blocks are released, on average, every 10 minutes. The PoW puzzle im-
plemented by Nakamoto (2008) was based on the hashcash system developed by Back
(2002). As the mining power of the network increases, the diculty of the PoW puzzle is
adjusted to slow the rate of block creation. The PoW puzzle diculty is dened by the
target T, the nBits eld, in Table 2.2, in a block header.
To solve a PoW puzzle, a miner must calculate a dSHA256 hash H over the contents of
the block so that it is smaller or equal to the target hash T such that:
H ≤T
As discussed before, hash functions are deterministic, and input i will always result in
the same output o when passed through the same function dSHA256.
H = dSHA256(i ) = o
In order for a miner to generate variable hashes to satisfy the condition H ≤ T over the
static contents of a block i, it needs to incorporate other random data n in the form of a
controllable nonce into the input of the function so that:
H = dSHA256(i ‖ n) = on
Since the miner cannot change the contents of i, it varies the input of n, concatenates i
and n, performs the hash function and evaluates the result against the condition H ≤ T.
If this condition is met, the puzzle has been solved, and the miner broadcasts this block,
accompanied by the proof, to the network for conrmation.
PoW diculty is adjusted by requiring that value T starts with a certain predetermined
amount of zeros. The more zeroes required, the more hashing operations the miner has
to perform in order to nd a value of H that satises the condition H ≤ T.
By solving this PoW, the miner proves that it has invested an approximate amount of
eort at its own cost toward nding the block, and that it is a willing and conforming
participant in the network.
Blockchain technology, as known and implemented today, is only useful because of a few
key properties which make the concept and practical execution possible, namely:
2.2. BLOCKCHAINS 30
1. Immutability,
2. Chronology,
3. Redundancy, and
4. Transparency.
Immutability, the lack of ability to be changed, is arguably one of the most important
properties of blockchain systems. Immutability is not a property on the macro level - as
the chain is constantly changing and expanding when new blocks are added - but rather
on a more granular level as data and transactions that are embedded in the blocks are un-
changeable. Witte (2016) highlights that this immutability is conditional and strengthens
over time as a consequence of the design of the system. As newer blocks form on top of
older blocks, the block depth increases and the ability to change data embedded in that
block diminishes. Any entity that wishes to change some data within a block would have
to change the data in that block and recompute that block and all subsequent blocks faster
than all the other nodes in the network can. It would therefore be theoretically possible
for multiple nodes to collude to change some data, but this type of collusion is unlikely
and inherently detectable. According to Witte (2016), in the Bitcoin blockchain, the cur-
rent block depth to guarantee a permanent and unchangeable transaction is six-blocks
deep. This immutability means that the public ledger record cannot be altered to reect
a record that represents a false or fabricated transaction, and can thus be trusted. The
immutability of the information embedded in the blockchain means that, to any observer
or participant in the public ledger, all information can be considered secure, unchangeable
and a true record of data and transactions over time.
Transparency is the nal the four core blockchain properties and is more of a functional
requirement and not a design consequence. Considering the prevailing application of
blockchain-based systems similar to Bitcoin, it is obvious that for the system to work, all
transactions need to be broadcast openly to any entity willing to listen. Apart from the
broadcasting of transactions, the information embedded inside the ledger should also be
open for all to see and verify. Transparency is fundamentally necessary for the system
to function and it cannot be changed. In the case of the Bitcoin blockchain, no balances
are stored, only transactions. So, in order to calculate the balance of a specic address,
all the transactions in and out of that address need to be visible. Transparency of the
technology and the information processed by the system also increases trust in the system.
2.2. BLOCKCHAINS 32
Malkovský (2015) notes that not only is it benecial having the data in the blockchain
transparent, but the transparency of the protocol also enables trust in the system.
2.2.6 Weaknesses
Blockchain technology is not a universal solution to all technology problems. Even though
there is a plethora of potential applications for the technology, not all of them are good
or even practical. Below is an overview of why blockchain technology cannot solve all
problems and what some of its weaknesses are.
The rst and most notable weakness in a blockchain based system is rooted, quite iron-
ically, in trust, the very problem it tries to solve. The weakness comes in the form of
a theoretical attack, called the 51% Attack. Witte (2016) notes that the number, 51,
in the name has little relevance to an actual attack, but rather serves to illustrate the
nature of the problem. The 51% Attack would occur if a majority of participating nodes
in a blockchain system colluded to manipulate the addition of new blocks to the chain.
Because the system is designed to achieve consensus through computational power, if a
majority of nodes with a signicant portion of the entire network’s computational power
colluded, they could theoretically have a small time window to manipulate recent blocks
to their advantage. The attack is theoretical and mitigated through the concept of deeper
block depth I.E., as subsequent blocks are added onto the block it becomes more and
more impenetrable. Witte (2016) notes that the 51% Attack creates a problem of speed
rather than security, as users must wait for a block to be embedded deeper in the chain
before fully trusting its contents.
Speed is another potential weakness. As alluded to before, for a block to be considered safe
it needs to be a few levels deep in the chain and this processing takes time. The exact
amount of time is dependent on design factors in the blockchain implementation. For
example, the average block release time diers signicantly between Bitcoin and Litecoin
2.2. BLOCKCHAINS 33
according to Volf (2016), with Bitcoin at 10 minutes and Litecoin at 2.5 minutes. Litecoin
will be discussed in more detail later, but in short it is a Bitcoin spino with the specic
aim of shorter transaction conrmation times. Litecoin aims to achieve this speed by
releasing blocks more often that Bitcoin does by broadcasting the necessary block data
to all participating nodes, enabling them to start mining that block. Speed is not a
problem that is easily addressed in blockchain-based systems as their fundamental design
necessitates a certain level of computational eort to establish the distributed consensus
through mechanisms such as PoW. Blockchain system speed has also been the topic of
much prior research, as can be seen in Kiayias and Panagiotakos (2015) where they noted
the conicting relationship between PoW (security) and speed.
Directly related to the issue of speed is the issue of size, or rather the lack thereof.
In Nakamoto’s original Bitcoin blockchain design there was some consideration about
minimising the size of the blockchain, since a complete copy would need to be stored
by each node in the network. It is also the case that due to its nature, a blockchain
could only ever get bigger and never have information removed from it. The challenge of
space has meant that signicant work has been done on best utilising the available space
on a blockchain, notably by Okupski (2015) in which the available space on the Bitcoin
blockchain for embedding arbitrary data storage was analysed and maximised through
novel techniques. The purpose of this was to utilize the immutable nature of the Bitcoin
blockchain to facilitate the development of an anti-censorship tool as a way to embed
messages on the blockchain, making them impossible to alter or remove. The distributed
nature of a blockchain-based system requires that the data exists as redundant copies,
and for that to be true, there needs to be a focus on the frugal use of space in these
systems. It is important to note that there is no theoretical size limit on the size of the
entire blockchain, but rather the size of individual blocks. Theoretically, the blockchain
has the potential to store an innite amount of data but that data would have to be spread
over an innite number of individual blocks of nite size. This nite size of individual
blocks gives rise to an impracticality when using a blockchain simply as a mass storage
mechanism.
Bitcoin, as discussed, is the rst and most popular implementation of a practical blockchain-
based system. Even though the Bitcoin blockchain initiated the concept, it is no longer
the only blockchain system. The explosive growth and popularity of the blockchain con-
cept gave rise to a complete ecosystem of alternative blockchain implementations. Some of
2.2. BLOCKCHAINS 34
these alternative digital currencies attempt to solve the shortcomings of Bitcoin, whereas
others diverge completely from the digital currency use case for blockchains. To date there
have been numerous implementations and usages of blockchain technology as is obvious
from titles like: “101 TOP BLOCKCHAIN COMPANIES” in Rampton (2016). The fol-
lowing is a short overview of some of the major blockchain systems, apart from Bitcoin,
that have driven the technology and its adoption to new heights.
One of the rst successful blockchain implementations other than Bitcoin was an alter-
native to Bitcoin. Litecoin was and remains a very close copy of Bitcoin, both in terms
of technology and purpose, with only a few subtle dierences. Litecoin was created to
address some of the practical issues of Bitcoin, primarily the slow transaction times. The
Litecoin website at Litecoin Project (2017), describes Litecoin as: “...a peer-to-peer In-
ternet currency that enables instant, near-zero cost payments ... Mathematics secures the
network and empowers individuals to control their own nances. Litecoin features faster
transaction conrmation times and improved storage eciency than the leading math-
based currency. With substantial industry support, trade volume and liquidity, Litecoin
is a proven medium of commerce complementary to Bitcoin”. Litecoin, despite its faster
transaction conrmation and reduced size, did not make Bitcoin obsolete and although
still in existence, has fallen in popularity due to a decreasing market cap. Despite this,
Litecoin is signicant as it was the rst blockchain implementation that attempted, and
arguably succeeded, in addressing some of the issues with Nakamoto’s initial design. To
be clear, Litecoin built on the basic principles Nakamoto posited, but it improves speed
by tweaking certain parameters, like the time between block creation and the overall size
of blocks. By releasing larger blocks more frequently, Litecoin is able to process more
transactions per second than Bitcoin.
(zk-SNARK). Not only does Zcash allow shielded transactions - transactions of which
the content is encrypted - but it also allows normal public payments like the Bitcoin
blockchain. Zcash is a promising and new addition to the cryptocurrency ecosystem and
illustrates how blockchain technology might be improved upon.
Among the most notable blockchain technologies that departs from the most common
use case of such technology is Ethereum. As described by Ethereum Foundation (2016):
“Ethereum is a decentralized platform that runs smart contracts: applications that run
exactly as programmed without any possibility of downtime, censorship, fraud or third
party interference”. Or, more concisely, Ethereum is a generalised platform for computing
based on the blockchain concept. Wood (2014) notes: “Ethereum is a project which
attempts to build the generalised technology; technology on which all transaction based
state machine concepts may be built. Moreover, it aims to provide to the end-developer
a tightly integrated end-to-end system for building software on a hitherto unexplored
compute paradigm in the mainstream: a trustful object messaging compute framework.”.
Whereas Bitcoin, Litecoin, Zcash utilise blockchain technology for a very specic use case,
Ethereum is abstracted and presents a platform for implementing solutions for a vast array
of dierent use cases. One of its most notable use cases is the concept of smart contracts.
These smart contracts are written and embedded into the Ethereum blockchain, ensuring
the terms of the contract are immutable and the execution, given the right circumstances,
is guaranteed.
Using Ethereum and its powerful, built-in Turing-complete, scripting capabilities, enti-
ties such as the Distributed Autonomous Organization (DAO) were brought into being.
According to del Castillo (2016), the DAO is a distributed, leaderless organisation, built
on top of the Ethereum blockchain. Its purpose is to serve as a vehicle for supporting
Ethereum-related projects. A participant in the DAO can be seen as a stock holder in
a traditional organisation, and gets to exercise their vote on which projects should be
funded. In fact, the DAO itself was built on a set of smart contracts that are embedded
in the Ethereum blockchain. According to Delmolino, Arnett, Kosba, Miller, and Shi
2016, smart contracts are user dened programs that stipulate a set of rules to govern
transactions that are enforced by a network of peers.
Although the DAO remains popular, it has suered a number of setbacks, one of which
was a hack that threatened to invalidate the whole system and the principle it was based
upon. As Finley (2016) notes, the DAO was the victim of an attack that exploited an
error built into the smart contract governing the DAO. The attack was able to drain
large sums of digital currency from the DAO. Not only was the attack very eective, it
2.2. BLOCKCHAINS 36
was also a great point of contention as the purveyors of the DAO performed a fork of
the blockchain in order to reclaim some of the lost funds. A blockchain fork (or forking)
refers to the action of choosing a point on the blockchain, prior to some unwanted action,
and then creating an alternative chain; the chain is processed from that point onward.
Forking a blockchain is a non-trivial task and involves consensus from a majority of the
nodes processing the blockchain. By forking a chain, the community of nodes eectively
has the power to ‘go back in time’ undoing some unwanted action and processing ahead as
if that action never occurred; creating an alternative timeline. It is important to note that
the act of forking the chain does not erase the record of the unwanted action, but rather
works around it by creating an alternative chain. This was very contentions since the
DAO was built on the principle that the smart contract is the only law that matters, but
when it was found that the smart contract was awed, that stance was very invalidated in
order to save the DAO members from losing money. As a headlines stated: “A $50 Million
Hack Just Showed That the DAO Was All Too Human” (Finley, 2016). Because of the
versatility of the Ethereum platform, there are many other systems and applications built
on top of it and even more being planned.
Despite the seemingly devastating eects and monetary loss associated with the DAO
hack, the fundamental security of blockchain technology was never compromised. The
DAO hack took advantage of a logic implementation error in a system built on top of the
Ethereum blockchain. It did, however, illustrate the immutable nature of the blockchain
and how a hard fork was necessary to recover funds. Even after such a hard fork, the
oending transactions were not deleted but remain in an immutable state in an abandoned
fork of the Ethereum blockchain for all to see.
The understanding of blockchain technology, its strengths, weaknesses, and possible ap-
plications is slowly increasing (Marc, 2016) in technological and non-technological circles.
Although it is clear that the application of blockchain technology stretches far beyond
digital currency, nancial applications are currently the main driver for the adoption of
the technology. The nancial sector worldwide have taken note of blockchain technology,
with many large nancial institutions, such as the Bank of England, actively investing,
researching and developing the technology (Barrdear and Kumhof, 2016). Blockchain has
also not gone unnoticed by governments around the globe (Curry, 2016).
As noted by Filippi (2013), Bitcoin, and blockchains in general, have been the subject
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 37
of various legislative attempts in the EU. Filippi (2013) also notes that the development
of such legislation is driven primarily by the threat cryptocurrencies pose to nancial
systems and governments of the world. As a measure of the impact of the technology and
its applications, legislative intervention is a very convincing metric. A recent bill, noted
by Arizona State Legislature, Fifty-third Legislature, First Regular Session (2017) was
passed in the state of Arizona in the United States of America, that gives legally binding
status to smart contracts and blockchain signatures. This is a signicant step forward in
the future adoption and acceptance of blockchain-based technologies.
Both digital forensics and blockchains are concepts that are fairly new to modern com-
puter science, and have had the bulk of their development and innovation happen after
the turn of the century. In fact, as discussed, blockchain technology only really came into
being around 2008 and only recently started drawing widespread attention. The technol-
ogy behind digital forensics and blockchains are heavily rooted in modern cryptography,
though they pursue very dierent goals. What the underlying cryptography provides,
to both, is a mathematical instrument to provide reasonable assurances of integrity and
trust.
Seeing trust and integrity as a common base for many of the functions within digital
forensics and blockchain technology, enables the appreciation of how these two seemingly
unrelated concepts can be married. It also provides perspective and the ability to develop
use cases where the properties of blockchain stand to benet digital forensics as well as
other areas of computer science.
Having discussed blockchain technology and digital forensics, exploring some of the works
that demonstrate this merger of concepts will provide further understanding of the versa-
tility of blockchain technology. Apropos to that is a discussion of previous work that, to
varying degrees, explores the application of the trust and integrity, provided by blockchain,
in digital forensics and other elds.
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 38
This work centres on using the trust, integrity, resilience and immutability provided by
the Bitcoin blockchain to augment anti-censorship tools.
Okupski (2015) notes that freedom of speech is a very important and often undermined
basic right, and certain corporations and governments actively blocking individuals from
exercising this right. Okupski (2015) specically references the censorship activities of
the People’s Republic of China when describing these injustices toward free speech.
Okupski (2015) goes on to present Bitcoin and the blockchain as a possible mechanism to
circumvent this censorship and suppression of free speech. At rst, it might seem odd to
make this suggestion, but there are two features of Bitcoin that make it ideally suited to
such applications. Firstly, the political and economic nature of Bitcoin makes it incredibly
dicult for lawmakers to enforce laws on. Bitcoin is also dicult to t it into any existing
legal framework, making it dicult to enforce outright bans. Secondly, Bitcoin and the
infrastructure on which is it based, is truly global and decentralised. The pseudonymous
nature of participation and decentralisation of infrastructure would make it very dicult
for oppressive regimes to enforce restrictions upon it.
Okupski (2015) also establishes that the act of embedding arbitrary data into the Bitcoin
blockchain is not new and has been in practice as early as the genesis block in the Bitcoin
blockchain. Data embedded in the blockchain includes various seemingly random mes-
sages, images and quotes from political gures such as Nelson Mandela and even a script
that allows users to embed data in the Bitcoin blockchain.
After establishing that embedding data into the Bitcoin blockchain is possible, Okupski
(2015) continues to describe, in detail, the Bitcoin protocol to identify elements that can
be used for embedding data and thereby improving the eciency of current techniques.
They nd that current methods are functional but not very ecient and that by using
improvements in the evolving Bitcoin protocol these methods can be improved. Currently,
the data is embedded in the destination address eld of a transaction as this parameter
is under the control of the user performing the transaction. A consequence of using this
method is that the Bitcoin value associated with that transaction is lost forever (a nancial
cost) as there is a mathematically insignicant chance of the arbitrary data, now used
as the destination address, would correspond to a valid Bitcoin address, perhaps even
an address belonging to the sender. Additionally, as there is limited space available per
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 39
transaction, multiple transactions are needed to embed data of signicant size - this in
turn could have a signicant cost implication due to the un-spendable transaction outputs.
This places two restrictions on the process of embedding data: the cost and space.
To counter this, Okupski (2015) developed a new, more ecient method for embedding
data, thereby signicantly reducing the cost per byte of data. They found that by using
the Pay-to-ScriptHash (P2SH) transaction type together with other methods, they could
reduce the cost of embedding data to around 16 Satoshi (the smallest denomination
of Bitcoin) per embedded byte of data. This translated to approximately $184.21 per
Megabyte of embedded data, given the Bitcoin price of $1097.98 per Bitcoin at the time
of writing.
The result of this research is an application that has the ability to eciently encode
and embed arbitrary data into the Bitcoin blockchain. Okupski (2015) has also released
the code for this application publicly in order to further the cause of anti-censorship
movements as well as promote future research on the topic.
Unfortunately, although this method signicantly improves the eciency and, by exten-
sion, reduces the cost of embedding data into the Bitcoin blockchain, it can still not be
considered inexpensive. It is, after all, subject to Bitcoin price uctuations and may
become cheaper or more expensive as time goes on. Even with the improved eciency,
this method does not seem like a viable solution to embed large amounts of data in the
Bitcoin blockchain.
The work done by Okupski (2015) is, however, very informative and showcases how
blockchains, in this case the Bitcoin blockchain, can be used for purposes other than
what it was initially intended, despite it carrying a cost. Okupski (2015) goes on to high-
light that by embedding data into the blockchain users can prevent the data’s destruction
and ensure its propagation to all corners of the world. It is clear that Okupski (2015) re-
alised the advantages of the four core blockchain properties - immutability, transparency,
chronology and redundancy - and used them to create a use case completely out of the
scope of the initial Bitcoin design.
Even though the work in question does not directly refer to any digital forensic practices,
the nature of the problem it is trying to solve resonates with digital forensics in that data
is persisted and protected from tampering. In the use case by Okupski (2015), the data
that is persisted might be messages and important political statements, but in the case
of digital forensics the data could very well be evidence or signatures of evidence.
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 40
This work, also by the author of this paper, introduces the idea of marrying the elds of
digital forensics and blockchain, and informs this dissertation, which itself is addressing
some of the potential applications noted in Weilbach (2014).
Weilbach (2014) starts by introducing the concept of digital forensic readiness and Link-
ing that to technologies such as Intrusion Detection Systems (IDS). He then goes on to
highlight the necessity to have legally admissible evidence to facilitate prosecution and
further the cause of digital forensics in general. In this work, Weilbach (2014) proposed
a framework that guides the use of blockchain technology from the perspective of storing
and retrieving data associated with the forensic process. Weilbach (2014) already noted
the use of having an immutable store for such mission critical information.
Weilbach (2014) focused on dening the problem and then setting out to perform a com-
prehensive literature study to determine the feasibility and practicality of the proposed
solution. The scope of work presented by Weilbach (2014) is arguably slightly awed, as
is apparent through the following problem statement: “The research problem therefore, is
that current logging eorts in support of IDS and Digital Forensic Readiness (DFR) lack
a concurrent, secure and standardised means to communicate and store mission critical
evidence”. It is clear from this statement that the work is framed as improving IDS and
logging capabilities only. Although not a misguided endeavour, it does somewhat limit
the application of the technology to a very specic use case. The concept introduced
could be more valuable if applied more generically and broadly.
Through conducting an extensive literature study, Weilbach (2014) identied that the
area of study is very niche and that at the time there were not many academic works
on the topic. This scarcity of resources has since subsequently changed with the growing
popularity of blockchain and the research eorts associated with it. The related work of
this very study is testament to the amount of research that has been conducted in the
relevant eld.
Weilbach (2014) proposed a framework and dened a basic protocol for implementing
such a framework using the necessary technology. The protocol draws on related work
in the form of logging standards and practices to dene its own requirements. Although
rudimentary, the protocol does advance the aim of having the solution standardised and
accepted more widely.
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 41
Beel et al. (2016) takes the concept of digital evidence and blockchains a step further by
outlining and implementing an application for mobile phones that allows the user to store
the signature of a video on the Bitcoin blockchain.
In the following extract it is easy to recognise how the work is very relevant to the
conation of blockchain and digital evidence: “The ability to verify the integrity of video
les is important for consumer and business applications alike. Especially if video les
are to be used as evidence in court, the ability to prove that a le existed in a certain
state at a specic time and was not altered since is crucial. This paper proposes the use
of blockchain technology to secure and verify the integrity of video les”. What is notable
here is the reference to admissibility of evidence in a legal context - also highlighted by
Weilbach (2014). This solution is clearly geared toward a completely dierent use case,
but with the same purpose in mind - validating the integrity of digital evidence.
Beel et al. (2016) reinforces the importance of having evidence of which the authenticity
can be proven without a doubt. They continued by proposing a narrowly scoped solution
for securely timestamping the digital signature of a piece of video evidence in the Bitcoin
blockchain. In their work, Beel et al. (2016) also noted the versatility of blockchain
technology and how it can potentially be applied to any domain where: “...a trustless
[sic], anonymous, and tamperproof means of recordkeeping is required”.
Timestamping of evidence, its history, and current applications is discussed in some depth
to provide the reader with the necessary background to recognise the application of
blockchain technology in this scenario. Beel et al. (2016) also explained the use and
usefulness of timestamping from a practical and legal perspective.
Beel et al. (2016) based their work on previous work in the form of a timestamping
service called OriginStamp. This web-based service allows its users to embed hashes of
arbitrary data into the Bitcoin blockchain. The feasibility of providing such a service,
given the estimated high cost of embedding data in the Bitcoin blockchain as seen in
Okupski (2015), is questionable, but Beel et al. (2016) explained that the service is only
possible as it performs one transaction of 1 Satoshi every 24 hours, embedding all hashes
aggregated since the previous transaction.
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 42
Beel et al. (2016) proposed and developed a mobile application that has the ability to
act as a Decentralised Trusted Timestamping (DTT) solution by leveraging the Origin-
Stamp Application Programming Interface (API). Beel et al. (2016) considered that any
blockchain can be used, and that their choice of the Bitcoin blockchain was merely the
result of a lack of mature alternatives at the time of creating the DTT service. In the
future work section of this paper there is a discussion about alternative applications of
the solution proposed by Beel et al. (2016) that ranges from timestamping police body
camera footage to CCTV and other aerial footage.
They also address the issue of admissibility in court by noting that, at the time of writing,
no precedent has yet been established regarding the obligation of courts to recognise the
evidence and its validity through DTT. Although the jurisdiction of the two instances
certainly dier, the recent ruling that gives legal binding status to smart contracts and
blockchain signatures in the American state of Arizona as noted by Campbell (2017),
might indicate a shift in a positive direction for the acceptance of DTT practices around
the world.
The alternative application of the technology, as noted in Beel et al. (2016), is recognised
in this work, but it is proposed that the application of blockchain technology can be even
further generalised to suit a wide array of use cases. The solution by Beel et al. (2016), is
elegant and certainly uses the blockchain as intended in this work, but in a more limited
scope.
Finally, the aptly titled: “Securing Digital Evidence Information in Bitcoin” by Wijaya
and Suwarsono (2016) is examined as it directly addressed some of the concerns shared
between the elds of digital forensics and blockchain technology.
Shortly after an introductory Bitcoin explanation, the discussion moves on to tax fraud
and tax fraud investigations, where Wijaya and Suwarsono (2016) noted that digital
evidence is often part of tax fraud investigations and that procedures exist for ‘borrowing’
this digital evidence from alleged perpetrators. They also noted that a letter describing
this ‘borrowing’ of data was prepared and given to both the investigating authority and
the subject of the investigation. On the contents of this document Wijaya and Suwarsono
(2016) noted: “The digital data itself will be represented as hash values. The appendices
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 43
and the ocial letters are then signed by both parties: the tax investigators and the
taxpayers. Both of the parties keep copies of the ocial letters.”. From this it is clear
that a possible use case for blockchain technology is developing.
Wijaya and Suwarsono (2016) continued by highlighting that the creation and distribu-
tion of this letter relies heavily on a trust-based system where there is currently ample
opportunity for illegal modication of the data and that, as yet, no procedure exists for
resolving a dispute or discrepancy between the copies of these letters and the data.
Wijaya and Suwarsono (2016) proposed a system where evidence hashes are incorporated
into Bitcoin transactions and then embedded into the blockchain as a method of preserving
and timestamping the evidence. This method would give an observer the ability to verify
that the evidence they possess matches a hash of the original evidence. Wijaya and
Suwarsono (2016) notes that in this use case, the purpose of the embedded hash is simply
to prove that some arbitrary data existed at some point in time.
Wijaya and Suwarsono (2016) then oered a new perspective on the previously mentioned
use case by proposing that the two parties both sign the data in question in the pursuit of
non-repudiation. This method, unlike others observed before, relies on being able to link
an identity to the embedded hash; for that to be possible both parties need to intrinsically
link their identities to the hash by signing it with their public/private key pair.
The topic of economic feasibility is also addressed by Wijaya and Suwarsono (2016), when
they calculated the cost for a single such transaction to be 10 000 Satoshi. Since two
separate transactions need to be made, the cost of this scheme would be 20 000 Satoshi
or (approximately USD 0.22 at the time of writing).
This work yet again highlighted how blockchain technology can be applied to solve prob-
lems of trust and integrity across a range of dierent scenarios.
A particularly relevant alternative application has been maturing over the last few years
since Araoz and Ordano (2013) created the Proof of Existence (PoE) service. This service
is said to have been the pioneer of what is now referred to as blockchain timestamping
services (Wayne et al., 2016).
The PoE service allowed a user to prove that some data existed at a certain point in
time by embedding a hash of that data into the Bitcoin blockchain. As the creator of the
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 44
service noted on their website (Araoz and Ordano, 2013), the service aims to solve three
problems:
• Document timestamping
Todd (2016b) explained four relevant use cases for the OpenTimestamps service (OTS),
and discussed some of the benets and challenges associated with these use cases.
The rst use case Todd (2016b) explains, is very closely aligned with the purpose of this
work and relates to record integrity. Specically, Todd (2016b) notes that having an
immutable source of timestamped logging data in the aftermath of a malicious network
intrusion can streamline the investigation signicantly. By having these timestamped
records, logs, or backups, it can easily be conrmed if the data has been altered and
enables the ability to narrow the focus of the investigation to a reduced time window.
Todd (2016b) does provide a caveat to that statement by saying that this specic use
case for timestamping data would not be benecial if it is not known when such an
intrusion occurred.
A second use case explained by Todd (2016b) is software signing and PGP, where they
noted that timestamped software signatures embedded in the Bitcoin blockchain can serve
as a historical record to validate, possibly expired, software signing proofs against.
The third use case is evidence authenticity, which again, aligns very narrowly with the
purpose of this work. Todd (2016b) explained this use case by way of an example involving
a website. If a website’s content were to be hashed and timestamped at a point in time,
it could be proven with certainty that some content existed on that website at that point
in time. This could then be compared to a hash of archived content to validate the exact
content on the website at that time. This has application in the legal realm as it could
be used in lawsuits involving copyright and content distribution.
The fourth and nal use case involved ownership. Todd (2016b) specically noted that
the initial motivation for the OTS project was to prove ownership and provenance. By
keeping a record of deeds, titles, or sales receipts for high-value goods on the immutable
blockchain, ownership and provenance can be veried and tracked over time. Again,
Todd (2016b) highlighted a potential shortcoming of this use case when they noted that
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 45
a timestamped record does not prove the validity of items like a sales receipts; it merely
limits the scope of potential fraud by giving investigators a narrower timeline to investigate
within.
Another use case, as noted by Beel et al. (2016), is the DTT service mentioned in Section
2.3.4. Beel et al. (2016) proposed a scenario where video evidence of a trac accident or
incident is submitted to the DTT service in order to verify its existence and integrity in
some future process concerning insurance or legal proceedings. Importantly, Beel et al.
(2016) noted that: “Currently, there is no simple, cost-eective and automated method
available to consumers to prove that video footage was not tampered with after a specic
point in time. If the authenticity of a video le is contested, the status quo requires
testimony of witnesses, or the hiring of experts to verify that the digital le has retained its
integrity”. This was reinforced shortly thereafter when referencing an incident involving
forged satellite imagery and the downing of Malaysia Airlines Flight 17 (MH17) over
the Ukraine. In this incident, it took almost two years to determine that the footage in
question was tampered with. If DTTs were commonly used and accepted, this detection
of tampering could have been near-instant.
Proof of Existence
The service is hosted as a web-based API that can be posted to from a browser or using
a Command Line Interface (CLI) tool. The service was a major step forward in the
development of blockchain-based timestamping and notarization services.
On the basis of the PoE model, similar services were developed, and today there are
multiple services with a variety of dierent oerings and interaction models. Although
the overall principle remains the same, these services vary in terms of size, cost, and
performance; all of which have since been greatly improved upon. Additionally, as a
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 46
result of the popularisation of such services, a protocol for interacting with blockchain-
based timestamping services has been developed. In the following section, these services
and other advances will be discussed in more detail. Table 2.4 gives a quick overview of
such services and some distinguishing properties of each.
From this comparison table, it can be seen that only OTS is a service and protocol and
is open source, making it an exceedingly suitable candidate for further analysis and a
possible candidate for use in this work.
OpenTimestamps
The OTS service consists of server-side and client-side components that interact to per-
form the timestamping of data as well as validate existing timestamps for which receipts
have been received. The client-side component takes some arbitrary data as input, hashes
it, incorporates that hash into a predened structure and submits it to the server-side
component via remote procedure call (RPC). The server-side components then take the
data and incorporate it into a Bitcoin transaction and submits that transaction to be
processed into the Bitcoin blockchain. The server then sends a OTS proof back to the
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 47
client and the client can, from that point onward, use that proof to verify the timestamp
and the integrity of the data by performing another RPC call.
Todd (2016b) noted that the service has three distinct advantages over other timestamping
services:
• Trust: By using the public Bitcoin blockchain it eliminates the need to use third
parties or authorities to notarise data.
• Cost: OpenTimestamps scales by being able to create timestamps for vast amounts
of data using a single low-value Bitcoin transaction.
Based on the above characteristics, the OTS oering appears very appealing in the current
research domain.
Todd (2016b) explains that in the OTS system, the Bitcoin blockchain acts as notary as
it aords users thereof the ability to create and verify both the integrity of a document
and the approximate date at which it must have existed. OTS allows any participant
to submit the hash of an arbitrary piece of data to be embedded in a transaction in the
Bitcoin blockchain and to timestamp that document hash on the blockchain by using
the nTime block header eld. The accuracy of such a time stamp is estimated by Todd
(2016b) to be within two to three hours of the submission date and time. Since the
nTime eld is tightly coupled with the other block header elds containing the hash of
the document, there is an inherent link between the data and the time, allowing any
observer to verify that some arbitrary data existed at a specic time in the past.
Todd (2016b) noted that OTS also uses, what they term ‘commitment operations’. A
commitment operation can be any function that alters the function input to produce a
deterministic output. A simple concatenation function such as a ‖ b = ab is an example of
a commitment operation. In OTS, the verication of an OTS timestamp is the execution
of the sequence of commitment operations and the comparison of the output to the value
stored on the Bitcoin blockchain. OTS timestamps can therefore be said to be trees of
operations with the root being the message, the edges (also known as nodes) being the
commitments, and leaves being the attestations. Some terminology - root, node and leaves
- were discussed previously in Section 2.2.3. The usage of these terms is not coincidence
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 48
but rather as a result of the heavy reliance on MHT to support the OTS functionality, as
discussed below.
Todd (2016b), like many others, recognised the issue of scalability (in terms of constrained
data storage and speed) associated with the Bitcoin blockchain, and like others (Wijaya
and Suwarsono, 2016), made use of various techniques to address these constraints. OTS
primarily makes use of MHTs to address the problem of scalability but also employs other
novel techniques like aggregation- and calendar services.
OTS embeds data on the Bitcoin blockchain by associating it with a Bitcoin transaction;
more specically, by embedding the hash of some known data into the output script
eld (scriptPubkey as noted in Table 2.3) of a transaction as a Bitcoin address. Since a
transaction output has a limited amount of space available it would be impractical to store
the large amounts of data in this eld. Even if that data were hashed, having to create a
transaction for many data sets would become expensive as a result of the fees associated
with the many Bitcoin transactions necessary to accommodate hashes for each data set.
Not only would the cost of these transaction be prohibitive, but large numbers of low value
transactions could have a detrimental eect on the entire Bitcoin network by clogging it
and slowing it down. By using MHTs, OTS can compress large amounts of data into a
single hash by adding individual hashes as leaves of a MHT. These leaves would then be
collapsed into the MHT root which, in turn, is embedded into a Bitcoin transaction. This
aggregation occurs on OTS aggregation servers when the OTS client sends the hash of
the desired data to at least two OTS aggregation servers. These aggregation servers then
collect all of the dierent hashes from dierent OTS clients, uses them as leaves of a MHT
and computes the MHT. This root is in turn embedded into a single Bitcoin transaction.
Once a MHT root for a given set of leaves has been embedded in the Bitcoin blockchain,
verifying any single leaf can be accomplished by simply replaying a subset of commitment
operations with eciency O(log2 (n)) as noted in Table 2.1. Figure 2.6 serves as a visual
example of a series of relevant commitments to be able to prove the integrity and existence
of data in L2 .
Note how, to verify the integrity or the timestamp associated with the data in L2 , only a
subset of leaves or nodes need to be known. This means that many hashes representing
large datasets can be stored within the bounds of the scriptPubkey Bitcoin transaction
header by aggregating these leaves into a MHT. The root of that tree is then stored in a
Bitcoin transaction, and returns only the commitments necessary to follow the commit-
ment path up the tree and to the root. In Figure 2.6, only the blue nodes would need
2.3. DIGITAL FORENSICS AND BLOCKCHAINS 49
The MHT represented in Figure 2.6 can be considered a minimal example and MHTs can
be much larger and complex, resulting in an increased storage eciency as the eciency
is logarithmic (O(log2 (n))).
OTS further makes use of calendar servers to address the issue of speed, since aggre-
gation and embedding into the Bitcoin blockchain may take too long for time sensitive
notarization processes. Calendar servers act as an intermediary, conrming receipt of
an attestation, and committing to having that attestation embedded into the Bitcoin
blockchain at some point. Since calendar servers are completely under the control of the
entity operating it, it does not aord the same assurances as the nal proof but rather
serves as a trade-o between convenience and security. They add the convenience of hav-
ing an immediate proof but lack the immutability and security provided by the Bitcoin
blockchain which takes time. A malicious calendar server cannot steal data since the data
2.4. SUMMARY 50
sent to it is merely a hash of the original data, but one can falsely claim to commit the
attestation to the Bitcoin blockchain on behalf of the aggregation server and then never
do so. This would mean that the attestation would be lost and would have to be recom-
mitted before a proof can be generated and veried. The risk is minimal and by having
multiple calendar servers operational this risk can be mitigated by submitting redundant
attestations.
2.4 Summary
Considering the above previous research on the topic of blockchains and digital evidence,
there is a clear indication that the concept is becoming accepted and that many have
identied how the key properties of blockchain technology can also be used to reinforce
the integrity and validation of digital evidence.
All of the related work discussed above takes advantage of these key blockchain properties
by using them to solve their own problem scenarios. No comment is made on the validity
of one scenario over the other, but what is clear is that in all the examples mentioned there
is a narrowly dened scope of application, be it video, political messages or tax documents.
Conversely, there are very generic services like OriginStamp and OTS that allows the user
thereof to embed the hash of any document in the Bitcoin blockchain. OTS, due to its
open nature, is an ideal candidate to base further research upon. By utilizing an open
standard like OTS, the adoption and acceptance of blockchain timestamping services in
the legal context can be accelerated.
It is clear from the current state of research that there is ample opportunity to build on
current implementations to develop a more generic and formalised approach to creating
and validating digital evidence against an immutable public ledger. Furthermore, formal-
ising such a system may lead to further development and accelerated acceptance of such
systems in various legal jurisdictions.
Chapter 3
Research design
Given the current state of research into timestamping using decentralised trust systems
and the overlap with the practice of hashing and verifying evidence as a proof of integrity,
it seems prudent to explore how the application of blockchain technology can aid or
better these practices. Following the literature review, it is also apparent that the best
candidate technology for this would have to be transparent and open to encourage vetting
and adoption. OTS as a protocol and an implementation of timestamping technology will
therefore be best suited to achieve these research goals.
Apart from OTS as the candidate technology for notarisation and timestamping, the
digital forensics software in which it will be implemented should also be accessible and
open to support the integration of these technologies. Given these requirements, SleuthKit
Autopsy emerges as an ideal candidate for merging these technologies because of its open
source nature and extensibility discussed in Section 3.3.
The goal of this work can therefore be further crystallised by exploring the possibilities of
implementing a new technology, OTS, to in widely used digital forensic tools, Autopsy, and
measuring its eectiveness in aiding and maturing existing practices of evidence integrity
verication.
51
3.2. UNDERSTANDING OPENTIMESTAMPS 52
OTS, as alluded to in the literature review section of this work, is a new and novel
implementation of blockchain technology to facilitate automated notarisation services
without the reliance on a trusted central authority or third party. OTS aims to make
these notarisation services accessible and transparent by using the Bitcoin blockchain to
store and validate proofs.
The OTS timestamp, or proof, is at the core of the OTS protocol. It is the artefact that
enables the verication of a given attestation. To understand what a timestamp does,
it is necessary to rst understand what a timestamp is and what an attestation is. An
attestation, in the context of OTS, is a statement that some information - a logical le
in the case of the current OTS design - existed in a certain state at a certain point in
time. An attestation is, therefore, time-bound and content-specic. An attestation is not
a proof in any form but rather a claim; the authenticity of which is proven by an OTS
timestamp.
The timestamp is a series of operations that, when replayed, provides evidence that the
attestation is true for a particular source le. The source of truth for OTS is the Bit-
coin blockchain, which is demonstrably immutable and chronological as discussed in the
literature review section of this paper.
An OTS proof allows any person or entity in possession of the original le or an exact bit-
by-bit replica thereof, and the timestamp generated from it, to verify two things without
having to trust a third party, namely:
3.2. UNDERSTANDING OPENTIMESTAMPS 53
• That the le’s content remains unmodied from the time the timestamp was created.
OTS utilises the immutability of the Bitcoin blockchain to remove the need for two or
more parties to establish a mutual trust relationship with a third party to verify that
any given attestation and its proof is genuine. It delegates the trust mechanism to the
Bitcoin blockchain, which is inherently public along with all the operations, to support the
attestations made regarding the state of the le. By doing this, the timestamp becomes
immutable and independently veriable to any concerned party looking to verify the
existence integrity and timestamp of a specic le.
The exact operation to create a timestamp will be discussed in more detail in the coming
sections, but for now it is useful to note that creating a timestamp for a le is called
‘stamping’ the le. The resultant timestamp, assuming the stamping process was suc-
cessful, will be the original name of the stamped le with the .ots extension appended to
indicate the OTS timestamp le type.
Below is an example of a complete OTS timestamp after being parsed and presented by
an OTS utility:
1 File sha256 hash: bd7299df8b4c2717650fcfc9f409beffc454e9b7f201eec89f2de4fc0b535882
2 Timestamp:
3 append 0306d4367f450e71cb225b2e922aef94
4 sha256
5 -> append 1c277205d32170fa9ac33ef24a562450
6 sha256
7 prepend 59aee87b
8 append f4f65e1d23c9f037
9 verify PendingAttestation(’https://alice.btc.calendar.opentimestamps.org’)
10 -> append 372bfd2312ba2fb5109987241a229405
11 sha256
12 prepend 6e0bb638b20b762f51f4b63676a7f60665e9b3b85fa4122e950c8d18820871a3
13 sha256
14 prepend 59aee87a
15 append cc8d9dc107815d8f
16 verify PendingAttestation(’https://finney.calendar.eternitywall.com’)
17 -> append f8f182995747b9c9ebce4cb40389cbd4
18 sha256
19 prepend 59aee87b
20 append 6305c4687d0a20c4
21 verify PendingAttestation(’https://bob.btc.calendar.opentimestamps.org’)
3.2. UNDERSTANDING OPENTIMESTAMPS 54
22 sha256
23 prepend 6fdb93f0a5e327a3acd274393961eddff9296cb0866673bf8b4dd4dad673c019
24 sha256
25 prepend 66ff25a8b732bc7b96623b8ac87cab579bc2869aee5cb9d5c0f6dc2ecee41b80
26 sha256
27 append 1f3dde8a14910392f613aaf271d46a840a21999c09c89ed1b962daf68a3578e7
28 sha256
29 append 269d331418a408ced27e3af285f4e44fb08c43283708286eb3f4932f4127ebf5
30 sha256
31 append 263b23d8aa562c2836ba5f4dbc641ac62e4018fb26c1fc31dbce0b595e8c8e0d
32 sha256
33 prepend 14ef2469176146f044e105f86385762d264443ebb98eecc56cb5984457de3972
34 sha256
35 append 3f59c91703dfc511d4977e8729d2a62e86d97303c3d0d10392f3a98cb13b6ec0
36 sha256
37 append de50842022d66983cea6637d78fea5032fe85f154632afa6a3ccd560551a5508
38 sha256
39 append 68b09f62ca7e0c5f7bf430830652dbba03078040e4926b7a6a0c2c0847c87eef
40 sha256
41 prepend 010000000118c8a478cddc58325969e1a409e7f2e4badfe57c12d25de2c8b73
42 b139e6050d10000000000fdffffff025abaf000000000001600140db84d3cb80e3fe685
43 834583d6216d0736bc12660000000000000000226a20
44 append 6e610700
45 # Bitcoin transaction id 853b24b4cb03015c0781543c03710b4ecdb7db2319e511e44a6d27977
↪→ f54895d
46 sha256
47 sha256
48 prepend d08ac122340781dc3507d97df99f2240044ea95f6a8701568d68b34c5167cb18
49 sha256
50 sha256
51 append 74e469a92c2662afa4ba63f6287806fd6af5db5f045ea5260abd4186799bc69e
52 sha256
53 sha256
54 append 66a91ddd3f81448f6c7ffd12be514fb558aa1e4b36bfe84b459111c45eff58bd
55 sha256
56 sha256
57 append ef757837405eb880bf3714316464e3520eac9503a681313d5084ad5c9bb93fd9
58 sha256
59 sha256
60 append fb9e30ce972c56810ebb62cbadb7cb593864354d7f1559665dc9baf7138e1d4a
61 sha256
62 sha256
63 append 12e216e71aa8ac191f3d4194d4010942a16f5d53377c9e8dd01e7420724b00c8
64 sha256
3.2. UNDERSTANDING OPENTIMESTAMPS 55
65 sha256
66 append 8e29aa8c4173bd7f1f0fd71c5305453ff81d7adb9fa709f0edda88f2f8ca375b
67 sha256
68 sha256
69 prepend 7ded30660ea096d34f8f10d1188124c84c01c81c198da47b2b74c305f96b9184
70 sha256
71 sha256
72 append 7a6d3c6ac2f4bd1077f5fe3f26f941cef92e1f8ab3776c196019adfeddda159d
73 sha256
74 sha256
75 prepend 2d7a16cd4a6b108ed6558fd28b195444f77c12d5d4b63275b53fb16c927ac87c
76 sha256
77 sha256
78 verify BitcoinBlockHeaderAttestation(483695)
79 # Bitcoin block merkle root 0f92e50cd5b32fa5c7c851b160daafca524aa9548a1ea7205249f
↪→ c98d5b2014f
Todd (2016b) noted that a timestamp is essentially just a collection of commitment op-
erations that are applied to an input message in a specied sequence, and that replaying
those commitment operations in order is all that is necessary to verify the timestamp.
The basic anatomy of a timestamp can be divided into three main sections:
1. File hash
Each of these vary signicantly in size and complexity, but are equally important to the
nal timestamp verication.
What is seen above is an interpreted textual version of the timestamp commitment oper-
ations, but timestamps are binary data blobs which are not readily legible. Timestamps
are saved in raw binary format to prevent issues with interpretation, encoding, and com-
patibility between systems. In email correspondence with the OTS author (can be seen
in Section B.1), Todd (2017) noted that in a previous version of OTS, timestamps were
not binary blobs, but instead looked similar what is shown below (using a Java Script
Object Notation (JSON) structure):
3.2. UNDERSTANDING OPENTIMESTAMPS 56
1 "ops": [
2 {
3 "Hash": {
4 "input": "13249541def3c688e28a32da9a6f39e2c68442db",
5 "parents": [
6 [
7 0,
8 20
9 ]
10 ],
11 "algorithm": "sha256d",
12 "digest": "49bdaf64146928c7ba30e5a28704e0762a37d53236438b4cd1d831f0568
↪→ b8535"
13 }
14 },
15 ]
Todd (2017) eloborated by explaining that, contrary to what seems obvious, JSON is not
the serialisation format of the timestamp above: the serialisation format of the timestamp
is, in actual fact, a subset of the JSON standard, namely exactly what JSON elements
are allowed in an OTS proof. For textual proofs, it is always necessary to parse the
output of the textual parser itself, which introduces unnecessary complexity and leaves
room for inaccurate interpretation. Since OTS is fundamentally security software, it is
of critical importance that any interpreter completely understand the timestamp without
misinterpretation or ambiguity.
By making the timestamp a raw binary format, OTS achieves two important goals that
make the overall system more secure and trustworthy.
Firstly, it adds fragility to the timestamp interoperation to ensure that the smallest of
misinterpretations would result in a completely invalid timestamp, and would be obvious
to the interpreting system. This is achieved by having a strong 1-to-1 coupling between
every single bit in the serialised timestamp and a component of the mathematical structure
of the timestamp. This results in a system with very little redundancy, which is ideal
for consensus-critical systems where even seemingly insignicant changes between two
timestamps should result in dierent results. Todd (2017) noted: “For security software,
this brittleness is a good thing, as we want incorrect implementations to fail completely,
100% of the time, rather than potentially give inaccurate results.”
Secondly, by storing the timestamp in a raw binary format, OTS reduces the potential
3.2. UNDERSTANDING OPENTIMESTAMPS 57
size of the timestamp since space can be saved by not having a universal and versatile
schema denition and markup associated with formats such as JSON and eXtensible
Markup Language (XML). This also helps with fragility, in that these schema denitions,
of formats like JSON and XML, can be more forgiving and will tolerate some errors
depending on the implementation of the interpreter.
As is clear from the verication output, OTS was able to attest that the source le in
question existed in its current state as of Tuesday, September 5 20:24:51 2017 CEST. It
is important to note that the verication is clear and succinct, and leaves little room for
interpretation and ambiguity, which is ideal when considering the intended use case of
proving integrity of digital evidence.
The date in the timestamp verication conrmation is accurate up to the second, and
the accuracy of this statement will be the subject of much more detailed analysis and
discussion.
Before the subtleties and intricacies of OTS is discussed, note that OTS was designed
to be widely adopted and compatible, and has therefore been implemented in a range of
languages and frameworks to ensure its continued adoption and development.
As can be seen from the Github repository at Opentimestamps (2017), the OTS protocol
has a wide range of dierent implementations. These are:
• opentimestamps-client: The OTS client component to create and verify OTS proofs
in Python
• python-opentimestamps: The core OTS libraries used by both server and client
components in Python
3.2. UNDERSTANDING OPENTIMESTAMPS 58
The focus of this section will, for the sake of simplication, centre on the most recent
Python implementations of OTS at the time of writing (opentimestamps-client-v0.5.0,
python-opentimestamps v0.1.0 and opentimestamps-server-v0.1.2).
Before exploring the lifecycle of an OTS proof, it is prudent to rst obtain a better
understanding of the functionality extended to the user of the opentimestamps-client. By
looking at the functions and seeing how to invoke them and what output they produce,
the reader will get a much clearer idea of how the protocol is intended to be used.
A discussion about the setup procedure and conguration of the OTS client-side compo-
nents are outside the scope of this work and is very well documented in the code repository
of the latest versions of the respective components. The setup process is automated and
presents a low barrier to entry for less tech-savvy users. The functions shown will be
invoked via the OTS CLI interface, but it is worth noting that these same commands can
be wrapped by a Graphical User Interface (GUI) utility for users who prefer such a tool.
The underlying functions, however, remain the same.
To illustrate the various OTS functions, a simple test text le was created. This le is
called testots.txt. The content of the le will be a short sentence: “This is a test le.”
1 user@host:~/otsdemo$ cat testots.txt
2 This is a test file.
3.2. UNDERSTANDING OPENTIMESTAMPS 59
By invoking the OTS client without supplying any parameters, the following usage guide
- which indicates that the Stamp function can be invoked with the s argument, followed
by the le to be stamped - is produced:
1 user@host:~/otsdemo$ ots
2 usage: ots [-h] [--version] [-q] [-v] [-l URL] [--no-default-whitelist]
3 [--cache CACHE_PATH | --no-cache]
4 [--btc-testnet | --btc-regtest | --no-bitcoin] [-w]
5 [--socks5-proxy SOCKS5_PROXY] [--bitcoin-node BITCOIN_NODE]
6 {stamp,s,upgrade,u,verify,v,info,i,git-extract} ...
There is also a more detailed version of the help function that can be invoked by calling
OTS with the -h ag, which produces the following output:
30 immediately.
31 --socks5-proxy SOCKS5_PROXY
32 Route all traffic through a socks5 proxy, including
33 DNS queries. The default port is 1080. Format:
34 domain[:port] (e.g. localhost:9050)
35 --bitcoin-node BITCOIN_NODE
36 Bitcoin node URL to connect to (defaults to local
37 configuration)
38
39 Subcommands:
40 All operations are done through subcommands:
41
42 {stamp,s,upgrade,u,verify,v,info,i,git-extract}
43 stamp (s) Timestamp files
44 upgrade (u) Upgrade remote calendar timestamps to be locally
45 verifiable
46 verify (v) Verify a timestamp
47 info (i) Show information on a timestamp
48 git-extract Extract timestamp for a single file from a timestamp
49 git commit
For the sake of detail when demonstrating the dierent function calls, all calls are made
with the -v ag set, which enables verbose output.
The Stamp operation, which is logically the rst operation a user of OTS would perform,
invokes the Stamp function which produces the timestamp that can later be veried.
Calling the Stamp function is depicted below and can be done by invoking OTS with the
s subcommand:
1 user@host:~/otsdemo$ ots -v s testots.txt
2 Doing 2-of-3 request, timeout is 5 seconds
3 Submitting to remote calendar https://a.pool.opentimestamps.org
4 Submitting to remote calendar https://b.pool.opentimestamps.org
5 Submitting to remote calendar https://a.pool.eternitywall.com
6 1.66 seconds elapsed
This function produces the initial timestamp and saves the testots.txt.ots le. It is impor-
tant to note that this is an initial, or incomplete, timestamp and that there are further
actions to be taken to make it a complete timestamp.
1 user@host:~/otsdemo$ ls -l
2 total 8
3 -rw-rw-r-- 1 user user 21 Oct 3 19:05 testots.txt
3.2. UNDERSTANDING OPENTIMESTAMPS 61
The second function is Info, which parses the timestamp and displays information about
it. The Info function can be executed by using the i subcommand and supplying the
timestamp le as input argument:
1 user@host:~/otsdemo$ ots -v i testots.txt.ots
2 File sha256 hash: 649b8b471e7d7bc175eec758a7006ac693c434c8297c07db15286788c837154a
3 Timestamp:
4 append b148f67dd8c0081046b196cb5aa8dcc2 == 649b8b471e7d7bc175eec758a7006ac693c434c8297
↪→ c07db15286788c837154ab148f67dd8c0081046b196cb5aa8dcc2
5 sha256 == bd2c8ac682b8ed4b544ddd29ce229ef42479162b1ff14cdd51c653601600b40b
6 -> append 4bc7414f9b79e5b2a9699a15449f79d8 == bd2c8ac682b8ed4b544ddd29ce229ef42479162
↪→ b1ff14cdd51c653601600b40b4bc7414f9b79e5b2a9699a15449f79d8
7 sha256 == 04b5b76515735a80be9b465887d2f83d5423bf5e3540741e6e1ca5be78728e93
8 prepend 59d3cd30 == 59d3cd3004b5b76515735a80be9b465887d2f83d5423bf5e3540741e6e1ca5b
↪→ e78728e93
9 append 68a95ad6aeade4bd == 59d3cd3004b5b76515735a80be9b465887d2f83d5423bf5e3540741
↪→ e6e1ca5be78728e9368a95ad6aeade4bd
10 verify PendingAttestation(’https://finney.calendar.eternitywall.com’)
11 -> append 782372bb88a18335daf2a8e596338454 == bd2c8ac682b8ed4b544ddd29ce229ef42479162
↪→ b1ff14cdd51c653601600b40b782372bb88a18335daf2a8e596338454
12 sha256 == 15bc3053e59447106c4b233c36336389e7e6b5ee7a625121313c7a0b0ebbc75e
13 prepend 59d3cd30 == 59d3cd3015bc3053e59447106c4b233c36336389e7e6b5ee7a625121313c7a0
↪→ b0ebbc75e
14 append 50d3d70950797d54 == 59d3cd3015bc3053e59447106c4b233c36336389e7e6b5ee7
↪→ a625121313c7a0b0ebbc75e50d3d70950797d54
15 verify PendingAttestation(’https://alice.btc.calendar.opentimestamps.org’)
16 -> append e97b3ae7e7270e1778c5e596cc440842 == bd2c8ac682b8ed4b544ddd29ce229ef42479162
↪→ b1ff14cdd51c653601600b40be97b3ae7e7270e1778c5e596cc440842
17 sha256 == d2146113dcd4fbdc95b3f4cb984d59a78ead957d07bca92ddc8a0439dc4aa5ff
18 prepend 59d3cd30 == 59d3cd30d2146113dcd4fbdc95b3f4cb984d59a78ead957d07bca92ddc8
↪→ a0439dc4aa5ff
19 append 75cc0807f01f2591 == 59d3cd30d2146113dcd4fbdc95b3f4cb984d59a78ead957d07bca92d
↪→ dc8a0439dc4aa5ff75cc0807f01f2591
20 verify PendingAttestation(’https://bob.btc.calendar.opentimestamps.org’)
Once the timestamp attestation has been generated, it takes some time for it to be
incorporated into the Bitcoin blockchain by OTS (this process will be discussed in more
detail in the following section). Once this happens, the timestamp needs to be upgraded
to reect this commitment to the Bitcoin blockchain to form the nal timestamp. The
Upgrade function can be run by invoking OTS with the u subcommand:
3.2. UNDERSTANDING OPENTIMESTAMPS 62
As is clear from the output of the Upgrade function, the timestamp is now complete and
ready to be veried. Similar to previous functions, Verify can be called by invoking OTS
with a subcommand, in this case v :
1 user@host:~/otsdemo$ ots -v v testots.txt.ots
2 Assuming target filename is ’testots.txt’
3 Hashing file, algorithm sha256
4 Got digest 649b8b471e7d7bc175eec758a7006ac693c434c8297c07db15286788c837154a
5 Attestation block hash: 00000000000000000031944aee9496e6c77f909508b797b19b9f6a662a6
↪→ e6996
6 Success! Bitcoin attests data existed as of Tue Oct 3 20:15:45 2017 CEST
The above examples cover the most basic of OTS functionality and the logical order in
which functions can be executed to generate and verify an OTS timestamp. This is not
to say that this sequence will always be followed or that the results will always be the
same. OTS can, for instance, be congured to submit le hashes to custom aggregation
servers, using proxies, using a local cache, using testnet (the Bitcoin test network) etc.
In the above examples, no custom conguration was done and OTS was executed with
the default conguration in place.
3.2. UNDERSTANDING OPENTIMESTAMPS 63
To achieve its functional goal, OTS relies on multiple dierent components, each built on
various technologies. OTS was designed to strike a careful balance between ease-of-use
and dependencies on systems outside the control of the user.
Due to the nature of OTS and its focus on trust, any system that is not the Bitcoin
blockchain or the end-user system, introduces a level of uncertainty and potential risk
into the OTS timestamp system. Simultaneously, OTS tries to be simple to congure
and uses a system to encourage usage; this necessitates that highly technical components
can be abstracted and performed on behalf of the user to preserve the user experience.
This abstraction leads to the introduction of other systems into the OTS lifecycle. It is,
therefore, important that an exploration these systems is undertaken to understand how
they impact the trust placed in an OTS timestamp.
Trust domains - a logical boundary which denotes where a party’s control of a particular
system begins and ends - are used to better explain OTS components. Recall that OTS
attempts to provide easy and trustworthy proofs by eliminating the need for a verier of a
timestamp to trust a third party as trust becomes more fragile as more and more parties
are added to the trust chain. It is worth noting then that the failure of any one party
will cause the complete trust chain to be broken. This is why OTS attempts to limit the
number of systems to trust to the user themselves and the Bitcoin network; essentially
two trust domains.
We’ll designate three trust domains for explaining various OTS components:
Ideally, instances where OTHER is trusted needs to be avoided where possible. In cases
where OTHER cannot be avoided, it is essential to understand how OTHER functions,
what protection it provides, and what degree of trust can safely be placed in OTHER
without completely compromising the trust of the OTS timestamp.
The OTS client is one of the main components in the SELF trust domain, as it is
controlled by the user and runs on systems under their control. The libraries and code
3.2. UNDERSTANDING OPENTIMESTAMPS 64
embedded in the OTS client to interact with the Bitcoin blockchain are therefore also
included in SELF trust domain.
The Bitcoin network is the only other essential and necessary component of OTS and
resides in the BTC trust domain. This domain is considered trustworthy in as far as the
Bitcoin network is trusted, underpinned by the resiliency and trust mechanisms which
have been discussed previously.
Calendar servers are the only other signicant OTS component that potentially fall
within the OTHER trust domain. Calendar servers are used to centralise, simplify, and
speed up the creation of timestamps at the cost of delegating some trust to the OTHER
domain. These are used to provide aggregation services, blockchain interactions services
and attestation services for users who choose to, or cannot, run these services locally.
Note that the use of calendar servers is not required and that OTS, if congured to do
so with the installation of the necessary Bitcoin services, can directly interact with the
Bitcoin blockchain to create and verify timestamps.
Calendar servers are not necessarily in the OTHER domain since they can be run privately
by the user if they choose to centralise the aggregation and blockchain interaction within
the SELF trust domain. Think of a company providing OTS calendar servers as part of
a private OTS notary service.
The default OTS conguration, as used for illustrative purposes in this work, relies on
three public calendar servers:
• https://a.pool.opentimestamps.org
Alias: https://alice.btc.calendar.opentimestamps.org
• https://b.pool.opentimestamps.org
Alias: https://bob.btc.calendar.opentimestamps.org
• https://a.pool.eternitywall.com
Alias: https://finney.calendar.eternitywall.com
These public calendar servers are maintained by the creators of OTS and are used by
default in OTS to allow the easy creation of OTS timestamps by foregoing the need
for the user to install, congure and maintain a local instance of the necessary Bitcoin
software to interact with the blockchain. The installation and maintenance of a full local
Bitcoin node can be a daunting task to potential users of OTS, and thus is delegated away
3.2. UNDERSTANDING OPENTIMESTAMPS 65
from the user and presented as a service in the form of calendar servers. The complexities
of conguring, maintaining, and securing a full Bitcoin node is not within the scope of
this work.
By using a combination of the dened trust domains and the technology dependencies of
OTS to be able to perform timestamps, three distinct congurations (A, B and C) are
dened, two of which can be considered fully-trusted (Only SELF and BTC trust domains
involved) and the other semi-trusted (SELF, BTS and OTHER trust domains involved).
These are illustrated in Table 3.1.
Conguration A, being fully trusted, is depicted in Figure 3.1. This conguration requires
that user install and run the necessary Bitcoin software on the local environment to enable
the OTS client to interact directly with the Bitcoin network.
The conguration depicted in Figure 3.1 would require increased eort to congure and
run, as all the components would have to be installed by the user. Additionally, this
conguration would also carry a cost to the user, since they would be responsible for the
3.2. UNDERSTANDING OPENTIMESTAMPS 66
transaction fees required to perform the Bitcoin transaction. It is therefore implied that
the user would have to have a Bitcoin wallet and a positive Bitcoin balance to successfully
interact with the Bitcoin network.
Conguration B, also being fully trusted, is depicted in Figure 3.2. This conguration
extends the functionality of Conguration A outside the scope of the local system by
using a private calendar server. This conguration requires that users install and run a
calendar server, as well as install and run the necessary Bitcoin software on the calendar
server to enable the OTS client to interact with the Bitcoin network.
By using Conguration B, multiple OTS clients in the SELF trust domain can create and
upgrade timestamps without each having to install and run the required Bitcoin services.
As with Conguration A, Conguration B would require more eort and skill to congure
and maintain while also carrying a cost, in the form of transaction fees, for performing
Bitcoin transactions.
similar to B in terms of the required components, the only design change is the fact that
the calendar server moves from the SELF to the OTHER trust domain.
By using these public calendar servers, the OTHER trust domain is included in the
complete trust chain, and therefore can be considered to be the least trustworthy use case
for OTS. It was thought prudent to discuss this conguration, as any other conguration
that does not make use of public calendars will be inherently be more trustworthy, and
will therefore only increase the condence level of the OTS timestamp. Essentially, from
a trust and complexity perspective, the worst case scenario for OTS is evaluated. OTS
strikes a careful balance between usability and trust, by giving the user the choice of
placing their trust only in themselves and the Bitcoin blockchain, or delegating some
trust to external OTS systems not controlled by them.
The lifecycle of an OTS timestamp depends heavily on the OTS conguration, since it
will determine which systems come into play to create and verify the timestamp. Going
3.2. UNDERSTANDING OPENTIMESTAMPS 68
• OTS client: For creating and validating the timestamp and interacting with the
public calendar servers.
The above mentioned Bitcoin node can be a pruned node. A pruned node is a node which
can function without storing the complete blockchain history with all blocks. A pruned
node works by keeping a congurable cache of the latest blocks (specied in MB), thus
saving space (Bitcoin Foundation, 2016).
• Public calendar server(s): For timestamp aggregation and interacting with the Bit-
coin network.
• Bitcoin network: For storing the data that enables the OTS proof mechanism.
Using the same le (testots.txt) as in the previous example, a detailed description of the
processes and systems involved in each of the core OTS functions is given below.
Stamp
When stamping a le, the OTS client generates a SHA256 hash H of the target le.
A MHT is constructed with H to produce a (MR). In the case of a single le being
timestamped the values of H and MR will be the same, since a MHT with only one value
will be the value of the only leaf. If multiple les are timestamped at the same time,
the OTS client performs a round of local aggregation by constructing a MHT from the H
values of all the les being timestamped to produce a value for MR.
When calculating the MR value, the OTS client appends a random nonce n to the H
value of each le. The purpose of this nonce is to preserve privacy, since the MR will be
3.2. UNDERSTANDING OPENTIMESTAMPS 69
sent to an untrusted public calendar server. The nonce process will be explained in more
detail later.
Once the MR value has been derived, an OTS RPC call is made to all the supplied calendar
servers supplying the hexadecimal encoded string MR value to the digest endpoint. This
call is a REST-based web service call over HTTPS and would look similar to the below:
https://nney.calendar.eternitywall.com/digest/59d3cd3004b5b76515735a80be9b465887
d2f83d5423bf5e3540741e6e1ca5be78728e9368a95ad6aeade4bd
Once the calendar server receives the MR value it performs some validation on the length
and structure of the MR value. Upon completion of the validation, the calendar server
then performs its own aggregation function by incorporating the MR value into another
MHT with all the MR values received from other clients. As mentioned before, this is
necessary to make the solution scalable and keep costs low by aggregating many hashes
into a single MHT, the MR of which will be embedded into a single Bitcoin transaction
as an OP RETURN opcode.
Depending on the extent of local and remote aggregation, OTS eectively creates nested
MHTs as illustrated in Figure 3.4 where the root of one MHT becomes a leaf in a higher
order MHT. This can theoretically be done an innite number of times to create a single
MR from an innite number of leaves.
Since the calendar server might take some time to aggregate other timestamps and com-
plete the Bitcoin transaction and wait for it to be veried on the blockchain, it cannot
synchronously provide the complete proof because the complete timestamp does not yet
exist. In lieu of the complete timestamp, the calendar server returns a reduced timestamp
which is essentially a commitment that it guarantees it will incorporate the submitted
timestamp into a future transaction and return a full timestamp at that point. This is
one of the primary examples where trust is placed squarely in the OTHER domain. A
malicious calendar server may provide a commitment but discard the timestamp.
It is for this reason that OTS allows the user the ability to submit to multiple calendar
servers at the same time while specifying that m of n calendars should return a positive
commitment before considering the timestamp submitted. A user also has the ability to
provide a whitelist of calendar servers that will be used by the client. If none of those
3.2. UNDERSTANDING OPENTIMESTAMPS 70
calendars are available, or if the m of n minimum is not met, the timestamp will be
considered failed.
Once the incomplete timestamp is received from the calendar server, the OTS client
saves the timestamp to the same directory as that of the original le. The retuned
timestamp will contain the relevant commitment operations and timestamp identier for
each calendar server that committed to submitting the timestamp. This commitment by
the calendar server can be seen in the output below:
↪→ e78728e93
9 append 68a95ad6aeade4bd == 59d3cd3004b5b76515735a80be9b465887d2f83d5423bf5e3540741
↪→ e6e1ca5be78728e9368a95ad6aeade4bd
10 verify PendingAttestation(’https://finney.calendar.eternitywall.com’)
11 -> append 782372bb88a18335daf2a8e596338454 == bd2c8ac682b8ed4b544ddd29ce229ef42479162
↪→ b1ff14cdd51c653601600b40b782372bb88a18335daf2a8e596338454
12 sha256 == 15bc3053e59447106c4b233c36336389e7e6b5ee7a625121313c7a0b0ebbc75e
13 prepend 59d3cd30 == 59d3cd3015bc3053e59447106c4b233c36336389e7e6b5ee7a625121313c7a0
↪→ b0ebbc75e
14 append 50d3d70950797d54 == 59d3cd3015bc3053e59447106c4b233c36336389e7e6b5ee7
↪→ a625121313c7a0b0ebbc75e50d3d70950797d54
15 verify PendingAttestation(’https://alice.btc.calendar.opentimestamps.org’)
16 -> append e97b3ae7e7270e1778c5e596cc440842 == bd2c8ac682b8ed4b544ddd29ce229ef42479162
↪→ b1ff14cdd51c653601600b40be97b3ae7e7270e1778c5e596cc440842
17 sha256 == d2146113dcd4fbdc95b3f4cb984d59a78ead957d07bca92ddc8a0439dc4aa5ff
18 prepend 59d3cd30 == 59d3cd30d2146113dcd4fbdc95b3f4cb984d59a78ead957d07bca92ddc8
↪→ a0439dc4aa5ff
19 append 75cc0807f01f2591 == 59d3cd30d2146113dcd4fbdc95b3f4cb984d59a78ead957d07bca92d
↪→ dc8a0439dc4aa5ff75cc0807f01f2591
20 verify PendingAttestation(’https://bob.btc.calendar.opentimestamps.org’)
Once this has been performed the Stamp process is complete, albeit with a reduced or
incomplete timestamp.
Info
The simplest of all the OTS functions is the function which takes any timestamp as input,
parses the commitment operations contained within it and presents them in a legible way
to the user.
This function is useful if there is a need to see the commitment operations of a particular
timestamp or to see if the timestamp is correctly formatted, as any small change in the
timestamp will result in a complete parsing failure. The Info function can also be used
to determine if a timestamp is complete or if an upgrade request needs to be sent to the
3.2. UNDERSTANDING OPENTIMESTAMPS 72
calendar server to retrieve the complete timestamp; it also operates only locally in the
SELF trust domain.
The Info function does not perform any verication of the commitment operations of the
timestamp, but only the integrity of the structure of the timestamp.
Upgrade
The Upgrade function attempts to upgrade any given incomplete timestamp to a complete
timestamp by requesting the complete timestamp from the relevant calendar server(s). A
complete timestamp is a timestamp that is locally veriable without the need to contact
a calendar server.
Similar to the Stamp function, the Upgrade function needs to interact with a calendar
server in the OTHER trust domain, as only the calendar server has the ability to interact
with the Bitcoin blockchain. The mechanism for interacting with the calendar server is
also very similar, to the digest call, and is performed via an OTS RPC call over HTTPS
to a REST endpoint called timestamp:
https://nney.calendar.eternitywall.com/timestamp/59d3cd3004b5b76515735a80be9b465
887d2f83d5423bf5e3540741e6e1ca5be78728e9368a95ad6aeade4bd
If the timestamp has been completed by the calendar server, the complete timestamp is
returned synchronously to the OTS client as a downloadable binary .ots le. Once the
OTS client veries the structure of the timestamp, it proceeds to create a backup of the
original incomplete timestamp before appending the .bak extension to it, and merging the
complete timestamp into the existing .ots le. The OTS client also conrms in the CLI
that the timestamp has been upgraded and that it is now a complete timestamp which
can be validated locally if a Bitcoin node is present; it then no longer requires interaction
with the calendar server.
In the case where an upgrade request is made to a calendar server and the timestamp
is not yet complete or was not found on the calendar server, the appropriate message is
returned synchronously to the OTS client. Incomplete but found timestamps can again
be requested at a later stage by the OTS client.
3.2. UNDERSTANDING OPENTIMESTAMPS 73
Verify
Verication is the nal OTS function, and provides an OTS user the most value by val-
idating the saved timestamp through replaying its commitment operations and verifying
the result against the state of the current le. Since it is essential that a very good un-
derstanding of how this verication works is obtained, a portion of a manual verication
based on the commitment operations contained in the timestamp is conducted.
It is important to note that verication does not necessarily require any interaction with a
calendar server if the timestamp has been upgraded. Since verication is such a sensitive
and critical operation, OTS was designed in such a way as to ensure it does not require
interaction with the OTHER trust domain.
Verication does require that the OTS client be able to query the Bitcoin blockchain for
block headers, since the timestamp ultimately points to the block header which contains
the transaction which contains the MR derived from the le hash. Verication is per-
formed between the OTS client (SELF) and the Bitcoin blockchain (BTC), by using a
locally running Bitcoin node. In the scenario where access to a local Bitcoin node or one
in the SELF domain is not possible, the timestamp can still be veried by contacting
the calendar server, however that necessarily weakens the proof as the OTHER domain
is involved in attesting to the validity of the timestamp.
Below is an example of a complete timestamp which is locally veriable after being parsed
in verbose mode via the Info function:
Note the signicant dierence in size and complexity between an incomplete timestamp
in Listing 3.2, and the complete timestamp in Listing 3.3. This size dierence is a direct
result of the Upgrade function, since the entire timestamp and all relevant commitment
operations have been retrieved from the calendar server. This would include commitment
operations for local aggregation, calendar server aggregation, and the Bitcoin transaction
itself.
Also note that there are still three distinct commitments starting on lines 6, 11 and 79 in
Listing 3.3. This is because the initial timestamp was submitted to three calendar servers
as a redundancy mechanism, and that the complete timestamp was retrieved only from
https://alice.btc.calendar.opentimestamps.org starting at line 11 and ending at
line 78. The complete timestamps were not retrieved from the other calendar servers as
one valid timestamp is sucient to perform local verication.
Below is a step-by-step walkthrough of exactly how this timestamp was veried, and how it
was possible to make the attestation that it did. For the sake of brevity each commitment
operation in the complete timestamp will not be manually reproduced. Rather, select
examples will illustrate how that can be done.
Description: The rst step the OTS client performs is to look up the original le based on
the timestamp name. If the le is found in the same directory, it performs a sha256 hash
of the le. This hash value serves as the starting point for the timestamp verication and
is the rst commitment in a series of commitments.
Manual reproduction:
Description: Due to the privacy concerns of sending the hash of a potentially sensitive
le to an untrusted calendar server, the OTS client appends a 128bit random nonce.
The result of the concatenated le hash and nonce is then hashed again to produce the
value ‘bd2c8ac682b8ed4b544ddd29ce229ef42479162b114cdd51c653601600b40b’
Manual reproduction:
unhexSha256Hex.py used in Listing 3.8 is a small Python script that is necessary for
manual commitment replays. It is necessary to rst convert the textual value back to
raw binary format, perform the hashing operation and then convert the hash back to a
hexadecimal textual value to be displayed in the timestamp. This is done because the
OTS client performs hashing on the raw binary data, which is not printable, and not the
textual hexadecimal representation as shown in the commitment operation.
2
3 hash = sha256(data)
4
5 hexOutput = binaryToHex(hash)
6
7 print(hexOutput)
Description: Using the output from Step 3, the OTS client performs another round of
noncing by appending and prepending values that will serve as indexes to increase the
performance of searching for the timestamp in the local or remote OTS cache.
Notice that these indexing nonces are unique per calendar server.
The result of the noncing is then again hashed to produce the start leaf hash
(‘56367592cd684c5c2a03e71d353173a921f2441c9957cb1bdb926caed295a3’) that
will be aggregated by the calendar server.
Manual reproduction:
Listing 3.11: Manual reproduction of noncing and hashing and add addition of indexing
nonces
1 user@host:~/otsdemo$ { echo "bd2c8ac682b8ed4b544ddd29ce229ef42479162b1ff14cdd51
↪→ c653601600b40b"; echo "782372bb88a18335daf2a8e596338454"; } | tr "\n" " "
2 bd2c8ac682b8ed4b544ddd29ce229ef42479162b1ff14cdd51c653601600b40b782372bb88a18335daf2a8
↪→ e596338454
3
4 user@host:~/otsdemo$ python3 unhexSha256Hex.py bd2c8ac682b8ed4b544ddd29ce229ef42479162
↪→ b1ff14cdd51c653601600b40b782372bb88a18335daf2a8e596338454 1
5 15bc3053e59447106c4b233c36336389e7e6b5ee7a625121313c7a0b0ebbc75e
3.2. UNDERSTANDING OPENTIMESTAMPS 78
6
7 user@host:~/otsdemo$ { echo "59d3cd3"; echo "15bc3053e59447106c4b233c36336389e7e6b5ee7
↪→ a625121313c7a0b0ebbc75e"; } | tr "\n" " "
8 59d3cd315bc3053e59447106c4b233c36336389e7e6b5ee7a625121313c7a0b0ebbc75e
9
10 user@host:~/otsdemo$ { echo "59d3cd315bc3053e59447106c4b233c36336389e7e6b5ee7
↪→ a625121313c7a0b0ebbc75e"; echo "50d3d70950797d54"; } | tr "\n" " "
11 59d3cd315bc3053e59447106c4b233c36336389e7e6b5ee7a625121313c7a0b0ebbc75e50d3d70950797
↪→ d54
12
13 user@host:~/otsdemo$ python3 unhexSha256Hex.py 59d3cd3015bc3053e59447106c4b233
↪→ c36336389e7e6b5ee7a625121313c7a0b0ebbc75e50d3d70950797d54 1
14 56367592cd684c5c2a03e71d353173a921f2441c9ff957cb1bdb926caed295a3
Description: The commitment operations depicted in Listing 3.12 are actions that were
performed on the calendar server. Specically, it is representative of the aggregation
activities on the calendar server, where the hash is incorporated into a MHT with other
submitted hashes as leaves to the MHT.
The more hashes were submitted, and subsequently aggregated by the calendar server in
question, the more commitment operations will be in this portion of the timestamp.
Listing 3.12 shows how the output from Step 3, the leaf hash
(‘56367592cd684c5c2a03e71d353173a921f2441c9957cb1bdb926caed295a3’) is con-
catenated with another leaf hash, and hashed again to produce a new hash that forms
the leaf for the next level in the MHT.
This concatenation and hashing is performed for each commitment operation recorded in
the timestamp until the MR is produced
(‘37fa6f0f61f4fc61da240549f87f5a48b2e633e1be9020fbb2b8ecf747bd67’).
3.2. UNDERSTANDING OPENTIMESTAMPS 79
Description: Once the MR has been calculated, it serves as input to the Bitcoin transac-
tion. More specically, the MR will become the OP RETURN eld (the receiving Bitcoin
address) of the transaction. Listing 3.13 shows how the construction of the Bitcoin trans-
action by the Bitcoin client on the calendar server. The structure of a Bitcoin transaction
was discussed in an earlier section of this paper.
Description: As noted in the section on Bitcoin block and transaction structure, the trans-
actions within a Bitcoin block also form a MHT, since each block contains a number of
transactions from dierent sources. Depicted in Listing 3.14 is the process of creating this
MHT contained in the Bitcoin block. Note how a double sha256 hash is now performed
as per the Bitcoin specication.
These commitment operations in Listing 3.14 are performed by the Bitcoin network and
dened by the Bitcoin protocol in the BTC trust domain at the time of creating the block.
These commitment operations are not performed by the OTS calendar server.
Description: Now that all of the commitment operations in the timestamp have been
replayed from the initial hash of the source le up to the Bitcoin block MR, it is time
to look up and validate the data associated with the Bitcoin block and transaction to
determine the le integrity and timestamp validity.
The OTS client, as depicted in Conguration C, does this by querying the block in question
directly from the local Bitcoin node which would have a copy of all blocks replicated from
the Bitcoin network locally. Once the block data is found, the OTS client parses out the
block date and time, formats it for easy readability and presents it to the user in their
local time zone.
As for the manual proof, a popular web application Blockchain Luxembourg S.A (2017b),
as shown in Figure 3.5, that has a block exploration capability is used, allowing a user
to search based on criteria, such as block number and transaction ID. Obviously, the
OTS client will not perform the same action, as it cannot necessarily trust the data on
Blockchain Luxembourg S.A (2017b). In Figure 3.5, a search based on the transaction
ID recorded in the timestamp in Listing 3.13 is performed. The transaction data can be
located at blockchain.info1
1
https://blockchain.info/tx/9fd255b44d2373d98382a5469bda07862e1c1f6b89b5a4d7750309d958ce8809
3.2. UNDERSTANDING OPENTIMESTAMPS 81
Figure 3.5: blockchain.info lookup of the Bitcoin transaction ID recorded in the timestamp
The result of the search successfully returned a Bitcoin transaction linked to block 488163.
Looking at the scripts associated with this transaction, it can be seen that the root of the
MHT calculated as part of the calendar server aggregation in Step 4
(‘37fa6f0f61f4fc61da240549f87f5a48b2e633e1be9020fbb2b8ecf747bd67’) is present
in the Output Scripts section of Figure 3.5.
This implies that the original le hash that formed a leaf in the MHT is still that same
since the recoded MHT root
(‘37fa6f0f61f4fc61da240549f87f5a48b2e633e1be9020fbb2b8ecf747bd67’) and the
calculated MHT root
(‘37fa6f0f61f4fc61da240549f87f5a48b2e633e1be9020fbb2b8ecf747bd67’) are the
same thus proving the integrity of the le.
Furthermore, looking at the block specic data for block 488163, as shown in Figure 3.6,
it can be seen that the block in question has a timestamp date of 2017-10-03 18:15:45.
Note that this is UTC time. The block data can be found at https://blockchain.info/
block-index/1627658
3.2. UNDERSTANDING OPENTIMESTAMPS 82
Figure 3.6: blockchain.info lookup of the Bitcoin block number recorded in the timestamp
By trusting the Bitcoin network with its inherent integrity and immutability, assurance
is established that this timestamp cannot be forged and that the contents of the block
also cannot be forged or altered. And since the le hash indirectly exists in a conrmed
transaction output script in that block, it is known that the le that produced that hash
must have existed on or before 2017-10-03 18:15:45 UTC. Hence the attestation by the
OTS client in local time:
Success! Bitcoin attests data existed as of Tue Oct 3 20:15:45 2017 CEST
Having a much deeper understanding of how OTS works, it is now possible to discuss
some of the challenges and limitations faced by the protocol and its implementation as
illustrated above in Figure 3.3.
The attestation received by OTS in Listing 3.4 clearly states that the data existed as
of a specic time and date, with up to the second accuracy. As Todd (2016b) noted,
83
this granularity is not necessarily completely accurate up to the second, since the block
header time of a Bitcoin block is considered accurate within two to three hours, depending
on a range of factors; as he explained: “Every Bitcoin block header has a eld in it,
called nTime. For a Bitcoin block to be accepted by the network, the Bitcoin protocol
requires that the eld be set to approximately the time the block was created. Exactly
how accurate that eld must be for a block to be valid, is a complex topic, but for our
purposes it’s fair to say it’ll very likely be accurate to within two or three hours - even if
a sizable minority of Bitcoin miners are trying to create invalid timestamps - and almost
certainly within a day.”
The details of the accuracy of nTime, and the possibility that it could maliciously be
set to an inaccurate time is discussed in detail by Todd (2016a). The possibility of
malicious tampering with the nTime eld within a block is a function of the number of
honest versus dishonest miners that validate the block contents and thus diminishes as the
ration of honest to dishonest miners increases. Todd (2016a) noted that if the majority
of the hashing power is controlled by dishonest colluding nodes, the situation is hopeless
and nTime cannot be trusted, but at that point the entire Bitcoin network can also not be
trusted and it would be immediately apparent to the network. Todd (2016a) also noted
that inaccuracies in nTime can be accidental and non-malicious in nature, and could be
as a result of misinterpretation of daylight savings time or miscongured NTP servers.
In the case of non-malicious nTime however, the probability of correction of the time by
any single honest node is very probable and the scope for inaccuracy is reduced to two to
three hours.
A secondary aspect of accuracy to consider is the lag introduced by the calendar server
and aggregation operations. Because the calendar server aggregates multiple timestamp
and incorporates them into a single Bitcoin transaction, there might be considerable lag
between when a timestamp is submitted and when it is actually incorporated into a
transaction and Bitcoin block. This lag is a side eect of having to rely on a calendar
server for performing the timestamp. The frequency with which the calendar server
submits aggregated timestamps is up to the administrator of that specic server, but more
frequent submissions will result in higher costs, as each individual submission carries a
cost.
As mentioned previously, OTS does make provision for use cases where this aggregation
lag is not acceptable as it allows the user to either submit directly to the Bitcoin network,
if the necessary Bitcoin dependencies are congured locally, or in the form of a local
calendar server which can be congured to submit more frequently. In this use case, the
3.2. UNDERSTANDING OPENTIMESTAMPS 85
user can congure OTS in such a way as to ensure there is no aggregation which will
result in a higher cost.
Clearly the timestamp cannot be accurate up to the second if the accuracy of the block
time is not, because the accuracy of the timestamp is directly tied to the accuracy of
the block time. Furthermore, the timestamp accuracy can be aected by calendar server
aggregation lag which means that a possibly non-negligible amount of time might pass
between the submission and incorporation of a timestamp into a Bitcoin transaction by
a public calendar server.
We can, therefore, conclude that although the timestamp attests with accuracy up to the
second, the timestamp should be interpreted as accurate within two to three hours, and
accurate to within a day in a worst case scenario.
The accuracy of OTS timestamps in the conguration depicted in Figure 3.3 will be the
subject of further detailed scrutiny in a later section of this work.
Opentimestamps (2017) notes that privacy and security was a fundamental consideration
in the design and implementation of OTS. There are, however, a few trade-os between
usability and privacy as noted earlier when making use of public calendar servers. Even
in the case where public calendar servers are used, OTS was designed to protect the
condentiality of the content being timestamped to a very high degree.
This protection comes in the form of noncing, as illustrated in Listing 3.7 and Listing
3.10.
The purpose of this nonce in Listing 3.7 is to add necessary entropy to les of which the
entropy is very low. Because there is no authentication on the calendar server, the le
hash will be in the public domain and therefore its condentiality needs to be protected.
Smaller les, with low entropy, are susceptible to hash brute force attacks where an
attacker that knows the hash can derive the content of the le by brute force guessing
the content of the le, hashing it, and comparing it to the known hash. By appending
a 128bit nonce the OTS client ensures that the hash that is sent to calendar servers has
sucient entropy to make a brute force attack impractical. The technicalities and eort
to brute force attack les with greater than 128bit entropy is outside the scope of this
work.
3.3. OTS AND AUTOPSY 86
Similarly, the noncing performed in Listing 3.10 further preserves the condentiality of the
submitted hash per calendar server it is submitted to. This per-calendar server noncing
ensures that no information that could allow the correlation of interaction with calendar
servers is leaked; this is achieved by looking up the same timestamp identier on multiple
servers. If it was possible to gather this sensitive information, an attacker in possession
of the timestamp identier, which is inherently public, would be able to identify which
calendar servers are being used by a particular OTS client or set of clients.
Opentimestamps (2017) notes that the biggest privacy risk associated with the use of
OTS is the leakage of potentially sensitive metadata: “If you create multiple timestamps
in close succession it’s quite likely that an adversary will be able to link those timestamps
as related simply on the basis of when they were created; if you make use of the timestamp
multiple les in one command functionality most of the commitment operations in the
timestamps themselves will be identical, providing an adversary very strong evidence that
the les were timestamped by the same person.”.
Opentimestamps (2017) notes that although the connections to public calendar server
is intended to be secured, the calendar server makes no attempt at providing privacy
thought mechanisms like authentication or authorisation. Therefore, all calendar server
content should be treated as public and the necessary measures taken to preserve the
condentiality of the le content being timestamped. It should be noted that the public
calendar server congured by default in the version of OTS being discussed in this work
does implement Transport Layer Security (TLS). However, other calendar servers, public
or private, might not.
For the use and adoption of OTS in the digital forensic discipline to increase, it needs to
be implemented in a useful and open way as to allow and encourage scrutiny and further
development. OTS has been illustrated to be a comprehensive tool for timestamping
and verifying the integrity of digital artefacts, like digital evidence. However, OTS in
isolation is less likely to be adopted by digital forensic practitioners if it’s not integrated
into existing tools and processes. Making the use of OTS in digital forensics a success,
depends on how it can be incorporated into existing DF tools for easy adoption.
As noted previously, there is a vast number of both COTS and FOSS forensic tools
available to digital forensic practitioners. SleuthKit Autopsy (Autopsy) is but one of
3.3. OTS AND AUTOPSY 87
these FOSS tools which has a range of desirable properties that make it a good candidate
for integrating OTS. Carrier (2002) noted that SleuthKit is a collection of command
line tools with accompanying libraries that enables the user to analyse disk images and
recover les from them. Autopsy is an easy to use GUI program that enables the user to
eciently analyse hard drives. Importantly Carrier (2002) noted that Autopsy’s plug-in
architecture allows the user to nd and develop third-party modules that can be integrated
into Autopsy to perform a range of tasks. Finally, Carrier (2002) highlighted that Autopsy
has thousands of users worldwide and that there is an active community associated with
it.
Autopsy, therefore, has three key properties that make it an ideal candidate technology
to base further OTS development on:
Being FOSS means that there is no barrier to using and further developing Autopsy or
Autopsy modules. Additionally, it means that Autopsy is open to scrutiny and that any
interested party could review and validate its source code.
The modular architecture is another key benet, as it allows the easy installation and
use of any modules developed for Autopsy. The FOSS nature of Autopsy supports this
open architecture by allowing potential module developers deeper insight into the core
of Autopsy and how to best integrate with it. Being plug-in friendly also implies that
developing, installing and using third party plug-ins should be relatively easy.
Lastly, the active user base and community are a good indication that Autopsy has
stood the test of time, and will enjoy a growing user base of active user and developers
to maintain and progress the platform as requirements and tools advance. This active
community ensure that there are many freely available online resources to guide users and
developers alike (Sleuthkit.org, 2017b) and (Sleuthkit.org, 2017a).
Sabernick III (2016) also noted that the FOSS nature of Autopsy is the main driving
factor behind their use of the framework. Sabernick III (2016) further noted that the
ability to develop Autopsy modules, and then share them with the wider community, is
one of the most valuable aspects of Autopsy with its open and modular architecture.
3.4. OTS TEST DESIGN 88
The combination of the above factors make Autopsy the ideal framework for developing
an OTS module and exposing OTS to the digital forensics community. The research
design will therefore focus on developing an OTS module for Autopsy that would allow
users thereof to easily create and verify OTS timestamps for data sources in Autopsy.
Figure 3.9 shows the proposed use case diagram for the Autopsy plugin. The plugin would
enable the easy creation and verication of a timestamp for a particular le or set of les by
integrating it into the existing software used during such an investigation. Furthermore,
the timestamp could also easily be veried outside of the investigator environment as the
OTS protocol is open and free; enabling all other parties to verify the timestamp given
they have a copy of the le in question and the timestamp created by the investigator.
The timestamp can easily be shared with all parties to an investigation for independent
verication.
OTS was analysed in some detail in the above section. The scope of the analysis was
narrow and focused on showing how OTS works in a single instance for a specic le.
3.4. OTS TEST DESIGN 89
Given the acquired low level understanding of how OTS creates and veries timestamps,
the analysis is expanded to look at evaluating OTS from a dierent perspective. To
develop a better understanding of how OTS functions at scale, the scope of analysis is
shifted from a single timestamp to many timestamps over time. Testing and measurement
of OTS at a higher level will allow the determination of its benets and drawbacks, as
well as forming an opinion about its consistency and reliability over a larger sample of
les and timestamps.
3.4.1 Design
To perform analyses of the nature described above, the following test design is proposed,
namely, one that will continuously and automatically create and validate OTS timestamps,
while recording results and key metrics about the process.
Creating and validating OTS timestamps at scale would not be practical if performed
manually, as it would take a lot of time and resources. Luckily, the OTS client can easily
be invoked programmatically to create and validate timestamps for les.
By automating the OTS functions through the use of a script, a large number of OTS
functions to generate large sample of OTS timestamps and record a multitude of data
3.4. OTS TEST DESIGN 90
points about the functions being executed is enabled. The script will be tasked with
performing the OTS actions as well as capturing and storing key data points for further
analysis.
1. Create a le.
3. Verify the OTS timestamp for this le and all previous les not yet veried.
Once a large sample of OTS timestamps has been gathered, the associated data set
will be analysed to identify any individual or systemic errors with OTS timestamps.
Furthermore, the data will be analysed to possibly uncover trends associated with the
creation and validation of OTS timestamps; certain performance metrics of OTS will also
be examined.
3.4.2 Environment
A stable and consistent environment is essential for this type of testing. The test envi-
ronment was a dedicated virtual server environment running Ubuntu Linux as well as the
necessary software dependencies like Python, OTS and a Bitcoin full node.
All of the software dependencies were installed with default congurations where possible.
Each software package listed in Table 3.3 installed and congured their own dependencies
as per the normal install process. These sub-dependencies are not within the scope of this
discussion.
Python 3+ is required for the OTS client, which will also be the programming language
for the test script that would interact with the OTS client and the database.
Bitcoin Core libraries are necessary to run a local Bitcoin node to interact with the Bitcoin
network. In the testing conguration, Bitcoin would not function as a wallet, but only
a full node that maintains a copy of the Bitcoin blockchain. From an OTS perspective,
the Bitcoin libraries would only be used to verify timestamps and not to create them;
calendar servers will still be used to create these timestamps. The Bitcoin node was set
up according to guidelines by Bitcoin Project (2015)
The OTS client, necessary to perform all the OTS functions, was installed and congured
according to the project documentation found at Opentimestamps (2017).
Finally, MongoDB, serving as the data store for the recorded OTS function data, is
required. MongoDB was installed according to the product documentation noted by
MongoDB Inc. (2016).
The test script called ots-test.py is the main component of the OTS tests and is responsible
for all OTS functions, execution, and data gathering. The script code can be seen in
Listing A.1. Listing 3.16 is the pseudocode for ots-test.py and describes at a high level
what the script does and in which sequence.
3.4. OTS TEST DESIGN 92
As is clear from Listing 3.16, the sequence of actions in ots-test.py is very straight forward:
Create a new le and timestamp it; then Upgrade all existing incomplete timestamps;
and nally, verify all unveried complete timestamps. During each of these operations,
the necessary information and results are saved to otsobj, which in turn is saved to the
database for persistent storage.
3.4. OTS TEST DESIGN 93
The structure of otsobj warrants further discussion. The data structure in question is
created shown Listing 3.17
OtsObj is a nested object used to store the le and timestamp data for each stamped le
created by the script. The object has a range of simple properties, such as name, path
and size, combined with complex properties, which themselves are collections like proof
and events. The nal structure of the object with some reference data is shown in Listing
3.18.
42 }
The object mentioned in Listing 3.18 is created for each le by the script at the time of
creating the le and stamping it. This serves as the data structure in which the data
about the timestamp and associated functions will be recoded. The object is saved to
the database and subsequently read from the database whenever the le or timestamps is
operated on. Any changes are recorded in the appropriate eld within the data structure
and are subsequently written to the database for persistent storage.
2. proof: A nested complex object that contains data about the timestamp status.
3. events: A collection of nested complex objects that contain all events related to the
timestamp.
All time elds are recorded as UNIX time (UT), or seconds since 00:00:00 Jan 1 1970
UTC, to avoid ambiguity and complexities with time zones and conversions.
Many elds are metadata elds to help keep track of the status of the le and its progress
through the various OTS operations, and are not necessarily signicant to the overall
testing goal. They will therefore not be discussed in detail. Table 3.4 gives a brief
overview of the most important elds recorded by the script and their meaning.
The above mentioned elds, as well as other metadata about the associated les, are
recorded and maintained for each le and related timestamp as each OTS function is
executed. As soon a particular timestamp has a proof.veried value of True the timestamp
is considered complete and no further actions are taken or recorded for that timestamp.
The script was set to execute automatically on the Ubuntu server by conguring a systemd
job. The timing of the execution was set to every ten minutes. This timing was chosen
96
for two reasons. Firstly, to ensure that a large sample of timestamps could be generated
in a reasonable amount of time (at least 3 000 in two weeks of continuous execution).
Secondly, to ensure that there is a large enough time gap between executions to enable
the script to perform all of the necessary OTS functions, thereby preventing concurrent
execution that could lead to data integrity issues.
3.5 Summary
In this section, the OTS protocol was discussed and investigated in detail to understand
how and why it can potentially be trusted. Multiple conguration options were discussed,
along with how each conguration results in a unique balance between convenience and
security. A better understanding of its potential strengths and weaknesses is understood
and, through a practical example, it was shown how the verication mechanism works in
the desired conguration for an optimal balance of usability and security.
Subsequent to the discussion of OTS, Autopsy as a vehicle for OTS integration was looked
at and it was concluded that Autopsy is the ideal platform as a result of its open nature
and modular design. Autopsy also has an active community of support and development
resources to enable the development of an Autopsy OTS module. This subsection was
concluded with a basic Autopsy OTS use case and a brief discussion on how dierent
parties to an investigation could use it.
Finally, a design to test OTS at scale to investigate factors such as usability, error rates,
and response times was discussed and investigated. A verbose set of metrics was recorded
with the creation of a large sample of timestamps to generate a data set of OTS-related
data to investigate further and draw potential insights.
Chapter 4
Implementation
4.1 Introduction
In this section of the paper, a discussion on how the OTS protocol was implemented within
the Autopsy framework to create an easy-to-install and use Autopsy module is presented.
The module allows a user of Autopsy to timestamp data sources as they are imported
into an Autopsy case (project) and view the timestamp results in an easy-to-understand
format. A brief overview of how Autopsy module development works, the dependencies
and the development environment are outlined. The autopsy-opentimestamps module and
its functions are discussed, and its functions briey illustrated. The section is concluded
with a discussion about some challenges experienced during development and how the
results are believed to be useful. This is followed by a discussion of the end result, and
how and where the module can be downloaded.
As noted in the previous section, Autopsy is an application and framework that was built
with modularity and extensibility in mind. It is, therefore, quite easy to nd resources on
developing modules for the platform as is apparent in Sleuthkit.org (2015). The Autopsy
Developer’s Guide notes: “Autopsy was developed to be a platform for plug-in modules.
98
4.1. INTRODUCTION 99
The Developer’s Guide contains the API docs and information on how to write modules.
When you create a module, add it to the list of Autopsy 3rd Party Modules.”
It further clearly states that there are two types of Autopsy development:
The focus of this work is on the second type of development, where a module is developed
that can be freely distributed for use by the community. Sleuthkit.org (2017a) noted
that modules can be developed in either Python or Java. Autopsy itself is developed in
Java but can incorporate modules developed in Java using Jython, a free and open Java
implementation of Python.
From a developer’s perspective there are a few basic terms that should be well understood
to start developing an Autopsy module. Firstly, a case, which translates broadly to
an Autopsy project and is a logical container for data sources and resources related to
that project. A case can have many data sources. Secondly, there are data sources
which refer to disk images or collections of logical les. The central database is another
important component and serves as the persistent storage layer for modules to write and
read metadata and analysis results. The Blackboard is a form of intermediate storage
shared between modules and can be used to communicate data between two or more
modules by posting what is called artefacts to the Blackboard. Finally, there are many
services and utilities available to developers as depicted in the list below.
These services and utilities are primarily exposed as APIs and provide supporting func-
tions to the module. These include:
2. Logging: API for logging informational or error messages to the Autopsy log le.
6. Platform Utilities: API to determine user context and save resources to the user
directories.
Using a collection of the above-mentioned services and resources modules can be developed
with rich functionality and standardised interaction with the framework and underlying
data storage mechanisms.
There are four main types of Autopsy modules that can be leveraged by modules devel-
opers to perform dierent functions.
1. Ingest modules
2. Report modules
3. Content viewers
4. Result viewers
Ingest modules, as the name suggests, are used to operate on data as it is ingested into
an Autopsy case. Ingestion can happen when the case is created initially or throughout
the lifecycle of the case as more data sources are added. Ingestion modules can be further
classied into two types, le ingest modules and data source ingest modules. File ingest
modules are triggered for each individual le in a data source whereas data source modules
are triggered once per data source.
Report modules are usually invoked after ingestion and are used to deliver the results of
any analysis to the user. Report modules can also be used to perform further analysis if
desired.
Content Viewers are modules with graphical components and allow a module to display
content of a data source in a specic and visually appealing way; I.E., graphs, rendered
images or plain text.
4.1. INTRODUCTION 101
Result Viewers are modules to present data and information related to a collection of les
in a data source or case.
This brief overview of Autopsy module development resources and the architecture of the
Autopsy framework clearly illustrate that the platform is ideally suited to support and
promote easy module development. Additionally, the framework and its services are well
documented and present a rich set of standardised functionality.
Furthermore, the IDE had the primary software dependencies shown in Table 4.2. The
choice of operating system and software dependencies present in the development environ-
ment were driven by some compatibility prerequisites. Autopsy 4.4.0 is fully supported
only on Windows and thus an IDE running on Windows was necessary. Autopsy mod-
ules also require NetBeans as Autopsy modules are built on top of the NetBeans Rich
Client platform to support the plug-and-play nature of Autopsy modules. Guidance on
4.1. INTRODUCTION 102
Sleuthkit.org (2015) was followed to install and congure NetBeans and to create a basic
module project.
The Java Runtime Environment (JRE) and Java Development Kit (JDK) listed in Table
4.2 are required by both NetBeans and Autopsy. As with the OTS testing script, a local
Bitcoin node was required and was installed to enable OTS to interact with the Bitcoin
blockchain. The choice of java-opentimestamps as opposed to the Python variant, used for
the OTS test script, was twofold. Firstly, although Autopsy can support modules written
in Python, its use of Jython means that it can only support modules written in Python
2.7. OTS however, requires, at a minimum, Python 3.5 to run. This would mean that,
in order to use the OTS libraries from the Python implementation, they would have to
ported to Python 2.7. This in turn would require signicant eort outside the immediate
goals of this project. Luckily, a Java implementation of OTS was also available and it
was decided that this OTS implementation would be used for the development of the
autopsy-opentimestamps module. Secondly, Autopsy itself is written in Java and thus
natively supports modules developed in Java, resulting in less complexity.
All of software dependencies listed in Table 4.2 were installed using the default congu-
ration options.
During the initial design phase it was important to make the appropriate design decisions
to ensure the module would be functional. As discussed in the previous section, there
are multiple types of modules within the Autopsy framework and deciding which would
be most appropriate for the development of an OTS module would be crucial. Since the
purpose of the autopsy-opentimestamps module is to create and validate timestamps for
data sources it was determined that the most appropriate module type was an ingest
module.
By using an ingest module design, the module could create timestamps for data sources
as soon as they are ingested into a case. This default behaviour of timestamping at the
time of ingestion would mean that the possibility of accidentally not timestamping a data
source would be minimised.
decided that a data source ingest module would be best suited to achieve the desired
functionality. This would mean that timestamps would be created at a data source level
at not necessarily at an individual le level. Autopsy supports a range of data sources as
can be seen in Figure 4.1.
Each of these data sources possess a unique set of characteristics that dictate how they
can be used within an ingest module. As an example the “Disk Image or VM File”, when
ingested, is seen as single container for a collection of other les whereas the “Logical
Files” data source type is not seen as a container but rather a collection of individual
les. This meant the module would have to process these data sources dierently.
In the case of a disk image, which itself is a container for a hierarchical le structure, it
would be sucient to timestamp only the container since its hash would by design include
all the les within the container. This would imply that if any single le or artefact within
that container changed, the hash of the container would also change and invalidate the
timestamp, as expected. With a disk image a le or folder contained within its structure
cannot be moved outside of the container without altering the hash of the container. By
having a logical container, such as a disk image, it would not be necessary to timestamp
each le within that container.
Conversely, the logical le set data source does not necessarily belong to a container
with a hierarchical le structure. This means there is no top-level element that can be
4.1. INTRODUCTION 104
timestamped that would include all of the les within the data source. In the case of a
logical le set where multiple folders or les can be at the same level, each of the les in
the le set would have to be individually timestamped. This is achieved within an ingest
module by recursively enumerating each of the folders and timestamping all les at that
level. Essentially the behaviour of a le ingest module in mimicked within a data source
ingest module to achieve the desired functionality.
Figure 4.3 shows the high-level logical execution ow of the autopsy-opentimestamps
module for all supported data source types.
Note that there are two distinct sources of execution: execution upon ingestion and manual
execution. Both of these execution paths would operate on the selected data source. In
both cases, when importing a new data source or manually running ingest modules from
the context menu, a user will be presented with the screen shown in Figure 4.4 whereby
they can choose to list private calendar servers if they wish to use those. By leaving
this options empty the module will use the default calendar servers congured in OTS.
Support for extended conguration options like proxy awareness and custom Bitcoin node
conguration shown in Listing 3.1 was specically excluded from the scope of this module
implementation since the aim was to create a minimum viable solution which is both easy
to use and simple congure.
When invoked, the rst action by the module is to determine the data source type, since
dierent data sources need to be handled dierently. If the data source is an image le,
the module proceeds to check if a timestamp already exists for that data source. If the
data source is a set of logical les, it enumerates each of the les and, in turn, checks if a
timestamp exists for the le in question.
105
If a timestamp for the le does not exist, the module proceeds to create a timestamp
and exits or returns control to the enumeration process to advance the next available
le. If a timestamp does exist for a le, the module checks if the timestamp is complete.
In the case of an incomplete timestamp, the module attempts an Upgrade to retrieve
the complete timestamp from the calendar server and exits the current execution. If the
timestamp is already upgraded, the module performs a verication operation and logs the
results to a report which it saves locally before exiting. All actions performed on each of
the les are logged to the Autopsy log le for audit purposes. A sample of such a log le
can be seen in Figure 4.5.
The results of OTS operations are recorded in an OTS report for each data source in a
case. The reports are displayed in the dedicated Autopsy report section in the left-hand
pane of the Autopsy user interface, as can be seen in Figure 4.6. The report can be viewed
by clicking on the report entry, which will open the report in the containing folder.
The OTS report is text based, and the name is derived from the associated data source
name, appended with the text OTS Report.txt.
An example of the content of the OTS report can be seen in Figure 4.7, which notes all
the important OTS operations, such as timestamp creation, timestamp Upgrade results,
as well as verication results. This report is append-only, and any subsequent verication
107
results will be appended to the end of the report for that data source. The report also
contains the complete list of OTS timestamp commitment operations to enable a manual
proof, if necessary.
As is clear from Figure 4.8, there are a number of source code components to the
autopsy-openstimestamps project under the source packages view. Many of these com-
ponents, for instance OpentimestampsModuleFactory.java and OpentimestampsJobSet-
tingsPanel.java, are necessary module components for any Autopsy module and are cre-
ated in accordance with the guidance on Sleuthkit.org (2015). The most relevant modules
to the OTS functionality are OpentimestampsModule.java and OpentimestampsFunc-
tions.java, which will be briey discussed below.
are not ideal, as they take text-based input parameters and provide textual return types.
Because of this, these methods were reimplemented in OpentimestampsFunctions.java
with dened data types as input parameters, and with complex return types such as
lists. These complex return types made it easier to programmatically interact with the
OTS methods by passing more complex objects to and from OTS calling methods in
OpentimestampsModule.java.
4.2 Challenges
During the implementation phase it was conrmed that the Autopsy framework does
provide a rich feature set to simplify the development of modules. It was also observed
that the guides, sample projects, and how-to modules listed on Sleuthkit.org (2015) were
very useful to guide development.
Implementation did, however, become more dicult when attempting to build function-
ality, not implemented in an example project. It was also unclear how to approach
development where a module should act as both a data source ingest module and a le
ingest module. This lack of clarity led to the development of custom logic to determine
the data source type and to choose the appropriate execution path.
Implementation was started without having a clear idea of how the OTS result would be
displayed to the user, and several options were explored during development. Initially,
the Blackboard seemed like the appropriate solution due to the ease of access, but it was
found that artefacts posted to the Blackboard are stored in volatile memory and would
4.3. PROJECT DETAILS 110
not persist across multiple executions. This led to the use of the Report functionality
rather than the Blackboard.
It was found that module execution was primarily geared towards synchronous execution
and feedback, and proved to be dicult to design a feedback mechanism that could
persistently show the state of a long-running asynchronous operation (like OTS upgrade).
Finally, it was found that the ability to give real time rich feedback to the user, although
possible, was not easy given the functionality in the Autopsy framework. Even though
there is a feedback mechanism during module execution, it is limited to a progress bar.
In order to support testing and validation of the module by the community, it was de-
cided to use publicly accessible data sources during development where possible. During
development a small disk image named nps-2009-canon2-gen2.E01, located at Digital
Corpura2 , was used. This image is approximately 29MB in size and resulted in quick
execution of ingest modules. The module was not tested with larger image les, the likes
of 1TB or 2TB hard drives, as the size of the image and the subsequent performance of
the hashing operation performed on it is outside the scope of this work. Functionally,
there should be no dierence in the execution of the module on small versus large les.
Obviously there will be a performance cost to execute on larger les, as the hashing op-
eration would take longer. The OTS methods, outside of hashing, are relatively constant
and should execute within seconds regardless of the data source size.
The autopsy-opentimestamps project is open sourced under the GNU LESSER GEN-
ERAL PUBLIC LICENSE Version 3, 29 June 2007 and can be found on Github at
Weilbach (2017). By making this software open source, it is intended to encourage its use
1
https://github.com/opentimestamps/java-opentimestamps/pull/9
2
http://downloads.digitalcorpora.org/corpora/drives/nps-2009-canon2/nps-2009-
canon2-gen1.E01
4.4. SUMMARY 111
to validate the proposed use case and to introduce OTS to Autopsy users and the digital
forensics community at large. A secondary objective of releasing this module as open
source software is to promote further collaboration and renement on it, and develop it
beyond a sample implementation into a fully functional module with improved reporting
and user interaction.
At the time of publication, the project had one release in the form of a Netbeans module
that can be downloaded and installed in Autopsy version 4.4.0 or later. Table 4.2 lists
the other runtime dependencies for the module.
4.4 Summary
In this section the design approach taken to create the autopsy-opentimestamps module
was illustrated by looking at the structure of Autopsy modules in general and how that im-
pacts design decisions. The invocation and execution ow of the module was described at
a high-level followed by an overview of the project structure. This was accompanied by a
discussion of some implementation details, including the diculties and challenges experi-
enced during development. More detail was provided about the autopsy-opentimestamps
project and where it can be located.
Chapter 5
5.1 Introduction
Testing and measurement are of critical importance to determine how eective and e-
cient solutions are. Based on the research design and the implementation, there are two
main testing topics that will be discussed in this section:
The rst being which is quantitative in nature. The second being the testing and mea-
surement of the functionality in the autopsy-opentimestamps module which will be quan-
titative in nature.
Data gathering, according to the OTS test design discussed in Section 3.4.1, lasted for 34
days, from 5 September 2017 up to and including 8 October 2017. OTS timestamps were
112
5.2. OTS TESTING RESULTS 113
Figure 5.1: CSV data loaded into Microsoft Excel for analysis
Table 5.1: Description of data elds extracted from the data set
Field name Description
name The name of the le which was created
size The size in bytes of the le that was created
leCreated The name of the event
leCreatedTime The time (UNIX timestamp) the CreateFile event occurred
proofCreated The name of the event
proofCreatedTime The time (UNIX timestamp) the StampFile event occurred
proofUpgraded The name of the event
proofUpgradedTime The time (UNIX timestamp) the UpgradeFile event occurred
proofVeried The name of the event
proofVeriedTime The time (UNIX timestamp) the VerifyFile event occurred
The time (UNIX timestamp) as of which OpenTimestamps can
dataExistedTime
attest the data existed
created, upgraded, and veried every 10 minutes, resulting in a data set of 4 702 unique
les, their timestamps, timestamp results, and operational metadata.
Data Set A
The data set gathered was saved in the complex structure noted in Listing 3.18, and was
not easily analysable directly in the database. For this reason, the data was extracted in
a attened format from the database in using a Python script that produced a Comma
Separated Values (CSV) output le. The attened CSV data set has one le per line. A
sample can be seen in Figure 5.1 after being imported into Microsoft Excel.
Red columns in Figure 5.1 are unmodied data values, and green columns are calculated
elds translating UNIX timestamps into UTC date and time for easily legibility. This data
set served as the basis for further analysis, but is also enriched with additional metadata
during the analysis stages.
Table 5.1 shows the detailed descriptions of each of the column headers for the base data
set.
5.2. OTS TESTING RESULTS 114
The elds listed in Table 5.1 enabled the calculation of further metrics regarding the
performance and accuracy of OTS operations. A sample of these calculated elds can be
seen in Figure 5.2.
Table 5.2 gives the detailed descriptions of each of the column headers for the calculated
data set. These calculations were performed for each of the 4 703 records.
Data Set B
Supplementary to the base data set, some metadata about OTS operations was captured
by calculating the start and end times of each OTS operation performed by the script.
These measurements can be seen in Listing A.1 line 240 - 256, and are aimed to accu-
rately measure the execution time of these OTS operations. Initially, this was simply for
potential troubleshooting, but it became clear that having a data set of OTS operation
times could be valuable and that this data set was also analysed. A sample of this data
set can be seen in Figure 5.3.
5.2. OTS TESTING RESULTS 115
Table 5.2: Description of data elds calculated from the data in Table 5.1
Field name Description
timeToStamp The time, in seconds, it took for a calendar server to
conrm the timestamp would be committed, from the
time the le was created. This is the time required to
create an incomplete OTS timestamp.
timeToUpgrade The time, in seconds, it took for a complete timestamp
to be retrieved, from the time the timestamp was com-
mitted on the calendar server (timeToStamp). This is
the time required to create a complete OTS timestamp.
timeToVerifyFromStamp The time, in seconds, it took to verify a complete times-
tamp, from the time the timestamp was committed on
the calendar server (timeToStamp).
timeToVerifyFromUpgrade The time, in seconds, it took to verify a complete times-
tamp, from the time the timestamp was committed on
the calendar server (timeToVerifyFromStamp).
timestampAccuracy The time dierence in seconds between the time the
timestamp was completed (timeToUpgrade) and the
time attestation received by the Bitcoin blockchain as
per the OTS verify operation.
Data Set C
A nal data set was gathered pertaining to the failure rate of OTS. The data set was
generated by intentionally tampering with OTS components to induce invalid les and
timestamps and reverifying them using OTS. A sample of the data set can be seen in
Figure 5.4. The modication and validation was performed using another Python script,
noted in Listing A.2, which also recorded the results.
The script in Listing A.2 enumerates all of the previously generated les and timestamps,
and alternates between modifying the les, or the associated timestamp, by appending
a few xed bytes. By intentionally breaking the timestamps, or modifying the les in
known and consistent way, more insight into potential false positive and false negative
5.2. OTS TESTING RESULTS 116
Using the above-mentioned data sets, more in-depth analysis was performed on each data
set to highlight trends, issues and other potentially signicant facts.
Analysis A
Analysis started by looking at the data from Data Set A (Figure 5.2) where various
operations and there their timings were recorded. From the perspective of an OTS user,
there are a few metrics in this data set that can be signicant:
• tStamp: The time it takes to create a timestamp and get a commitment from the
calendar server. Includes local processing time.
• tVerify: The time it takes to verify a timestamp. Includes local processing time.
For each of the above-mentioned metrics, an average, minimum, maximum, and standard
deviation was calculated and is listed in Table 5.3.
All of the measurement values in Table 5.3 are in seconds and are rounded up to two
decimal places. These values will be referred to in coming sections by concatenating the
names of the relevant row and column. E.g. The Average (A) timeToUpgrade (tUpgrade)
will be denoted by A-tUpgrade.
5.2. OTS TESTING RESULTS 117
Analysis B
To better analyse the data from Data Set A, the various data points were plotted on a
graph. Looking at Figure 5.5, the time to create a complete timestamp has been visualised.
On the x-axis is the creation time of the timestamp (proofCreatedTime) and the time the
proof was created (timeToStamp) is on the y-axis.
Additionally, there is a calculated moving average per 144 data points (1 day) to assist
in visualising the timestamp completion-time without some of the outlier values. The
overall average for timeToUpgrade (A-tUpgrade) is 3 563.04 seconds, as can be seen in
Table 5.3.
Similarly, Figure 5.6 shows the timestamp accuracy. Timestamp accuracy is dened as
the dierence in time between the point the timestamp was created (the known time data
existed and was committed), and the time verication can attest the data rst existed.
This is used to measure accuracy as it shows how precise OTS attestations are for a
sample with a known creation data.
A moving average over 144 data points is also calculated and shown in Figure 5.6 to
account for outliers with the overall average A-tAccuracy being 2687.64 seconds.
Both of these metrics visualised in Figure 5.5 and Figure 5.6 are relevant to the respon-
siveness and performance of OTS within the test environment.
Analysis C
Another aspect of OTS performance is the time it takes to perform individual granular
functions. Granular functions refer to the actual time taken to perform a single operation
I.E., Stamp or Verify. Previous measurements were related to the time between multiple
operations I.E., Stamp and Verify. The following data set relates to the time it takes to
perform individual functions or the time it takes for OTS functions to deliver a requested
result.
• tStampG: Stamp (All stamp actions including RPC call to remote calendar).
• tVerifyG: Verify (All verify actions including RPC call to local Bitcoin node)
Figure 5.5: Time to complete a timestamp relative to the date and time the timestamp was created.
118