History of Database Systems
Information processing drives the growth of computers, as it has from the earliest days of
commercial computers. In fact, automation of data processing tasks predates computers. Punched
cards, invented by Herman Hollerith, were used at the very beginning of the twentieth century to
record U.S. census data, and mechanical systems were used to process the cards and tabulate
results. Punched cards were later widely used as a means of entering data into computers.
Techniques for data storage and processing have evolved over the years:
• 1950s and early 1960s: Magnetic tapes were developed for data storage. Data processing tasks
such as payroll were automated, with data stored on tapes. Processing of data consisted of reading
data from one or more tapes and writing data to a new tape. Data could also be input from punched
card decks and output to printers. For example, salary raises were processed by entering the raises
on punched cards and reading the punched card deck in synchronization with a tape containing the
master salary details. The records had to be in the same sorted order. The salary raises would be
added to the salary read from the master tape and written to a new tape; the new tape would become
the new master tape.
Tapes (and card decks) could be read only sequentially, and data sizes were much larger than main
memory; thus, data-processing programs were forced to process data in a particular order by
reading and merging data from tapes and card decks.
• Late 1960s and early 1970s: Widespread use of hard disks in the late 1960s changed the scenario
for data processing greatly, since hard disks allowed direct access to data. The position of data on
disk was immaterial, since any location on disk could be accessed in just tens of milliseconds. Data
were thus freed from the tyranny of sequentially. With the advent of disks, the network and
hierarchical data models were developed, which allowed data structures such as lists and trees to
be stored on disk. Programmers could construct and manipulate these data structures.
A landmark paper by Edgar Codd in 1970 defined the relational model and nonprocedural ways
of querying data in the relational model, and relational databases were born. The simplicity of the
relational model and the possibility of hiding implementation details completely from the
programmer were enticing indeed. Codd later won the prestigious Association of Computing
Machinery Turing Award for his work.
• Late 1970s and 1980s: Although academically interesting, the relational model was not used in
practice initially because of its perceived performance disadvantages; relational databases could
not match the performance of existing network and hierarchical databases. That changed with
System R, a groundbreaking project at IBM Research that developed techniques for the
construction of an efficient relational database system. The fully functional System R prototype
led to IBM’s first relational database product, SQL/DS. At the same time, the Ingres system was
being developed at the University of California at Berkeley. It led to a commercial product of the
same name. Also around this time, the first version of Oracle was released. Initial commercial
1
relational database systems, such as IBM DB2, Oracle,Ingres, and DEC Rdb, played a major role
in advancing techniques for efficient processing of declarative queries.
By the early 1980s, relational databases had become competitive with network and hierarchical
database systems even in the area of performance. Relational databases were so easy to use that
they eventually replaced network and hierarchical databases. Programmers using those older
models were forced to deal with many low-level implementation details, and they had to code their
queries in a procedural fashion. Most importantly, they had to keep efficiency in mind when
designing their programs, which involved a lot of effort. In contrast, in a relational database, almost
all these low-level tasks are carried out automatically by the database system, leaving the
programmer free to work at a logical level. Since attaining dominance in the 1980s, the relational
model has reigned supreme among data models.
The 1980s also saw much research on parallel and distributed databases, as well as initial work on
object-oriented databases.
1990s: The SQL language was designed primarily for decision support applications, which are
query-intensive, yet the mainstay of databases in the 1980s was transaction-processing
applications, which are update-intensive.
In the early 1990s, decision support and querying re-emerged as a major application area for
databases. Tools for analyzing large amounts of data saw a large growth in usage. Many database
vendors introduced parallel database products in this period. Database vendors also began to add
object-relational support to their databases.
The major event of the 1990s was the explosive growth of the World Wide Web. Databases were
deployed much more extensively than ever before. Database systems now had to support very high
transaction-processing rates, as well as very high reliability and 24 × 7 availability (availability 24
hours a day, 7 days a week, meaning no downtime for scheduled maintenance activities). Database
systems also had to support web interfaces to data.
• 2000s: The types of data stored in database systems evolved rapidly during this period. Semi-
structured data became increasingly important. XML emerged as a data-exchange standard. JSON,
a more compact data-exchange format well suited for storing objects from JavaScript or other
programming languages subsequently grew increasingly important. Increasingly, such data were
stored in relational database systems as support for the XML and JSON formats was added to the
major commercial systems. Spatial data (that is, data that include geographic information) saw
widespread use in navigation systems and advanced applications. Database systems added support
for such data.
Open-source database systems, notably PostgreSQL and MySQL saw increased use. “Auto-
admin” features were added to database systems in order to allow automatic reconfiguration to
adapt to changing workloads. This helped reduce the human workload in administering a database.
2
Social network platforms grew at a rapid pace, creating a need to manage data about connections
between people and their posted data that did not fit well into a tabular row-and-column format.
This led to the development of graph databases.
In the latter part of the decade, the use of data analytics and data mining in enterprises became
ubiquitous. Database systems were developed specifically to serve this market. These systems
featured physical data organizations suitable for analytic processing, such as “column-stores,” in
which tables are stored by column rather than the traditional row-oriented storage of the major
commercial database systems.
The huge volumes of data, as well as the fact that much of the data used for analytics was textual
or semi-structured, led to the development of programming frameworks, such as map-reduce, to
facilitate application programmers’ use of parallelism in analyzing data. In time, support for these
features migrated into traditional database systems. Even in the late 2010s, debate continued in the
database research community over the relative merits of a single database system serving both
traditional transaction processing applications and the newer data-analysis applications versus
maintaining separate systems for these roles.
The variety of new data-intensive applications and the need for rapid development, particularly by
startup firms, led to “NoSQL” systems that provide a lightweight form of data management. The
name was derived from those systems’ lack of support for the ubiquitous database query language
SQL, though the name is now often viewed as meaning “not only SQL.” The lack of a high-level
query language based on the relational model gave programmers greater flexibility to work with
new types of data. The lack of traditional database systems’ support for strict data consistency
provided more flexibility in an application’s use of distributed data stores. The NoSQL model of
“eventual consistency” allowed for distributed copies of data to be inconsistent as long they would
eventually converge in the absence of further updates.
• 2010s: The limitations of NoSQL systems, such as lack of support for consistency, and lack of
support for declarative querying, were found acceptable by many applications (e.g., social
networks), in return for the benefits they provided such as scalability and availability. However,
by the early 2010s it was clear that the limitations made life significantly more complicated for
programmers and database administrators. As a result, these systems evolved to provide features
to support stricter notions of consistency, while continuing to support high scalability and
availability. Additionally, these systems increasingly support higher levels of abstraction to avoid
the need for programmers to have to reimplement features that are standard in a traditional database
system.
Enterprises are increasingly outsourcing the storage and management of their data. Rather than
maintaining in-house systems and expertise, enterprises may store their data in “cloud” services
that host data for various clients in multiple, widely distributed server farms. Data are delivered to
users via web-based services.
3
Other enterprises are outsourcing not only the storage of their data but also whole applications. In
such cases, termed “software as a service,” the vendor not only stores the data for an enterprise but
also runs (and maintains) the application software. These trends result in significant savings in
costs, but they create new issues not only in responsibility for security breaches, but also in data
ownership, particularly in cases where a government requests access to data.
The huge influence of data and data analytics in daily life has made the management of data a
frequent aspect of the news. There is an unresolved tradeoff between an individual’s right of
privacy and society’s need to know. Various national governments have put regulations on privacy
in place. High-profile security breaches have created a public awareness of the challenges in
cybersecurity and the risks of storing data.