Unit 1
Unit 1
STRUCTURES; ARRAY
Structure
1.0 Introduction
1.1 Objectives
1.2 Program Analysis
1.3 One Dimensional Arrays
1.4 Array Declaration
1.5 Storage of Array in Main Memory
1.6 Sparse Arrays
1.7 Summary
Model Answers
Further Readings
1.0 INTRODUCTION
This unit is an introductory unit and gives you an understanding of what a data structure is. Knowledge of
data structures is required of people who design and develop computer programs of any kind : systems
software or applications software. As you have learnt in earlier blocks, data are represented by data values
held temporarily within program's data area or recorded permanently on a file. Often the different data
values are related to each other. To enable programs to make use of these relationships, these data values
must be in an organised form. The organised collection of data is called a data structure. The programs
have to follow certain rules to access and process the structured data. We may, therefore, say data are
represented that
If you recall, this is an extension of the concept of data type. We had defined adata type as
Further, we had seen that simple data type can be used to built new scalar data types, for example subrange
and enumerated type in Pascal. Similarly there are standard data structures which are often used in their
own right and can form the basis for complex data structures. One such basic data structure called Array is
also discussed in this unit Arrays are basic building block for more complex data structures. Designing and
using data structures is an important programming skill. In this and in subsequent units, we are going to
discuss various data structures. We may classify these data structures as linear and non-linear data
structures. However, this is not the only way to classify data structures. In linear data structure the data
items are arranged in a linear sequence like in an array. In a non-linear, the data items are not in sequence.
An example of a non linear data structure is a tree. Data structures may also be classified as homogenous
and non- homogenous data structures. An Array is a homogenous structure in which all elements are of
same type. In non-homogenous structures the elements may or may not be of the same type. Records are
common example of non-homogenoes data structures. Another way of classifying data structures is as static
or dynamic data structures. Static structures are ones whose sizes and structures associated memory
location are fixed at compile time. Dynamic structure are ones which expand or shrink as required during
the program execution and their associated memory locations change. Records are a common example of
non-homogenous data structures.
In this unit first we have discussed about the performance of an algorithm, You may find it very relevant as
You read on the subsequent blocks and develop programs. Then we introduce the array data structure. Then
the array declarations in Pascal and C are reviewed. A section is devoted to discussion on how single and
multi-dimensional arrays are mapped to storage. Finally a discussion on sparse arrays closes the unit .
1.1 OBJECTIVES
At the end of the Unit, you would be able to
- analyze the trade offs of the data handling needs of a particular problem situation
(iii) checking that modifications can be made easily, without introducing new errors.
(iv) we may also analyze program execution time and the storage complexity associated with
it i.e. how fast does the program run and how much storage it requires.
Another related question can be : how big must its data structure be and how many steps
will be required to execute its algorithm?
Since this course concerns data representation and writing programs, we shall analyse programs in terms of
storage and time complexity.
Performance Issues
In considering the performance of a program, we are primarily interested in
Generally we need to analyze efficiencies, when we need to compare alternative algorithms and data
representations for the same problem or when we deal with very large programs.
We often find that we can trade time efficiency for space efficiency, or vice-versa. For finding any of these
i.e. time or space efficiency, we need to have some estimate of the problem size. Let's assume that some
number N represents the size of the problem. The size of the problem or N can reflect one or more features
of the problem, for instance N might be the number of input data values, or it is the number of elements of
an array etc.
Suppose that we are given two algorithms for finding the largest value in a fist of N numbers. It is also
given that second algorithm executes twice the number of instructions executed by the first algorithm for
each N value.
Let first algorithm executes S number of instructions. Then second algorithm would execute 2S
instructions. If each instruction takes 1 unit of time, say 1 millisecond then for N= 10, 100, 1000 and 10000
we shall have the number of operations and estimated execution time may be as given below:
Algorithm 1 Algorithm II
Number of Estimated Number of Estimated
N
instructions Execution Time instructions Execution Time
10 10S 10 msec 20S 20 msec
100 100S 100 msec 200S 200 msec
1000 1000S 1000 msec 2000S 2000 msec
10000 10000S 10000 msec 20000 S 20000 msec
You may notice that for larger values of N, the difference between execution time of the two algorithm is
appreciable, and one may clearly say that Algorithm II is slower than Algorithm I. Also the difference of
algorithm I is comparatively better as the problem size get larger. This kind of performance improvement is
termed as order of improvements. Two algorithm may compare with each other by a constant factor, i.e.
improvement from one to another does not change as the problem size gets larger. For example, one of
them can be two times faster than the other and it always remain two times better regardless of the problem
size.
Associated to order of improvements are the several order of classification of algorithum as given below.
We shall be discussing those again in Block 6, when we shall introduce you to various solving and
searching algorithms.
- If the problem size doubles and the algorithm takes one more step, we relate the number of steps
to the number of steps to the problem size by
O(log2N)
- If the problem size doubles and the algorithm takes twice as many steps, the number of steps is
related to problem size by
O(N)
O(N log2 N)
You may notice that the growth rate complexities is more than the double of the growth rate of
problem size, but it is not a lot fasten
- If the number of steps used is proportional to the square of problem size ,we say the com- plexity
is of the order of N2 or O(N2).
- If the algorithm is independent of problem size, the complexity is constant in time and space, i.e.
O(1).
The notation being used, i.e. a capital O() is called Big- Oh notation.
1.3 ARRAYS
In applications where we have a small number of items to handle, we tend to specify separate variables
names for each item. When we have to keep track of more pieces of data, we need to organise data. Such
that we can use one name to refer to several items. Let us see this through a simple example. Consider the
following problem:
The problem requires all the numbers as they are read. Further we cannot print anything until all 25
numbers are read; therefore, we need to store all the twenty five numbers. Reading 25 numbers in 25
different variables will be quite cumbersome and so would be writing these numbers in reverse order. It is
much simpler to call the numbers NUM1 NUM2, NUM3 .... NUM25.
Each number is a NUM, and numbers are distinguish by subscripts. Also they are read in succession. Thus
we can abbreviate this sequence as NUMi for i = 0,1, 2. ....... 24. Such a subscripted variable is called an
Array. More formally an array is a Finite ordered set of homogenous elements which are stored in adjacent
cells in memory. Arrays are usually used when a program include a list of recovering elements.
In C subscripts are Placed in square brackets [ ]. Repetition over a sequence Of values of i may also be
implemented using a loop construct. For example, the following statement reads all 25 values:
The simplest form of an array is a one-dimensional array or vector. As stated earlier, the various elements
of an array are distinguished by giving each piece of data separate index or subscript. The subscript of an
element designates its position in array's ordering. An array named A which consists of N elements can be
depicted as shown in figure 1.
A(0) .A(1) .A(2).......................................................................................................A(N-1)
Arrays can be multi-dimensional. Any array defined to have more than one dimension is considered to be
multi-dimensional array. An array can be 2-dimensional, 3-dimensional, 4- dimensional, or N-dimensional
although they rarely exceed three dimensions. Two-dimensional arrays, sometimes called matrices, are
quite common. The best way to think about a two-dimensional array is to visualize a table of columns and
rows: the first dimension in. the array refers to the rows, and the second dimension refers to the columns.
A collection of data about the grades of students in a class in the four different exams can be represented
using a 2-dimensional arrays. If we have 10 students and each given grades in 4 exams, we can depict it as
in the table (Figure 2).
Figure 2
Each cell in this table contains a grade value for the student Number (given by the corresponding row
number) and exam number (given by the corresponding column no.). We may map it on to an array A of
order 10x4. A [I] [J] represents an element of A, where I runs from 0 to 9 and J runs from 0 to 3. A [3] [4]
will have the grade value of 4th student in fifth exam, A [8] [1] will have the grade value of 9th student in
second exam, and so on.
By convention the first subscript of a 2-dimensional array refers to a row of the array, while the second
subscript refers to a column of the array. These rows and columns are one more than what is represented as
subscripts.
In general an array of the order M X N (read as M by N) consists of M rows, N columns and MN elements.
It may be depicted as
0 1 ...............................................................N-1
0
1
.
.
M-1
Figure 3
To assign a value in a multi-dimensional array, specify all the dimensions, as shown in following
statement:
Let us now discuss the syntax and semantics of an array. We can divide our discussion in three parts:
- Array declaration
In first declaration A is the array name; the elements of A can hold integer data and the number of elements
is 24 i.e. subscripts range from 0 to 23.
In the next declaration B is the array name; the data type of its elements is real and it is a 2-dimensional
array with subscripts ranging from 0 to 99 and 0 to 24.. It makes more sense to start the array at a value that
corresponds to the context of your data. You may notice that the subscript is may not always be positive. It
can be negative or zero or negative subscripts.
Failing to remember that the zero element is the first item in the array - and therefore, the element at index
5 is the sixth, not the fifth - is a frequent cause of programming bugs.
The reason for this is that some languages - C and C++ for example - require arrays to begin with zero
indexes.
An array declaration tells the computer two major pieces of information about an array. First, the range of
subscripts allow the computer to determine how many memory locations must be allocated. Second the
array type tells the computer how much space is required to hold each value. Let us consider the following
declarations:
Int A[10];
Float B[10];
The first declaration tells the computer to allocate enough space for the variable A to store 10 integers. The
second declaration tells the computer to allocate enough space for the variable B to store 10 rears. Since a
real number takes more space than an integer the storage allocated would not be same. Array declarations
have already been discussed in Block 1, Unit 3 and Block 3, Unit 1 respectively.
Operations on Arrays
The array is a homogenous structure, i.e. the elements of an array are of the same type. Following set of
operations are defined for this structure.
These operations apply to array of any dimension. Some of these functions have already been discussed in
unit on arrays in C, i.e. Block 1, Unit 3 and Block 2, Unit ... The operation of searching for value in an
array looks for a mark of value with the elements in the array. If found it returns the index of such an
element that matches the value. Otherwise it returns an appropriate message. Sorting means rearranging the
array elements in a particular order.
16 17 18 19 20 21 22
Memory Cells
A[0] A[1] A[2] A[3] A[4] .. .. ..
Figure 4
Therefore, it is necessary to know the starting address of the space allocated to the array and the size of the
each element which is same for all the elements of an array. We may call the starting address as a base
address and denote it by B. Then the location of Ith element would be
B + I*S (1)
Let us now consider storage mappings for multi-dimensional arrays. As we had seen in previous section
that in a 2-dimensional array we think of data being arranged in rows and columns. However Machine's
memory is arranged as a row of memory cells. Thus the rectangular structure of a 2-dimensional array must
be simulated. We first calculate the amount of storage area needed and allocate a block of contiguous
memory cells of that size. One way to store the data in the cells is row by row. That is, we store first the
first row of the array, then the second row of the array and then the next and so on. For example the array
defined by A which logically appears as given in Figure 5; appears physically as given in Figure 6. Such a
storage scheme is called Row Major Order.
The order alternative is to store the array column by column. It is called Column Major Order. The array of
Figure 7.
Question 2: 1 3 7
Show how the array 5 2 8 would appear in
9 7 1
memory when stored in
(i) row major order
(ii) column major order
In all implementations of C the storage allocation scheme used is the Row Major Order.
Let us now see how do we calculate the address of an element of a 2-dimensional array, which is mapped in
Row Major Order. Consider a 4x6 array A[4] [6]. Take B as the array's base address and S as the size of
element of the array.
To locate element A [I] [J] we must skip (I-1) rows; each having 6 elements, each element of S length and
(J- 1) elements of Ith row, each of length S. Therefore, the address of element A[I] [ J] would be
A[U0], [U1]
where (U0-1) and (U1-1) are the upper bounds of the two subscript ranges.
The row major order varies the subscripts in right to left order. For example the elements of a 2-
dimensional array A[U1] [U2] would be stored in following order:
A[0] [0]
A[0] [1]
.
.
.
A[0] [U2-1]
A[1] [0]
A[1] [1]
A[1] [2]
.
.
.
A[1] [U2-1 ]
.
.
.
.
.
.
.
A[U1-1] [U2-1]
We may generalize it for an N-dimensional array A[U0] [U1] .... [Un-1]. The elements would be stored in
following order:
Let us see how the above expressions work out for a column major order.
We once again consider a 4x6 array A[4] [6]. Also take B as base address and S as size of each element.
Then the address of A[I,J] would be
B+J*4*S+ I*S
To reach A[I] [J] we shall skip J-1 columns, each of length 4* S and I-1 elements each of lengths. We
further generalize it for an array A[U1] [U2].
Following the same logic, the address of A[I] [J] would be given as
B+(J*U1*S)+(I*S)
The column major order varies the subscripts in left to right order. For example the elements of a 2-
dimensional array A[4] [3] would be stored in the sequence as given below:
A[0] [0]
A[1] [0]
A[2] [0]
A[3] [0]
.
.
.
.
A[0] [2]
A[1] [2]
A[2] [2]
A[3] [2]
Compare this sequence with the one you obtain in check your progress.
Question2: How would a m x n array A [m] [n] stored in Column Major Order?
As we had done for Row Major Order, we may generate the sequence of N-dimensional array
0 0 0 0 0 1 0
0 0 0 0 1 0 0
0 0 0 0 0 0 0
0 0 0 0 3 0 0
0 2 0 0 0 0 0
0 0 0 0 4 0 0
0 0 0 0 2 0 0
If we store those array through the techniques presented in previous section, there would be much wasted
space.
Let us consider 2 alternative representations that will store explicitly only the non-zero elements.
1. Vector representation
Each element of a 2-dimensional array is uniquely characterized by its row and column position we may,
therefore, store a sparse array in another array of the form.
A[n+1] [3]
The sparse array given in Figure 8 may be stored in the array A[7] [3] as shown in Figure 9.
The Elements A[O] [0] and A[0] [1] contain the number of rows and columns of the sparse array. A[0] [2]
contains the number of non- zero elements of sparse array. The first and second element of each of the rows
store the number of row and column of the non- zero term and the third element stores the value of non-
zero term. In other words, each non-zero element in a 2-dimensional sparse array is represented as a triplet
with the format (row subscript, column subscript, value).
If the sparse array was one-dimensional, each non-zero element would be represented by a pair. In general
for an N-dimensional sparse array, non-zero elements are represented by an entry with N+1 values.
1.7 SUMMARY
In this unit, we formally introduced the concept of data structure followed by the most common and simple
data structure Array.
Arrays help to solve problems that require to keep track of many pieces of data. To use array structure, the
name of the array, the type of its elements and the type of its subscripts must be allowed. The declaration
tells the computer the allocate the appropriate memory space.
We have also discussed the storage of arrays in the main memory in row major order and in column major
order. In the last section, we learnt about a special kind of arrays called sparse arrays. In a sparse array, a
large proportion of the elements are zero, but those which are non-zero randomly distributed.
Many application involving sparse arrays use example which are large to be represented in the standard
way. Some way of compacting the information contained in an array is needed. One of the methods to store
non zero elements is presented in this unit.
Arrays are used in programming languages for holding group of elements all of the same kind. Vectors,
matrices, chess boards, networks, polynomials, etc. can be represented as arrays. The space requirement of
array can be large. Good programming practice suggests that arrays should not be used unless there is a
good reason for their use.
MODEL ANSWERS
Check Your Progress 1