Abstract Data Types
data object
set or collection of instances
integer = {0, +1, -1, +2, -2, +3, -3, …}
daysOfWeek = {S,M,T,W,Th,F,Sa}
Data Object
instances may or may not be related
myDataObject = {apple, chair, 2, 5.2, red, green, Jack}
Data Structure
Data object +
relationships that exist among instances
and elements that comprise an instance
Among instances of integer
369 < 370
280 + 4 = 284
Data Structure
Among elements that comprise an instance
369
3 is more significant than 6
3 is immediately to the left of 6
9 is immediately to the right of 6
Data Structure
The relationships are usually specified by
specifying operations on one or more
instances.
add, subtract, predecessor, multiply
Linear (or Ordered) Lists
instances are of the form
(e0, e1, e2, …, en-1)
where ei denotes a list element
n >= 0 is finite
list size is n
Linear Lists
L = (e0, e1, e2, e3, …, en-1)
relationships
e0 is the zero’th (or front) element
en-1 is the last element
ei immediately precedes ei+1
Linear List Examples/Instances
Students in MyClass =
(Jack, Jill, Abe, Henry, Mary, …, Judy)
Exams in MyClass =
(exam1, exam2, exam3)
Days of Week = (S, M, T, W, Th, F, Sa)
Months = (Jan, Feb, Mar, Apr, …, Nov, Dec)
Linear List Operations—Length()
determine number of elements in list
L = (a,b,c,d,e)
length = 5
Linear List Operations—
Retrieve(theIndex)
retrieve element with given index
L = (a,b,c,d,e)
Retrieve(0) = a
Retrieve(2) = c
Retrieve(4) = e
Retrieve(-1) = error
Retrieve(9) = error
Linear List Operations—
IndexOf(theElement)
determine the index of an element
L = (a,b,d,b,a)
IndexOf(d) = 2
IndexOf(a) = 0
IndexOf(z) = -1
Linear List Operations—
Delete(theIndex)
delete and return element with given index
L = (a,b,c,d,e,f,g)
Delete(2) returns c
and L becomes (a,b,d,e,f,g)
index of d,e,f, and g decrease by 1
Linear List Operations—
Delete(theIndex)
delete and return element with given index
L = (a,b,c,d,e,f,g)
Delete(-1) => error
Delete(20) => error
Linear List Operations—
Insert(theIndex, theElement)
insert an element so that the new element
has a specified index
L = (a,b,c,d,e,f,g)
Insert(0,h) => L = (h,a,b,c,d,e,f,g)
index of a,b,c,d,e,f, and g increase by 1
Linear List Operations—
Insert(theIndex, theElement)
L = (a,b,c,d,e,f,g)
Insert(2,h) => L = (a,b,h,c,d,e,f,g)
index of c,d,e,f, and g increase by 1
Insert(10,h) => error
Insert(-6,h) => error
Data Structure Specification
Language independent
Abstract Data Type
C++
Class
Linear List Abstract Data Type
AbstractDataType LinearList
{
instances
ordered finite collections of zero or more elements
operations
IsEmpty(): return true iff the list is empty, false otherwise
Length(): return the list size (i.e., number of elements in the list)
Retrieve(index): return the indexth element of the list
IndexO f(x): return the index of the first occurrence of x in
the list, return -1 if x is not in the list
Delete(index): remove and return the indexth element,
elements with higher index have their index reduced by 1
Insert(theIndex, x): insert x as the indexth element, elements
with theIndex >= index have their index increased by 1
}
Linear List As A C++ Class
To specify a general linear list as a C++
class, we need to use a template class.
We shall study C++ templates later.
So, for now we restrict ourselves to linear
lists whose elements are integers.
Linear List As A C++ Class
class LinearListOfIntegers
{
bool IsEmpty() const;
int length() const;
int Retrieve(int index) const;
int IndexOf(int theElement) const;
int Delete(int index);
void Insert(int index, int theElement);
}
Data Structures In Text
Generally specified as a C++ (template) class.
Arrays
1D Array Representation In C++
Memory
a b c d
start
1-dimensional array x = [a, b, c, d]
map into contiguous memory locations
• location(x[i]) = start + i
Space Overhead
Memory
a b c d
start
space overhead = 4 bytes for start
(excludes space needed for the elements of x)
2D Arrays
The elements of a 2-dimensional array a
declared as:
int [][]a = new int[3][4];
may be shown as a table
a[0][0] a[0][1] a[0][2] a[0][3]
a[1][0] a[1][1] a[1][2] a[1][3]
a[2][0] a[2][1] a[2][2] a[2][3]
Rows Of A 2D Array
a[0][0] a[0][1] a[0][2] a[0][3] row 0
a[1][0] a[1][1] a[1][2] a[1][3] row 1
a[2][0] a[2][1] a[2][2] a[2][3] row 2
Columns Of A 2D Array
a[0][0] a[0][1] a[0][2] a[0][3]
a[1][0] a[1][1] a[1][2] a[1][3]
a[2][0] a[2][1] a[2][2] a[2][3]
column 0 column 1 column 2 column 3
2D Array Representation In C++
2-dimensional array x
a, b, c, d
e, f, g, h
i, j, k, l
view 2D array as a 1D array of rows
x = [row0, row1, row 2]
row 0 = [a,b, c, d]
row 1 = [e, f, g, h]
row 2 = [i, j, k, l]
and store as 4 1D arrays
2D Array Representation In C++
x[]
a b c d
e f g h
i j k l
Space Overhead
x[]
a b c d
e f g h
i j k l
space overhead = overhead for 4 1D arrays
= 4 * 4 bytes
= 16 bytes
= (number of rows + 1) x 4 bytes
Array Representation In C++
x[]
a b c d
e f g h
i j k l
This representation is called the array-of-arrays
representation.
Requires contiguous memory of size 3, 4, 4, and 4 for the
4 1D arrays.
1 memory block of size number of rows and number of
rows blocks of size number of columns
Row-Major Mapping
• Example 3 x 4 array:
abcd
efgh
i jkl
• Convert into 1D array y by collecting elements by rows.
• Within a row elements are collected from left to right.
• Rows are collected from top to bottom.
• We get y[] = {a, b, c, d, e, f, g, h, i, j, k, l}
row 0 row 1 row 2 … row i
Locating Element x[i][j]
0 c 2c 3c ic
row 0 row 1 row 2 … row i
• assume x has r rows and c columns
• each row has c elements
• i rows to the left of row i
• so ic elements to the left of x[i][0]
• so x[i][j] is mapped to position
ic + j of the 1D array
Space Overhead
row 0 row 1 row 2 … row i
4 bytes for start of 1D array +
4 bytes for c (number of columns)
= 8 bytes
Disadvantage
Need contiguous memory of size rc.
Column-Major Mapping
abcd
efgh
i jkl
• Convert into 1D array y by collecting elements
by columns.
• Within a column elements are collected from
top to bottom.
• Columns are collected from left to right.
• We get y = {a, e, i, b, f, j, c, g, k, d, h, l}
Matrix
Table of values. Has rows and columns, but
numbering begins at 1 rather than 0.
a b c d row 1
e f g h row 2
i jkl row 3
• Use notation x(i,j) rather than x[i][j].
• May use a 2D array to represent a matrix.
Shortcomings Of Using A 2D
Array For A Matrix
• Indexes are off by 1.
• C++ arrays do not support matrix operations
such as add, transpose, multiply, and so on.
– Suppose that x and y are 2D arrays. Can’t do x + y,
x –y, x * y, etc. in Java.
• Develop a class Matrix for object-oriented
support of all matrix operations.
Diagonal Matrix
An n x n matrix in which all nonzero
terms are on the diagonal.
Diagonal Matrix
1000
0200
0030
0004
• x(i,j) is on diagonal iff i = j
• number of diagonal elements in an
n x n matrix is n
• non diagonal elements are zero
• store diagonal only vs n2 whole
Lower Triangular Matrix
An n x n matrix in which all nonzero terms are either
on or below the diagonal.
100 0
230 0
456 0
7 8 9 10
• x(i,j) is part of lower triangle iff i >= j.
• number of elements in lower triangle is 1 + 2 +
… + n = n(n+1)/2.
• store only the lower triangle
Array Of Arrays Representation
x[]
1
2 3
4 5 6
7 8 9 l0
Use an irregular 2-D array … length of rows is not
required to be the same.
Creating And Using An Irregular Array
// declare a two-dimensional array variable
// and allocate the desired number of rows
int ** irregularArray = new int* [numberOfRows];
// now allocate space for the elements in each row
for (int i = 0; i < numberOfRows; i++)
irregularArray[i] = new int [length[i]];
// use the array like any regular array
irregularArray[2][3] = 5;
irregularArray[4][6] = irregularArray[2][3] + 2;
irregularArray[1][1] += 3;
Map Lower Triangular Array Into A 1D Array
Use row-major order, but omit terms that are
not part of the lower triangle.
For the matrix
100 0
230 0
456 0
7 8 9 10
we get
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Index Of Element [i][j]
0 1 3 6
r r2 r3 … row i
1
• Order is: row 1, row 2, row 3, …
• Row i is preceded by rows 1, 2, …, i-1
• Size of row i is i.
• Number of elements that precede row i is
1 + 2 + 3 + … + i-1 = i(i-1)/2
• So element (i,j) is at position i(i-1)/2 + j -1 of
the 1D array.
Sparse Matrices
Matrix table of values
Sparse Matrices
Matrix table of values
00304
00570 Row 2
4 x 5 matrix
00000
4 rows
02600
5 columns
Column 4
20 elements
Sparse Matrices
Sparse matrix #nonzero elements/#elements
is small.
Examples:
• Diagonal
Only elements along diagonal may be nonzero
n x n matrix ratio is n/n2 = 1/n
• Tridiagonal
• Only elements on 3 central diagonals may be nonzero
• Ratio is (3n-2)/n2 = 3/n – 2/n2
Sparse Matrices
• Lower triangular (?)
• Only elements on or below diagonal may be nonzero
• Ratio is n(n+1)(2n2) ~ 0.5
These are structured sparse matrices. Nonzero
elements are in a well-defined portion of the
matrix.
Sparse Matrices
An n x n matrix may be stored as an n x n array.
This takes O(n2) space.
The example structured sparse matrices may be
mapped into a 1D array so that a mapping
function can be used to locate an element
quickly; the space required by the 1D array is
less than that required by an n x n array (next
lecture).
Unstructured Sparse Matrices
Airline flight matrix.
airports are numbered 1 through n
flight(i,j) = list of nonstop flights from airport i
to airport j
n = 1000 (say)
n x n array of list pointers => 4 million bytes
total number of nonempty flight lists = 20,000
(say)
need at most 20,000 list pointers => at most
80,000 bytes
Unstructured Sparse Matrices
Web page matrix.
web pages are numbered 1 through n
web(i,j) = number of links from page i to page j
Web analysis.
authority page … page that has many links to it
hub page … links to many authority pages
Web Page Matrix
n = 2 billion (and growing by 1 million a day)
n x n array of ints => 16 * 1018 bytes (16 * 109
GB)
each page links to 10 (say) other pages on
average
on average there are 10 nonzero entries per row
space needed for nonzero elements is
approximately 20 billion x 4 bytes = 80 billion
bytes (80 GB)
Representation Of Unstructured
Sparse Matrices
Single linear list in row-major order.
scan the nonzero elements of the sparse matrix in row-
major order (i.e., scan the rows left to right
beginning with row 1 and picking up the nonzero
elements)
each nonzero element is represented by a triple
(row, column, value)
the list of triples is stored in a 1D array
Single Linear List Example
00304 list =
00570 row 1 1 2 2 4 4
00000 column 3 5 3 4 2 3
02600 value 3 4 5 7 2 6
One Linear List Per Row
00304 row1 = [(3, 3), (5,4)]
00570 row2 = [(3,5), (4,7)]
00000 row3 = []
02600 row4 = [(2,2), (3,6)]
Single Linear List
Class SparseMatrix
• Array smArray of triples of type MatrixTerm
int row, col, value
• int rows, // number of rows
cols, // number of columns
terms, // number of nonzero elements
capacity; // size of smArray
Size of smArray generally not predictable at time
of initialization.
• Start with some default capacity/size (say 10)
• Increase capacity as needed
Approximate Memory Requirements
500 x 500 matrix with 1994 nonzero elements, 4
bytes per element
2D array 500 x 500 x 4 = 1million bytes
Class SparseMatrix 3 x 1994 x 4 + 4 x 4
= 23,944 bytes
Array Resizing
if (newSize < terms) throw “Error”;
MatrixTerm *temp = new MatrixTerm[newSize];
copy(smArray, smArray+terms, temp);
delete [] smArray;
smArray = temp;
capacity = newSize;
Array Resizing
To avoid spending too much overall time
resizing arrays, we generally set newSize =
c * oldSize, where c >0 is some constant.
Quite often, we use c = 2 (array doubling)
or c = 1.5.
Now, we can show that the total time spent
in resizing is O(s), where s is the maximum
number of elements added to smArray.
Matrix Transpose
0000
00304
0002
00570
3506
00000
0700
02600
4000
Matrix Transpose 0000
00304
0002
00570
3506
00000
0700
02600
4000
row 1 1 2 2 4 4 2 3 3 3 4 5
column 3 5 3 4 2 3 4 1 2 4 2 1
value 3 4 5 7 2 6 2 3 5 6 7 4
Matrix
0000
Transpose
Step 1: #nonzero in each row of
00304
0002 transpose.
00570 = #nonzero in each column o
3506
00000 original matrix
0700
02600 = [0, 1, 3, 1, 1]
4000
Step2: Start of each row of transpos
row 1 1 2 2 4 4 = sum of size of preceding
rows of
column 3 5 3 4 2 3
transpose
value 3 4 5 7 2 6
= [0, 0, 1, 4, 5]
Matrix Transpose
Step 1: #nonzero in each row of
transpose. Complexity
= #nonzero in each column of m x n original matrix
original matrix t nonzero elements
= [0, 1, 3, 1, 1] Step 1: O(n+t)
Step2: Start of each row of transpose Step 2: O(n)
= sum of size of preceding Step 3: O(t)
rows of
Overall O(n+t)
transpose
= [0, 0, 1, 4, 5]
Step 3: Move elements, left to right,
Runtime Performance
Matrix Transpose
500 x 500 matrix with 1994 nonzero elements
Run time measured on a 300MHz Pentium II PC
2D array 210 ms
SparseMatrix 6 ms
Performance
Matrix Addition.
500 x 500 matrices with 1994 and 999 nonzero
elements
2D array 880 ms
SparseMatrix 18 ms