Chapter 6
Data Types
ISBN 0-321-33025-0
Primitive Data Types
• Almost all programming languages provide a set of
primitive data types
• Primitive data types: Those not defined in terms of
other data types
Copyright © 2006 Addison-Wesley. All rights reserved. 2
Primitive Data Types: Integer
• It is the most common primitive numeric data type.
• Java include four signed integer sizes; byte, short,
int, and long.
Copyright © 2006 Addison-Wesley. All rights reserved. 3
Primitive Data Types: Floating Point
• Languages for scientific use support at least two
floating-point types (e.g., float and double;
sometimes more
Copyright © 2006 Addison-Wesley. All rights reserved. 4
Primitive Data Types: Boolean
• Simplest of all
• Range of values: two elements, one for “true” and
one for “false”
• Could be implemented as bits, but often as bytes.
Copyright © 2006 Addison-Wesley. All rights reserved. 5
Primitive Data Types: Character
• Stored as numeric codings
• Most commonly used coding: ASCII
• An alternative, 16-bit coding: Unicode
– Includes characters from most natural languages
– Originally used in Java
– C# and JavaScript also support Unicode
Copyright © 2006 Addison-Wesley. All rights reserved. 6
Character String Types Operations
• Values are sequences of characters
• Typical operations:
– Assignment and copying
– Comparison (=, >, etc.)
– Catenation
– Substring reference
– Pattern matching
Copyright © 2006 Addison-Wesley. All rights reserved. 7
Character String Type in Certain Languages
• C and C++
– Not primitive
– Use char arrays and a library of functions that provide
operations
• SNOBOL4 (a string manipulation language)
– Primitive
– Many operations, including elaborate pattern matching
• Java
– Primitive via the String class
Copyright © 2006 Addison-Wesley. All rights reserved. 8
Character String Length Options
• Several design choices for the length of string:
1. Static:
– The length is static and set when the string is created, for example
Java’s String class
2. Limited Dynamic Length:
– The length is varying up to a declared and fixed maximum.
– In C-based language, a special character is used to indicate the end of
a string’s characters, rather than maintaining the length.
3. Dynamic (no maximum):
– The length is varying with no maximum.
– SNOBOL4, Perl, JavaScript
• Ada supports all three string length options.
Copyright © 2006 Addison-Wesley. All rights reserved. 9
Character String Implementation
• Static length: compile-time descriptor
• Limited dynamic length: may need a run-time
descriptor for length (but not in C and C++)
• Dynamic length: need run-time descriptor;
allocation/de-allocation is the biggest
implementation problem
Copyright © 2006 Addison-Wesley. All rights reserved. 10
Compile- and Run-Time Descriptors
Compile-time Run-time
descriptor for descriptor for
static strings limited dynamic
strings
Copyright © 2006 Addison-Wesley. All rights reserved. 11
Enumeration Types
• Provides a way of defining and grouping collections
of named constants, which are called enumeration
constants.
• C# example
enum days {Mon, Tue, Wed, Thu, Fri, Sat, Sun};
• The enumeration constants are implicitly
assigned the integer values, 0, 1, …
• Design issues
– Is an enumeration constant allowed to appear in more than one type
definition, and if so, how is the type of an occurrence of that constant
checked?
– Are enumeration values coerced to integer?
– Any other type coerced to an enumeration type?
Copyright © 2006 Addison-Wesley. All rights reserved. 12
Evaluation of Enumerated Type
• Aid to readability, e.g., no need to code a color as a
number.
• Aid to reliability, e.g., compiler can check:
– Operations (don’t allow colors to be added)
– No enumeration variable can be assigned a value outside
its defined range.
– Ada, C#, and Java 5.0 provide better support for
enumeration than C++ because enumeration type variables
in these languages are not coerced into integer types.
Copyright © 2006 Addison-Wesley. All rights reserved. 13
Subrange Types
• An ordered contiguous subsequence of an ordinal
type
– Example: 12..18 is a subrange of integer type
• Ada’s design
type Days is (mon, tue, wed, thu, fri, sat, sun);
subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;
Day1: Days;
Day2: Weekday;
Day2 := Day1; --legal as long as Day1 not Sat or Sun.
Copyright © 2006 Addison-Wesley. All rights reserved. 14
Subrange Evaluation
• Aid to readability
– Make it clear to the readers that variables of subrange can
store only certain range of values
• Reliability
– Assigning a value to a subrange variable that is outside the
specified range is detected as an error
Copyright © 2006 Addison-Wesley. All rights reserved. 15
Implementation of User-Defined Ordinal Types
• Enumeration types are implemented as integers
• Subrange types are implemented like the parent types
with code inserted (by the compiler) to restrict
assignments to subrange variables
Copyright © 2006 Addison-Wesley. All rights reserved. 16
Array Indexing
• Indexing (or subscripting) is a mapping from indices
to elements
array_name (index_value_list) an element
• Index Syntax
– FORTRAN, PL/I, Ada use parentheses
– Most other languages use brackets
Copyright © 2006 Addison-Wesley. All rights reserved. 17
Arrays Index (Subscript) Types
• FORTRAN, C: integer only
• Pascal: integer, boolean, char, and enumeration
• Ada: integer, enumeration, boolean and char
• Java: integer only
Copyright © 2006 Addison-Wesley. All rights reserved. 18
Array Initialization
• Some language allow initialization at the time of
storage allocation
– C, C++, Java, C# example
int list [] = {4, 5, 7, 83}
– Character strings in C and C++
char name [] = “freddie”;
– Arrays of strings in C and C++
char *names [] = {“Bob”, “Jake”, “Joe”];
– Java initialization of String objects
String[] names = {“Bob”, “Jake”, “Joe”};
Copyright © 2006 Addison-Wesley. All rights reserved. 19
Arrays Operations
• APL provides the most powerful array processing operations
for vectors and matrixes as well as unary operators.
• Example: Consider the APL code
+/AxB
– Computes (A[1]xB[1]) + (A[2]xB[2]) + …
• Ada allows array assignment
• Fortran provides elemental operations
• For example, + operator between two arrays results in an
array of the sums of the element pairs of the two arrays.
Copyright © 2006 Addison-Wesley. All rights reserved. 20
Implementation of Arrays
• Access function maps subscript expressions to an
address in the array
• Access function for single-dimensioned arrays:
address(array1[k]) = address (array1[lower_bound])
+ ((k-lower_bound) * element_size)
Copyright © 2006 Addison-Wesley. All rights reserved. 21
Accessing Multi-dimensioned Arrays
• Two common ways:
– Row major order (by rows) – used in most languages
– column major order (by columns) – used in Fortran
Copyright © 2006 Addison-Wesley. All rights reserved. 22
Locating an Element in a Multi-dimensioned
Array (row major)
•General format
Location (a[i,j]) = address of a [row_lb,col_lb]
+ (((i - row_lb) * n) + (j - col_lb)) *
element_size
Where n is the number of elements per row.
Copyright © 2006 Addison-Wesley. All rights reserved. 23
Record Types
• A record is a possibly heterogeneous aggregate of
data elements in which the individual elements are
identified by names
• Design issues:
– What is the syntactic form of references to the field?
– Are elliptical references allowed
Copyright © 2006 Addison-Wesley. All rights reserved. 24
Examples
• COBOL uses level numbers to show nested records;
01 EMP-REC.
05 FIRST PIC X(20).
05 MID PIC X(10).
05 LAST PIC X(20).
05 HOURLY-RATE PIC 99V99.
• The EMP-REC record consists of four fields.
• The level numbers, such as 01 and 05, indicate the position of a data item
in a hierarchical structure of the data.
• The 01-level is called a record. The numbers 02-49 are available for
subdivision of a record. Gaps are usually left between the level numbers to
allow for ease in modifying the record structure.
• PIC X(n), called the picture clause, specifies n alphanumeric characters to
the filed.
• 99V99 specifies four decimal digits with the decimal point in the middle.
Copyright © 2006 Addison-Wesley. All rights reserved. 25
Examples
• Record structures are indicated in Ada as follows:
type Emp_Rec_Type is record
First: String (1..20);
Mid: String (1..10);
Last: String (1..20);
Hourly_Rate: Float;
end record;
Emp_Rec: Emp_Rec_Type;
Copyright © 2006 Addison-Wesley. All rights reserved. 26
References to Fields
• Record Field References
1. COBOL
field_name OF record_name_1 OF ... OF record_name_n
For example, the MID field in the above COBOL example
can be referenced with
MID of EMP-REC
2. Most of the others languages use the dot notation.
record_name_1. ... record_name_n.field_name
For example, the MID field in the above Ada example can be
referenced with
EMP-REC.MID
Copyright © 2006 Addison-Wesley. All rights reserved. 27
References to Records
• Fully qualified references must include all record
names.
• Elliptical references allow leaving out record names
as long as the reference is unambiguous, for example
in COBOL.
FIRST insteat of FIRST of EMP-REC is
elliptical references to the employee’s first name.
Copyright © 2006 Addison-Wesley. All rights reserved. 28
Operations on Records
• Assignment is very common.
• Ada allows record comparison for equality and
inequality.
• Ada records can be initialized.
Copyright © 2006 Addison-Wesley. All rights reserved. 29
Evaluation and Comparison to Arrays
• Records are used when collection of data values is
heterogeneous.
• Arrays are used when all the data values have the
same type
• Access to array elements is much slower than access
to record fields, because subscripts are dynamic
while field names are static.
Copyright © 2006 Addison-Wesley. All rights reserved. 30
Pointer and Reference Types
• A pointer type variable has a range of values that
consists of memory addresses and a special value,
nil.
• The value nil indicates that a pointer cannot currently
be used to reference any memory cell.
• Pointers, unlike arrays and records, are not structured
types.
• Very useful to implement dynamic data structures,
such linked lists and trees.
Copyright © 2006 Addison-Wesley. All rights reserved. 31
Pointer Operations
• Two fundamental operations: assignment and
dereferencing.
• Assignment is used to set a pointer variable’s value
to some useful address.
• Dereferencing yields the value stored at the location
represented by the pointer’s value
– Dereferencing can be explicit or implicit
– Fortran 95 uses implicit dereferncing.
– C++ uses an explicit operation via *
j = *ptr sets j to the value located at ptr
Copyright © 2006 Addison-Wesley. All rights reserved. 32
Pointer Assignment Illustrated
The assignment operation j = *ptr sets j to 206
Copyright © 2006 Addison-Wesley. All rights reserved. 33
Problems with Pointers
• Dangling pointers (dangerous)
– A pointer points to a heap-dynamic variable that has been
deallocated.
– It could be created by the following sequence of operations:
• Pointer p1 is set to point to a new heap-dynamic
variable
• Pointer p2 is assigned to p1’s value.
• The heap-dynamic variable pointed to by p1 is explicitly
deallocated (setting p1 to nil), but p2 is not changed by
the operation. P2 is now a dangling pointer
Copyright © 2006 Addison-Wesley. All rights reserved. 34
Problems with Pointers
• Lost heap-dynamic variable
– An allocated heap-dynamic variable that is no longer
accessible to the user program (often called garbage)
– It could be created by the following sequence of
operations:
• Pointer p1 is set to point to a newly created heap-
dynamic variable
• Pointer p1 is later set to point to another newly created
heap-dynamic variabl.
Copyright © 2006 Addison-Wesley. All rights reserved. 35
Pointer Arithmetic in C and C++
float stuff[100];
float *p;
p = stuff;
*(p+5) is equivalent to stuff[5] and p[5]
*(p+i) is equivalent to stuff[i] and p[i]
Copyright © 2006 Addison-Wesley. All rights reserved. 36
Summary
• The data types of a language are a large part of what
determines that language’s style and usefulness
• The primitive data types of most imperative languages
include numeric, character, and boolean types
• The user-defined enumeration and subrange types are
convenient and add to the readability and reliability of
programs.
• Arrays and records are included in most languages.
• Pointers are used for addressing flexibility and to control
dynamic storage management.
Copyright © 2006 Addison-Wesley. All rights reserved. 37