Introduction
• A data type defines a collection of data
objects and a set of predefined operations
on those objects
• A descriptor is the collection of the
attributes of a variable
• An object represents an instance of a
user-defined (abstract data) type
• One design issue for all data types: What
operations are defined and how are they
specified?
Copyright © 2015 Pearson. All rights reserved. 1-2
Primitive Data Types
• Almost all programming languages provide
a set of primitive data types
• Primitive data types: Those not defined in
terms of other data types
• Some primitive data types are merely
reflections of the hardware
• Others require only a little non-hardware
support for their implementation
Copyright © 2015 Pearson. All rights reserved. 1-3
Primitive Data Types: Integer
• Almost always an exact reflection of the
hardware so the mapping is trivial
• There may be as many as eight different
integer types in a language
• Java’s signed integer sizes: byte, short,
int, long
• typically the leftmost bit defines the sign
Copyright © 2015 Pearson. All rights reserved. 1-4
Primitive Data Types: Floating Point
• Model real numbers, but only as
approximations
• Languages for scientific use support at least
two floating-point types (e.g., float and
double; sometimes more
• Usually exactly like the hardware, but not
always
• IEEE Floating-Point
Standard 754 (single
and double precision)
Copyright © 2015 Pearson. All rights reserved. 1-5
Primitive Data Types: Complex
• Some languages support a complex type,
e.g., C99, Fortran, and Python
• Each value consists of two floats, the real
part and the imaginary part
• Literal form (in Python):
(7 + 3j), where 7 is the real part and 3 is
the imaginary part
Copyright © 2015 Pearson. All rights reserved. 1-6
Primitive Data Types: Decimal
• For business applications (money)
– Essential to COBOL
– C# offers a decimal data type
• Store a fixed number of decimal digits, in
coded form (BCD - Binary Coded Decimal)
• Advantage: accuracy
• Disadvantages: limited range, wastes
memory (1 or 2 digits per byte)
Copyright © 2015 Pearson. All rights reserved. 1-7
Primitive Data Types: Boolean
• Simplest of all
• Range of values: two elements, one for
“true” and one for “false”
• Could be implemented as bits, but often as
bytes
– Advantage: readability
Copyright © 2015 Pearson. All rights reserved. 1-8
Primitive Data Types: Character
• Stored as numeric codings
• Most commonly used coding: ASCII (1 byte)
• An alternative, 16-bit coding: Unicode
(UCS-2)
– Includes characters from most natural languages
– Originally used in Java
– C# and JavaScript also support Unicode
• 32-bit Unicode (UCS-4 or )
– Supported by Fortran, starting with 2003
Copyright © 2015 Pearson. All rights reserved. 1-9
Character String Types
• Values are sequences of characters
• Design issues:
– Is it a primitive type or just a special kind of
array?
– Should the length of strings be static or
dynamic?
Copyright © 2015 Pearson. All rights reserved. 1-10
Character String Types Operations
• Typical operations:
– Assignment and copying
– Comparison (=, >, etc.)
– Catenation
– Substring reference
– Pattern matching
Copyright © 2015 Pearson. All rights reserved. 1-11
Character String Type in Certain
Languages
• C and C++
– Not primitive
– Use char arrays and a library of functions that provide
operations
• SNOBOL4 (a string manipulation language)
– Primitive
– Many operations, including elaborate pattern matching
• Fortran and Python
– Primitive type with assignment and several operations
• Java
– Primitive via the String class
• Perl, JavaScript, Ruby, and PHP
- Provide built-in pattern matching, using regular
expressions
Copyright © 2015 Pearson. All rights reserved. 1-12
Character String Length Options
• Static: COBOL, Java’s String class
• Limited Dynamic Length: C and C++
– In these languages, a special character is used
to indicate the end of a string’s characters,
rather than maintaining the length
• Dynamic (no maximum): SNOBOL4, Perl,
JavaScript
Copyright © 2015 Pearson. All rights reserved. 1-13
Character String Type Evaluation
• Aid to writability
• As a primitive type with static length, they
are inexpensive to provide--why not have
them?
• Dynamic length is nice, but is it worth the
expense?
Copyright © 2015 Pearson. All rights reserved. 1-14
Character String Implementation
• Static length: compile-time descriptor
• Limited dynamic length: may need a
run-time descriptor for length (but not in C
and C++)
• Dynamic length: need run-time descriptor;
allocation/deallocation is the biggest
implementation problem
Copyright © 2015 Pearson. All rights reserved. 1-15
Compile- and Run-Time Descriptors
Compile-time Run-time
descriptor for descriptor for
static strings limited dynamic
strings
Copyright © 2015 Pearson. All rights reserved. 1-16
Enumeration Types
• All possible values, which are named
constants, are provided in the definition
• C# example
enum days {mon, tue, wed, thu, fri, sat, sun};
• Design issues
– Is an enumeration constant allowed to appear in
more than one type definition, and if so, how is
the type of an occurrence of that constant
checked?
– Are enumeration values coerced to integer?
– Any other type coerced to an enumeration type?
Copyright © 2015 Pearson. All rights reserved. 1-18
Evaluation of Enumerated Type
• Aid to readability, e.g., no need to code a
color as a number
• Aid to reliability, e.g., compiler can check:
– operations (don’t allow colors to be added)
– No enumeration variable can be assigned a value
outside its defined range
– C# and Java 5.0 provide better support for
enumeration than C++ because enumeration
type variables in these languages are not
coerced into integer types
Copyright © 2015 Pearson. All rights reserved. 1-19
Array Types
• An array is a homogeneous aggregate of
data elements in which an individual
element is identified by its position in the
aggregate, relative to the first element.
Copyright © 2015 Pearson. All rights reserved. 1-20
Array Design Issues
• What types are legal for subscripts?
• Are subscripting expressions in element
references range checked?
• When are subscript ranges bound?
• When does allocation take place?
• Are ragged or rectangular multidimensional
arrays allowed, or both?
• What is the maximum number of subscripts?
• Can array objects be initialized?
• Are any kind of slices supported?
Copyright © 2015 Pearson. All rights reserved. 1-21
Array Indexing
• Indexing (or subscripting) is a mapping
from indices to elements
array_name (index_value_list) → an element
• Index Syntax
– Fortran and Ada use parentheses
• Ada explicitly uses parentheses to show uniformity
between array references and function calls because
both are mappings
– Most other languages use brackets
Copyright © 2015 Pearson. All rights reserved. 1-22
Arrays Index (Subscript) Types
• FORTRAN, C: integer only
• Java: integer types only
• Index range checking
- C, C++, Perl, and Fortran do not specify
range checking
- Java, ML, C# specify range checking
Copyright © 2015 Pearson. All rights reserved. 1-23
Subscript Binding and Array Categories
• Static: subscript ranges are statically bound
and storage allocation is static (before
run-time)
– Advantage: efficiency (no dynamic allocation)
• Fixed stack-dynamic: subscript ranges are
statically bound, but the allocation is done
at declaration time
– Advantage: space efficiency
Copyright © 2015 Pearson. All rights reserved. 1-24
Subscript Binding and Array Categories
(continued)
• Fixed heap-dynamic: similar to fixed
stack-dynamic: storage binding is dynamic
but fixed after allocation (i.e., binding is
done when requested and storage is
allocated from heap, not stack)
Copyright © 2015 Pearson. All rights reserved. 1-25
Subscript Binding and Array Categories
(continued)
• Heap-dynamic: binding of subscript ranges
and storage allocation is dynamic and can
change any number of times
– Advantage: flexibility (arrays can grow or shrink
during program execution)
Copyright © 2015 Pearson. All rights reserved. 1-26
Subscript Binding and Array Categories
(continued)
• C and C++ arrays that include static modifier
are static
• C and C++ arrays without static modifier are
fixed stack-dynamic
• C and C++ provide fixed heap-dynamic
arrays
• C# includes a second array class ArrayList
that provides heap-dynamic
• Perl, JavaScript, Python, and Ruby support
heap-dynamic arrays
Copyright © 2015 Pearson. All rights reserved. 1-27
Array Initialization
• Some language allow initialization at the
time of storage allocation
– C, C++, Java, C# example
int list [] = {4, 5, 7, 83}
– Character strings in C and C++
char name [] = ″freddie″;
– Arrays of strings in C and C++
char *names [] = {″Bob″, ″Jake″, ″Joe″];
– Java initialization of String objects
String[] names = {″Bob″, ″Jake″, ″Joe″};
Copyright © 2015 Pearson. All rights reserved. 1-28
Heterogeneous Arrays
• A heterogeneous array is one in which the
elements need not be of the same type
• Supported by Perl, Python, JavaScript, and
Ruby
Copyright © 2015 Pearson. All rights reserved. 1-29
Arrays Operations
• APL provides the most powerful array processing
operations for vectors and matrixes as well as
unary operators, for example:
○ ϕV : to reverse column elements of vector V
○ ØM: transposes matrix M
• Python’s array assignments, but they are only
reference changes. Python also supports array
catenation and element membership operations
• Ruby also provides array catenation
Copyright © 2015 Pearson. All rights reserved. 1-30
Slices
• A slice is some substructure of an array;
nothing more than a referencing
mechanism
• Slices are only useful in languages that have
array operations
Copyright © 2015 Pearson. All rights reserved. 1-31
Slice Examples
• Python
vector = [2, 4, 6, 8, 10, 12, 14, 16]
mat = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
mat[0][0:2] is the first and second element of the
first row of mat
• Ruby supports slices with the slice method
list.slice(2, 2) returns the third and fourth
elements of list
Copyright © 2015 Pearson. All rights reserved. 1-32
Rectangular and Jagged Arrays
• A rectangular array is a multi-dimensioned
array in which all of the rows have the same
number of elements and all columns have
the same number of elements
• A jagged matrix has rows with varying
number of elements
– Possible when multi-dimensioned arrays actually
appear as arrays of arrays
• C, C++, and Java support jagged arrays
• F# and C# support rectangular arrays and
jagged arrays
Copyright © 2015 Pearson. All rights reserved. 1-33
Implementation of Arrays
• Access function maps subscript expressions
to an address in the array
• Access function for single-dimensioned
arrays:
Copyright © 2015 Pearson. All rights reserved. 1-34
Implementation of Arrays
• Access function maps subscript expressions
to an address in the array
• Access function for single-dimensioned
arrays:
address(list[k]) = address (list[lower_bound])
+ ((k-lower_bound) * element_size)
Copyright © 2015 Pearson. All rights reserved. 1-35
Accessing Multi-dimensioned Arrays
• Two common ways:
– Row major order (by rows) – used in most
languages
– Column major order (by columns) – used in
Fortran
– A compile-time descriptor
for a multidimensional
array
Copyright © 2015 Pearson. All rights reserved. 1-36
Locating an Element in a
Multi-dimensioned Array
Copyright © 2015 Pearson. All rights reserved. 1-37
Locating an Element in a
Multi-dimensioned Array
•General format
Location (a[I,j]) = address of a [row_lb,col_lb] +
(((I - row_lb) * n) + (j - col_lb)) * element_size
Copyright © 2015 Pearson. All rights reserved. 1-38
Compile-Time Descriptors
Single-dimensioned array Multidimensional array
Copyright © 2015 Pearson. All rights reserved. 1-39
Associative Arrays
• An associative array is an unordered
collection of data elements that are
indexed by an equal number of values
called keys
– User-defined keys must be stored
• Design issues:
- What is the form of references to elements?
- Is the size static or dynamic?
• Built-in type in Perl, Python, Ruby, and Lua
– In Lua, they are supported by tables
Copyright © 2015 Pearson. All rights reserved. 1-40
Associative Arrays in Perl
• Names begin with %; literals are delimited
by parentheses
%hi_temps = ("Mon" => 77, "Tue" => 79, "Wed" =>
65, …);
• Subscripting is done using braces and keys
$hi_temps{"Wed"} = 83;
– Elements can be removed with delete
delete $hi_temps{"Tue"};
Copyright © 2015 Pearson. All rights reserved. 1-41
Record Types
• A record is a possibly heterogeneous
aggregate of data elements in which the
individual elements are identified by names
• Design issues:
– What is the syntactic form of references to the
field?
– Are elliptical references allowed
Copyright © 2015 Pearson. All rights reserved. 1-42
Definition of Records in COBOL
• COBOL uses level numbers to show nested
records; others use recursive definition
01 EMP-REC.
20 alphanumeric characters
02 EMP-NAME.
05 FIRST PIC X(20).
05 MID PIC X(10).
05 LAST PIC X(20).
02 HOURLY-RATE PIC 99V99.
4 decimal digits, V is the
decimal point
Copyright © 2015 Pearson. All rights reserved. 1-43
References to Records
• Record field references
1. COBOL
field_name OF record_name_1 OF ... OF record_name_n
2. Others (dot notation)
record_name_1.record_name_2. ... record_name_n.field_name
• Fully qualified references must include all record names
• Elliptical references allow leaving out record names as long
as the reference is unambiguous, for example in COBOL
FIRST, FIRST OF EMP-NAME, and FIRST of EMP-REC are
elliptical references to the employee’s first name
Copyright © 2015 Pearson. All rights reserved. 1-44
Evaluation and Comparison to Arrays
• Records are used when collection of data
values is heterogeneous
• Access to array elements is much slower
than access to record fields, because
subscripts are dynamic (field names are
static)
• Dynamic subscripts could be used with
record field access, but it would disallow
type checking and it would be much slower
Copyright © 2015 Pearson. All rights reserved. 1-45
Descriptor of Record Type
Offset address relative to
the beginning of the records
is associated with each field
Copyright © 2015 Pearson. All rights reserved. 1-46
Tuple Types
• A tuple is a data type that is similar to a
record, except that the elements are not
named
• Used in Python, ML, and F# to allow
functions to return multiple values
– Python
• Closely related to its lists, but immutable
• Create with a tuple literal
myTuple = (3, 5.8, ′apple′)
Referenced with subscripts (begin at 0, myTuple[0])
Catenation with + and deleted with del
Copyright © 2015 Pearson. All rights reserved. 1-47
Tuple Types (continued)
• ML
val myTuple = (3, 5.8, ′apple′);
- Access as follows:
#1(myTuple) is the first element
- A new tuple type can be defined
type intReal = int * real;
• F#
let tup = (3, 5, 7)
let a, b, c = tup This assigns a tuple to
a tuple pattern (a, b, c)
Copyright © 2015 Pearson. All rights reserved. 1-48
List Types
• Lists in Lisp and Scheme are delimited by
parentheses and use no commas
(A B C D) and (A (B C) D)
• Data and code have the same form
As data, (A B C) is literally what it is
As code, (A B C) is the function A applied to the
parameters B and C
• The interpreter needs to know which a list
is, so if it is data, we quote it with an
apostrophe
′(A B C) is data
Copyright © 2015 Pearson. All rights reserved. 1-49
List Types (continued)
• List Operations in Scheme
– CAR returns the first element of its list parameter
(CAR ′(A B C)) returns A
– CDR returns the remainder of its list parameter
after the first element has been removed
(CDR ′(A B C)) returns (B C)
- CONS puts its first parameter into its second
parameter, a list, to make a new list
(CONS ′A (B C)) returns (A B C)
- LIST returns a new list of its parameters
(LIST ′A ′B ′(C D)) returns (A B (C D))
Copyright © 2015 Pearson. All rights reserved. 1-50
List Types (continued)
• List Operations in ML
– Lists are written in brackets and the elements
are separated by commas
– List elements must be of the same type
– The Scheme CONS function is a binary operator in
ML, ::
3 :: [5, 7, 9] evaluates to [3, 5, 7, 9]
– The Scheme CAR and CDR functions are named hd
and tl, respectively
Copyright © 2015 Pearson. All rights reserved. 1-51
List Types (continued)
• Python Lists
– The list data type also serves as Python’s arrays
– Unlike Scheme, Common Lisp, ML, and F#,
Python’s lists are mutable
– Elements can be of any type
– Create a list with an assignment
myList = [3, 5.8, "grape"]
Copyright © 2015 Pearson. All rights reserved. 1-52
List Types (continued)
• Python Lists (continued)
– List elements are referenced with subscripting,
with indices beginning at zero
x = myList[1] Sets x to 5.8
– List elements can be deleted with del
del myList[1]
– List Comprehensions – derived from set notation
[x * x for x in range(6) if x % 3 == 0]
creates [0, 1, 2, 3, 4, 5]
range(6)
Constructed list: [0, 9]
Copyright © 2015 Pearson. All rights reserved. 1-53
List Types (continued)
• Haskell’s List Comprehensions
– The original
[n * n | n <- [1..10]]
• Both C# and Java supports lists through
their generic heap-dynamic collection
classes, List and ArrayList, respectively
Copyright © 2015 Pearson. All rights reserved. 1-54
Unions Types
• A union is a type whose variables are
allowed to store different type values at
different times during execution
• Design issue
– Should type checking be required?
Copyright © 2015 Pearson. All rights reserved. 1-55
Discriminated vs. Free Unions
• C and C++ provide union constructs in
which there is no language support for type
checking; the union in these languages is
called free union
• Type checking of unions require that each
union include a type indicator called a
discriminant
– Supported by ML, Haskell, and F#
Copyright © 2015 Pearson. All rights reserved. 1-56
Evaluation of Unions
• Free unions are unsafe
– Do not allow type checking
• Java and C# do not support unions
– Reflective of growing concerns for safety in
programming language
Copyright © 2015 Pearson. All rights reserved. 1-61
Type Checking
• Generalize the concept of operands and operators to include
subprograms and assignments
• Type checking is the activity of ensuring that the operands of
an operator are of compatible types
• A compatible type is one that is either legal for the operator,
or is allowed under language rules to be implicitly converted,
by compiler- generated code, to a legal type
– This automatic conversion is called a coercion.
• A type error is the application of an operator to an operand
of an inappropriate type
Copyright © 2015 Pearson. All rights reserved. 1-62
Type Checking (continued)
• If all type bindings are static, nearly all type
checking can be static
• If type bindings are dynamic, type checking
must be dynamic
• A programming language is strongly typed
if type errors are always detected
• Advantage of strong typing: allows the
detection of the misuses of variables that
result in type errors
Copyright © 2015 Pearson. All rights reserved. 1-63
Strong Typing
Language examples:
– C and C++ are not: parameter type checking can
be avoided; unions are not type checked
– Java and C# are, almost (because of explicit type
casting)
- ML and F# are
Copyright © 2015 Pearson. All rights reserved. 1-64
Strong Typing (continued)
• Coercion rules strongly affect strong
typing--they can weaken it considerably
(C++ versus ML and F#)
• Although Java has just half the assignment
coercions of C++, its strong typing is still
far less effective than that of ML or F#
Copyright © 2015 Pearson. All rights reserved. 1-65
Name Type Equivalence
• Name type equivalence means the two
variables have equivalent types if they are
in either the same declaration or in
declarations that use the same type name
• Easy to implement but highly restrictive:
– Subranges of integer types are not equivalent
with integer types
– Formal parameters must be the same type as
their corresponding actual parameters
Copyright © 2015 Pearson. All rights reserved. 1-66
Structure Type Equivalence
• Structure type equivalence means that two
variables have equivalent types if their
types have identical structures
• More flexible, but harder to implement
Copyright © 2015 Pearson. All rights reserved. 1-67
Theory and Data Types
• Type theory is a broad area of study in
mathematics, logic, computer science, and
philosophy
• Two branches of type theory in computer
science:
– Practical – data types in commercial languages
– Abstract – typed lambda calculus
• A type system is a set of types and the rules
that govern their use in programs
Copyright © 2015 Pearson. All rights reserved. 1-69
Summary
• The data types of a language are a large part of
what determines that language’s style and
usefulness
• The primitive data types of most imperative
languages include numeric, character, and Boolean
types
• The user-defined enumeration and subrange types
are convenient and add to the readability and
reliability of programs
• Arrays and records are included in most languages
• Pointers are used for addressing flexibility and to
control dynamic storage management
Copyright © 2015 Pearson. All rights reserved. 1-71