Principles
of
Programming
Languages
h"p://www.di.unipi.it/~andrea/Dida2ca/PLP-14/
Prof.
Andrea
Corradini
Department
of
Computer
Science,
Pisa
Lesson 24!
Composite
data
types
(contd)
Summary
Data
Types
in
programming
languages
Type
system,
Type
safety,
Type
checking
Equivalence,
compaEbility
and
coercion
PrimiEve
and
composite
types
Discrete
and
scalar
types
Tuples
and
records
Arrays
Unions
Pointers
Recursive
types
2
A
brief
overview
of
composite
types
We
review
type
constructors
in
several
languages
corresponding
to
the
following
mathemaEcal
concepts:
Cartesian
products
(records
and
tuples)
mappings
(arrays)
disjoint
unions
(algebraic
data
types,
unions)
recursive
types
(lists,
trees,
etc.)
Mappings
We
write
m
:
S
T
to
state
that
m
is
a
mapping
from
set
S
to
set
T.
In
other
words,
m
maps
every
value
in
S
to
some
value
in
T.
If
m
maps
value
x
to
value
y,
we
write
y
=
m(x).
The
value
y
is
called
the
image
of
x
under
m.
Some
of
the
mappings
in
{u,
v}
{a,
b,
c}:
m1
=
{u
a,
v
c}
m2
=
{u
c,
v
c}
m3
=
{u
c,
v
b}
image of u is c,
image of v is b
4
Arrays
(1)
Arrays
(found
in
all
imperaEve
and
OO
PLs)
can
be
understood
as
mappings.
If
the
arrays
elements
are
of
type
T
(base
type)
and
its
index
values
are
of
type
S,
the
arrays
type
is
S
T.
An
arrays
length
is
the
number
of
components,
#S.
Basic
operaEons
on
arrays:
construc@on
of
an
array
from
its
components
indexing
using
a
computed
index
value
to
select
a
component.
so we can select the ith component
5
Arrays
(2)
An
array
of
type
S
T
is
a
nite
mapping.
Here
S
is
nearly
always
a
nite
range
of
consecuEve
values
{l,
l+1,
,
u}.
This
is
called
the
arrays
index
range.
lower bound
upper bound
In
C
and
Java,
the
index
range
must
be
{0,
1,
,
n1}.
In
Pascal
and
Ada,
the
index
range
may
be
any
scalar
(sub)type
other
than
real/oat.
We
can
generalise
to
n-dimensional
arrays.
If
an
array
has
index
ranges
of
types
S1,
,
Sn,
the
arrays
type
is
S1
Sn
T.
6
When
is
the
index
range
known?
A
sta@c
array
is
an
array
variable
whose
index
range
is
xed
by
the
program
code.
A
dynamic
array
is
an
array
variable
whose
index
range
is
xed
at
the
Eme
when
the
array
variable
is
created.
In
Ada,
the
deniEon
of
an
array
type
must
x
the
index
type,
but
need
not
x
the
index
range.
Only
when
an
array
variable
is
created
must
its
index
range
be
xed.
Arrays
as
formal
parameters
of
subrouEnes
are
oden
dynamic
(eg.
conformant
arrays
in
Pascal)
A
exible
(or
fully
dynamic)
array
is
an
array
variable
whose
index
range
is
not
xed
at
all,
but
may
change
whenever
a
new
array
value
is
assigned.
Example:
C
staEc
arrays
Array
variable
declaraEons:
index range
float v1[] = {2.0, 3.0, 5.0, 7.0};
is {0, , 3}
float v2[10];
index range is {0, , 9}
Function:
void
print_vector (float v[], int n) {
// Print the array v[0], , v[n-1] in the form [ ].
int i;
printf("[%f", v[0]);
for (i = 1; i < n; i++)
A C array
printf(" %f", v[i]);
printf("]");
doesnt know
}
print_vector(v1, 4);
its own length!
print_vector(v2, 10);
3-8
Example:
Ada
dynamic
arrays
Array
type
and
variable
declaraEons:
type Vector is
array (Integer range <>) of Float;
v1: Vector(1 .. 4) := (1.0, 0.5, 5.0, 3.5);
v2: Vector(0 .. m) := (0 .. m => 0.0);
Procedure:
procedure print_vector (v: in Vector) is
-- Print
the
array
v
in
the
form
[
].
begin
put('['); put(v(v'first));
for i in v'first + 1 .. v'last loop
put(' '); put(v(i));
end loop;
put(']');
end;
print_vector(v1); print_vector(v2);
3-9
Example:
Java
exible
arrays
Array
variable
declaraEons:
index range
float[] v1 = {1.0, 0.5, 5.0, 3.5};
is {0, , 3}
float[] v2 = {0.0, 0.0, 0.0};
index range is {0, , 2}
v1 = v2;
v1s index range is now {0, , 2}
Method:
static void printVector (float[] v) {
// Print the array v in the form [ ].
System.out.print("[" + v[0]);
for (int i = 1; i < v.length; i++)
System.out.print(" " + v[i]);
System.out.print("]"); Enhanced
for:
}
for (float f : v)
System.out.print(" " + f)
printVector(v1); printVector(v2);
3-10
Array
allocaEon
sta$c
array,
global
life$me
If
a
staEc
array
can
exist
throughout
the
execuEon
of
the
program,
then
the
compiler
can
allocate
space
for
it
in
sta>c
global
memory
sta$c
array,
local
life$me
If
a
staEc
array
should
not
exist
throughout
the
execuEon
of
the
program,
then
space
can
be
allocated
in
the
subrou>nes
stack
frame
at
run
Eme.
dynamic
array,
local
life$me
If
the
index
range
is
known
at
runEme,
the
array
can
sEll
be
allocated
in
the
stack,
but
in
a
variable
size
area
fully
dynamic
If
the
index
range
can
be
modied
at
runEme
it
has
to
be
allocated
in
the
heap
11
Chapter 7 Data Types
AllocaEon
of
dynamic
arrays
on
stack
sp
-- Ada:
procedure foo (size : integer) is
M : array (1..size, 1..size) of real;
...
begin
...
end foo;
Local
variables
// C99:
void foo(int size) {
double M[size][size];
...
}
Variable-size
part of the frame
Temporaries
Pointer to M
Dope vector
Fixed-size part
of the frame
Bookkeeping
fp
Return address
Arguments
and returns
12
Arrays:
memory
layout
ConEguous
elements
column
major
-
only
in
Fortran
row
major
used
by
everybody
else
Row
pointers
an
opEon
in
C,
the
rule
in
Java
allows
rows
to
be
put
anywhere
-
nice
for
big
arrays
on
machines
with
segmentaEon
problems
avoids
mulEplicaEon
nice
for
matrices
whose
rows
are
of
dierent
lengths
e.g.
an
array
of
strings
requires
extra
space
for
the
pointers
13
Arrays
memory
layout
in
C
Address
computaEon
varies
a
lot
With
conEguous
allocaEon
part
of
the
computaEon
can
be
done
staEcally
14
Strings
A
string
is
a
sequence
of
0
or
more
characters.
Usually
ad-hoc
syntax
is
supported
Some
PLs
(ML,
Python)
treat
strings
as
primi>ve.
Haskell
treats
strings
as
lists
of
characters.
Strings
are
thus
equipped
with
general
list
operaEons
(length,
head
selecEon,
tail
selecEon,
concatenaEon,
).
Ada
treats
strings
as
arrays
of
characters.
Strings
are
thus
equipped
with
general
array
operaEons
(length,
indexing,
slicing,
concatenaEon,
).
Java
treats
strings
as
objects,
of
class
String.
15
Disjoint
Union
(1)
In
a
disjoint
union,
a
value
is
chosen
from
one
of
several
dierent
types.
Let
S
+
T
stand
for
a
set
of
disjoint-union
values,
each
of
which
consists
of
a
tag
together
with
a
variant
chosen
from
either
type
S
or
type
T.
The
tag
indicates
the
type
of
the
variant:
S
+
T
=
{
leB
x
|
x
S
}
{
right
y
|
y
T
}
leB
x
is
a
value
with
tag
leB
and
variant
x
chosen
from
S
right
x
is
a
value
with
tag
right
and
variant
y
chosen
from
T.
We
write
le8
S
+
right
T
(instead
of
S
+
T)
when
we
want
to
make
the
tags
explicit.
16
Disjoint
Union
(2)
Basic
operaEons
on
disjoint-union
values
in
S
+
T:
construc@on
of
a
disjoint-union
value
from
its
tag
and
variant
tag
test,
to
determine
whether
the
variant
was
chosen
from
S
or
T
projec@on,
to
recover
either
the
variant
in
S
or
the
variant
in
T.
Algebraic
data
types
(Haskell),
discriminated
records
(Ada),
unions
(C)
and
objects
(Java)
can
all
be
understood
in
terms
of
disjoint
unions.
We
can
generalise
to
mulEple
variants:
S1
+
S2
+
+
Sn.
17
Example:
Haskell/ML
algebraic
data
types
Type
declaraEon:
data Number = Exact Int | Inexact Float
Each Number value consists of a tag, together
with either an Integer variant (if the tag is
Exact) or a Float variant (if the tag is Inexact).
ApplicaEon
code:
pi = Inexact 3.1416
rounded :: Number -> Integer
rounded num =
case num of
Exact i
-> i
projection
Inexact r -> round r
(by pattern
matching)
18
Variant
records
(unions)
Origin:
Fortran
I
equivalence
Fortran
I
--
equivalence
statement
statement:
variables
should
integer i
r
share
the
same
memory
locaEon
real
logical b
equivalence (i, r, b)
Cs
union
types
MoEvaEons:
Saving
space
Need
of
dierent
access
to
the
same
memory
locaEons
for
system
programming
AlternaEve
conguraEons
of
a
data
type
C
--
union
union {
int i;
double d;
_Bool b;
};
19
Variant
records
(unions)
(2)
In
Ada,
Pascal,
unions
are
discriminated
by
a
tag,
called
discriminant
Integrated
with
records
in
Pascal/Ada,
not
in
C
ADA
discriminated
variant
tag
type Form is
(pointy, circular, rectangular);
type Figure (f: Form := pointy) is record
x, y: Float;
case f is
when pointy
=> null;
when circular
=> r: Float;
when rectangular => w, h: Float;
end case;
end record;
20
Using
discriminated
records
in
Ada
ApplicaEon
code:
discriminated-record
construction
box: Figure :=
(rectangular, 1.5, 2.0, 3.0, 4.0);
function area (fig: Figure) return Float
is
begin
case fig.f is
when pointy =>
return 0.0;
tag test
when circular =>
return 3.1416 * fig.r**2;
when rectangular =>
return fig.w * fig.h;
end case;
end;
projection
21
(Lack
of)
Safety
in
variant
records
Only
Ada
has
strict
rules
for
assignment:
tag
and
variant
have
to
be
changed
together
For
nondiscriminated
unions
(Fortran,
C)
no
runEme
check:
responsibility
of
the
programmer
In
Pascal
the
tag
eld
can
be
modied
independently
of
the
variant.
Even
worse:
the
tag
eld
is
opEonal.
Unions
not
included
in
Modula
3,
Java,
and
recent
OO
laguages:
replaced
by
classes
+
inheritance
22
Example:
Java
objects
(1)
Type
declaraEons:
class Point {
private float x, y;
// methods
}
class Circle extends Point {
private float r;
inherits x and y
// methods
from Point
}
class Rectangle extends Point {
private float w, h;
inherits x and y
// methods
from Point
}
23
Example:
Java
objects
(2)
Methods:
class Point {
public float area()
{ return 0.0; }
}
class Circle extends Point {
public float area()
{ return 3.1416 * r * r; }
}
class Rectangle extends Point
public float area()
{ return w * h; }
}
overrides Points
area() method
{
overrides Points
area() method
24
Example:
Java
objects
(3)
ApplicaEon
code:
Rectangle box =
new Rectangle(1.5, 2.0, 3.0,4.0);
float a1 = box.area();
Point it = ;
float a2 = it.area();
it can refer to a
Point, Circle, or
Rectangle object
calls the appropriate
area() method
25
Value
model
vs.
reference
model
What
happens
when
a
composite
value
is
assigned
to
a
variable
of
the
same
type?
Value
model
(aka
copy
seman@cs):
all
components
of
the
composite
value
are
copied
into
the
corresponding
components
of
the
composite
variable.
Reference
model:
the
composite
variable
is
made
to
contain
a
reference
to
the
composite
value.
Note:
this
makes
no
dierence
for
basic
or
immutable
types.
C
and
Ada
adopt
copy
semanEcs.
Java
adopts
value
model
for
primiEve
values,
reference
model
for
objects.
FuncEonal
languages
usually
adopt
the
reference
model
26
Example:
Ada
value
model
(1)
DeclaraEons:
type Date is
record
y: Year_Number;
m: Month;
d: Day_Number;
end record;
dateA: Date := (2004, jan, 1);
dateB: Date;
Eect
of
copy
semanEcs:
dateB := dateA;
dateB.y := 2005;
dateA
2004
jan
1
dateB
2004
2005
?
jan
?
1?
3-27
Example:
Java
reference
model
(1)
DeclaraEons:
class Date {
int y, m, d;
public Date (int y, int m, int d)
{ }
}
Date dateR = new Date(2004, 1, 1);
Date dateS = new Date(2004, 12, 25);
Eect
of
reference
semanEcs:
dateS = dateR;
dateR.y = 2005;
dateR
dateS
2005
2004
1
1
2004
12
25
3-28
Ada
reference
model
with
pointers
(2)
We
can
achieve
the
eect
of
reference
model
in
Ada
by
using
explicit
pointers:
type Date_Pointer is access Date;
Date_Pointer dateP = new Date;
Date_Pointer dateQ = new Date;
dateP.all := dateA;
dateQ := dateP;
3-29
Java
value
model
with
cloning
(2)
We
can
achieve
the
eect
of
copy
semanEcs
in
Java
by
cloning:
Date dateR = new Date(2004, 4, 1);
dateT = dateR.clone();
3-30
Pointers
Thus
in
a
language
adopEng
the
value
model,
the
reference
model
can
be
simulated
with
the
use
of
pointers.
A
pointer
(value)
is
a
reference
to
a
parEcular
variable.
A
pointers
referent
is
the
variable
to
which
it
refers.
A
null
pointer
is
a
special
pointer
value
that
has
no
referent.
A
pointer
is
essenEally
the
address
of
its
referent
in
the
store,
but
it
also
has
a
type.
The
type
of
a
pointer
allows
us
to
infer
the
type
of
its
referent.
Pointers
mainly
serve
two
purposes:
ecient
(someEmes
intuiEve)
access
to
elaborated
objects
(as
in
C)
dynamic
creaEon
of
linked
data
structures,
in
conjuncEon
with
a
heap
storage
manager
3-31
Dangling
pointers
A
dangling
pointer
is
a
pointer
to
a
variable
that
has
been
destroyed.
Dangling
pointers
arise
from
the
following
situaEons:
where
a
pointer
to
a
heap
variable
sEll
exists
ader
the
heap
variable
is
destroyed
by
a
deallocator
where
a
pointer
to
a
local
variable
sEll
exists
at
exit
from
the
block
in
which
the
local
variable
was
declared.
A
deallocator
immediately
destroys
a
heap
variable.
All
exisEng
pointers
to
that
heap
variable
become
dangling
pointers.
Thus
deallocators
are
inherently
unsafe.
3-32
Dangling
pointers
in
languages
C
is
highly
unsafe:
Ader
a
heap
variable
is
destroyed,
pointers
to
it
might
sEll
exist.
At
exit
from
a
block,
pointers
to
its
local
variables
might
sEll
exist
(e.g.,
stored
in
global
variables).
Ada
and
Pascal
are
safer:
Ader
a
heap
variable
is
destroyed,
pointers
to
it
might
sEll
exist.
But
pointers
to
local
variables
may
not
be
stored
in
global
variables.
Java
is
very
safe:
It
has
no
deallocator.
Pointers
to
local
variables
cannot
be
obtained.
FuncEonal
languages
are
even
safer:
they
dont
have
pointers
3-33
Example:
C
dangling
pointers
Consider
this
C
code:
allocates a new
struct Date {int y, m, d;};
heap variable
struct Date* dateP, dateQ;
dateP = (struct Date*)malloc(sizeof (struct Date));
dateP->y = 2004; dateP->m = 1; dateP->d = 1;
dateQ = dateP;
free(dateQ);
makes dateQ point
printf("%d", dateP->y);
dateP->y = 2005;
fails
fails
to the same heap
variable as dateP
deallocates that heap
variable (dateP and
dateQ are now
dangling pointers)
3-34
Techniques
to
avoid
dangling
pointers
Tombstones
A
pointer
variable
refers
to
a
tombstone
that
in
turn
refers
to
an
object
If
the
object
is
destroyed,
the
tombstone
is
marked
as
expired
35
Locks
and
Keys
Heap
objects
are
associated
with
an
integer
(lock)
iniEalized
when
created.
A
valid
pointer
contains
a
key
that
matches
the
lock
on
the
object
in
the
heap.
Every
access
checks
that
they
match
A
dangling
reference
is
unlikely
to
match.
36
Pointers
and
arrays
in
C
In
C,
an
array
variable
is
a
pointer
to
its
rst
element
int *a == int a[]
int **a == int *a[]
BUT
equivalences
don't
always
hold
Specically,
a
declaraEon
allocates
an
array
if
it
species
a
size
for
the
rst
dimension,
otherwise
it
allocates
a
pointer
int **a, int *a[]
pointer
to
pointer
to int
int *a[n], n-element
array
of
row
pointers
int a[n][m], 2-d
array
Pointer
arithmeEcs:
operaEons
on
pointers
are
scaled
by
the
base
type
size.
All
these
expressions
denote
the
third
element
of
a:
a[2]
(a+2)[0]
(a+1)[1]
2[a]
0[a+2]
37
C
pointers
and
recursive
types
C
declaraEon
rule:
read
right
as
far
as
you
can
(subject
to
parentheses),
then
led,
then
out
a
level
and
repeat
int *a[n], n-element array of pointers to integer
int (*a)[n], pointer to n-element array of
integers
Compiler
has
to
be
able
to
tell
the
size
of
the
things
to
which
you
point
So
the
following
aren't
valid:
int a[][]
bad
int (*a)[]
bad
38
Recursive
types:
Lists
A
recursive
type
is
one
dened
in
terms
of
itself,
like
lists
and
trees
A
list
is
a
sequence
of
0
or
more
component
values.
The
length
of
a
list
is
its
number
of
components.
The
empty
list
has
no
components.
A
non-empty
list
consists
of
a
head
(its
rst
component)
and
a
tail
(all
but
its
rst
component).
A
list
is
homogeneous
if
all
its
components
are
of
the
same
type.
Otherwise
it
is
heterogeneous.
39
List
operaEons
Typical
list
operaEons:
length
empEness
test
head
selecEon
tail
selecEon
concatenaEon.
40
Example:
Haskell
lists
Type
declaraEon
for
integer-lists:
data IntList = Nil | Cons Int IntList
Some IntList constructions:
recursive
Nil
Cons
2
(Cons
3
(Cons
5
(Cons
7
Nil)))
Actually, Haskell has built-in list types:
[Int]
[String]
[[Int]]
Some list constructions:
[]
[2,3,5,7]
["cat","dog"]
[[1],[2,3]]
Built-in operator for cons
[2,3,5,7]
=
2:[3,5,7]
=
2:3:5:[7]
=
2:3:5:7:[]
41
Example:
Ada
lists
Type
declaraEons
for
integer-lists:
type IntNode;
type IntList is access IntNode;
type IntNode is record
head: Integer;
tail: IntList;
end record;
mutually
recursive
An IntList construction:
new
IntNode'(2,
new
IntNode'(3,
new
IntNode'(5,
new
IntNode'(7,
null)))
42
Example:
Java
lists
Class
declaraEons
for
generic
lists:
class List<E> {
public E head;
public List<E> tail;
public List<E> (E el, List<E> t) {
head = h; tail = t;
}
}
recursive
A list construction:
List<Integer>
list
=
new
List<Integer>(2,
new
List<Integer>(3,
new
List<integer>(5,
null))));
43