Device trees everywhere
David Gibson
Benjamin Herrenschmidt
<[email protected]>
<[email protected]>
OzLabs, IBM Linux Technology Center
February 13, 2006
Abstract
The device tree consists of nodes representing
devices or buses .
We
properties,
namevalue pairs that give information about the
machine.
for some properties, they contain tables or other
device. The values are arbitrary byte strings, and
To do this, we supply the kernel
with a compact attened-tree representation
structured information.
of the system's hardware based on the device
r
tree supplied by Open Firmware on IBM
r
r
servers and Apple
Power Macintosh
ma-
1.2 The bad old days
chines.
Embedded systems, by contrast, usually have a
The blob representing the device tree can
dtc the Device Tree Com-
minimal rmware that might supply a few vital
piler that turns a simple text representa-
system parameters (size of RAM and the like),
tion of the tree into the compact representa-
but nothing as detailed or complete as the OF de-
tion used by the kernel.
The compiler can
vice tree. This has meant that the various 32-bit
produce either a binary blob or an assem-
PowerPC embedded ports have required a variety of
bler le ready to be built into a rmware or
hacks spread across the kernel to deal with the lack
be created using
bootwrapper image.
of device tree.
This attened-tree approach is now the
only supported method of booting a
ppc64
reasonably localised) to CONFIG-dependent hacks
in drivers to override normal probe logic with hard-
to make it the only supported method for all
powerpc
coded addresses for a particular board. As well as
kernels in the future.
being ugly of itself, such CONFIG-dependent hacks
make it hard to build a single kernel image that
Introduction
supports multiple embedded machines.
Until
1.1 OF and the device tree
relatively
recently,
the
only
64-bit
PowerPC machines without OF were legacy (prePOWER5
Historically, everyday PowerPC machines have
) iSeries
machines. iSeries machines
often only have virtual IO devices, which makes it
booted with the help of Open Firmware (OF),
quite simple to work around the lack of a device
a rmware environment dened by IEEE1275 [4].
tree. Even so, the lack means the iSeries boot se-
Among other boot-time services, OF maintains a
quence must be quite dierent from the pSeries or
device tree that describes all of the system's hardware devices and how they're connected.
These vary from specialised boot
wrappers to parse parameters (which are at least
kernel without Open Firmware, and we plan
Each node contains
present a method for booting a
r
r
kernel on an embedded
Linux
PowerPC
Macintosh, which is not ideal.
During
The device tree also presents a problem for im-
boot, before taking control of memory manage-
plementing
ment, the Linux kernel uses OF calls to scan the
kexec().
When the kernel boots, it
takes over full control of the system from OF, even
device tree and transfer it to an internal represen-
re-using OF's memory.
tation that is used at run time to look up various
1 Well,
device information.
So, when
kexec()
comes
mostly. There are a few special exceptions.
to boot another kernel, OF is no longer around for
that need recalculation if a section is inserted
the second kernel to query.
or removed with
compact :
The Flattened Tree
memory space. Thus, the tree representation
should be kept as small as conveniently possi-
new approach to handling the device tree that ad-
ble.
When booting on OF
systems, the rst thing the kernel runs is a small
piece of code in
prom_init.c,
2.2 Format of the device tree blob
which executes in
the context of OF. This code walks the device tree
using OF calls, and transcribes it into a compact,
Oset
attened format. The resulting device tree blob
0x00
0x04
0x08
0x0C
0x10
0x14
0x18
0x1C
0x20
is then passed to the kernel proper, which eventually unattens the tree into its runtime form. This
blob is the only data communicated between the
prom_init.c
bootstrap and the rest of the kernel.
When OF isn't available, either because the machine doesn't have it at all or because
embedded systems are frequently
short of resources, particularly RAM and ash
In May 2005 Ben Herrenschmidt implemented a
dresses all these problems.
memmove().
kexec() has
been used, the kernel instead starts directly from
.
.
.
the entry point taking a attened device tree. The
Contents
0xd00dfeed
totalsize
o_struct
o_strs
o_rsvmap
version
last_comp_ver
boot_cpu_id
size_strs
o_rsvmap
address0
rather than generated by part of the kernel from
...
OF. For
kexec(),
the userland
kexec
tools build
the blob from the runtime device tree before invok-
.
.
.
ing the new kernel. For embedded systems the blob
can come either from the embedded bootloader, or
from a specialised version of the
zImage
wrapper
for the system in question.
.
.
.
2.1 Properties of the attened tree
o_strs
+
The attened tree format should be easy to handle,
both for the kernel that parses it and the bootloader
that generates it. In particular, the following prop-
erties are desirable:
relocatable :
able to move the blob around as a whole, with-
out needing to parse or adjust its internals. In
practice that means we must not use pointers
+
+
within the blob.
might want to make tweaks to the attened
tree, such as deleting or inserting a node (or
.
.
.
whole subtree). It should be possible to do this
totalsize
without having to eectively regenerate the
whole attened tree.
0x0000000000000000
0x0000000000000000
memory reserve
table
end marker
.
.
.
'n' 'a' 'm' 'e'
0 'm' 'o' 'd'
'e' 'l' 0 ...
.
.
.
.
.
.
sometimes the bootloader
only
.
.
.
.
.
.
0x04
0x08
0x0C
0x10
0x14
0x18
only
...
.
.
.
size_strs
len0
.
.
.
o_struct
the bootloader or kernel should be
insert and delete :
0x04
0x08
>v2
>v3
.
.
.
device tree blob must be passed in from outside,
0x04
0x08
0x0C
magic number
OF_DT_BEGIN_NODE
'/' 0
0 0
OF_DT_PROP
0x00000005
0x00000008
'M' 'y' 'B' 'o'
'a' 'r' 'd' 0
strings block
structure block
root node
model
.
.
.
OF_DT_END_NODE
OF_DT_END
.
.
.
In practice this means
Figure 1: Device tree blob layout
limiting the use of internal osets in the blob
2.3 Contents of the tree
The format for the blob we devised, was rst described on the
linuxppc64-dev
mailing list in [2].
Having seen how to represent the device tree struc-
The format has since evolved through various revi-
ture as a attened blob, what actually goes into the
sions, and the current version is included as part of
the
dtc
tree? The short answer is the same as an OF tree.
(see 3) git tree, [1].
On OF systems, the attened tree is transcribed di-
Figure 1 shows the layout of the blob of data
rectly from the OF device tree, so for simplicity we
containing the device tree. It has three sections of
also use OF conventions for the tree on other sys-
variable size: the
tems.
ture block
and
memory reserve table, the structhe strings block. A small header
In many cases a at tree can be simpler than a
gives the blob's size and version and the locations
typical OF provided device tree. The attened tree
of the three sections, plus a handful of vital param-
need only provide those nodes and properties that
eters used during early boot.
the kernel actually requires; the attened tree generally need not include devices that the kernel can
The memory reserve map section gives a list of
2
regions of memory that the kernel must not use .
probe itself. For example, an OF device tree would
The list is represented as a simple array of (address,
normally include nodes for each PCI device on the
size) pairs of 64 bit values, terminated by a zero
system.
size entry.
A attened tree need only include nodes
The strings block is similarly simple,
for the PCI host bridges; the kernel will scan the
consisting of a number of null-terminated strings
buses thus described to nd the subsidiary devices.
appended together, which are referenced from the
The device tree can include nodes for devices where
structure block as described below.
the kernel needs extra information, though: for example, for ISA devices on a subsidiary PCI/ISA
The structure block contains the device tree
proper.
bridge, or for devices with unusual interrupt rout-
Each node is introduced with a 32-bit
OF_DT_BEGIN_NODE
tag,
followed
by
the
ing.
node's
Where they exist, we follow the IEEE1275 bind-
name as a null-terminated string, padded to a 32bit boundary.
Then follows all of the properties
of the node, each introduced with a
tag,
ings that specify how to describe various buses in
the device tree (for example, [5] describe how to
OF_DT_PROP
then all of the node's subnodes,
represent PCI devices). The standard has not been
each in-
updated for a long time, however, and lacks bind-
OF_DT_BEGIN_NODE tag.
The node ends with an OF_DT_END_NODE tag,
and after the OF_DT_END_NODE for the root node
is an OF_DT_END tag, indicating the end of the
troduced with their own
whole tree .
ings for many modern buses and devices. In particular, embedded specic devices such as the various
System-on-Chip buses are not covered. We intend
to create new bindings for such buses, in keeping
The structure block starts with the
OF_DT_BEGIN_NODE
with the general conventions of IEEE1275 (a sim-
introducing the description of
the root node (named
ple such binding for a System-on-Chip bus was in-
/).
Each property, after the
OF_DT_PROP
cluded in [3] a revision of [2]).
, has a 32-
One complication arises for representing phan-
bit value giving an oset from the beginning of the
dles in the attened tree. In OF, each node in the
strings block at which the property name is stored.
tree has an associated phandle, a 32-bit integer that
Because it's common for many nodes to have prop-
uniquely identies the node . This handle is used
erties with the same name, this approach can sub-
by the various OF calls to query and traverse the
stantially reduce the total size of the blob.
The
tree. Sometimes phandles are also used within the
name oset is followed by the length of the prop-
tree to refer to other nodes in the tree. For exam-
erty value (as a 32-bit value) and then the data
ple, devices that produce interrupts generally have
itself padded to a 32-bit boundary.
an
interrupt-parent property giving the phandle
of the interrupt controller that handles interrupts
2 Usually
from this device. Parsing these and other interrupt
related properties allows the kernel to build a com-
such ranges contain some data structure ini-
tialised by the rmware that must be preserved by the ker-
4 In
nel.
3 This
is redundant, but included for ease of parsing.
practice usually implemented as a pointer or oset
within OF memory.
plete representation of the system's interrupt tree,
which can be quite dierent from the tree of bus
2
3
connections.
linux,phandle
property.
When the kernel generates a attened tree from
OF, it adds a
linux,phandle
property to each
10
When the tree is generated without OF, however,
11
only nodes that are actually referred to by phandle
12
13
need to have this property.
14
Another complication arises because nodes in an
15
OF tree have two names. First they have the unit
16
17
name, which is how the node is referred to in an OF
18
path. The unit name generally consists of a device
19
@ followed by a unit address.
example /memory@0 is the full path of a memnode at address 0, /ht@0,f2000000/pci@1 is
type followed by an
20
For
21
22
HyperTransport
TM
24
bus node. The form of the unit
25
address is bus dependent, but is generally derived
26
reg property. In addition, nodes
have a property, name, whose value is usually equal
27
to the rst path of the unit name.
30
from the node's
properties equal to
memory
and
pci,
28
name
31
32
respectively.
33
To save space in the blob, the current version of the
34
attened tree format only requires the unit names
35
pci@40000000000000 {
/* PCI host bridge */
/* ... */
};
36
to be present. When the kernel unattens the tree,
it automatically generates a
mpic@0x3fffdd08400 {
/* Interrupt controller */
/* ... */
};
29
For example,
the nodes in the previous example would have
memory@0 {
device_type = " memory ";
memreg : reg = <00000000 00000000
00000000 20000000 >;
};
23
the path of a PCI bus node, which is under a
37
name property from the
38
node's path name.
39
40
41
The Device Tree Compiler
42
43
44
As we've seen, the attened device tree format pro-
45
model = " MyBoard ";
compatible = " MyBoardFamily ";
# address - cells = <2 >;
# size - cells = <2 >;
cpus {
# address - cells = <1 >;
# size - cells = <0 >;
PowerPC ,970 @0 {
device_type = " cpu ";
reg = <0 >;
clock - frequency = <5 f5e1000 >;
timebase - frequency = <1 FCA055 >;
linux , boot - cpu ;
i - cache - size = <10000 >;
d - cache - size = <8000 >;
};
};
node, containing the phandle retrieved from OF.
ory
/ {
In the attened tree, a node's phandle is represented by a special
/ memreserve / 0 x20000000 -0 x21FFFFFF ;
};
chosen {
bootargs = " root =/ dev / sda2 ";
linux , platform = <00000600 >;
interrupt - controller =
< &/ mpic@0x3fffdd08400 >;
};
vides a convenient way of communicating device
tree information to the kernel. It's simple for the
Figure 2: Example
kernel to parse, and simple for bootloaders to ma-
dtc
source
nipulate. On OF systems, it's easy to generate the
attened tree by walking the OF maintained tree.
The device trees for embedded boards are usu-
However, for embedded systems, the attened tree
ally quite simple, and it's possible to hand con-
must be generated from scratch.
struct the necessary blob by hand, but doing so is
Embedded bootloaders are generally built for a
tedious.
particular board. So, it's usually possible to build
The device tree compiler,
dtc5 ,
is de-
signed to make creating device tree blobs easier by
the device tree blob at compile time and include it
converting a text representation of the tree into the
in the bootloader image. For minor revisions of the
necessary blob.
board, the bootloader can contain code to make the
necessary tweaks to the tree before passing it to the
5 dtc
booted kernel.
can be obtained from [1].
3.1 Input and output formats
Figure
As well as the normal mode of compiling a device
tree blob from text source,
source, the normal case.
The device tree is
given as just
string
/proc/device-tree
(roughly, a di-
cells
dtc is to compile from source
to
node has an
node
.o le
The
(for example,
[1234abcdef]).
The
references.
In-
interrupt-controller
/chosen
property re-
/mpic@0x3fffdd08400.
In the output tree,
cluded in the property. If that node doesn't have
an explicit phandle property,
giving the beginning of the blob and its various
dtc
will automati-
cally create a unique phandle for it. This approach
subsections. This can then be linked directly
makes it easy to create interrupt trees without hav-
into a bootloader or rmware image.
dtc
<0 8000 f0000000>).
the value of the referenced node's phandle is in-
containing the device tree blob, with symbols
For maximum applicability,
and
ferring to the interrupt controller described by the
can produce an
assembler le, which will assemble into a
(for example,
in the tree. For example, in Figure 2, the
to act as a decompiler.
dtc
\n, \0
persand (&) followed by the full path to some node
source (dts), as in 3.2. If used with blob in-
ing to explicitly assign and remember phandles for
the various interrupt controller nodes.
can both read
The
and write any of the existing revisions of the blob
dtc
The prop-
stead of a hex number, the source can give an am-
blob format.
format. When reading,
"MyBoard").
Cell properties can also contain
blob (dtb), as in 2.2. The most straightfor-
assembler source (asm).
Properties are
The property values
property value is given as a hex bytestring.
three dierent formats:
(for example,
bytestring
can output the tree in one of
dtc
name = value ;.
cells, each given as a hex value.
running kernel.
put, this allows
block containing the
property value is made up of a list of 32-bit
blob for the device tree in use by the currently
};
so forth) are allowed.
for each property). This is useful for building a
ward use of
...
nating NULL. C-style escapes (\t,
rectory for each node in the device tree, a le
starts
erty value is the given string, including termi-
lesystem (fs), input is a directory tree in the
dtc
le
can be given in any of three forms:
existing device tree blob.
In addition,
node's properties and subnodes.
blob (dtb), the attened tree format described
layout of
The
Nodes of the tree are introduced with the node
in 2.2. This mode is useful for checking a pre-
example.
directives, which gives address
name, followed by a
described in a text form, described in 3.2.
an
table, then the device tree proper is described.
It can
take its input in one of three dierent formats:
shows
ranges to add to the output blob's memory reserve
dtc can convert a device
tree between a number of representations.
/memreserve/
with
dtc
source can also include labels, which
are placed on a particular node or property.
takes the version from
For
the blob header, and when writing it takes a com-
example, Figure 2 has a label memreg on the
mand line option specifying the desired version. It
property of the node
automatically makes any necessary adjustments to
sembler output, corresponding labels in the output
the tree that are necessary for the specied version.
are generated, which will assemble into symbols ad-
For example, formats before 0x10 require each node
dressing the part of the blob with the node or prop-
to have an explicit
dtc creates such a blob, it will automatically generate name
erty in question. This is useful for the common case
properties from the unit names.
device tree with a few variable properties, such as
name
property. When
3.2 Source format
format for
When using as-
where an embedded board has an essentially xed
the size of memory.
The source
/memory@0.
reg
The bootloader for such a
board can have a device tree linked in, including
dtc
a symbol referring to the right place in the blob to
is a text descrip-
update the parameter with the correct value deter-
tion of the device tree in a vaguely C-like form.
mined at runtime.
3.3 Tree checking
creating device trees more convenient:
better tree checking :
Between reading in the device tree and writing it
out in the new format,
dtc
performs a number of
checks on the tree:
already
checks that node and
ters and meet length restrictions.
and won't pick up more interesting errors later
on. There is a
It checks
rors.
or subnodes with the same name.
In some cases,
dtc
-f parameter that forces dtc to
generate an output tree even if there are er-
that a node does not have multiple properties
semantic structure :
dtc
will give up after detecting a minor error early
property names contain only allowed charac-
At present, this needs to be used more
often than one might hope, because
checks
dtc is bad
at deciding which errors should really be fatal,
that properties whose contents are dened by
and which rate mere warnings.
convention have appropriate values. For exam-
binary include :
reg properties have a length
Occasionally, it is useful for
that makes sense given the address forms speci-
the device tree to incorporate as a property
ed by the
#address-cells and #size-cells
a block of binary data for some board-specic
properties.
It checks that properties such as
purpose. For example, many of Apple's device
interrupt-parent
trees incorporate bytecode drivers for certain
contain a valid phandle.
Linux requirements : dtc
platform devices.
checks that the de-
directly from a binary le.
that are required by the Linux kernel to boot
macros :
correctly.
These checks are useful to catch simple problems
PCI buses) can be written more quickly.
running the source le through CPP before
compiling with
Future Work
dtc.
It's not clear whether na-
tive support for macros would be more useful.
4.1 Board ports
References
The attened device tree has always been the only
ppc64 kernel on an embedppc32 and ppc64
dtc, git tree, http://
ozlabs.org/~dgibson/dtc/dtc.git.
[1] David Gibson et al.,
ded system. With the merge of
code it has also become the only supported way to
powerpc kernel, 32-bit or 64-bit.
ppc architecture exists mainly just
Herrenschmidt, Booting the Linux/ppc kernel without Open Firmware, May
boot any merged
[2] Benjamin
v0.1, http://ozlabs.org/pipermail/
linuxppc64-dev/2005-May/004073.html.
to support the old ppc32 embedded ports that have
2005,
not been migrated to the attened device tree apWe plan to remove the
ppc
architecture
[3]
eventually, which will mean porting all the various
embedded boards to use the attened device tree.
, Booting the Linux/ppc kernel without
Open Firmware, November 2005, v0.5, http:
//ozlabs.org/pipermail/linuxppc64-dev/
2005-December/006994.html.
4.2 dtc features
While it is already quite usable, there are a number
dtc
At
present, this can be accomplished in part by
lems with an existing blob.
of extra features that
to imple-
ple, multiple identical ethernet controllers or
input mode, it can also be used for diagnosing prob-
proach.
dtc
taining a number of similar devices (for exam-
the results on an embedded kernel. With the blob
In fact, the old
it might be useful for
ment some sort of macros so that a tree con-
with the device tree, rather than having to debug
supported way to boot a
dtc's source format ought to
allow this by letting a property's value be read
vice tree contains those nodes and properties
dtc
they are rather haphazard. In many cases
syntactic structure : dtc
ple, it checks that
Although
performs a number of checks on the device tree,
[4]
could include to make
IEEE Standard for Boot (Initialization Conguration) Firmware: Core Requirements and
Practices,
IEEE Std 1275-1994, IEEE Com-
States, other countries, or both.
puter Society, 345 E. 47th St, New York, NY
Linux is a registered trademark of Linus Tor-
10017, USA, 1994.
[5]
valds.
Other company, product, and service names may
PCI Bus Binding to: IEEE Std 1275-1994
Standard for Boot (Initialization Conguration)
Firmware, IEEE Computer Society, 345 E. 47th
be trademarks or service marks of others.
St, New York, NY 10017, USA, 1998, Revision
2.1.
About the authors
David Gibson has been a member of the IBM Linux
Technology Center, working from Canberra, Australia, since 2001. Recently he has worked on Linux
hugepage support and performance counter support for ppc64, as well as the device tree compiler.
In the past, he has worked on bringup for various ppc and ppc64 embedded systems, the orinoco
wireless driver, ramfs, and a userspace checkpointing system (esky).
Benjamin Herrenschmidt was a MacOS developer for about 10 years, but ultimately saw the
light and installed Linux on his Apple PowerPC
machine.
After writing a bootloader, BootX, for
it in 1998, he started contributing to the PowerPC
Linux port in various areas, mostly around the support for Apple machines. He became ocial PowerMac maintainer in 2001. In 2003, he joined the
IBM Linux Technology Center in Canberra, Australia, where he ported the 64 bit PowerPC kernel
to Apple G5 machines and the Maple embedded
board, among others things. He's a member of the
ppc64 development team and one of his current
goals is to make the integration of embedded platforms smoother and more maintainable than in the
32-bit PowerPC kernel.
Legal Statement
This work represents the view of the author and
does not necessarily represent the view of IBM.
IBM,
PowerPC,
POWER5,
pSeries
PowerPC
and
iSeries
Architecture,
are
trademarks
or registered trademarks of International Business
Machines Corporation in the United States and/or
other countries.
Apple and Power Macintosh are a registered
trademarks of Apple Computer Inc. in the United