0 ratings0% found this document useful (0 votes) 436 views369 pagesThe Unix Programming Environment PDF
The classic introduction to the unix operating system by 2 of the leading researchers-Brian kernighan and Rob Pike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
2 dl
The
UNIX
Programming
Environment
Brian W. Kernighan
Rob Pike
Bell Laboratories
Murray Hill, New Jersey
PRENTICE-HALL, INC.
Englewood Cliffs, New Jersey 07632
* UNIX is a Trademark of Bell LaboratoriesLibrary of Congress Catalog Card Number 8 3 -6 285 1
Prentice-Hall Software Series
Brian W. Kernighan, Advisor
Editorial/production supervision: Ros Herion
Cover design: Photo Plus Art, Celine Brandes
Manufacturing buyer: Gordon Osbourne
Copyright © 1984 by Bell Telephone Laboratories, Incorporated.
Alll rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical, photocopy-
ing, recording, or otherwise, without the prior written permission of the publisher.
Printed in the United States of America. Published simultaneously in Canada.
This book was typeset in Times Roman and Courier by the authors, using a Mer-
genthaler Linotron 202 phototypesetter driven by a VAX-11/750 running the 8th Edition
of the UNIX operating system.
UNIX is a trademark of Bell Laboratories. DEC, PDP and VAX are trademarks of
Digital Equipment Corporation.
20:19 18 17 16 1S 14
ISBN O-13-937b99-2
ISBN O-13-937?b81-X {PBK}
PRENTICE-HALL INTERNATIONAL, INC., London
PRENTICE-HALL OF AUSTRALIA PTY. LIMITED, Sydney
EDITORA PRENTICE-HALL DO BRASIL, LTDA., Rio de Janeiro
PRENTICE-HALL CANADA INC., Toronto
PRENTICE-HALL OF INDIA PRIVATE LIMITED, New Delhi
PRENTICE-HALL OF JAPAN, INC., Tokyo
PRENTICE-HALL OF SOUTHEAST ASIA PTE. LTD., Singapore
WHITEHALL BOOKS LIMITED, Wellington, New Zealand3.
4.
Preface
UNIX for Beginners
1.1 Getting started
1.2. Day-to-day use: files and common commands
1.3 More about files: directories
1.4 The shell
1.5 The rest of the UNIX system
The File System
2.1. The basics of files
2.2 What’s in a file?
2.3. Directories and filenames
2.4. Permissions
2.5. Inodes
2.6. The directory hierarchy
2.7 Devices
Using the Shell
3.1 Command line structure
3.2 Metacharacters
3.3 Creating new commands
3.4 Command arguments and parameters
3.5 Program output as arguments
3.6 Shell variables
3.7 More on I/O redirection
3.8 Looping in shell programs
3.9 bundle: putting it all together
3.10 Why a programmable shell?
Filters
4.1 The grep family
4.2. Other filters
iii
CONTENTS
101
102
106<
CONTENTS.
4.3 The stream editor sea 108
4.4 The awk pattern scanning and processing language 114
4.5 Good files and good filters 130
5. Shell Programming 133
5.1 Customizing the cal command 133
5.2 Which command is which? 138
5.3 while and until loops: watching for things 144
5.4 Traps: catching interrupts 150
5.5 Replacing a file: overwrite 152
5.6 zap: killing processes by name 156
5.7 The pick command: blanks vs. arguments 159
5.8 The news command: community service messages 162
5.9 get and put: tracking file changes 165
5.10 A look back 169
6. Programming with Standard /O 171
6.1 Standard input and output: vis 172
6.2 Program arguments: vis version 2 174
6.3 File access: vis version 3 176
6.4 A screen-at-a-time printer: p 180
6.5 Anexample: pick 186
6.6 On bugs and debugging 187
6.7 Anexample: zap 190
6.8 An interactive file comparison program: idiff 192
6.9 Accessing the environment 199
7 UNIX System Calls 201
7.1 Low-level /O 201
7.2 File system: directories 208
7.3 File system: inodes 214
7.4 Processes 220
7.5 Signals and interrupts 225
8. Program Development 233
8.1 Stage 1: A four-function calculator 234
8.2 Stage 2: Variables and error recovery 242
8.3 Stage 3: Arbitrary variable names; built-in functions 245
8.4 Stage 4: Compilation into a machine 258
8.5 Stage 5: Control flow and relational operators 266
8.6 Stage 6: Functions and procedures; input/output 273
8.7 Performance evaluation 284
8.8 A look back 28610.
Document Preparation
9.1 The ms macro package
9.2 The troff level
9.3 The tbl and eqn preprocessors
9.4 The manual page
9.5 Other document preparation tools
Epilog
Appendix 1: Editor Summary
Appendix 2: hoc Manual
Appendix 3: hoc Listing
Index
CONIENLS.
v
289
290
297
301
308
313
318
319
329
335
349=
43
44
4.5
CONTENTS
The stream editor sed
The awk pattern scanning and processing language
Good files and good filters
5. Shell Programming
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
Customizing the cal command
Which command is which?
while and until loops: watching for things
Traps: catching interrupts
Replacing a file: overwrite
zap: killing processes by name
The pick command: blanks vs. arguments
The news command: community service messages
get and put: tracking file changes
5.10 A look back
6. Programming with Standard 1/0
Standard input and output: vis
Program arguments: vis version 2
File access: vis version 3
A screen-at-a-time printer: p
An example: pick
On bugs and debugging
An example: zap
An interactive file comparison program: idiff
Accessing the environment
7. UNIX System Calls
el
7.2
73
74
75
Low-level 1/0
File system: directories
File system: inodes
Processes
Signals and interrupts
8. Program Development
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
Stage 1: A four-function calculator
Stage 2: Variables and error recovery
Stage 3: Arbitrary variable names; built-in functions
Stage 4: Compilation into a machine
Stage 5: Control flow and relational operators
Stage 6: Functions and procedures; input/output
Performance evaluation
A look back
108
130
133
133
13810.
Document Preparation
9.1 The ms macro package
9.2 The trofé level
9.3 The tbi and eqn preprocessors
9.4 The manual page
9.5. Other document preparation tools
Epilog
Appendix 1: Editor Summary
Appendix 2: hoc Manual
Appendix 3: hoc Listing
Index
CONTENTS.
v
289
290
297
301
308
313
315
319
329
335
349PREFACE
“The number of UNIX installations has grown to 10, with more expected.”
(The UNIx Programmer's Manual, 2nd Edition, June, 1972.)
The UNIXt operating system started on a cast-off DEC PDP-7 at Bell Labora-
tories in 1969. Ken Thompson, with ideas and support from Rudd Canaday,
Doug Mcliroy, Joe Ossanna, and Dennis Ritchie, wrote a small general-
purpose time-sharing system comfortable enough to attract enthusiastic users
and eventually enough credibility for the purchase of a larger machine — a
PDP-11/20. One of the early users was Ritchie, who helped move the system
to the PDP-11 in 1970. Ritchie also designed and wrote a compiler for the C
programming language. In 1973, Ritchie and Thompson rewrote the UNIX ker-
nel in C, breaking from the tradition that system software is written in assem-
bly language. With that rewrite, the system became essentially what it is
today.
Around 1974 it was licensed to universities “for educational purposes” and
a few years later became available for commercial use. During this time, UNIX
systems prospered at Bell Labs, finding their way into laboratories, software
development projects, word processing centers, and operations support systems
in telephone companies. Since then, it has spread world-wide, with tens of
thousands of systems installed, from microcomputers to the largest main-
frames.
What makes the UNIX system so successful? We can discern several rea-
sons. First, because it is written in C, it is portable — UNIX systems run on a
range of computers from microprocessors to the largest mainframes; this is a
strong commercial advantage. Second, the source code is available and written
in a high-level language, which makes the system easy to adapt to particular
requirements. Finally, and most important, it is a good operating system,
+ UNIX is a trademark of Bell Laboratories. “UNIX” is not an acronym, but a weak pun on MUL-
TICS, the operating system that Thompson and Ritchie worked on before UNIX.
viiPREFACE
especially for programmers. The UNIX programming environment is unusually
rich and productive.
Even though the UNIX system introduces a number of innovative programs
and techniques, no single program or idea makes it work well. Instead, what
makes it effective is an approach to programming, a philosophy of using the
computer. Although that philosophy can’t be written down in a single sen-
tence, at its heart is the idea that the power of a system comes more from the
relationships among programs than from the programs themselves. Many UNIX
programs do quite trivial tasks in isolation, but, combined with other pro-
grams, become general and useful tools.
Our goal in this book is to communicate the UNIX programming philosophy.
Because the philosophy is based on the relationships between programs, we
must devote most of the space to discussions about the individual tools, but
throughout run the themes of combining programs and of using programs to
build programs. To use the UNIX system and its components well, you must
understand not only how to use the programs, but also how they fit into the
environment.
As the UNIX system has spread, the fraction of its users who are skilled in
its application has decreased. Time and again, we have seen experienced
users, ourselves included, find only clumsy solutions to a problem, or write
programs to do jobs that existing tools handle easily. Of course, the elegant
solutions are not easy to see without some experience and understanding. We
hope that by reading this book you will develop the understanding to make
your use of the system — whether you are a new or seasoned user — effective
and enjoyable. We want you to use the UNIX system well.
We are aiming at individual programmers, in the hope that, by making
their work more productive, we can in turn make the work of groups more
productive. Although our main target is programmers, the first four or five
chapters do not require programming experience to be understood, so they
should be helpful to other users as well.
Wherever possible we have tried to make our points with real examples
rather than artificial ones. Although some programs began as examples for the
book, they have since become part of our own set of everyday programs. All
examples have been tested directly from the text, which is in machine-readable
form.
The book is organized as follows. Chapter | is an introduction to the most
basic use of the system. It covers logging in, mail, the file system, commonly-
used commands, and the rudiments of the command interpreter. Experienced
users can skip this chapter.
Chapter 2 is a discussion of the UNIX file system. The file system is central
to the operation and use of the system, so you must understand it to use the
system well. This chapter describes files and directories, permissions and file
modes, and inodes. It concludes with a tour of the file system hierarchy and
an explanation of device files.PREFACE — ix
The command interpreter, or shell, is a fundamental tool, not only for run-
ning programs, but also for writing them. Chapter 3 describes how to use the
shell for your own purposes: creating new commands, command arguments,
shell variables, elementary control flow, and input-output redirection.
Chapter 4 is about filters: programs that perform some simple transforma-
tion on data as it flows through them. The first section deals with the grep
pattern-searching command and its relatives; the next discusses a few of the
more common filters such as sort; and the rest of the chapter is devoted to
two general-purpose data transforming programs called sed and awk. sed is
a stream editor, a program for making editing changes on a stream of data as
it flows by. awk is a programming language for simple information retrieval
and report generation tasks. It’s often possible to avoid conventional program-
ming entirely by using these programs, sometimes in cooperation with the
shell.
Chapter 5 discusses how to use the shell for writing programs that will
stand up to use by other people. Topics include more advanced control flow
and variables, traps and interrupt handling. The examples in this chapter
make considerable use of sed and awk as well as the shell.
Eventually one reaches the limits of what can be done with the shell and
other programs that already exist. Chapter 6 talks about writing new programs
using the standard I/O library. The programs are written in C, which the
reader is assumed to know, or at least be learning concurrently. We try to
show sensible strategies for designing and organizing new programs, how to
build them in manageable stages, and how to make use of tools that already
exist.
Chapter 7 deals with the system calls, the foundation under all the other
layers of software. The topics include input-output, file creation, error pro-
cessing, directories, inodes, processes, and signals.
Chapter 8 talks about program development tools: yacc, a parser-
generator; make, which controls the process of compiling a big program; and
lex, which generates lexical analyzers. The exposition is based on the
development of a large program, a C-like programmable calculator.
Chapter 9 discusses the document preparation tools, illustrating them with a
user-level description and a manual page for the calculator of Chapter 8. It
can be read independently of the other chapters.
Appendix 1 summarizes the standard editor ed. Although many readers
will prefer some other editor for daily use, ed is universally available, efficient
and effective. Its regular expressions are the heart of other programs like
grep and sed, and for that reason alone it is worth learning.
Appendix 2 contains the reference manual for the calculator language of
Chapter 8.
Appendix 3 is a listing of the final version of the calculator program,
presenting the code all in one place for convenient reading.X PREFACE
Some practical matters. First, the UNIX system has become very popular,
and there are a number of versions in wide use. For example, the 7th Edition
comes from the original source of the UNIX system, the Computing Science
Research Center at Bell Labs. System III and System V are the official Bell
Labs-supported versions. The University of California at Berkeley distributes
systems derived from the 7th Edition, usually known as UCB 4.xBSD. In
addition, there are numerous variants, particularly on small computers, that
are derived from the 7th Edition.
We have tried to cope with this diversity by sticking closely to those aspects
that are likely to be the same everywhere. Although the lessons that we want
to teach are independent of any particular version, for specific details we have
chosen to present things as they were in the 7th Edition, since it forms the
basis of most of the UNIX systems in widespread use. We have also run the
examples on Bell Labs’ System V and on Berkeley 4.1BSD; only trivial changes
were required, and only in a few examples. Regardless of the version your
machine runs, the differences you find should be minor.
Second, although there is a lot of material in this book, it is not a reference
manual. We feel it is more important to teach an approach and a style of use
than just details. The unix Programmer's Manual is the standard source of
information. You will need it to resolve points that we did not cover, or to
determine how your system differs from ours.
Third, we believe that the best way to learn something is by doing it. This
book should be read at a terminal, so that you can experiment, verify or con-
tradict what we say, explore the limits and the variations. Read a bit, try it
out, then come back and read some more.
We believe that the UNIX system, though certainly not perfect, is a mar-
velous computing environment. We hope that reading this book will help you
to reach that conclusion too.
We are grateful to many people for constructive comments and criticisms,
and for their help in improving our code. In particular, Jon Bentley, John
Linderman, Doug Mellroy, and Peter Weinberger read multiple drafts with
great care, We are indebted to Al Aho, Ed Bradford, Bob Flandrena, Dave
Hanson, Ron Hardin, Marion Harris, Gerard Holzmann, Steve Johnson, Nico
Lomuto, Bub Martin, Larry Rosler, Chris Van Wyk, and Jim Weythman for
their comments on the first draft. We also thank Mike Bianchi, Elizabeth
Bimmler, Joe Carfagno, Don Carter, Tom De Marco, Tom Duff, David Gay,
Steve Mahaney, Ron Pinter, Dennis Ritchie, Ed Sitar, Ken Thompson, Mike
Tilson, Paul Tukey, and Larry Wehr for valuable suggestions.
Brian Kernighan
Rob PikeCHAPTER : UNIX FOR BEGINNERS
What is “UNIX”? In the narrowest sense, it is a time-sharing operating sys-
tem kernel: a program that controls the resources of a computer and allocates
them among its users. It lets users run their programs; it controls the peri-
pheral devices (discs, terminals, printers, and the like) connected to the
machine; and it provides a file system that manages the long-term storage of
information such as programs, data, and documents.
In a broader sense, “UNIX” is often taken to include not only the kernel,
but also essential programs like compilers, editors, command languages, pro-
grams for copying and printing files, and so on.
Still more broadly, “UNIX” may even include programs developed by you or
other users to be run on your system, such as tools for document preparation,
routines for statistical analysis, and graphics packages.
Which of these uses of the name “UNIX” is correct depends on which level
of the system you are considering. When we use “UNIX” in the rest of this
book, context should indicate which meaning is implied.
The UNIX system sometimes looks more difficult than it is — it’s hard for a
newcomer to know how to make the best use of the facilities available. But
fortunately it’s not hard to get started — knowledge of only a few programs
should get you off the ground. This chapter is meant to help you to start using
the system as quickly as possible. It’s an overview, not a manual; we'll cover
most of the material again in more detail in later chapters. We'll talk about
these major areas:
© basics — logging in and out, simple commands, correcting typing mistakes,
mail, inter-terminal communication.
@ day-to-day use — files and the file system, printing files, directories,
commonly-used commands.
© the command interpreter or shell — filename shorthands, redirecting input
and output, pipes, setting erase and kill characters, and defining your own
search path for commands.
If you’ve used a UNIX system before, most of this chapter should be familiar;
you might want to skip straight to Chapter 2.
12° THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
You will need a copy of the UNIX Programmer's Manual, even as you read
this chapter; it’s often easier for us to tell you to read about something in the
manual than to repeat its contents here. This book is not supposed to replace
it, but to show you how to make best use of the commands described in it.
Furthermore, there may be differences between what we say here and what is
true on your system. The manual has a permuted index at the beginning that’s
indispensable for finding the right programs to apply to a problem; learn to use
it.
Finally, a word of advice: don’t be afraid to experiment. If you are a
beginner, there are very few accidental things you can do to hurt yourself or
other users. So learn how things work by trying them. This is a long chapter,
and the best way to read it is a few pages at a time, trying things out as you
go.
1.1 Getting started
Some prerequisites about terminals and typing
To avoid explaining everything about using computers, we must assume you
have some familiarity with computer terminals and how to use them. If any of
the following statements are mystifying, you should ask a local expert for help.
The UNIX system is full duplex: the characters you type on the keyboard are
sent to the system, which sends them back to the terminal to be printed on the
screen. Normally, this echo process copies the characters directly to the
screen, so you can see what you are typing, but sometimes, such as when you
are typing a secret password, the echo is turned off so the characters do not
appear on the screen.
Most of the keyboard characters are ordinary printing characters with no
special significance, but a few tell the computer how to interpret your typing.
By far the most important of these is the RETURN key. The RETURN key sig-
nifies the end of a line of input; the system echoes it by moving the terminal’s
cursor to the beginning of the next line on the screen. RETURN must be
pressed before the system will interpret the characters you have typed.
RETURN is an example of a control character — an invisible character that
controls some aspect of input and output on the terminal. On any reasonable
terminal, RETURN has a key of its own, but most control characters do not.
Instead, they must be typed by holding down the CONTROL key, sometimes
called CTL or CNTL or CTRL, then pressing another key, usually a letter. For
example, RETURN may be typed by pressing the RETURN key or,
equivalently, holding down the CONTROL key and typing an ‘m’, RETURN
might therefore be called a control-m, which we will write as ctl-m. Other con-
trol characters include ctl-a, which tells a program that there is no more input;
ctl-g, which rings the bell on the terminal; cél-h, often called backspace, which
can be used to correct typing mistakes; and ctl-i, often called tab, whichCHAPTER 1 UNIX FOR BEGINNERS 3
advances the cursor to the next tab stop, much as on a regular typewriter. Tab
stops on UNIX systems are eight spaces apart. Both the backspace and tab char-
acters have their own keys on most terminals.
Two other keys have special meaning: DELETE, sometimes called RUBOUT
or some abbreviation, and BREAK, sometimes called INTERRUPT. On most
UNIX systems, the DELETE key stops a program immediately, without waiting
for it to finish. On some systems, ctl-c provides this service. And on some
systems, depending on how the terminals are connected, BREAK is a synonym
for DELETE or ctl-c.
A Session with UNIX
Let’s begin with an annotated dialog between you and your UNIX system.
Throughout the examples in this book, what you type is printed in slanted
letters, computer responses are in typewriter-style characters, and
explanations are in italics.
Establish a connection: dial a phone or turn on a switch as necessary.
Your system should say
login: you Type your name, then press RETURN
Password: Your password won't be echoed as you type it
You have mail. There's mail to be read after you log in
$ The system is now ready for your commands
$ Press RETURN a couple of times
$ date What’s the date and time?
Sun Sep 25 23:02:57 EDT 1983
$ who Who's using the machine?
jib ttyO Sep 25 1
you tty2 Sep 25 2
mary tty4 Sep 25 1
doug ttys Sep 25 1
egb tty7 Sep 25 17:17
bob tty8 Sep 25 20:48
$ mail Read your mail
From doug Sun Sep 25 20:53 EDT 1983
give me a call sometime monday
? RETURN moves on to the next message
From mary Sun Sep 25 19:07 EDT 1983 Next message
Lunch at noon tomorrow?
24 Delete this message
$ No more mail
$ mail mary Send mail to mary
lunch at 12 is fine
etl-d End of mail
$ Hang up phone or turn off terminal
and that's the end
Sometimes that’s all there is to a session, though occasionally people do4. THE UNIX PROGRAMMING ENVIRONMENT CHAPTER I
some work too. The rest of this section will discuss the session above, plus
other programs that make it possible to do useful things.
Logging in
You must have a login name and password, which you can get from your
system administrator. The UNIX system is capable of dealing with a wide
variety of terminals, but it is strongly oriented towards devices with lower case;
case distinctions matter! If your terminal produces only upper case (like some
video and portable terminals), life will be so difficult that you should look for
another terminal.
Be sure the switches are set appropriately on your device: upper and lower
case, full duplex, and any other settings that local experts advise, such as the
speed, or baud rate. Establish a connection using whatever magic is needed
for your terminal; this may involve dialing a telephone or merely flipping a
switch. In either case, the system should type
login:
If it types garbage, you may be at the wrong speed; check the speed setting and
other switches. If that fails, press the BREAK or INTERRUPT key a few times,
slowly. If nothing produces a login message, you will have to get help.
When you get the login: message, type your login name in lower case.
Follow it by pressing RETURN. If a password is required, you will be asked
for it, and printing will be turned off while you type it.
The culmination of your login efforts is a prompt, usually a single charac-
ter, indicating that the system is ready to accept commands from you. The
prompt is most likely to be a dollar sign $ or a percent sign %, but you can
change it to anything you like; we'll show you how a little later. The prompt is
actually printed by a program called the command interpreter or shell, which is
your main interface to the system.
There may be a message of the day just before the prompt, or a notification
that you have mail. You may also be asked what kind of terminal you are
using; your answer helps the system to use any special properties the terminal
might have.
Typing commands
Once you receive the prompt, you can type commands, which are requests
that the system do something. We will use program as a synonym for com-
mand. When you see the prompt (let’s assume it’s $), type date and press
RETURN. The system should reply with the date and time, then print another
prompt, so the whole transaction will look like this on your terminal:
$ date
Mon Sep 26 12:20:57 EDT 1983
$
Don’t forget RETURN, and don’t type the $. If you think you're beingCHAPTER 1 UNIX FOR BEGINNERS 5
ignored, press RETURN; something should happen. RETURN won't be men-
tioned again, but you need it at the end of every line.
The next command to try is who, which tells you everyone who is currently
logged in:
$ who
rim ttyO Sep 26 11:17
piw tty4 Sep 26 11:30
gerard tty? Sep 26 10:27
mark tty9 Sep 26 07:59
you ttya Sep 26 12:20
$
The first column is the user name. The second is the system’s name for the
connection being used (“‘tty” stands for “teletype,” an archaic synonym for
“terminal”). The rest tells when the user logged on. You might also try
$ who am i
you ttya Sep 26 12:20
$
If you make a mistake typing the name of a command, and refer to a non-
existent command, you will be told that no command of that name can be
found:
$ whom Misspelled command name ...
whom: not found «+. 80 system didn’t know how to run it
$
Of course, if you inadvertently type the name of an actual command, it will
run, perhaps with mysterious results.
Strange terminal behavior
Sometimes your terminal will act strangely, for example, each letter may be
typed twice, or RETURN may not put the cursor at the first column of the next
line. You can usually fix this by turning the terminal off and on, or by logging
out and logging back in. Or you can read the description of the command
stty (“‘set terminal options”) in Section 1 of the manual. To get intelligent
treatment of tab characters if your terminal doesn’t have tabs, type the com-
mand
$ stty -tabs
and the system will convert tabs into the right number of spaces. If your ter-
minal does have computer-settable tab stops, the command tabs will set them
correctly for you. (You may actually have to say
$ tabs terminal-type
to make it work — see the tabs command description in the manual.)6 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER I
Mistakes in typing
If you make a typing mistake, and see it before you have pressed RETURN,
there are two ways to recover: erase characters one at a time or kill the whole
line and re-type it.
If you type the line kill character, by default an at-sign @, it causes the
whole line to be discarded, just as if you’d never typed it, and starts you over
on a new line:
$ ddtae@ Completely botched; start over
date on a new line
Mon Sep 26 12:23:39 EDT 1983
$
The sharp character # erases the last character typed; each # erases one
more character, back to the beginning of the line (but not beyond). So if you
type badly, you can correct as you go:
$ dd#atte##e Fix it as you go
Mon Sep 26 12:24:02 EDT 1983
$
The particular erase and line kill characters are very system dependent. On
many systems (including the one we use), the erase character has been changed
to backspace, which works nicely on video terminals. You can quickly check
which is the case on your system:
$ datee+ Try «
datee+: not found It's not +
$ datee# Try #
Mon Sep 26 12:26:08 EDT 1983 Itis #
$
(We printed the backspace as + so you can see it.) Another common choice is
ctl-u for line kill.
We will use the sharp as the erase character for the rest of this section
because it’s visible, but make the mental adjustment if your system is different.
Later on, in “tailoring the environment,” we will tell you how to set the erase
and line kill characters to whatever you like, once and for all.
What if you must enter an erase or line kill character as part of the text? If
you precede either # or @ by a backslash \, it loses its special meaning. So to
enter a # or @, type \# or \@. The system may advance the terminal’s cursor
to the next line after your @, even if it was preceded by a backslash. Don’t
worry — the at-sign has been recorded.
The backslash, sometimes called the escape character, is used extensively to
indicate that the following character is in some way special. To erase a
backslash, you have to type two erase characters: \##. Do you see why?
The characters you type are examined and interpreted by a sequence of pro-
grams before they reach their destination, and exactly how they are interpretedCHAPTER 1 UNIX FOR BEGINNERS 7
depends not only on where they end up but how they got there.
Every character you type is immediately echoed to the terminal, unless
echoing is turned off, which is rare. Until you press RETURN, the characters
are held temporarily by the kernel, so typing mistakes can be corrected with
the erase and line kill characters. When an erase or line kill character is pre-
ceded by a backslash, the kernel discards the backslash and holds the following
character without interpretation.
When you press RETURN, the characters being held are sent to the pro-
gram that is reading from the terminal. That program may in turn interpret
the characters in special ways; for example, the shell turns off any special
interpretation of a character if it is preceded by a backslash. We'll come back
to this in Chapter 3. For now, you should remember that the kernel processes
erase and line kill, and backslash only if it precedes erase or line kill; whatever
characters are left after that may be interpreted by other programs as well.
Exercise 1-1. Explain what happens with
$ date\e
a
Exercise 1-2. Most shells (though not the 7th Edition shell) interpret # as introducing a
comment, and ignore all text from the # to the end of the line. Given this, explain the
following transcript, assuming your erase character is also #:
$ date
Mon Sep 26 12:39:56 EDT 1983
$ #date
Mon Sep 26 12:40:21 EDT 1983
$ \fdate
$ \\#date
#date: not found
$
a
Type-ahead
The kernel reads what you type as you type it, even if it’s busy with some-
thing else, so you can type as fast as you want, whenever you want, even when
some command is printing at you. If you type while the system is printing,
your input characters will appear intermixed with the output characters, but
they will be stored away and interpreted in the correct order. You can type
commands one after another without waiting for them to finish or even to
begin.
Stopping a program
You can stop most commands by typing the character DELETE. The
BREAK key found on most terminals may also work, although this is system
dependent. In a few programs, like text editors, DELETE stops whatever the
program is doing but leaves you in that program. Turning off the terminal or8 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
hanging up the phone will stop most programs.
If you just want output to pause, for example to keep something critical
from disappearing off the screen, type ctl-s. The output will stop almost
immediately; your program is suspended until you start it again. When you
want to resume, type ctl-q.
Logging out
The proper way to log out is to type ctl-d instead of a command; this tells
the shell that there is no more input. (How this actually works will be
explained in the next chapter.) You can usually just turn off the terminal or
hang up the phone, but whether this really logs you out depends on your sys-
tem.
Mail
The system provides a postal system for communicating with other users, so
some day when you log in, you will see the message
You have mail.
before the first prompt. To read your mail, type
$ mail
Your mail will be printed, one message at a time, most recent first. After each
item, mail waits for you to say what to do with it. The two basic responses
are d, which deletes the message, and RETURN, which does not (so it will still
be there the next time you read your mail). Other responses include p to
reprint a message, s filename to save it in the file you named, and q to quit
from mail. (If you don’t know what a file is, think of it as a place where you
can store information under a name of your choice, and retrieve it later. Files
are the topic of Section 1.2 and indeed of much of this book.)
mail is one of those programs that is likely to differ from what we describe
here; there are many variants. Look in your manual for details.
Sending mail to someone is straightforward. Suppose it is to go to the per-
son with the login name nico. The easiest way is thi
$ mail nico
Now type in the text of the letter
‘on as many lines as you like ...
After the last line of the letter
type a control-d.
ctl-a
$
The ctl-d signals the end of the letter by telling the mail command that there
is no more input. If you change your mind half-way through composing the
letter, press DELETE instead of ctl-d. The half-formed letter will be stored in
a file called dead. letter instead of being sent.CHAPTER I UNIX FOR BEGINNERS 9
For practice, send mail to yourself, then type mail to read it. (This isn’t
as aberrant as it might sound — it’s a handy reminder mechanism.)
There are other ways to send mail — you can send a previously prepared
letter, you can mail to a number of people all at once, and you may be able to
send mail to people on other machines. For more details see the description of
the mail command in Section | of the UNIX Programmer's Manual. Hen-
ceforth we'll use the notation mail(1) to mean the page describing mail in
Section 1 of the manual. All of the commands discussed in this chapter are
found in Section 1.
There may also be a calendar service (see calendar(1)); we'll show you in
Chapter 4 how to set one up if it hasn’t been done already.
Writing to other users
If your UNIX system has multiple users, someday, out of the blue, your ter-
minal will print something like
Message from mary tty7...
accompanied by a startling beep. Mary wants to write to you, but unless you
take explicit action you won’t be able to write back. To respond, type
$ write mary
This establishes a two-way communication path. Now the lines that Mary
types on her terminal will appear on yours and vice versa, although the path is
slow, rather like talking to the moon.
If you are in the middle of something, you have to get to a state where you
can type a command. Normally, whatever program you are running has to
stop or be stopped, but some programs, such as the editor and write itself,
have a ‘!’ command to escape temporarily to the shell — see Table 2 in
Appendix 1.
The write command imposes no rules, so a protocol is needed to keep
what you type from getting garbled up with what Mary types. One convention
to take turns, ending each turn with (o), which stands for “over,” and to
signal your intent to quit with (oo), for “over and out.”10. THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
Mary's terminal: Your terminal:
$ write you
$ Message from mary tty7...
write mary
Message from you ttya...
did you forget lunch? (o)
did you forget lunch? (o)
five@
ten minutes (0)
ten minutes (0)
ok (00)
ok (00)
ctl-d
EOF
etl-d
$ EOF
$
You can also exit from write by pressing DELETE. Notice that your typing
errors do not appear on Mary’s terminal.
If you try to write to someone who isn’t logged in, or who doesn’t want to
be disturbed, you'll be told. If the target is logged in but doesn’t answer after
a decent interval, the person may be busy or away from the terminal; simply
type cil-d or DELETE. If you don’t want to be disturbed, use mesg(1).
News
Many UNIX systems provide a news service, to keep users abreast of
interesting and not so interesting events. Try typing
$ news
There is also a large network of UNIX systems that keep in touch through tele-
phone calls; ask a local expert about netnews and USENET.
The manual
The UNIX Programmer’s Manual describes most of what you need to know
about the system. Section | deals with commands, including those we discuss
in this chapter. Section 2 describes the system calls, the subject of Chapter 7,
and Section 6 has information about games. The remaining sections talk about
functions for use by C programmers, file formats, and system maintenance.
(The numbering of these sections varies from system to system.) Don’t forget
the permuted index at the beginning; you can skim it quickly for commands
that might be relevant to what you want to do. There is also an introduction
to the system that gives an overview of how things work.
Often the manual is kept on-line so that you can read it on your terminal.
If you get stuck on something, and can’t find an expert to help, you can print
any manual page on your terminal with the command man command-name.CHAPTER 1 UNIX FOR BEGINNERS IL
Thus to read about the who command, type
$ man who
and, of course,
$ man man
tells about the man command.
Computer-aided instruction
Your system may have a command called learn, which provides
computer-aided instruction on the file system and basic commands, the editor,
document preparation, and even C programming. Try
$ learn
If learn exists on your system, it will tell you what to do from there. If that
fails, you might also try teach.
Games
It’s not always admitted officially, but one of the best ways to get comfort-
able with a computer and a terminal is to play games. The UNIX system comes
with a modest supply of games, often supplemented locally. Ask around, or
see Section 6 of the manual.
1.2 Day-to-day use: files and common commands
Information in a UNIX system is stored in files, which are much like ordi-
nary office files. Each file has a name, contents, a place to keep it, and some
administrative information such as who owns it and how big it is. A file might
contain a letter, or a list of names and addresses, or the source statements of a
program, or data to be used by a program, or even programs in their execut-
able form and other non-textual material.
The UNIX file system is organized so you can maintain your own personal
files without interfering with files belonging to other people, and keep people
from interfering with you too. There are myriad programs that manipulate
files, but for now, we will look at only the more frequently used ones.
Chapter 2 contains a systematic discussion of the file system, and introduces
many of the other file-related commands.
Creating files — the editor
If you want to type a paper or a letter or a program, how do you get the
information stored in the machine? Most of these tasks are done with a text
editor, which is a program for storing and manipulating information in the
computer. Almost every UNIX system has a screen editor, an editor that takes
advantage of modern terminals to display the effects of your editing changes in
context as you make them. Two of the most popular are vi and emacs. We12 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
won’t describe any specific screen editor here, however, partly because of typo-
graphic limitations, and partly because there is no standard one.
There is, however, an older editor called ed that is certain to be available
on your system. It takes no advantage of special terminal features, so it will
work on any terminal. It also forms the basis of other essential programs
(including some screen editors), so it’s worth learning eventually. Appendix 1
contains a concise description.
No matter what editor you prefer, you'll have to learn it well enough to be
able to create files. We'll use ed here to make the discussion concrete, and to
ensure that you can make our examples run on your system, but by all means
use whatever editor you like best.
To use ed to create a file called junk with some text in it, do the follow-
ing:
$ ed Invokes the text editor
a ed command to add text
now type in
whatever text you want ...
: Type a ‘." by itself to stop adding text
w junk Write your text into a file called junk
39 ed prints number of characters written
q Quit ed
$
The command a (“append”) tells ed to start collecting text. The “.” that sig-
nals the end of the text must be typed at the beginning of a line by itself.
Don’t forget it, for until it is typed, no other ed commands will be recognized
— everything you type will be treated as text to be added.
The editor command w (“write”) stores the information that you typed;
“w junk” stores it in a file called junk. The filename can be any word you
like; we picked junk to suggest that this file isn’t very important.
ed responds with the number of characters it put in the file. Until the w
command, nothing is stored permanently, so if you hang up and go home the
information is not stored in the file. (If you hang up while editing, the data
you were working on is saved in a file called ed. hup, which you can continue
with at your next session.) If the system crashes (i.e., stops unexpectedly
because of software or hardware failure) while you are editing, your file will
contain only what the last write command placed there. But after w the infor-
mation is recorded permanently; you can access it again later by typing
$ ed junk
Of course, you can edit the text you typed in, to correct spelling mistakes,
change wording, rearrange paragraphs and the like. When you're done, the q
command (“‘quit”) leaves the editor.CHAPTER 1 UNIX FOR BEGINNERS — 13
What files are out there?
Let’s create two files, junk and temp, so we know what we have:
$ ed
a
To be or not to be
w junk
19
q
$ ed
a
That is the question.
w temp
22
q
$
The character counts from ed include the character at the end of each line,
called newline, which is how the system represents RETURN.
The 1s command lists the names (not contents) of files:
$ 1s
junk
temp
$
which are indeed the two files just created. (There might be others as well
that you didn’t create yourself.) The names are sorted into alphabetical order
automatically.
1s, like most commands, has options that may be used to alter its default
behavior. Options follow the command name on the command line, and are
usually made up of an initial minus sign ‘~’ and a single letter meant to suggest
the meaning. For example, 1s -t causes the files to be listed in “time” order:
the order in which they were last changed, most recent first.
$ Is -t
temp
junk
$
The -1 option gives a “long” listing that provides more information about each
file:
$ 1s -1
total 2
-rw-r--r-- 1 you 19 Sep 26 16:25 junk
-rw-r--r-- 1 you 22 Sep 26 16:26 temp
$14 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
“total 2” tells how many blocks of disc space the files occupy; a block is
usually either 512 or 1024 characters. The string -rw-r--r-- tells who has
permission to read and write the file; in this case, the owner (you) can read
and write, but others can only read it. The “1” that follows is the number of
links to the file; ignore it until Chapter 2. “you” is the owner of the file, that
is, the person who created it. 19 and 22 are the number of characters in the
corresponding files, which agree with the numbers you got from ed. The date
and time tell when the file was last changed.
Options can be grouped: 1s -1t gives the same data as 1s -1, but sorted
with most recent files first. The -u option gives information on when files
were used: 1s -1ut gives a long (-1) listing in the order of most recent use.
The option -r reverses the order of the output, so 1s -rt lists in order of
least recent use. You can also name the files you're interested in, and 1s will
list the information about them only:
$ 1s -1 junk
ee ee 19 Sep 26 16:25 junk
$
The strings that follow the program name on the command line, such as -1
and junk in the example above, are called the program’s arguments. Argu-
ments are usually options or names of files to be used by the command.
Specifying options by a minus sign and a single letter, such as -t or the
combined -1t, is a common convention. In general, if a command accepts
such optional arguments, they precede any filename arguments, but may other-
wise appear in any order. But UNIX programs are capricious in their treatment
of multiple options. For example, standard 7th Edition 1s won’t accept
$ Is -1 -t Doesn't work in 7th Edition
as a synonym for 1s -1t, while other programs require multiple options to be
separated.
As you learn more, you will find that there is little regularity or system to
optional arguments. Each command has its own idiosyncrasies, and its own
choices of what letter means what (often different from the same function in
other commands). This unpredictable behavior is disconcerting and is often
cited as a major flaw of the system. Although the situation is improving —
new versions often have more uniformity — all we can suggest is that you try
to do better when you write your own programs, and in the meantime keep a
copy of the manual handy.
Printing files — cat and pr
Now that you have some files, how do you look at their contents? There
are many programs to do that, probably more than are needed. One possibility
is to use the editor:CHAPTER 1 UNIX FOR BEGINNERS 15
$ ed junk
49 ed reports 19 characters in junk
1,$p Print lines 1 through last
To be or not to be File has only one line
q All done
$
ed begins by reporting the number of characters in junk; the command 1, $p
tells it to print all the lines in the file. After you learn how to use the editor,
you can be selective about the parts you print.
There are times when it’s not feasible to use an editor for printing. For
example, there is a limit — several thousand lines — on how big a file ed can
handle. Furthermore, it will only print one file at a time, and sometimes you
want to print several, one after another without pausing. So here are a couple
of alternatives.
First is cat, the simplest of all the printing commands. cat prints the con-
tents of all the files named by its arguments:
$ cat junk
To be or not to be
$ cat temp
That is the question.
$ cat junk temp
To be or not to be
That is the question.
$
The named file or files are catenated+ (hence the name “cat”’) onto the termi-
nal one after another with nothing between.
There’s no problem with short files, but for long ones, if you have a high-
speed connection to your computer, you have to be quick with crl-s to stop
output from cat before it flows off your screen. There is no “standard” com-
mand to print a file on a video terminal one screenful at a time, though almost
every UNIX system has one. Your system might have one called pg or more.
Ours is called p; we'll show you its implementation in Chapter 6.
Like cat, the command pr prints the contents of all the files named in a
list, but in a form suitable for line printers: every page is 66 lines (11 inches)
long, with the date and time that the file was changed, the page number, and
the filename at the top of each page, and extra lines to skip over the fold in
the paper. Thus, to print junk neatly, then skip to the top of a new page and
print temp neatly:
+ “Catenate” is a slightly obscure synonym for “concatenate.”16 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
$ pr junk temp
Sep 26 16:25 1983 junk Page 1
To be or not to be
(60 more blank lines)
Sep 26 16:26 1983 temp Page 1
That is the question.
(60 more blank lines)
$
pr can also produce multi-column output:
$ pr -3 filenames
prints each file in 3-column format. You can use any reasonable number in
place of “3” and pr will do its best. (The word filenames is a place-holder for
a list of names of files.) pr -m will print a set of files in parallel columns.
See pr(1).
It should be noted that pr is not a formatting program in the sense of re-
arranging lines and justifying margins. The true formatters are nroff and
troff, which are discussed in Chapter 9.
There are also commands that print files on a high-speed printer. Look in
your manual under names like 1p and lpr, or look up “printer” in the per-
muted index. Which to use depends on what equipment is attached to your
machine. pr and lpr are often used together; after pr formats the informa-
tion properly, 1pr handles the mechanics of getting it to the line printer. We
will return to this a little later.
Moving, copying, removing files — mv, cp, xm
Let’s look at some other commands. The first thing is to change the name
of a file. Renaming a file is done by “moving” it from one name to another,
like thi:
$ mv junk precious
This means that the file that used to be called junk is now called precious;
the contents are unchanged. If you run 1s now, you will see a different list:
junk is not there but precious is.CHAPTER 1 UNIX FOR BEGINNERS = 17
$ 1s
precious
temp
$ cat junk
cat: can’t open junk
$
Beware that if you move a file to another one that already exists, the target file
is replaced.
To make a copy of a file (that is, to have two versions of something), use
the cp command:
$ cp precious precious.save
makes a duplicate copy of precious in precious.save.
Finally, when you get tired of creating and moving files, the rm command
removes all the files you name:
$ rm temp junk
xm: junk nonexistent
$
You will get a warning if one of the files to be removed wasn’t there, but oth-
erwise xm, like most UNIX commands, does its work silently. There is no
prompting or chatter, and error messages are curt and sometimes unhelpful.
Brevity can be disconcerting to newcomers, but experienced users find talkative
commands annoying.
What's in a filename?
So far we have used filenames without ever saying what a legal name is, so
it’s time for a couple of rules. First, filenames are limited to 14 characters.
Second, although you can use almost any character in a filename, common
sense says you should stick to ones that are visible, and that you should avoid
characters that might be used with other meanings. We have already seen, for
example, that in the 1s command, 1s -t means to list in time order. So if
you had a file whose name was -t, you would have a tough time listing it by
name. (How would you do it?) Besides the minus sign as a first character,
there are other characters with special meaning. To avoid pitfalls, you would
do well to use only letters, numbers, the period and the underscore until you're
familiar with the situation. (The period and the underscore are conventionally
used to divide filenames into chunks, as in precious.save above.) Finally,
don’t forget that case distinctions matter — junk, Junk, and JUNK are three
different names.
A handful of useful commands
Now that you have the rudiments of creating files, listing their names, and
printing their contents, we can look at a half-dozen file-processing commands.18 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
To make the discussion concrete, we'll use a file called poem that contains a
familiar verse by Augustus De Morgan. Let’s create it with ed:
$ ed
a
Great fleas have little fleas
upon their backs to bite ‘em,
And little fleas have lesser fleas,
and so ad infinitum.
And the great fleas themselves, in turn,
have greater fleas to go on;
While these again have greater still,
and greater still, and so on.
w poem
263
qa
$
The first command counts the lines, words and characters in one or more
files; it is named we after its word-counting function:
$ we poem
8 46 263 poem
$
That is, poem has 8 lines, 46 words, and 263 characters. The definition of a
“word” is very simple: any string of characters that doesn’t contain a blank,
tab or newline.
we will count more than one file for you (and print the totals), and it will
also suppress any of the counts if requested. See wo(1).
The second command is called grep; it searches files for lines that match a
pattern. (The name comes from the ed command g/regular-expression/p,
which is explained in Appendix 1.) Suppose you want to look for the word
“fleas” in poem:
$ grep fleas poem
Great fleas have little fleas
And little fleas have lesser fleas,
And the great fleas themselves, in turn,
have greater fleas to go on;
$
grep will also look for lines that don’t match the pattern, when the option -v
is used. (It’s named ‘v’ after the editor command; you can think of it as
inverting the sense of the match.)CHAPTER 1 UNIX FOR BEGINNERS 19
$ grep -v fleas poem
upon their backs to bite ‘em,
and so ad infinitum.
While these again have greater still,
and greater still, and so on.
$
grep can be used to search several files; in that case it will prefix the
filename to each line that matches, so you can tell where the match took place.
There are also options for counting, numbering, and so on. grep will also
handle much more complicated patterns than just words like “fleas,” but we
will defer consideration of that until Chapter 4.
The third command is sort, which sorts its input into alphabetical order
line by line. This isn’t very interesting for the poem, but let’s do it anyway,
just to see what it looks like:
$ sort poem
and greater still, and so on.
and so ad infinitum.
have greater fleas to go on;
upon their backs to bite ‘em,
And little fleas have lesser fleas,
And the great fleas themselves, in turn,
Great fleas have little fleas
While these again have greater still,
$
The sorting is line by line, but the default sorting order puts blanks first, then
upper case letters, then lower case, so it’s not strictly alphabetical.
sort has zillions of options to control the order of sorting — reverse
order, numerical order, dictionary order, ignoring leading blanks, sorting on
fields within the line, etc. — but usually one has to look up those options to be
sure of them. Here are a handful of the most commo
sort -r Reverse normal order
sort -n Sort in numeric order
sort -nr Sort in reverse numeric order
sort -f Fold upper and lower case together
sort +n Sort starting at n+1-st field
Chapter 4 has more information about sort.
Another file-examining command is tail, which prints the last 10 lines of
a file. That’s overkill for our eight-line poem, but it’s good for larger files.
Furthermore, tail has an option to specify the number of lines, so to print
the last line of poem:20 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
$ tail -1 poem
and greater still, and so on.
$
tail can also be used to print a file starting at a specified line:
$ tail +3 filename
starts printing with the 3rd line. (Notice the natural inversion of the minus
sign convention for arguments.)
The final pair of commands is for comparing files. Suppose that we have a
variant of poem in the file new_poem:
$ cat poem
Great fleas have little fleas
upon their backs to bite ‘em,
And little fleas have lesser fleas,
and so ad infinitum.
And the great fleas themselves, in turn,
have greater fleas to go on;
While these again have greater still,
and greater still, and so on.
$ cat new_poem
Great fleas have little fleas
upon their backs to bite them,
And little fleas have lesser fleas,
and so on ad infinitum.
And the great fleas themselves, in turn,
have greater fleas to go on;
While these again have greater still,
and greater still, and so on.
$
There’s not much difference between the two files; in fact you'll have to look
hard to find it. This is where file comparison commands come in handy. cmp
finds the first place where two files differ:
$ cmp poem new_poem
poem new_poem differ: char 58, line 2
$
This says that the files are different in the second line, which is true enough,
but it doesn’t say what the difference is, nor does it identify any differences
beyond the first.
The other file comparison command is 4if£, which reports on all lines that
are changed, added or deleted:CHAPTER 1 UNIX FOR BEGINNERS 21.
$ diff poem new_poem
2c2
< upon their backs to bite ‘em,
> upon their backs to bite them,
< and so ad infinitum.
> and so on ad infinitum.
$
This says that line 2 in the first file (poem) has to be changed into line 2 of the
second file (new_poem), and similarly for line 4.
Generally speaking, cmp is used when you want to be sure that two files
really have the same contents. It’s fast and it works on any kind of file, not
just text. diff is used when the files are expected to be somewhat different,
and you want to know exactly which lines differ. diff works only on files of
text.
A summary of file system commands
Table 1.1 is a brief summary of the commands we’ve seen so far that deal
with files.
1.3 More about files: directories
The system distinguishes your file called junk from anyone else’s of the
same name. The distinction is made by grouping files into directories, rather
in the way that books are placed on shelves in a library, so files in different
directories can have the same name without any conflict.
Generally each user has a personal or home directory, sometimes called
login directory, that contains only the files that belong to him or her. When
you log in, you are “in” your home directory. You may change the directory
you are working in — often called your working or current directory — but
your home directory is always the same. Unless you take special action, when
you create a new file it is made in your current directory. Since this is initially
your home directory, the file is unrelated to a file of the same name that might
exist in someone else’s directory.
A directory can contain other directories as well as ordinary files (“Great
directories have lesser directories ...”). The natural way to picture this organi-
zation is as a tree of directories and files. It is possible to move around within
this tree, and to find any file in the system by starting at the root of the tree
and moving along the proper branches. Conversely, you can start where you
are and move toward the root.
Let’s try the latter first. Our basic tool is the command pwd (“print work-
ing directory”), which prints the name of the directory you are currently in:22. THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
Table 1.1:
1s
1s filenames
ls -t
a
is -u
cy
ed filename
cp filel file2
mv file! file
xm filenames
cat filenames
pr filenames
pr -n filenames
pr -m filenames
we filenames
we -1. filenames
grep pattern filenames
grep -v pattern files
sort filenames
tail filename
tail -n filename
tail +n filename
cmp file! file2
aift file! file2
Common File System Commands
list names of all files in current directory
list only the named files
list in time order, most recent first.
‘ist long: more information; also 1s -1t
list by time last used; also 1s -1u, 1s -lut
list in reverse order; also -rt, -r1t, etc.
edit named file
copy file! to file2, overwrite old file2 if it exists
move file! to file2, overwrite old file2 if it exists
remove named files, irrevocably
print contents of named files
print contents with header, 66 lines per page
print in n columns
print named files side by side (multiple columns)
count lines, words and characters for each file
count lines for each file
print lines matching pattern
print lines not matching pattern
sort files alphabetically by line
print last 10 lines of file
print last n lines of file
start printing file at line n
print location of first difference
print all differences between files
$ pwd
/asx/you
$
This says that you are currently in the directory you, in the directory usr,
which in turn is in the root directory, which is conventionally called just ‘/”.
The / characters separate the components of the name; the limit of 14 charac-
ters mentioned above applies to each component of such a name. On many
systems, /usr is a directory that contains the directories of all the normal
users of the system. (Even if your home directory is not /usr/you, pwd will
print something analogous, so you should be able to follow what happens
below.)
If you now typeCHAPTER 1 UNIX FOR BEGINNERS 23
$ 1s /usr/you
you should get exactly the same list of file names as you get from a plain 1s.
When no arguments are provided, 1s lists the contents of the current direc-
tory; given the name of a directory, it lists the contents of that directory.
Next, try
$ 1s /usr
This should print a long series of names, among which is your own login direc-
tory you.
The next step is to try listing the root itself. You should get a response
similar to this:
$is/7
bin
boot
dev
etc
lib
tmp
unix
usr
$
(Don’t be confused by the two meanings of /: it’s both the name of the root
and a separator in filenames.) Most of these are directories, but unix is actu-
ally a file containing the executable form of the UNIX kernel. More on this in
Chapter 2.
Now try
$ cat /usr/you/junk
(if junk is still in your directory). The name
/usr/you/ junk
is called the pathname of the file. “Pathname” has an intuitive meaning: it
represents the full name of the path from the root through the tree of direc-
tories to a particular file. It is a universal rule in the UNIX system that wher-
ever you can use an ordinary filename, you can use a pathname.
The file system is structured like a genealogical tree; here is a picture that
may make it clearer.24 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
Ha
bing ideve etc. user tmp unix boot)
“hd
you" mike paul
junk junk temp tM.
Your file named junk is unrelated to Paul’s or to Mary’s.
Pathnames aren’t too exciting if all the files of interest are in your own
directory, but if you work with someone else or on several projects con-
currently, they become handy indeed. For example, your friends can print
your junk by saying
/
$ cat /usr/you/junk
Similarly, you can find out what files Mary has by saying
$ 1s /usr/mary
data
junk
$
or make your own copy of one of her files by
$ cp /usr/mary/data data
or edit her file:
$ ed /usr/mary/data
If Mary doesn’t want you poking around in her files, or vice versa, privacy
can be arranged. Each file and directory has read-write-execute permissions
for the owner, a group, and everyone else, which can be used to control access.
(Recall 1s -1.) In our local systems, most users most of the time find open-
ness of more benefit than privacy, but policy may be different on your system,
so we'll get back to this in Chapter 2.
As a final set of experiments with pathnames, try
$ 1s /bin /usr/bin
Do some of the names look familiar? When you run a command by typing its
name after the prompt, the system looks for a file of that name. It normally
looks first in your current directory (where it probably doesn’t find it), then in
/bin, and finally in /usr/bin. There is nothing special about commandsCHAPTER 1 UNIX FOR BEGINNERS 25
like cat or 1s, except that they have been collected into a couple of direc-
tories to be easy to find and administer. To verify this, try to execute some of
these programs by using their full pathnames:
$ /bin/date
Mon Sep 26 23:29:32 EDT 1983
$ /bin/who
srm tty1 Sep 26 22:20
cvw tty4 Sep 26 22:40
you ttyS Sep 26 23:04
$
Exercise 1-3. Try
$ 1s /asr/games
and do whatever comes naturally. Things might be more fun outside of normal working
hours.
Changing directory — ca
If you work regularly with Mary on information in her directory, you can
say “I want to work on Mary’s files instead of my own.” This is done by
changing your current directory with the cd command:
$ cd /usr/mary
Now when you use a filename (without /’s) as an argument to cat or pr, it
refers to the file in Mary’s directory. Changing directories doesn’t affect any
permissions associated with a file — if you couldn’t access a file from your
own directory, changing to another directory won’t alter that fact.
It is usually convenient to arrange your own files so that all the files related
to one thing are in a directory separate from other projects. For example, if
you want to write a book, you might want to keep all the text in a directory
called book. The command mkdir makes a new directory.
$ mkdir book Make a directory
$ cd book Go to it
$ pwd Make sure you're in the right place
/asx/you/book
Write the book (several minutes pass)
$ cd .. Move up one level in file system
$ pwd
/usr/you
$
refers to the parent of whatever directory you are currently in, the direc-
tory one level closer to the root. *.’ is a synonym for the current directory.
$ ed Return to home directory26 = THE UNIX PROGRAMMING ENVIRONMENT CHAPTER |
all by itself will take you back to your home directory, the directory where you
log in.
Once your book is published, you can clean up the files. To remove the
directory book, remove all the files in it (we'll show a fast way shortly), then
cd to the parent directory of book and type
$ xmdir book
rmdix will only remove an empty directory.
1.4 The shell
When the system prints the prompt $ and you type commands that get exe-
cuted, it’s not the kernel that is talking to you, but a go-between called the
command interpreter or shell. The shell is just an ordinary program like date
or who, although it can do some remarkable things. The fact that the shell sits
between you and the facilities of the kernel has real benefits, some of which
we'll talk about here. There are three main ones:
Filename shorthands: you can pick up a whole set of filenames as argu-
ments to a program by specifying a pattern for the names — the shell will
find the filenames that match your pattern.
Input-output redirection: you can arrange for the output of any program to
go into a file instead of onto the terminal, and for the input to come from a
file instead of the terminal. Input and output can even be connected to
other programs.
Personalizing the environment: you can define your own commands and
shorthands.
Filename shorthand
Let’s begin with filename patterns. Suppose you're typing a large document
like a book. Logically this divides into many small pieces, like chapters and
perhaps sections. Physically it should be divided too, because it is cumbersome
to edit large files. Thus you should type the document as a number of files.
You might have separate files for each chapter, called ch1, ch2, etc. Or, if
each chapter were broken into sections, you might create files called
chi.1
ch1.2
ch1.3
ch2.1
ch2.2
which is the organization we used for this book. With a systematic naming
convention, you can tell at a glance where a particular file fits into the whole.
What if you want to print the whole book? You could sayCHAPTER 1 UNIX FOR BEGINNERS 27
$ pr ch1.1 ch1.2 ch1.3 ...
but you would soon get bored typing filenames and start to make mistakes.
This is where filename shorthand comes in. If you say
$ pr chx
the shell takes the * to mean “any string of characters,” so ch» is a pattern
that matches all filenames in the current directory that begin with ch. The
shell creates the list, in alphabetical? order, and passes the list to pr. The pr
command never sees the *; the pattern match that the shell does in the current
directory generates a list of strings that are passed to pr.
The crucial point is that filename shorthand is not a property of the pr
command, but a service of the shell. Thus you can use it to generate a
sequence of filenames for any command. For example, to count the words in
the first chapter:
$ wo ch1.*
113 562 3200 ch1.0
935 4081 22435 ch1.1
974 4191 22756 ch1.2
378 «1561 8481 ch1.3
1293 5298 28841 ch1.4
33 194 1190 ch1.5
75 323 2030 ch1.6
3801 16210 88933 total
$
There is a program called echo that is especially valuable for experiment-
ing with the meaning of the shorthand characters. As you might guess, echo
does nothing more than echo its arguments:
$ echo hello world
hello world
$
But the arguments can be generated by pattern-matching:
$ echo ch1.*
lists the names of all the files in Chapter 1,
$ echo *
lists all the filenames in the current directory in alphabetical order,
$ pr
prints all your files (in alphabetical order), and
+ Again, the order is not strictly alphabetical, in that upper case letters come before lower case
letters. See ascii(7) for the ordering of the characters used in the sort.28 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
$ rm *
removes ail files in your current directory. (You had better be very sure that’s
what you wanted to say!)
The * is not limited to the last position in a filename — *’s can be any-
where and can occur several times. Thus
$ rm *.save
removes all files that end with . save.
Notice that the filenames are sorted alphabetically, which is not the same as
numerically. If your book has ten chapters, the order might not be what you
intended, since ch10 comes before ch2:
$ echo *
ch1.1 ch1.2 ... ch10.1 ch10.2 ... ch2.1 ch2.2 ...
$
The * is not the only pattern-matching feature provided by the shell,
although it’s by far the most frequently used. The pattern [...] matches any
of the characters inside the brackets. A range of consecutive letters or digits
can be abbreviated:
$ pr ch[12346789]* Print chapters 1,2,3,4,6,7,8,9 but not 5
$ pr ch[1-46-9]* Same thing
$ xm templa-z] Remove any of tempa, ..., tempz that exist
The ? pattern matches any single character:
$ 1s? List files with single-character names
$ 1s -1 ch?.1 List ch1.1 ch2.1 ch3.1, etc. but not ch10.1
$ xm temp? Remove files temp1, ..., tempa, etc.
Note that the patterns match only existing filenames. In particular, you cannot
make up new filenames by using patterns. For example, if you want to expand
ch to chapter in each filename, you cannot do it this way:
$ mv ch.* chapter.+ Doesn't work!
because chapter .* matches no existing filenames.
Pattern characters like * can be used in pathnames as well as simple
filenames; the match is done for each component of the path that contains a
special character. Thus /usr/mary/* performs the match in /usr/mary,
and /usr/*/calendar generates a list of pathnames of all user calendar
files.
If you should ever have to turn off the special meaning of *, ?, etc.,
enclose the entire argument in single quotes, as in
OH ae
You can also precede a special character with a backslash:CHAPTER 1 UNIX FOR BEGINNERS 29.
$ 1s \?
(Remember that because ? is not the erase or line kill character, this backslash
is interpreted by the shell, not by the kernel.) Quoting is treated at length in
Chapter 3.
Exercise 1-4. What are the differences among these commands?
$ 1s junk $ echo junk
$ is / $ echo /
$ 1s $ echo
sis # $ echo *
$ Is ‘*’ $ echo ’s’
o
Input-output redirection
Most of the commands we have seen so far produce output on the terminal;
some, like the editor, also take their input from the terminal. It is nearly
universal that the terminal can be replaced by a file for either or both of input
and output. As one example,
$ 1s
makes a list of filenames on your terminal. But if you say
$ Is >filelist
that same list of filenames will be placed in the file £ilelist instead. The
symbol > means “put the output in the following file, rather than on the termi-
nal.” The file will be created if it doesn’t already exist, or the previous con-
tents overwritten if it does. Nothing is produced on your terminal. As
another example, you can combine several files into one by capturing the out-
put of cat in a file:
$ cat £1 £2 £3 >temp
The symbol >> operates much as > does, except that it means “add to the
end of.” That is,
$ cat £1 £2 £3 >>temp
copies the contents of £1, £2 and £3 onto the end of whatever is already in
temp, instead of overwriting the existing contents. As with >, if temp doesn’t
exist, it will be created initially empty for you.
In a similar way, the symbol < means to take the input for a program from
the following file, instead of from the terminal. Thus, you can prepare a letter
in file et, then send it to several people with
$ mail mary joe tom bob or <, but ou
formatting is traditional30 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER I
Given the capability of redirecting output with >, it becomes possible to
combine commands to achieve effects not possible otherwise. For example, to
print an alphabetical list of users,
$ who >temp
$ sort temp
$ we -1 temp
$ we -1 temp
$ pr -3 temp
$ grep mary and < is being done by the
shell, not by the individual programs. Centralizing the facility in the shell
means that input and output redirection can be used with any program; the
program itself isn’t aware that something unusual has happened.
This brings up an important convention. The command
$ sort 1s.out
causes 1s. out to be included in the list of names. 0
Exercise 1-6. Explain the output from
$ we temp >temp
If you misspell a command name, as in
$ woh >temp
what happens? 0
Pipes
All of the examples at the end of the previous section rely on the same
trick: putting the output of one program into the input of another via a tem-
porary file. But the temporary file has no other purpose; indeed, it’s clumsy to
have to use such a file. This observation leads to one of the fundamental con-
tributions of the UNIX system, the idea of a pipe. A pipe is a way to connect
the output of one program to the input of another program without any tem-
porary file; a pipeline is a connection of two or more programs through pipes.
Let us revise some of the earlier examples to use pipes instead of tem-
poraries. The vertical bar character | tells the shell to set up a pipeline:
$ who / sort Print sorted list of users
$ who | we -1 Count users
$ Is i we -1 Count files
$ Is f pr -3 3-column list of filenames
$ who / grep mary Look for particular user
Any program that reads from the terminal can read from a pipe instead;
any program that writes on the terminal can write to a pipe. This is where the
convention of reading the standard input when no files are named pays off: any
program that adheres to the convention can be used in pipelines. grep, pr,
sort and we are all used that way in the pipelines above.
You can have as many programs in a pipeline as you wish:32 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
$ 1s | pr -3 | Ipr
creates a 3-column list of filenames on the line printer, and
$ who | grep mary ! we -1
counts how many times Mary is logged in.
The programs in a pipeline actually run at the same time, not one after
another. This means that the programs in a pipeline can be interactive; the
kernel looks after whatever scheduling and synchronization is needed to make
it all work.
As you probably suspect by now, the shell arranges things when you ask for
a pipe; the individual programs are oblivious to the redirection. Of course,
programs have to operate sensibly if they are to be combined this way. Most
commands follow a common design, so they will fit properly into pipelines at
any position. Normally a command invocation looks like
command optional-arguments optional-filenames
If no filenames are given, the command reads its standard input, which is by
default the terminal (handy for experimenting) but which can be redirected to
come from a file or a pipe. At the same time, on the output side, most com-
mands write their output on the standard output, which is by default sent to the
terminal. But it too can be redirected to a file or a pipe.
Error messages from commands have to be handled differently, however,
or they might disappear into a file or down a pipe. So each command has a
standard error output as well, which is normally directed to your terminal.
Or, as a picture:
standard input command, standard
: => i —
or files options output
standard
error
Almost all of the commands we have talked about so far fit this model; the
only exceptions are commands like date and who that read no input, and a
few like cmp and diff that have a fixed number of file inputs. (But look at
the ‘-’ option on these.)
Exercise 1-7. Explain the difference between
$ who | sort
andCHAPTER 1 UNIX FOR BEGINNERS 33
$ who >sort
o
Processes
The shell does quite a few things besides setting up pipes. Let us turn
briefly to the basics of running more than one program at a time, since we
have already seen a bit of that with pipes. For example, you can run two pro-
grams with one command line by separating the commands with a semicolon;
the shell recognizes the semicolon and breaks the line into two commands:
$ date; who
Tue Sep 27 01:03:17 EDT 1983
ken ttyO Sep 27 00:43
amr tty! Sep 26 23:45
xob tty2 Sep 26 23:59
bwk tty3 Sep 27 00:06
3 tty4 Sep 26 23:31
you ttyS Sep 26
ber tty7 Sep 26
Both commands are executed (in sequence) before the shell returns with a
prompt character.
You can also have more than one program running simultaneously if you
wish. For example, suppose you want to do something time-consuming like
counting the words in your book, but you don’t want to wait for we to finish
before you start something else. Then you can say
$ wo ch* >we.out &
6944 Process-id printed by the shell
$
The ampersand & at the end of a command line says to the shell “start this
command running, then take further commands from the terminal immedi-
ately,” that is, don’t wait for it to complete. Thus the command will begin,
but you can do something else while it’s running. Directing the output into the
file we. out keeps it from interfering with whatever you're doing at the same
time.
An instance of a running program is called a process. The number printed
by the shell for a command initiated with & is called the process-id; you can
use it in other commands to refer to a specific running program.
It’s important to distinguish between programs and proce: we is a pro-
gram; each time you run the program we, that creates a new process. If
several instances of the same program are running at the same time, each is a
separate process with a different process-
If a pipeline is initiated with &, as in34 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 1
$ pr che | Ipr &
6951 Process-id of lpr
$
the processes in it are all started at once — the & applies to the whole pipeline.
Only one process-id is printed, however, for the last process in the sequence.
The command
$ wait
waits until all processes initiated with & have finished. If it doesn’t return
immediately, you have commands still running. You can interrupt wait with
DELETE.
You can use the process-id printed by the shell to stop a process initiated
with &:
$ kill 6944
If you forget the process-id, you can use the command ps to tell you about
everything you have running. If you are desperate, kill 0 will kill all your
processes except your login shell. And if you're curious about what other users
are doing, ps -ag will tell you about all processes that are currently running.
Here is some sample output:
$ ps -ag
PID TTY TIME CMD
36 co 6:29 /etc/cron
6423 5 0:02 -sh
6704 1 0:04 -sh
6722 1 0:12 vi paper
4430 2 0:03 -sh
6612 7 0:03 -sh
6628 7 1:13 rogue
6843 2 0:02 write dmr
6949 4 0:01 login bimmler
6952 5 0:08 pr ch1.1 ch1.2 ch1.3 ch1.4
69515 0:03 Ipr
6959 5 0:02 ps -ag
6844 1 0:02 write rob
$
PID is the process-id; TTY is the terminal associated with the process (as in
who); TIME is the processor time used in minutes and seconds; and the rest is
the command being run. ps is one of those commands that is different on dif-
ferent versions of the system, so your output may not be formatted like this.
Even the arguments may be different — see the manual page ps(1).
Processes have the same sort of hierarchical structure that files do: each
process has a parent, and may well have children. Your shell was created by a
process associated with whatever terminal line connects you to the system. AsCHAPTER 1 UNIX FOR BEGINNERS 35
you run commands, those processes are the direct children of your shell. If
you run a program from within one of those, for example with the | command
to escape from ed, that creates its own child process which is thus a grandchild
of the shell.
Sometimes a process takes so long that you would like to start it running,
then turn off the terminal and go home without waiting for it to finish. But if
you turn off your terminal or break your connection, the process will normally
be killed even if you used & The command nohup (“no hangup”) was
created to deal with this situation: if you say
$ nohup command &
the command will continue to run if you log out. Any output from the com-
mand is saved in a file called nohup.out. There is no way to nohup a com-
mand retroactively.
If your process will take a lot of processor resources, it is kind to those who
share your system to run your job with lower than normal priority; this is done
by another program called nice:
$ nice expensive-command &
nohup automatically calls nice, because if you're going to log out you can
afford to have the command take a little longer.
Finally, you can simply tell the system to start your process at some wee
hour of the morning when normal people are asleep, not computing. The com-
mand is called at(1):
$ at time
whatever commands
you want ...
ctl-d
$
This is the typical usage, but of course the commands could come from a file:
$ at 3am temp
$ ed ch2.1
1534
x temp
168
od produces text on its standard output, which can then be used anywhere text
can be used. This uniformity is unusual; most systems have several file for-
mats, even for text, and require negotiation by a program or a user to create a
file of a particular type. In UNIX systems there is just one kind of file, and all
that is required to access a file is its name.+
The lack of file formats is an advantage overall — programmers needn’t
worry about file types, and all the standard programs will work on any file —
but there are a handful of drawbacks. Programs that sort and search and edit
really expect text as input: grep can’t examine binary files correctly, nor can
sort sort them, nor can any standard editor manipulate them
There are implementation limitations with most programs that expect text as
input. We tested a number of programs on a 30,000 byte text file containing
no newlines, and surprisingly few behaved properly, because most programs
make unadvertised assumptions about the maximum length of a line of text
(for an exception, see the BUGS section of sort(1)).
+ There’s a good test of file system uniformity, due originally to Doug Mellroy, that the UNIX file
system passes handily. Can the output of a FORTRAN program be used as input to the FORTRAN
compiler? A remarkable number of systems have trouble with this test48 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
Non-text files definitely have their place. For example, very large data-
bases usually need extra address information for rapid access; this has to be
binary for efficiency. But every file format that is not text must have its own
family of support programs to do things that the standard tools could perform
if the format were text. Text files may be a little less efficient in machine
cycles, but this must be balanced against the cost of extra software to maintain
more specialized formats. If you design a file format, you should think care-
fully before choosing a non-textual representation. (You should also think
about making your programs robust in the face of long input lines.)
2.3 Directories and filenames
All the files you own have unambiguous names, starting with /usr/you,
but if the only file you have is junk, and you type 1s, it doesn’t print
/usx/you/ junk; the filename is printed without any prefix:
$ 1s
junk
$
That is because each running program, that is, each process, has a current
directory, and all filenames are implicitly assumed to start with the name of
that directory, unless they begin directly with a slash. Your login shell, and
1s, therefore have a current directory. The command pwd (print working
directory) identifies the current directory:
$ pwd
/asr/you
$
The current directory is an attribute of a process, not a person or a program
— people have login directories, processes have current directories. If a pro-
cess creates a child process, the child inherits the current directory of its
parent. But if the child then changes to a new directory, the parent is unaf-
fected — its current directory remains the same no matter what the child does.
The notion of a current directory is certainly a notational convenience,
because it can save a lot of typing, but its real purpose is organizational.
Related files belong together in the same directory. /usr is often the top
directory of the user file system. (user is abbreviated to usr in the same
spirit as cmp, 1s, etc.) /usr/you is your login directory, your current direc-
tory when you first log in. /usr/src contains source for system programs,
/usr/src/cmd contains source for UNIX commands, /usr/src/cmd/sh
contains the source files for the shell, and so on. Whenever you embark on a
new project, or whenever you have a set of related files, say a set of recipes,
you could create a new directory with mkdir and put the files there.CHAPTER 2 THE FILE SYSTEM = 49
$ pwd
/asr/you
$ mkdir recipes
$ cd recipes
$ pwd
/usr/you/recipes
$ mkdir pie cookie
$ ed pie/apple
$ ed cookie/choc.chip
$
Notice that it is simple to refer to subdirectories. pie/apple has an obvious
meaning: the apple pie recipe, in directory /usr/you/recipes/pie. You
could instead have put the recipe in, say, recipes/apple.pie, rather than
in a subdirectory of recipes, but it seems better organized to put all the pies
together, too. For example, the crust recipe could be kept in
recipes/pie/crust rather than duplicating it in each pie recipe.
Although the file system is a powerful organizational tool, you can forget
where you put a file, or even what files you’ve got. The obvious solution is a
command or two to rummage around in directories. The 1s command is cer-
tainly helpful for finding files, but it doesn’t look in sub-directories.
$ cd
$ Is
junk
recipes
$ file »
junk: ascii text
recipes: directory
$ 1s recipes
cookie
pie
$ 1s recipes/pie
apple
crust
$
This piece of the file system can be shown pictorially as:50 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
/asr/you
/ \
junk recipes
/ \
pie cookie
apple crust choc. chip
The command du (disc usage) was written to tell how much disc space is
consumed by the files in a directory, including all its subdirectories.
$ du
6 ./recipes/pie
4 ./recipes/cookie
4 :/recipes
13
$
The filenames are obvious; the numbers are the number of disc blocks — typi-
cally 512 or 1024 bytes each — of storage for each file. The value for a direc-
tory indicates how many blocks are consumed by all the files in that directory
and its subdirectories, including the directory itself.
du has an option -a, for “all,” that causes it to print out all the files in a
directory. If one of those is a directory, du processes that as well:
$ du -a
./recipes/pie/apple
. /recipes/pie/crust
./recipes/pie
-/recipes/cookie/choc. chip
-/recipes/cookie
-/recipes
-/ junk
Boss Rwauwn
The output of du -a can be piped through grep to look for specific files:
$ du -a / grep choc
3 ./recipes/cookie/choc.chip
$
Recall from Chapter | that the name ‘.’ is a directory entry that refers to the
directory itself; it permits access to a directory without having to know the fullCHAPTER 2 THE FILE SYSTEM 51
name. du looks in a directory for files; if you don’t tell it which directory, it
assumes ‘.’, the directory you are in now. Therefore, junk and ./junk are
names for the same file.
Despite their fundamental properties inside the kernel, directories sit in the
file system as ordinary files. They can be read as ordinary files. But they
can’t be created or written as ordinary files — to preserve its sanity and the
users’ files, the kernel reserves to itself all control over the contents of direc-
tories.
The time has come to look at the bytes in a directory:
$ od -cb «
oo00000 4 «=; ~~ \O \0 \O \O \o \o \O \O \o XO \o \o \o
064 073 056 000 000 000 000 000 000 000 000 000 000 000 000 000
0000020 273 ( =. . \0 \0 \0 \o \o \o \o \o0 \0 \o \o \o
273 050 056 056 000 000 000 000 000 000 000 000 000 000 000 000
ooo0040 252; «=r e c¢ i pe s \0 \0 \o \0 \o \o Xo
252 073 162 145 143 151 160 145 163 000 000 000 000 000 000 000
0000060 230 = 43 u n xk \O \O \O \0 \o \o \0 \o \o Xo
230 075 152 165 156 153 000 000 000 000 000 000 000 000 000 000
0000100
s
See the filenames buried in there? The directory format is a combination of
binary and textual data. A directory consists of 16-byte chunks, the last 14
bytes of which hold the filename, padded with ASCII NUL’s (which have value
0) and the first two of which tell the system where the administrative informa-
tion for the file resides — we'll come back to that. Every directory begins
with the two entries *.’ (“dot”) and *. .” (“dot-dot”).
$ cd Home
$ cd recipes
$ pwd
/asr/you/recipes
$ cd ..3 pwd Up one level
/uasr/you
$ cd ..3 pwd Up another level
vase
$ cd ..3 pwd Up another level
/
$ cd ..3 pwd Up another level
7 Can't go any higher
$
The directory / is called the root of the file system. Every file in the sys-
tem is in the root directory or one of its subdirectories, and the root is its own
parent directory.
Exercise 2-2. Given the information in this section, you should be able to understand
roughly how the 1s command operates. Hint: cat . >f00; 1s -f foo. 052 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
Exercise 2-3. (Harder) How does the pwd command operate? 0
Exercise 2-4. du was written to monitor disc usage. Using it to find files in a directory
hierarchy is at best a strange idiom, and perhaps inappropriate. As an alternative, look
at the manual page for find(1), and compare the two commands. In particular, com-
pare the command du -a } grep ... with the corresponding invocation of find.
Which runs faster? Is it better to build a new tool or use a side effect of an old one?
2.4 Permissions
Every file has a set of permissions associated with it, which determine who
can do what with the file. If you're so organized that you keep your love
letters on the system, perhaps hierarchically arranged in a directory, you prob-
ably don’t want other people to be able to read them. You could therefore
change the permissions on each letter to frustrate gossip (or only on some of
the letters, to encourage it), or you might just change the permissions on the
directory containing the letters, and thwart snoopers that way.
But we must warn you: there is a special user on every UNIX system, called
the super-user, who can read or modify any file on the system. The special
login name root carries super-user privileges; it is used by system administra-
tors when they do system maintenance. There is also a command called su
that grants super-user status if you know the root password. Thus anyone
who knows the super-user password can read your love letters, so don’t keep
sensitive material in the file system.
If you need more privacy, you can change the data in a file so that even the
super-user cannot read (or at least understand) it, using the crypt command
(crypt(1)). Of course, even crypt isn’t perfectly secure. A super-user can
change the crypt command itself, and there are cryptographic attacks on the
crypt algorithm. The former requires malfeasance and the latter takes hard
work, however, so crypt is in practice fairly secure.
In real life, most security breaches are due to passwords that are given
away or easily guessed. Occasionally, system administrative lapses make it
possible for a malicious user to gain super-user permission. Security issues are
discussed further in some of the papers cited in the bibliography at the end of
this chapter.
When you log in, you type a name and then verify that you are that person
by typing a password. The name is your login identification, or login-id. But
the system actually recognizes you by a number, called your user-id, or uid. In
fact different login-id’s may have the same uid, making them indistinguishable
to the system, although that is relatively rare and perhaps undesirable for secu-
rity reasons. Besides a uid, you are assigned a group identification, or group-
id, which places you in a class of users. On many systems, all ordinary users
(as opposed to those with login-id’s like root) are placed in a single group
called other, but your system may be different. The file system, and there-
fore the UNIX system in general, determines what you can do by theCHAPTER 2 THE FILE SYSTEM) = 53.
permissions granted to your uid and group-id.
The file /etc/passwd is the password file; it contains all the login infor-
mation about each user. You can discover your uid and group-id, as does the
system, by looking up your name in /etc/passwd:
$ grep you /etc/passwd
you: gkmbCTrJ04COM: 604: 1:¥.0.A.People:/usr/you:
$
The fields in the password file are separated by colons and are laid out like this
(as seen in passwd(5)):
login-id : encrypted-password : uid : group-id: miscellany : login-directory : shell
The file is ordinary text, but the field definitions and separator are a conven-
tion agreed upon by the programs that use the information in the file.
The shell field is often empty, implying that you use the default shell,
/bin/sh. The miscellany field may contain anything; often, it has your name
and address or phone number.
Note that your password appears here in the second field, but only in an
encrypted form. Anybody can read the password file (you just did), so if your
password itself were there, anyone would be able to use it to masquerade as
you. When you give your password to login, it encrypts it and compares the
result against the encrypted password in /etc/passwd. If they agree, it lets
you log in. The mechanism works because the encryption algorithm has the
property that it’s easy to go from the clear form to the encrypted form, but
very hard to go backwards. For example, if your password is ka-boom, it
might be encrypted as gkmbCTrJ04COM, but given the latter, there's no easy
way to get back to the original.
The kernel decided that you should be allowed to read /etc/passwd by
looking at the permissions associated with the file. There are three kinds of
permissions for each file: read (i.e., examine its contents), write (i.e., change
its contents), and execute (i.e., run it as a program). Furthermore, different
permissions can apply to different people. As file owner, you have one set of
read, write and execute permissions. Your “group” has a separate set. Every-
one else has a third set.
The -1 option of 1s prints the permissions information, among other
things:
$ 1s -1 /etc/passwad
-rw-r--r-- 1 root 5115 Aug 30 10:40 /etc/passwa
$ 1s -lg /etc/passwa
-rw-r--r-- 1 adm 5115 Aug 30 10:40 /etc/passwa
$
These two lines may be collectively interpreted as: /etc/passwd is owned by
login-id root, group adm, is 5115 bytes long, was last modified on August 30
at 10:40 AM, and has one link (one name in the file system; we'll discuss links54 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
in the next section). Some versions of 1s give both owner and group in one
invocation.
The string -rw-r--r~- is how 1s represents the permissions on the file.
The first - indicates that it is an ordinary file. If it were a directory, there
would be a d there. The next three characters encode the file owner’s (based
on uid) read, write and execute permissions. rw- means that root (the
owner) may read or write, but not execute the file. An executable file would
have an x instead of a dash.
The next three characters (r--) encode group permissions, in this case that
people in group adm, presumably the system administrators, can read the file
but not write or execute it. The next three (also r--) define the permissions
for everyone else — the rest of the users on the system. On this machine,
then, only root can change the login information for a user, but anybody may
read the file to discover the information. A plausible alternative would be for
group adm to also have write permission on /etc/passwd.
The file /etc/group encodes group names and group-id’s, and defines
which users are in which groups. /etc/passwd identifies only your login
group; the newgrp command changes your group permissions to another
group
Anybody can say
$ ed /etc/passwa
and edit the password file, but only root can write back the changes. You
might therefore wonder how you can change your password, since that involves
editing the password file. The program to change passwords is called passwd;
you will probably find it in /bin:
$ 1s -1 /bin/passwd
-rwsr-xr-x 1 root 8454 Jan 4 1983 /bin/passwd
$
(Note that /etc/passwd is the text file containing the login information,
while /bin/passwd, in a different directory, is a file containing an executable
program that lets you change the password information.) The permissions here
state that anyone may execute the command, but only root can change the
passwd command. But the s instead of an x in the execute field for the file
owner states that, when the command is run, it is to be given the permissions
corresponding to the file owner, in this case root. Because /bin/passwa is
“set-uid” to root, any user can run the passwd command to edit the pass-
word file.
The set-uid bit is a simple but elegant ideat that solves a number of security
problems. For example, the author of a game program can make the program
set-uid to the owner, so that it can update a score file that is otherwise
+ The set-uid bit is patented by Dennis RitchieCHAPTER 2 THE FILE SYSTEM 55
protected from other users’ access. But the set-uid concept is potentially
dangerous. /bin/passwd has to be correct; if it were not, it could destroy
system information under root’s auspices. If it had the permissions
_rwsrwxxwx, it could be overwritten by any user, who could therefore replace
the file with a program that does anything. This is particularly serious for a
set-uid program, because root has access permissions to every file on the sys-
tem. (Some UNIX systems turn the set-uid bit off whenever a file is modified,
to reduce the danger of a security hole.)
The set-uid bit is powerful, but used primarily for a few system programs
such as passwd. Let's look at a more ordinary file.
$ 1s -1 /bin/who
-rwxrwxr-x 1 root 6348 Mar 29 1983 /bin/who
$s
who is executable by everybody, and writable by root and the owner’s group.
What “executable” means is this: when you type
$ who
to the shell, it looks in a set of directories, one of which is /bin, for a file
named “who.” If it finds such a file, and if the file has execute permission,
the shell calls the kernel to run it. The kernel checks the permissions, and, if
they are valid, runs the program. Note that a program is just a file with exe-
cute permission. In the next chapter we will show you programs that are just
text files, but that can be executed as commands because they have execute
permission set.
Directory permissions operate a little differently, but the basic idea is the
same.
$ 1s -ld.
drwxrwxr-x 3 you 80 Sep 27 06:11 .
$
The -d option of 1s asks it to tell you about the directory itself, rather than its
contents, and the leading d in the output signifies that ‘.’ is indeed a directory.
An x field means that you can read the directory, so you can find out what
files are in it with 1s (or od, for that matter). A w means that you can create
and delete files in this directory, because that requires modifying and therefore
writing the directory file.
Actually, you cannot simply write in a directory — even root is forbidden
to do so.
$ who >. Try to overwrite *.”
: cannot create You can't
$
Instead there are system calls that create and remove files, and only through
them is it possible to change the contents of a directory. The permissions idea,56 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
however, still applies: the w fields tell who can use the system routines to
modify the directory.
Permission to remove a file is independent of the file itself. If you have
write permission in a directory, you may remove files there, even files that are
protected against writing. The rm command asks for confirmation before
removing a protected file, however, to check that you really want to do so —
one of the rare occasions that a UNIX program double-checks your intentions.
(The ~£ flag to xm forces it to remove files without question.)
The x field in the permissions on a directory does not mean execution; it
means ‘“‘search.” Execute permission on a directory determines whether the
directory may be searched for a file. It is therefore possible to create a direc-
tory with mode --x for other users, implying that users may access any file
that they know about in that directory, but may not run 1s on it or read it to
see what files are there. Similarly, with directory permissions r--, users can
see (1s) but not use the contents of a directory. Some installations use this
device to turn off /usr/games during busy hours.
The chmod (change mode) command changes permissions on files.
$ chmod permissions filenames ..
The syntax of the permissions is clumsy, however. They can be specified in
two ways, either as octal numbers or by symbolic description. The octal
numbers are easier to use, although the symbolic descriptions are sometimes
convenient because they can specify relative changes in the permissions. It
would be nice if you could say
$ chmod rw-rw-rw- junk Doesn't work this way!
rather than
$ chmod 666 junk
but you cannot. The octal modes are specified by adding together a 4 for
read, 2 for write and 1 for execute permission. The three digits specify, as in
1s, permissions for the owner, group and everyone else. The symbolic codes
are difficult to explain; you must look in chmod(1) for a proper description.
For our purposes, it is sufficient to note that + turns a permission on and that
= turns it off. For example
$ chmod +x command
allows everyone to execute command, and
$ chmod -w file
turns off write permission for everyone, including the file’s owner. Except for
the usual disclaimer about super-users, only the owner of a file may change the
permissions on a file, regardless of the permissions themselves. Even if some-
body else allows you to write a file, the system will not allow you to change itsCHAPTER 2 THE FILE SYSTEM = 57
permission bits.
$ 1s -ld /usr/mary
drwxrwxrwx 5 mary 704 Sep 25 10:18 /usr/mary
$ chmod 444 /usr/mary
chmod: can’t change /usr/mary
$
If a directory is writable, however, people can remove files in it regardless of
the permissions on the files themselves. If you want to make sure that you or
your friends never delete files from a directory, remove write permission from
it:
$ cd
$ date >temp
$ chmod -w . Make directory unwritable
$ 1s -ld.
dr-xr-xr-x 3 you 80 Sep 27 11:48 .
$ rm temp
rm: temp not removed Can't remove file
$ chmod 775 . Restore permission
$ Is -ld.
arwxrwxr-x 3 you 80 Sep 27 11:48
$ xm temp
$ ‘Now you can
temp is now gone. Notice that changing the permissions on the directory
didn’t change its modification date. The modification date reflects changes to
the file’s contents, not its modes. The permissions and dates are not stored in
the file itself, but in a system structure called an index node, or i-node, the
subject of the next section.
Exercise 2-5. Experiment with chmod. Try different simple modes, like 0 and 1. Be
careful not to damage your login directory! ©
2.5 Inodes
A file has several components: a name, contents, and administrative infor-
mation such as permissions and modification times. The administrative infor-
mation is stored in the inode (over the years, the hyphen fell out of “‘i-node”),
along with essential system data such as how long it is, where on the disc the
contents of the file are stored, and so on.
There are three times in the inode: the time that the contents of the file
were last modified (written); the time that the file was last used (read or exe-
cuted); and the time that the inode itself was last changed, for example to set
the permissions58 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
$ date
Tue Sep 27 12:07:24 EDT 1983
$ date >junk
$ 1s -1 junk
-rw-rw-rw- 1 you 29 Sep 27 12:07 junk
$ 1s -lu junk
-rw-rw-rw- 1 you 29 Sep 27 06:11 junk
$ le -le junk
-rw-rw-rw- 1 you 29 Sep 27 12:07 junk
$
Changing the contents of a file does not affect its usage time, as reported by
1s -1u, and changing the permissions affects only the inode change time, as
reported by 1s -1c.
$ chmod 444 junk
$ Is -lu junk
-r r-- 1 you 29 Sep 27 06:11 junk
$ 1s -1c junk
r--r-- 1 you 29 Sep 27 12:11 junk
$ chmod 666 junk
$
The -t option to 1s, which sorts the files according to time, by default that
of last modification, can be combined with -c or ~u to report the order in
which inodes were changed or files were read:
$ 1s recipes
cookie
pie
$ 1s -lut
total 2
drwxrwxrwx 4 you 64 Sep 27 12:11 recipes
-rw-rw-rw- 1 you 29 Sep 27 06:11 junk
$
recipes is most recently used, because we just looked at its contents.
It is important to understand inodes, not only to appreciate the options on
1s, but because in a strong sense the inodes are the files. All the directory
hierarchy does is provide convenient names for files. The system’s internal
name for a file is its i-number: the number of the inode holding the file’s infor-
mation. 1s -i reports the i-number in decimal:
$ date >x
$ 1s -i
15768 junk
15274 recipes
15852 x
$
It is the i-number that is stored in the first two bytes of a directory, before theCHAPTER 2 THE FILE SYSTEM = 59.
name. od -d will dump the data in decimal by byte pairs rather than octal by
bytes and thus make the i-number visible.
$ od -c .
OG00008 Ag) | \0 NO) A010] NO \0..\0. 0, N01 N00 N02. 0)
0000020 273 ( . . \O0 \O0 \O \O \O \O \o \o \o \O \O XO
OUUOd f 6 0 Of fo OM OO WW
0000060 230 = j u nk \O \0 \O \O \o \0 \o \o \O XO
0000100 354 = x \O \O \O \O \O \O \o \o \o \o \o \0 \O
0000120
$ od -d
0000000 15156 00046 00000 00000 00000 00000 00000 o0000
0000020 10427 11822 00000 00000 00000 00000 00000 o0000
0000040 15274 25970 26979 25968 00115 00000 00000 00000
0000060 15768 30058 27502 00000 00000 00000 00000 00000
0000100 15852 00120 00000 00000 00000 00000 00000 00000
0000120
s
The first two bytes in each directory entry are the only connection between the
name of a file and its contents. A filename in a directory is therefore called a
link, because it links a name in the directory hierarchy to the inode, and hence
to the data. The same i-number can appear in more than one directory. The
xm command does not actually remove inodes; it removes directory entries or
links. Only when the last link to a file disappears does the system remove the
inode, and hence the file itself.
If the i-number in a directory entry is zero, it means that the link has been
removed, but not necessarily the contents of the file — there may still be a link
somewhere else. You can verify that the i-number goes to zero by removing
the file:
$ rm x
$ od -d .
0000000 15156 00046 00000 00000 00000 00000 00000 00000
0000020 10427 11822 00000 00000 00000 00000 00000 00000
0000040 15274 25970 26979 25968 00115 00000 00000 00000
0000060 15768 30058 27502 00000 00000 00000 00000 00000
0000100 00000 00120 00000 00000 00000 00000 00000 00000
0000120
$
The next file created in this directory will go into the unused slot, although it
will probably have a different i-number.
The 1n command makes a link to an existing file, with the syntax
$ In old-file new-file
The purpose of a link is to give two names to the same file, often so it can
appear in two different directories. On many systems there is a link to
/bin/ed called /bin/e, so that people can call the editor e. Two links to a60 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
file point to the same inode, and hence have the same i-number:
$ In junk linktojunk
$ Is -li
total 3
15768 -rw-rw-rw- 2 you 29 Sep 27 12:07 junk
15768 -rw-rw-rw- 2 you 29 Sep 27 12:07 linktojunk
15274 drwxrwxrwx 4 you 64 Sep 27 09:34 recipes
$
The integer printed between the permissions and the owner is the number of
links to the file. Because each link just points to the inode, each link is equally
important — there is no difference between the first link and subsequent ones.
(Notice that the total disc space computed by 1s is wrong because of double
counting.)
When you change a file, access to the file by any of its names will reveal
the changes, since all the links point to the same file.
$ echo x >junk
$ 1s -1
total 3
-rw-rw-rw- 2 you 2 Sep 27 12:37 junk
-rw-rw-rw- 2 you 2 Sep 27 12:37 linktojunk
drwxrwxrwx 4 you 64 Sep 27 09:34 recipes
$ rm linktojunk
$ 1s -1
total 2
-rw-rw-rw- 1 you 2 Sep 27 12:37 junk
drwxrwxrwx 4 you 64 Sep 27 09:34 recipes
$
After linktojunk is removed the link count goes back to one. As we said
before, rm’ing a file just breaks a link; the file remains until the last link is
removed. In practice, of course, most files only have one link, but again we
see a simple idea providing great flexibility.
A word to the hasty: once the last link to a file is gone, the data is irretriev-
able. Deleted files go into the incinerator, rather than the waste basket, and
there is no way to call them back from the ashes. (There is a faint hope of
resurrection. Most large UNIX systems have a formal backup procedure that
periodically copies changed files to some safe place like magnetic tape, from
which they can be retrieved. For your own protection and peace of mind, you
should know just how much backup is provided on your system. If there is
none, watch out — some mishap to the discs could be a catastrophe.)
Links to files are handy when two people wish to share a file, but some-
times you really want a separate copy — a different file with the same infor-
mation. You might copy a document before making extensive changes to it,
for example, so you can restore the original if you decide you don’t like the
changes. Making a link wouldn’t help, because when the data changed, bothCHAPTER 2
THE FILE SYSTEM 61
links would reflect the change. cp makes copies of files:
$ cp junk copyofjunk
$ Is -li
total 3
15850 -rw-rw-rw- 1 you
15768 -rw-rw-rw- 1 you
15274 drwxrwxrwx 4 you
s
2 Sep 27 13:13 copyof junk
2 Sep 27 12:37 junk
64 Sep 27 09:34 recipes
The i-numbers of junk and copyof junk are different, because they are dif-
ferent files, even though they currently have the same contents. It’s often a
good idea to change the permissions on a backup copy so it’s harder to remove
it accidentally.
$ chmod -w copyof junk
$ 1s -li
total 3
15850 -r--r--r-- 1 you
15768 -rw-rw-rw- 1 you
15274 drwxrwxxrwx 4 you
$ rm copyof junk
xm: copyofjunk 444 mode n
$ date >junk
$ Is -1i
a
15850 -r--r--r-- 1 you
15768 -rw-rw-rw- 1 you
15274 drwxrwxrwx 4 you
$ xm copyof junk
rm: copyofjunk 444 mode y
$ Is -li
total 2
15768 -rw-rw-rw- 1 you
15274 drwxrwxrwx 4 you
$
Turn off write permission
2 Sep 27 13:13 copyofjunk
2 Sep 27 12:37 junk
64 Sep 27 09:34 recipes
No! It's precious
2 Sep 27 13:13 copyof junk
29 Sep 27 13:16 junk
64 Sep 27 09:34 recipes
Well, maybe not so precious
29 Sep 27 13:16 junk
64 Sep 27 09:34 recipes
Changing the copy of a file doesn’t change the original, and removing the copy
has no effect on the original. Notice that because copyof junk had write per-
mission turned off, rm asked for confirmation before removing the file.
There is one more common command for manipulating files: mv moves or
renames files, simply by rearranging the links. Its syntax is the same as cp
and 1n:62 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
$ mv junk sameoldjunk
$ Is -li
total 2
15274 drwxrwxrwx 4 you 64 Sep 27 09:34 recipes
15768 -rw-rw-rw- 1 you 29 Sep 27 13:16 sameoldjunk
$
sameoldjunk is the same file as our old junk, right down to the i-number;
only its name — the directory entry associated with inode 15768 — has been
changed.
We have been doing all this file shuffling in one directory, but it also works
across directories. 1n is often used to put links with the same name in several
directories, such as when several people are working on one program or docu-
ment. mv can move a file or directory from one directory to another. In fact,
these are common enough idioms that mv and cp have special syntax for them:
$ mv (or ep) filel file2 ... directory
moves (or copies) one or more files to the directory which is the last argument.
The links or copies are made with the same filenames. For example, if you
wanted to try your hand at beefing up the editor, you might begin by saying
$ cp /usr/src/cmd/ed.c .
to get your own copy of the source to play with. If you were going to work on
the shell, which is in a number of different source files, you would say
$ mkdir sh
$ cp /usr/src/cmd/sh/* sh
and cp would duplicate all of the shell’s source files in your subdirectory sh
(assuming no subdirectory structure in /usr/src/cmd/sh — cp is not very
clever). On some systems, 1n also accepts multiple file arguments, again with
a directory as the last argument. And on some systems, mv, cp and 1n are
themselves links to a single file that examines its name to see what service to
perform.
Exercise 2-6. Why does 1s -1 report 4 links to recipes? Hint: try
: $ 1s -ld /usr/you
Why is this useful information? ©
Exercise 2-7. What is the difference between
$ mv junk junk?
and
$ cp junk junk?
$ xm junk
Hint: make a link to junk, then try it. ©CHAPTER 2 THE FILE SYSTEM = 63
Exercise 2-8. cp doesn’t copy subdirectories, it just copies files at the first level of a
hierarchy. What does it do if one of the argument files is a directory? Is this kind or
even sensible? Discuss the relative merits of three possibilities: an option to cp to des-
cend directories, a separate command rep (recursive copy) to do the job, or just having
cp copy a directory recursively when it finds one. See Chapter 7 for help on providing
this facility. What other programs would profit from the ability to traverse the directory
tree? 0
2.6 The directory hierarchy
In Chapter 1, we looked at the file system hierarchy rather informally,
starting from /usr/you. We're now going to investigate it in a more orderly
way, starting from the top of the tree, the root.
The top directory is 7.
$ is 7
bin
boot
dev
ete
1ib
tmp
unix
usr
$
/unix is the program for the UNIX kernel itself: when the system starts,
/anix is read from disc into memory and started. Actually, the process
occurs in two steps: first the file /boot is read; it then reads in /unix. More
information about this “bootstrap” process may be found in boot(8). The rest
of the files in /, at least here, are directories, each a somewhat self-contained
section of the total file system. In the following brief tour of the hierarchy,
play along with the text: explore a bit in the directories mentioned. The more
familiar you are with the layout of the file system, the more effectively you
will be able to use it. Table 2.1 suggests good places to look, although some of
the names are system dependent.
/bin (binaries) we have seen before: it is the directory where the basic
programs such as who and ed reside.
/dev (devices) we will discuss in the next section.
/ete (et cetera) we have also seen before. It contains various administra-
tive files such as the password file and some system programs such as
/etc/getty, which initializes a terminal connection for /bin/login.
/etc/re is a file of shell commands that is executed after the system is
bootstrapped. /etc/group lists the members of each group.
/1ib (library) contains primarily parts of the C compiler, such as
/1ib/cpp, the C preprocessor, and /1ib/libe.a, the C subroutine library.
/tmp (temporaries) is a repository for short-lived files created during the64 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
7,
/bin
/dev
fete
/ete/motd
/etc/passwd
/lib
/tmp
/unix
/usr
/usr/adm
/usr/bin
/usr/dict
/usx/games
/usr/include
/usr/include/sys
/usr/lib
/usr/man
/usr/man/man1
/usr/mdec
/usr/news
/usr/pub
/asr/sre
/usr/src/cmd
/usr/src/lib
/usx/spool
/usx/spool/1pd
/usx/spool/mail
/usr/spool/uucp
/usr/sys
7usr/tmp
/usr/you
/usr/you/bin
Table 2.1: Interesting Directories (see also hier(7))
root of the file system
essential programs in executable form (“‘binaries”’)
device files
system miscellany
login message of the day
password file
essential libraries, etc.
temporary files; cleaned when system is restarted
executable form of the operating system
user file system.
system administration: accounting info., etc.
user binaries: troff, etc.
dictionary (words) and support for spe11(1)
game programs
header files for C programs, e.g. math.h
system header files for C programs, e.g. inode.
libraries for C, FORTRAN, ete.
on-line manual
manual pages for section 1 of manual
hardware diagnostics, bootstrap programs, etc.
community service messages
public oddments: see ascii(7) and eqnchar(7)
source code for utilities and libraries
source for commands in /bin and /usr/bin
source code for subroutine libraries
working directories for communications programs
line printer temporary directory
mail in-boxes
working directory for the uucp programs
source for the operating system kernel
alternate temporary directory (little used)
your login directory
your personal programs
execution of a program.
When you start up the editor ed, for example, it
creates a file with a name like /tmp/e00512 to hold its copy of the file you
are editing, rather than working with the original file. It could, of course,
create the file in your current directory, but there are advantages to placing it
in /tmp: although it is unlikely, you might already have a file called e00512
in your directory; /tmp is cleaned up automatically when the system starts, so
your directory doesn’t get an unwanted file if the system crashes; and often
/tmp is arranged on the disc for fast access.CHAPTER 2 THE FILE SYSTEM = 65.
There is a problem, of course, when several programs create files in /tmp
at once: they might interfere with each other’s files. That is why ed’s tem-
porary file has a peculiar name: it is constructed in such a way as to guarantee
that no other program will choose the same name for its temporary file. In
Chapters 5 and 6 we will see ways to do this.
/usr is called the “user file system,” although it may have little to do with
the actual users of the system. On our machine, our login directories are
/usr/bwk and /usr/rob, but on your machine the /usx part might be dif-
ferent, as explained in Chapter 1. Whether or not your personal files are in a
subdirectory of /usr, there are a number of things you are likely to find there
(although local customs vary in this regard, too). Just as in /, there are direc-
tories called /usr/bin, /usr/lib and /usr/tmp. These directories have
functions similar to their namesakes in /, but contain programs less critical to
the system. For example, nroff is usually in /usr/bin rather than /bin,
and the FORTRAN compiler libraries live in /usr/1ib. Of course, just what
is deemed “critical” varies from system to system. Some systems, such as the
distributed 7th Edition, have all the programs in /bin and do away with
/usr/bin altogether; others split /usr/bin into two directories according to
frequency of use.
Other directories in /usr are /usr/adm, containing accounting informa-
tion and /usr/dict, which holds a modest dictionary (see spel1(1)). The
on-line manual is kept in /usr/man — see /usr/man/man1/spell.1, for
example. If your system has source code on-line, you will probably find it in
/usr/sre.
It is worth spending a little time exploring the file system, especially /usr,
to develop a feeling for how the file system is organized and where you might
expect to find things.
2.7 Devices
We skipped over /dev in our tour, because the files there provide a nice
review of files in general. As you might guess from the name, /dev contains
device files. ‘
One of the prettiest ideas in the UNIX system is the way it deals with peri-
pherals — discs, tape drives, line printers, terminals, etc. Rather than having
special system routines to, for example, read magnetic tape, there is a file
called /dev/mt0 (again, local customs vary). Inside the kernel, references to
that file are converted into hardware commands to access the tape, so if a pro-
gram reads /dev/mt0, the contents of a tape mounted on the drive are
returned. For example,
$ cp /dev/mtO junk
copies the contents of the tape to a file called junk. cp has no idea there is
anything special about /dev/mt0; it is just a file —- a sequence of bytes.66 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
The device files are something of a zoo, each creature a little different, but
the basic ideas of the file system apply to each. Here is a significantly shor-
tened list of our /dev:
$ 1s -1 /dev
crw--w--w- 1 root 0, O Sep 27 23:09 console
crw-r- 1 root 3, 1 Sep 27 14:37 kmem
erw-r, 1 root 3, O May 6 1981 mem
brw-rw-rw- 1 root 1, 64 Aug 24 17:41 mto
erw-rw-rw- 1 root 3, 2 Sep 28 02:03 null
crw-rw-rw- 1 root 4, 64 Sep 9 15:42 rmtO
brw-r. 1 root 2, O Sep 8 08:07 rp0Dd
1 root 2, 1 Sep 27 23:09 rp01
1 root 13, 0 Apr 12 1983 rrp00
BE ano 13, 1 Jul 28 15:18 rrp01
crw-rw-rw- 1 root 2, 0 Jul 5 08:04 tty
crw- 1 you 1, 0 Sep 28 02:38 tty0
crw- 1 root 1, 1 Sep 27 23:09 tty1
crw- 1 root 1, 2 Sep 27 17:33 tty2
erw- 1 root 1, 3 Sep 27 18:48 tty3
$s
The first things to notice are that instead of a byte count there is a pair of
small integers, and that the first character of the mode is always a ‘b’ or a ‘c’.
This is how 1s prints the information from an inode that specifies a device
rather than a regular file. The inode of a regular file contains a list of disc
blocks that store the file’s contents. For a device file, the inode instead con-
tains the internal name for the device, which consists of its type — character
(c) or block (b) — and a pair of numbers, called the major and minor device
numbers. Discs and tapes are block devices; everything else — terminals,
printers, phone lines, etc. — is a character device. The major number encodes
the type of device, while the minor number distinguishes different instances of
the device. For example, /dev/tty0 and /dev/tty1 are two ports on the
same terminal controller, so they have the same major device number but dif-
ferent minor numbers.
Disc files are usually named after the particular hardware variant they
represent. /dev/rp00 and /dev/rp01 are named after the DEC RPO6 disc
drive attached to the system. There is just one drive, divided logically into two
file systems. If there were a second drive, its associated files would be named
/dev/rp10 and /dev/rp11. The first digit specifies the physical drive, and
the second which portion of the drive.
You might wonder why there are several disc device files, instead of just
one. For historical reasons and for ease of maintenance, the file system is
divided into smaller subsystems. The files in a subsystem are accessible
through a directory in the main system. The program /etc/mount reports
the correspondence between device files and directories:CHAPTER 2 THE FILE SYSTEM = 67
$ /etc/mount
xp01 on /usr
$
In our case, the root system occupies /dev/rp00 (although this isn’t reported
by /etc/mount) while the user file system — the files in /usr and its sub-
directories — reside on /dev/rp01.
The root file system has to be present for the system to execute. /bin,
/dev and /etc are always kept on the root system, because when the system
starts only files in the root system are accessible, and some files such as
/bin/sh are needed to run at all. During the bootstrap operation, all the file
systems are checked for self-consistency (see icheck(8) or fsck(8)), and
attached to the root system. This attachment operation is called mounting, the
software equivalent of mounting a new disc pack in a drive; it can normally be
done only by the super-user. After /dev/rp01 has been mounted as /usr,
the files in the user file system are accessible exactly as if they were part of the
root system.
For the average user, the details of which file subsystem is mounted where
are of little interest, but there are a couple of relevant points. First, because
the subsystems may be mounted and dismounted, it is illegal to make a link to
a file in another subsystem. For example, it is impossible to link programs in
/bin to convenient names in private bin directories, because /usr is in a dif-
ferent file subsystem from /bin:
$ In /bin/mail /usr/you/bin/m
In: Cross-device link
$
There would also be a problem because inode numbers are not unique in dif-
ferent file systems.
Second, each subsystem has fixed upper limits on size (number of blocks
available for files) and inodes. If a subsystem fills up, it will be impossible to
enlarge files in that subsystem until some space is reclaimed. The df (disc
free space) command reports the available space on the mounted file subsys-
tems:
$ df
/dev/rp00 1989
/dev/rp01 21257
$
/usr has 21257 free blocks. Whether this is ample space or a crisis depends
on how the system is used; some installations need more file space headroom
than others. By the way, of all the commands, df probably has the widest
variation in output format. Your df output may look quite different.
Let’s turn now to some more generally useful things. When you log in, you
get a terminal line and therefore a file in /dev through which the characters68 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
you type and receive are sent. The tty command tells you which terminal you
are using:
$ who am i
you tty0 Sep 28 01:02
$ tty
/dev/ttyd
$ 1s -1 /dev/tty0
crw--w--w- 1 you 1, 12 Sep 28 02:40 /dew/tty0
$ date >/dev/ttyo
Wed Sep 28 02:40:51 EDT 1983
$s
Notice that you own the device, and that only you are permitted to read it. In
other words, no one else can directly read the characters you are typing. Any-
one may write on your terminal, however. To prevent this, you could chmod
the device, thereby preventing people from using write to contact you, or you
could just use mesg.
$ mesg n Turn off messages
$ 1s -1 /dev/ttyo
erw------- 1 you 1, 12 Sep 28 02:41 /dew/tty0
$ mesg y Restore
$
It is often useful to be able to refer by name to the terminal you are using,
but it’s inconvenient to determine which one it is. The device /dew/tty is a
synonym for your login terminal, whatever terminal you are actually using.
$ date >/dev/tty
Wed Sep 28 02:42:23 EDT 1983
$
/dev/tty is particularly useful when a program needs to interact with a user
even though its standard input and output are connected to files rather than the
terminal. crypt is one program that uses /dev/tty. The “clear text”
comes from the standard input, and the encrypted data goes to the standard
output, so crypt reads the encryption key from /dev/tty:
$ crypt cryptedtext
Enter key: Type encryption key
$
The use of /dev/tty isn’t explicit in this example, but it is there. If crypt
read the key from the standard input, it would read the first line of the clear
text. So instead crypt opens /dev/tty, turns off automatic character echo-
ing so your encryption key doesn’t appear on the screen, and reads the key. In
Chapters 5 and 6 we will come across several other uses of /dev/tty.
Occasionally you want to run a program but don’t care what output is pro-
duced. For example, you may have already seen today’s news, and don’t wantCHAPTER 2 THE FILE SYSTEM 69
to read it again. Redirecting news to the file /dev/nul1 causes its output to
be thrown away:
$ news >/dev/null
$
Data written to /dev/null is discarded without comment, while programs
that read from /dev/null get end-of-file immediately, because reads from
7dev/null always return zero bytes.
One common use of /dev/null is to throw away regular output so that
diagnostic messages are visible. For example, the time command (time(1))
reports the CPU usage of a program. The information is printed on the stan-
dard error, so you can time commands that generate copious output by sending
the standard output to /dev/null:
$ Is -1 /usr/dict/words
r-- 1 bin 196513 Jan 20 1979 /usr/dict/words
$ time grep e /usr/dict/words >/dev/null
real 4
user
sys
$ time egrep e /usr/dict/words >/dev/null
yoo
uo}
real 8.0
user 3.9
sys 2.8
$
The numbers in the output of time are elapsed clock time, CPU time spent in
the program and CPU time spent in the kernel while the program was running.
egrep is a high-powered variant of grep that we will discuss in Chapter 4; it’s
about twice as fast as grep when searching through large files. If output from
grep and egrep had not been sent to /dev/null or a real file, we would
have had to wait for hundreds of thousands of characters to appear on the ter-
minal before finding out the timing information we were after.
Exercise 2-9. Find out about the other files in /dev by reading Section 4 of the
manual. What is the difference between /dev/mt0 and /dev/rmt0? Comment on
the potential advantages of having subdirectories in /dev for discs, tapes, ete.
Exercise 2-10. Tapes written on non-UNIX systems often have different block sizes, such
as 800 bytes — ten 80-character card images — but the tape device /dev/mt0 expects
512-byte blocks. Look up the dd command (aa(1)) to see how to read such a tape. 0
Exercise 2-11. Why isn’t /dev/tty just a link to your login terminal? What would
happen if it were mode rw--w--w- like your login terminal? 0
Exercise 2-12. How does write(1) work? Hint: see utmp(5). 0
Exercise 2-13. How can you tell if a user has been active at the terminal recently? 070 ‘THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 2
History and bibliographic notes
The file system forms one part of the discussion in “UNIX implementation,”
by Ken Thompson (BSTJ, July, 1978). A paper by Dennis Ritchie, entitled
“The evolution of the UNIX time-sharing system” (Symposium on Language
Design and Programming Methodology, Sydney, Australia, Sept. 1979) is an
fascinating description of how the file system was designed and implemented
on the original PDP-7 UNIX system, and how it grew into its present form.
The UNIX file system adapts some ideas from the MULTICS file system. The
MULTICS System: An Examination of its Structure, by E. 1. Organick (MIT
Press, 1972) provides a comprehensive treatment of MULTICS.
“Password security: a case history,” by Bob Morris and Ken Thompson, is
an entertaining comparison of password mechanisms on a variety of systems; it
can be found in Volume 2B of the unix Programmer's Manual.
In the same volume, the paper “On the security of UNIX,” by Dennis
Ritchie, explains how the security of a system depends more on the care taken
with its administration than with the details of programs like crypt.CHAPTER 3: USING THE SHELL
The shell — the program that interprets your requests to run programs — is
the most important program for most UNIX users; with the possible exception of
your favorite text editor, you will spend more time working with the shell than
any other program. In this chapter and in Chapter 5, we will spend a fair
amount of time on the shell’s capabilities. The main point we want to make is
that you can accomplish a lot without much hard work, and certainly without
resorting to programming in a conventional language like C, if you know how
to use the shell.
We have divided our coverage of the shell into two chapters. This chapter
goes one step beyond the necessities covered in Chapter 1 to some fancier but
commonly used shell features, such as metacharacters, quoting, creating new
commands, passing arguments to them, the use of shell variables, and some
elementary control flow. These are topics you should know for your own use
of the shell. The material in Chapter 5 is heavier going — it is intended for
writing serious shell programs, ones that are bullet-proofed for use by others.
The division between the two chapters is somewhat arbitrary, of course, so
both should be read eventually.
3.1 Command line structure
To proceed, we need a slightly better understanding of just what a com-
mand is, and how it is interpreted by the shell. This section is a more formal
coverage, with some new information, of the shell basics introduced in the first
chapter.
The simplest command is a single word, usually naming a file for execution
(later we will see some other types of commands):
$ who Execute the file bin/who
you tty2 Sep 28 0
jpl tty4 Sep 28 0
$
A command usually ends with a newline, but a semicolon ; is also a command
terminator:
a72 ‘THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 3.
$ date;
Wed Sep 28 09:07:15 EDT 1983
$ date; who
Wed Sep 28 09:07:23 EDT 1983
you tty2 Sep 28 07:51
Spl tty4 Sep 28 08:32
$
Although semicolons can be used to terminate commands, as usual nothing
happens until you type RETURN. Notice that the shell only prints one prompt
after multiple commands, but except for the prompt,
$ date; who
is identical to typing the two commands on different lines. In particular, who
doesn’t run until date has finished.
Try sending the output of “date; who” through a pipe:
$ date; who / we
Wed Sep 28 09:08:48 EDT 1983
2 10 60
$
This might not be what you expected, because only the output of who goes to
we. Connecting who and we with a pipe forms a single command, called a
pipeline, that runs after date. The precedence of | is higher than that of ‘;’
as the shell parses your command line.
Parentheses can be used to group commands:
$ (date; who)
Wed Sep 28 09:11:09 EDT 1983
you tty2 Sep 28 07:51
jpl tty4 Sep 28 0:
$ (date; who) | we
3 16 89
$
The outputs of date and who are concatenated into a single stream that can be
sent down a pipe.
Data flowing through a pipe can be tapped and placed in a file (but not
another pipe) with the tee command, which is not part of the shell, but is
nonetheless handy for manipulating pipes. One use is to save intermediate out-
put in a file:CHAPTER 3 USING THE SHELL = 73
$ (date; who) / tee save | we
3 16 89 Output from we
$ cat save
Wed Sep 28 09:13:22 EDT 1983
you tty2 Sep 28 07:51
jpl tty4 Sep 28 08:32
$ we , 1, 5
and &, are not arguments to the programs the shell runs. They instead control
how the shell runs them. For example,
$ echo Hello >junk
tells the shell to run echo with the single argument Hello, and place the out-
put in the file junk. The string >junk is not an argument to echo; it is
interpreted by the shell and never seen by echo. In fact, it need not be the
last string in the command:
$ >junk echo Hello
is identical, but less obvious.
Exercise 3-1. What are the differences among the following three commands?
$ cat file | pr
$ pr file direct standard output to file
prog >>file append standard output to file
prog
run p;; if unsuccessful, run p>
In this last example, because the quotes are discarded after they’ve done their
job, echo sees a single argument containing no quotes.
Quoted strings can contain newlines:
$ echo
“hello
> world’
hello
world
$
The string ‘> ’ is a secondary prompt printed by the shell when it expects you
to type more input to complete a command. In this example the quote on the
first line has to be balanced with another. The secondary prompt string is
stored in the shell variable PS2, and can be modified to taste.
In all of these examples, the quoting of a metacharacter prevents the shell
from trying to interpret it. The commandCHAPTER 3 USING THE SHELL 77
$ echo x+y
echoes all the filenames beginning x and ending y. As always, echo knows
nothing about files or shell metacharacters; the interpretation of *, if any, is
supplied by the shell.
What happens if no files match the pattern? The shell, rather than com-
plaining (as it did in early versions), passes the string on as though it had been
quoted. It’s usually a bad idea to depend on this behavior, but it can be
exploited to learn of the existence of files matching a pattern:
$ Is x*y
x*y not found Message from 1s: no such files exist
$ >xyzzy Create xy2zy
$ 1s x+y
xyz2y File xyzzy matches x+y
$ 1s ‘xey’
xy not found Ls doesn’t interpret the *
$
A backslash at the end of a line causes the line to be continued; this is the
way to present a very long line to the shell.
$ echo abc\
> def\
> ghi
abcdefghi
$
Notice that the newline is discarded when preceded by backslash, but is
retained when it appears in quotes.
The metacharacter # is almost universally used for shell comments; if a
shell word begins with #, the rest of the line is ignored:
$ echo hello # there
hello
$ echo hello#there
hello#there
$
The # was not part of the original 7th Edition, but it has been adopted very
widely, and we will use it in the rest of the book.
Exercise 3-2. Explain the output produced by
Sls +
a
A digression on echo
Even though it isn’t explicitly asked for, a final newline is provided by
echo. A sensible and perhaps cleaner design for echo would be to print only78 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 3
what is requested. This would make it easy to issue prompts from the shell:
$ pure-echo Enter a command:
Enter a command:$ No trailing newline
but has the disadvantage that the most common case — providing a newline —
is not the default and takes extra typing:
$ pure-echo ‘Hello!
>
Hello!
$
Since a command should by default execute its most commonly used function,
the real echo appends the final newline automatically.
But what if it isn’t desired? The 7th Edition echo has a single option, -n,
to suppress the last newline:
$ echo -n Enter a command:
Enter a command:$ Prompt on same line
$ echo -
- Only -n is special
s
The only tricky case is echoing -n followed by a newline:
$ echo -n ’-n
ao
-n
$
It’s ugly, but it works, and this is a rare situation anyway.
A different approach, taken in System V, is for echo to interpret C-like
backslash sequences, such as \b for backspace and \c (which isn’t actually in
the C language) to suppress the last newline:
$ echo ‘Enter a command: \c’ System V version
Enter a command: $
Although this mechanism avoids confusion about echoing a minus sign, it has
other problems. echo is often used as a diagnostic aid, and backslashes are
interpreted by so many programs that having echo look at them too just adds
to the confusion.
Still, both designs of echo have good and bad points. We shall use the 7th
Edition version (-n), so if your local echo obeys a different convention, a
couple of our programs will need minor revision.
Another question of philosophy is what echo should do if given no argu-
ments — specifically, should it print a blank line or nothing at all? All the
current echo implementations we know print a blank line, but past versions
didn’t, and there wete once great debates on the subject. Doug Mcllroy
imparted the right feelings of mysticism in his discussion of the topic:CHAPTER 3 USING THE SHELL 79
‘The UNIX and the Echo
There dwelt in the land of New Jersey the UNIX, a fair maid whom savants traveled far to
admire. Dazzled by her purity, all sought to espouse her, one for her virginal grace, another for
her polished civility, yet another for her agility in performing exacting tasks seldom accomplished
even in much richer lands. So large of heart and accommodating of nature was she that the UNIX
adopted all but the most insufferably rich of her suitors. Soon many offspring grew and prospered
and spread to the ends of the earth.
Nature herself smiled and answered to the UNIX more eagerly than to other mortal beings.
Humbler folk, who knew little of more courtly manners, delighted in her echo, so precise and crys-
tal clear they scarce believed she could be answered by the same rocks and woods that so garbled
their own shouts into the wilderness. And the compliant uNIx obliged with perfect echoes of what-
ever she was asked
When one impatient swain asked the UNIX, ‘Echo nothing,’ the UNIX obligingly opened her
mouth, echoed nothing, and closed it again.
“Whatever do you mean,’ the youth demanded, ‘opening your mouth like that? Henceforth
never open your mouth when you are supposed to echo nothing!’ And the UNIX obliged.
“But I want a perfect performance, even when you echo nothing,” pleaded a sensitive youth,
‘and no perfect echoes can come from a closed mouth.’ Not wishing (o offend either one, the UNIX
agreed to say different nothings for the impatient youth and for the sensitive youth. She called the
sensitive nothing ‘\n.°
Yet now when she said ‘\n," she was really not saying nothing so she had to open her mouth
twice, once to say ‘wn,’ and once to say nothing, and so she did not please the sensitive youth, who
said forthwith, “The \n sounds like a perfect nothing to me, but the second one ruins it. I want you
to take back one of them.’ So the UNIX, who could not abide offending, agreed to undo some
echoes, and called that ‘\c.’ Now the sensitive youth could hear a perfect echo of nothing by asking
for ‘\n’ and ‘\c’ together. But they say that he died of a surfeit of notation before he ever heard
one.
Exercise 3-3. Predict what each of the following grep commands will do, then verify
your understanding.
grep \$ grep \\
grep \\$ grep \\\\
grep \\\$ grep "\s"
grep ‘\s’ grep ‘"$’
grep ‘\’s’ grep "$"
A file containing these commands themselves makes a good test case if you want to
experiment. 0
Exercise 3-4. How do you tell grep to search for a pattern beginning with a ‘-"?. Why
doesn’t quoting the argument help? Hint: investigate the -e option. 0
Exercise 3-5. Consider
$ echo +/+
Does this produce all names in all directories? In what order do the names appear? 0
Exercise 3-6. (Trick question) How do you get a / into a filename (i.e., a / that
doesn’t separate components of the path)? 9
Exercise 3-7. What happens with80 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 3
$ cat xy >y
and with
$ cat x >>x
Think before rushing off to try them. ©
Exercise 3-8. If you type
$ rm *
why can’t rm warn you that you're about to delete all your files? 0
3.3 Creating new commands
It’s now time to move on to something that we promised in Chapter 1 —
how to create new commands out of old ones.
Given a sequence of commands that is to be repeated more than a few
times, it would be convenient to make it into a “new” command with its own
name, so you can use it like a regular command. To be specific, suppose you
intend to count users frequently with the pipeline
$ who ! we -1
that was mentioned in Chapter 1, and you want to make a new program nu to
do that.
The first step is to create an ordinary file that contains ‘who } we -1’.
You can use a favorite editor, or you can get creative:
$ echo ‘who / we -1’ >nu
(Without the quotes, what would appear in nu?)
As we said in Chapter 1, the shell is a program just like an editor or who or
we; its name is sh. And since it’s a program, you can run it and redirect its
input. So run the shell with its input coming from the file nu instead of the
terminal:
$ who
you tty2 Sep 28 07:51
xhh tty4 Sep 28 10:02
moh ttyS Sep 28 09:38
ava tty6 Sep 28 10:17
$ cat nu
who } we -1
$ sh cx Create cx originally
$ sh cx cx Make cx itself executable
$ echo echo Hi, there! >hello Make a test program
$ hello Try it
hello: cannot execute
$ cx hello Make it executable
$ hello Try again
Hi, there! It works
$ mv cx /usr/you/bin Install cx
$ xm hello Clean up
$
Notice that we said
$ sh cx cx
exactly as the shell would have automatically done if cx were already execut-
able and we typed
$ cx cx
What if you want to handle more than one argument, for example to make
a program like cx handle several files at once? A crude first cut is to put nine
arguments into the shell program, as in
chmod +x $1 $2 $3 $4 $5 $6 $7 $8 $9
(it only works up to $9, because the string $10 is parsed as “first argument,
$1, followed by a 0”!) If the user of this shell file provides fewer than nine
arguments, the missing ones are null strings; the effect is that only the argu-
ments that were actually provided are passed to chmod by the sub-shell. So
this implementation works, but it’s obviously unclean, and it fails if more than
nine arguments are provided.
Anticipating this problem, the shell provides a shorthand $* that means “all
the arguments.” The proper way to define cx, then, is
chmod +x $*
which works regardless of how many arguments are provided.
With $* added to your repertoire, you can make some convenient shell
files, such as 1¢ or m:
$ cd /usr/you/bin
$ cat le
# 1c: count number of lines in files
we -1l $*
$ cat m
# m: a concise way to type mail
mail $*
$
Both can sensibly be used without arguments. If there are no arguments, $*84 THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 3
will be null, and no arguments at all will be passed to we or mail. With or
without arguments, the command is invoked properly:
$ le /usr/you/bin/+
1 /usr/you/bin/cx
/asr/you/bin/1e
/asr/you/bin/m
/usr/you/bin/nu
/usr/you/bin/what
/usr/you/bin/where
total
$ 1s /usr/you/bin | le
6
W2NaNNn
$
These commands and the others in this chapter are examples of personal
programs, the sort of things you write for yourself and put in your bin, but
are unlikely to make publicly available because they are too dependent on per-
sonal taste. In Chapter 5 we will address the issues of writing shell programs
suitable for public use.
The arguments to a shell file need not be filenames. For example, consider
searching a personal telephone directory. If you have a file named
/usr/you/1ib/phone-book that contains lines like
dial-a-joke 212-976-3838
dial-a-prayer 212-246-4200
dial santa 212-976-3636
dow jones report 212-976-4141
then the grep command can be used to search it. (Your own Lib directory is
a good place to store such personal data bases.) Since grep doesn’t care about
the format of information, you can search for names, addresses, zip codes or
anything else that you like. Let’s make a directory assistance program, which
we'll call 411 in honor of the telephone directory assistance number where we
live:
$ echo “grep $* /usr/you/lib/phone-book’ >411
$ cx 417
$ 411 joke
@ial-a-joke 212-976-3838
$ 411 dial
dial-a-joke 212-976-3838
dial-a-prayer 212-246-4200
dial santa 212-976-3636
$ 411 ‘dow jones’
grep: can’t open jones Something is wrong
$
The final example is included to show a potential problem: even though dow
jones is presented to 411 as a single argument, it contains a space and is noCHAPTER 3 USING THE SHELL 85
longer in quotes, so the sub-shell interpreting the 411 command converts it
into two arguments to grep: it’s as if you had typed
$ grep dow jones /usr/you/lib/phone-book
and that’s obviously wrong.
‘One remedy relies on the way the shell treats double quotes. Although
anything quoted with ’...’ is inviolate, the shell looks inside "..." for $’s, \’s,
and *...°’s. So if you revise 411 to look like
grep “$*" /usr/you/1ib/phone-book
the $* will be replaced by the arguments, but it will be passed to grep as a
single argument even if it contains spaces.
$ 411 dow jones
dow jones report 212-976-4141
$
By the way, you can make grep (and thus 411) case-independent with the
-y option:
$ grep -y pattern .
with -y, lower case letters in pattern will also match upper case letters in the
input. (This option is in 7th Edition grep, but is absent from some other sys-
tems.)
There are fine points about command arguments that we are skipping over
until Chapter 5, but one is worth noting here. The argument $0 is the name
of the program being executed — in cx, $0 is “‘cx.” A novel use of $0 is in
the implementation of the programs 2, 3, 4, ..., which print their output in
that many columns:
$ who / 2
arh ttyO Sep 28 21:23 cw ttyS Sep 28 21:09
amr tty6 Sep 28 22:10 scj tty7 Sep 28 22:11
you tty9 Sep 28 23:00 jib ttyb Sep 28 19:58
$
The implementations of 2, 3, ... are identical; in fact they are links to the
same file:
$ In 2 3; In 2 4; In 25; In26
$ Is -1i [1-9]
16722 -rwxrwxrwx 5 you 51 Sep 28 23:21 2
16722 -rwxrwxrwx 5 you 51 Sep 28 23:21 3
16722 -rwxrwxrwx 5 you 51 Sep 28 23:21 4
16722 -rwxrwxrwx 5 you 51 Sep 28 23:21 5
16722 -rwxrwxrwx 5 you 51 Sep 28 23:21 686 © THE UNIX PROGRAMMING ENVIRONMENT CHAPTER 3
$ 1s /usr/you/bin | 5
2 3 4 at 5
6 ox le m nu
what where
$ cat 5
#2, 3, : print in n columns
pr -$0 -t -11 $#
$
The -t option turns off the heading at the top of the page and the -1n option
sets the page length to lines. The name of the program becomes the
number-of-columns argument to pr, so the output is printed a row at a time in
the number of columns specified by $0.
3.5 Program output as arguments
Let us turn now from command arguments within a shell file to the genera-
tion of arguments. Certainly filename expansion from metacharacters like * is
the most common way to generate arguments (other than by providing them
explicitly), but another good way is by running a program. The output of any
program can be placed in a command line by enclosing the invocation in back-
quotes *...*:
$ echo At the tone the time will be ‘date’.
At the tone the time will be Thu Sep 29 00:02:15 EDT 1983.
$
A small change illustrates that *...* is interpreted it
side double quotes "
$ echo "At the tone
> the time will be ‘date’."
At the tone
the time will be Thu Sep 29 00:03:07 EDT 1983.
$
As another example, suppose you want to send mail to a list of people
whose login names are in the file mailinglist. A clumsy way to handle this
is to edit mailinglist into a suitable mail command and present it to the
shell, but it’s far easier to say
$ mail ‘cat mailinglist‘
dir=/usr/you/bin
$ echo $dir
/usr/you/bin
$
The value of a variable is associated with the shell that creates it, and is not
automatically passed to the shell’s children.
$ x=Hello Create x
$ sh New shell
$ echo $x
Newline only: x undefined in the sub-shell
ctl-d Leave this shell
Back in original shell
$
$
$ echo $x
Hello x still defined
$
This mear.s that a shell file cannot change the value of a variable, because the
shell file is run by a sub-shell:
$ echo ’x="Good Bye" Make a two-line shell file ...
> echo $x’ >setx ++ 0 set and print x
$ cat setx
x="Good Bye"
echo $x
$ echo $x
Hello x is He11o in original shell
$ sh setx
Good Bye x is Good Bye in sub-shell...
$ echo $x
Hello ---but still He11o in this shell
$
There are times when using a shell file to change shell variables would be
useful, however. An obvious example is a file to add a new directory to your
PATH. The shell therefore provides a command ‘.’ (dot) that executes the
commands in a file in the current shell, rather than in a sub-shell. This was
originally invented so people could conveniently re-execute their . profile
files without having to log in again, but it has other uses: