SAS Programming Concepts: In-Depth
Explanatory Guide
1. INPUT Statement Behavior in DATA Steps
Concept:
When you use multiple input statements in a DATA step, each input reads from the input buffer
based on how previous statements or line-hold specifiers (@) are used.
a. Workflow
Single use:
input x y;
Reads both variables from the current input record (row).
Multiple uses (no hold):
input x;
input y;
Each new input statement moves to the next raw data line, so x and y are read from
separate lines.
b. Hold Specifiers
Adding @ at the end holds the current line for more input:
input x @;
input y;
Both x and y read from the same line.
Diagram
Raw Data: 5 10
input x; x=5 (line 1)
input y; y=10 (line 2)
2. Implicit Data Conversion in SAS
Concept:
SAS automatically converts data types in certain cases (called implicit conversions):
Character-to-numeric:
When you use a character variable in a numeric context.
data test;
charval = "123";
num = charval + 1; /* SAS converts charval to 123 */
run;
Numeric-to-character:
When you use a numeric variable in a character context.
data test;
numval = 456;
char = put(numval, 8.); /* explicit, but SAS will also auto-convert if needed */
run;
Warning:
SAS logs NOTE messages like “numeric values have been converted...” if implicit conversions
occur.
3. Difference between .sas and .sas7bdat
.sas:
Type: Text program file.
Use: Contains SAS code, data steps, procedures, macros, etc.
Editable: Yes, in any text editor.
.sas7bdat:
Type: Binary data file.
Use: Stores SAS datasets (tables).
Editable: Only readable by SAS software.
Extension File Type Use Editable?
.sas SAS Program Code/Script Yes
.sas7bdat SAS Dataset Data Tables No
4. Controlled Terminology vs. Codelist
Aspect Controlled Terminology Codelist
A standard set of terms defined by an A finite list of allowable values, which may or
Definition
authority (e.g., CDISC, MedDRA) may not be controlled standards
Aspect Controlled Terminology Codelist
External standards (MedDRA, CDISC, Study-specific or sponsor-defined or external
Source
SNOMED) source
Change
Strict (managed by organizations) Variable, may be flexible
Control
Example Gender: M=Male, F=Female (CDISC) Status: 1=Active, 2=Inactive
5. WHERE Statement Restrictions Inside DATA Steps
The WHERE statement can only be applied directly with SET, MERGE, UPDATE, or MODIFY
inside a DATA step, not arbitrarily in the middle.
data new;
set old;
where x=1; /* Valid */
/* ... */
run;
Placing a WHERE statement after other DATA step code (without being part of a SET or similar
statement) will cause an error.
6. No Statements Allowed After DATALINES
When using datalines; (or cards;), no SAS statements are allowed after the data lines in
the same DATA step.
The block must end with a single semicolon on a line.
data test;
input x y;
datalines;
1 2
3 4
; /* NO code between this and the run; */
run;
7. WHERE vs. IF After INFILE/INPUT
When reading raw/external data (with infile + input), WHERE statements cannot be used to
filter observations. Use IF instead.
The WHERE statement is only valid when reading from an existing SAS dataset (e.g., SET
statement).
Example:
data filter1;
infile 'external.txt';
input id age;
if age > 20; /* Valid */
run;
data filter2;
set data_existing;
where age > 20; /* Valid */
run;
8. FIND Function Case Sensitivity
The find() function in SAS is case-sensitive by default.
find(string, 'XX') will not match 'xx' in the string.
To ignore case, use the modifiers or findc() with :i
e.g., find(string, 'xx', 'i') ignores case.
9. Nesting DATA Steps in SAS
Standard Practice:
SAS does not allow one DATA step to be placed inside another (no true nesting).
Implementation:
You can have multiple DATA steps sequentially, but not nested:
data a;
/* code */
run;
data b;
/* code */
run;
If attempted, SAS returns a syntax error.
10. LEAVE vs. BREAK Statements
Statement Behavior Where Used
LEAVE Immediately exits current loop (DO or DO WHILE/UNTIL) DATA step loops
BREAK Used with PROC REPORT for breaking on variable value PROC REPORT only
LEAVE is similar to “break” in other languages: it stops the nearest enclosing loop and
continues afterwards.
11. Default Length of Character Variables
By default:
SAS assigns a character variable length of 8 bytes if you do not specify a length.
Exception:
If you create a character variable using the CHAR() function, the default length is 200 bytes.
Variable Creation Default Length
char variable (data step, no explicit length) 8 bytes
using CHAR() function 200 bytes
12. Default SAS Date Format
SAS stores dates as numbers (days since 1960-01-01).
When displayed, the default format is DATE9., e.g., 01JAN1960.
13. Character Variables: Length by Value
If you specify a value on creation, but don’t use a LENGTH statement, the length of the
variable is determined by the first occurrence.
data test;
x = 'ABCD'; /* x has length 4 */
run;
But if assigned a shorter value later, it is still stored at the assigned length.
Diagrams
For illustration, here's a text-based example for a few points:
Input Statement Flow (1)
Data: 10 20
-------------------
input x; --> x=10
input y; --> y=20 (next line!)
Case Sensitivity in FIND Function (8):
String: 'Apple'
find('Apple', 'ap') --> 0 (no match)
find('Apple', 'ap', 'i') --> 1 (match, ignores case)
If you’d like a downloadable document with formatted diagrams, tables, and explanations, copy
this report into your editor or word processor, and add illustrative images or flowcharts as
necessary. This covers all your requested topics in practical SAS programming detail.