Shell Scripting
&
awk
2013 GlobalLogic Inc.
CONFIDENTIAL
Agenda
Shell scripting
Standard file descriptors & IO redirection
File test operators
Functions
Executing SQLs/FTP from a shell-script
awk
Programming model
Records and fields
Pattern matching
Functions
System/built-in variables
Data manipulation and report Generation
CONFIDENTIAL
Training schedule
Day-1
2013 GlobalLogic Inc.
Shell scripting
awk
CONFIDENTIAL
Shell scripting
Standard file descriptors & IO redirection
Executing SQLs/FTP from a shell-script
File test operators
Functions
CONFIDENTIAL
UNIX standard files : IO-redirection
Re-direction of input and output: >, >>, <,<<
Standard streams, their (C) names and file-descriptors:
Input stdin - 0
Output stdout - 1
Error stderr - 2
UNIX standard files [ contd.]
Ex.1
cat x1 vs cat < x1
cat x1 > x2 vs cat < x1 > x2
UNIX standard files [ contd.]
Ex.2
1. Login as a non root user and go to root - cd /
2. Find everything find .
3. Again, find everything, but redirect output to a file
find . > $HOME/x1
What is being shown on screen? Why?
4. Redirect o/p & error to different files:
find . > $HOME/x1 2>$HOME/x2
5. Send o/p & error to different files:
UNIX Shell Scripting
Here document (heredoc/hereis/here-script)
Special purpose code-block, read something from stdin
Spread over multiple line
Delimiter in the last line, first column
Example, - send mail with message spread over multiple lines:
Test command
!
See more exampes http://tldp.org/LDP/abs/html/here-docs.html
UNIX FTP examples
FTP
ftp -n -i ftp.FreeBSD.org <<END_SCRIPT
user anonymous
pass MyScretPassword
ls
bye
END_SCRIPT
Receive input from a file:
ftp n i my.ftp.server < FileContaingCommands
Redirect output to a file for further analysis
ftp -n -i ftp.FreeBSD.org <<END_SCRIPT >TMPLOG 2>&1
user anonymous
pass MyScretPassword
ls
bye
END_SCRIPT
UNIX FTP example
cat -n 07ExampleFTP.sh
1
#!/bin/sh
2
# Usage:
3
# 07ExampleFTP.sh machine file
4
# set -x
5
SOURCE=$1
6
FILE=$2
7
GETHOST="uname -n"
8
BFILE=`basename $FILE`
9
ftp -n $SOURCE <<EndFTP
10
ascii
11
user anonymous $USER@`$GETHOST`
12
get $FILE /tmp/$BFILE
13
EndFTP
UNIX Shell Scripting
Ex: Redirect output, as well as error to the location where output is being directed:
isql $DATBASE_NAME <<END_EXEC >/home/ymahajan/Log 2>&1
select
request_id ,
trade_date ,
from
trade_table
where
request_id = "$REQUEST_ID";
END_EXEC
UNIX SQL handling
Executing SQL queries:
sqlite3 TrainingDB "select * from orders;"
Write query over multiple lines:
sqlite3 TrainingDB<<!
select * from orders;
!
Redirect output to a file:
sqlite3 TrainingDB > Output.txt <<END_SQL
select * from t1;
END_SQL
cat Output.txt
-Or process output without using a file:
sqlite3 TrainingDB
<<! | cut -d '|' -f 2select * from t1;
!
UNIX SQL handling [contd.]
Multiple SQLs:
sqlite3 TrainingDB
<<END_SQL
.mode column
.header on
select DATA AS DATA from t1;
Select * from orders;
END_SQL
-A single-column:
sqlite3 TrainingDB
<<END_SQL
.mode column
.header on
select DATA AS DATA
from t1;
END_SQL
UNIX SQL handling [contd. 2]
Process output in this single column:
sqlite3 TrainingDB
<<END_SQL | sed -e "/^$/d"
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL
-e "/MyOwnData/d"
Assign this to some variable, and process it:
SomeVariable=`sqlite3 TrainingDB
<<END_SQL | sed -e "s/-//g" | sed -e "/^$/d"
"/MyOwnData/d"
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL
`
echo $SomeVariable
for eachVar in $SomeVariable; do echo "xxxxx $SomeVariable" ;done;
-e
UNIX SQL handling [contd. 3]
Redirect output to some file
SomeVariable=`sqlite3 TrainingDB
<<END_SQL | sed -e "s/-//g"
| sed -e "/^$/d" -e "/MyOwnData/d" | tee TMPLOG 2>&1
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL
`
cat TMPLOG
Dont type it every timeuse the queries from an existing file:
sqlite3 TrainingDB < SQLInput
UNIX SQL handling [contd. 4]
tee:
sqlite3 TrainingDB
<<END_SQL | sed -e "s/-//g" | sed -e
"/^$/d" -e "/MyOwnData/d"
> TMPLOG 2>&1
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL
---vs--sqlite3 TrainingDB
<<END_SQL | sed -e "s/-//g" | sed -e
"/^$/d" -e "/MyOwnData/d" | tee TMPLOG 2>&1
.mode column
.header on
select DATA AS MyOwnData
from t1;
END_SQL
UNIX SQL handling [contd. 5]
Error Handling: Use redirection-operators to direct output to a file for further analysis:
SomeVariable=`sqlite3 TrainingDB
<<END_SQL 2>&1 | sed -e "s/-//g" | sed -e "/^$/d"
-e "/MyOwnData/d" | tee TMPLOG
.mode column
.header on
select ColNotPresent AS MyOwnData
from t1;
END_SQL
`
echo $SomeVariable;
grep ie error TMPLOG
if [ $? = 0 ]
echo "$0: INFORMIX SQL ERROR: Script $0 failed in $Query" | tee -a $OPC >> $LOG
exit $retcd
else
echo "$0: QUERY $Query SUCCESSFUL >> $LOG
fi
WHY did NOT we compare $0 to 0/1 directly after SQL query instead of doing an grep on TMPLOG and
then doing it?
The tee command
tee [-a] file read from standard i/p and writes to
standard o/p & files
The tee command can be used to send standard output to
the screen and to a file simultaneously.
make | tee build.log
Runs the make command and stores its output to build.log.
make install | tee -a build.log
Runs the make install command and appends its output to
build.log.
Utilities automate tasks
Run a task in background:
&
nohup
Sending mails from UNIX
mail -s "Hello"
[email protected] <<!
Test message `date`
From: $USER
Server: $HOSTNAME
!
File test operators
Files attributes comparison
[ parameter FILE ] OR test parameter FILE
Ex. test -x x.awk && echo Executable || echo NOT executable
--OR -if [ -x x.awk ]; then echo Executable; else echo NOT executable; fi
Returns true if...
-e
file exists
-a
file exists: identical to e, but has been "deprecated
-f
file is a regular file, i.e. not a directory or device file
-s
file is not zero size
-d
directory
-b
a block device
-c
a character device
-p
pipe
-h/-L
symbolic-link
File test operators
-S
-t
terminal.
-r/-w/-x
-g
-u
-t
write
-O
-G
-N
f1 -nt f2
f1 -ot f2
f1 -ef f2
!
socket
Terminal-device, e.g. whether the stdin [ -t 0 ] or stdout [ -t 1 ] in a given script is a
File has read / write / execute permission
set-group-id (sgid) flag set on file or directory. If true, then any file created in this
directory will have direcotorys group ID
set-user-id (suid) flag set on file
sticky bit set (the t at the end of ls l o/p) - the save-text-mode flag is a special type of
file permission, if set, then file will be kept in cache-memory, if set to a file, then
permission will be restricted.
you are owner of file
group-id of file same as yours
file modified since it was last read
file f1 is newer than f2
file f1 is older than f2
files f1 and f2 are hard links to the same file
"not" -- reverses the sense of the tests above (returns true if condition absent).
File test operators
File existence:
if [ -f /home/user11/Yogesh ]; then echo "File exists"; else echo "File NOT
present"; fi;
Directory existence:
if [ -d /home/user11/Yogesh ]; then echo 'This is not a file' ;fi;
Executable file:
if [ -x /home/user11/TrainingScripts/01Hello.sh ]; then echo 'Wow, I can run this!
' ;fi;
Writeable file:
if [ -w /home/user11/TrainingScripts/01Hello.sh ]; then echo 'Warning! This file could
be over-written!! ' ;fi;
File-test operators
Logical Operators:
! : NOT
-a : AND
-o : OR
Examples:
File empty or not?
if [ -f /home/user11/Yogesh.data a
-s /home/user11/Yogesh.data ];
then echo 'Some data in file' ;
fi;
Risk of losing data?
if [ -f /home/user11/Yogesh.data
-s /home/user11/Yogesh.data
-w /home/user11/Yogesh.data
then echo 'Danger of a non-empty
fi;
-a
-a
];
file being written over!' ;
Functions
To be used within the script
Functionsor procedures
A function may return a value in one of four different ways:
Change the state of a variable or variables
Use the exit command to end the shell script
Use the return command to end the function, and return the supplied value to the calling section of the shell script
echo output to stdout, which will be caught by the caller just as c=`expr $a + $b` is caught
Can be defined within a file
or inside a project library as well.
There is _NO_ scoping
24
Other than the parameters ($1, $2, $@ etc.)
CONFIDENTIAL
Functions: scoping
Declare as:
function_name () {
list of commands
}
Invoke as:
function_name
function_name
1 b 3 other-arguments
Return a value as:
return 10
Evaluate return code with $?
iRC=$ret
if [ $iRC ge 0 ] .
25
CONFIDENTIAL
Checklist Shell scripting
01
02
03
04
26
Standard file descriptors & IO redirection
Executing SQLs/FTP from a shell-script
File test operators
Shell-scripting: functions
CONFIDENTIAL
awk
27
awk programming model
Records and fields
Pattern matching
Functions
System/built-in variables
Data manipulation and report Generation
CONFIDENTIAL
awk programming model
Designed for text-processing and used typically for data-extraction
Data-driven, Interpreted
awk views input stream as a collection of records
Records are made up of fields
Fields is word w/ one/more non-whitespace characters
Fields are separated by one/more whitespace characters
An awk program consists of pairs of patterns and braced actions
All patterns are examined for every input record
Fields could be accessed by $1, $2 etc. $0 is for whole record
Program consist of main input loop, which gets executed over all the records
Typical awk programs looks like this:
BEGIN{}
{}
END{}
pattern { action }
pattern { action }
Input_file
28
CONFIDENTIAL
awk programming model
BEGIN What happens before processing,
e.g. initialization part
Main input loop the processing
END What happens after processing, e.g.
print some concluding stats
Main input loop:
- You dont write the loop, e.g. in C while(!
EOF) do {readline()}.
- Instructions are written as a series of
pattern/action procedures
- Multiple BEGIN/END/Main-loops are
possible will be executed in the order of
appearance
29
CONFIDENTIAL
awk programming model - examples
awk '$0 ~ /Rent/{print}' file
Rent,900
awk '/Rent/{print}' file
awk '/Rent/' file
awk -F, '$1 ~ /Rent/' file
Search only in first field
awk -F, '$1 == "Medicine"{print $2}' file
200 /n 600
awk '/Rent|cine/' file
3 lines for Medicine and Rent
awk '!/Medicine/' file
The non-medicine lines
awk -F, '$2>500' file
where did I spent more than 500?
awk 'NR==3|| NR==5' file
3rd and 5th lines
awk 'NR!=1' file
skip the header
awk 'BEGIN{IGNORECASE=1} /Rent/' file
Rent + Restaurent as well!
awk '/Rent/{print} /cine/{print}' file
+Medicine
awk 'BEGIN{IGNORECASE=1;print("--START--")} /Rent/{print} /cine/;END{print ("-END--")}' file
....+ report-heading / footer!
awk 'NR>2{ print x} {x=$0}' file
Skip first and the last line...what/how?
30
CONFIDENTIAL
cat file
Medicine,200
Grocery,500
Rent,900
Grocery,800
Medicine,600
Restaurent,300
<empty line>
awk programming model BEGIN and END
Implementation of wc-l in awk (run as awk f awkScriptName inputFileName)
BEGIN { lines = 0}
{ lines = lines + 1 }
END { print lines }
Implementation of cat n in awk (run as awk f awkScriptName inputFileName)
BEGIN { linenum = 0 }
{
linenum = linenum + 1
print \t linenum $0
}
31
CONFIDENTIAL
awk programming model - examples
awk -f awkscript02.awk file
Run the script from a file
awk is C-like input language, so youll see printf(), if, while, for with syntax as
exactly same as that in C
32
CONFIDENTIAL
awk programming model basic awk programs
cat awkscript01.awk
1
BEGIN{
2
IGNORECASE=1
3
print("--START--")
4
}
5
6
/Rent/{print} /cine/;
7
8
END{
9
print ("--END--")
10
}
cat -n awkscript02.awk
1
BEGIN{
33
2
IGNORECASE=1
3
print("--START--")
CONFIDENTIAL
Records and fields
Input is structured and not just an endless string of characters.
- Delimited by spaces or tabs
echo a b c d | awk 'BEGIN { one = 1; two = 2 } { print $(one + two) } c
echo a,b,c,d | awk 'BEGIN { one = 1; two = 2 } { print $(one + two) } null-string
$0: Whole record, $1 / $2: First/second field etc. $NF last field
- You can change the field separator with the -F option on the command line
echo a,b,c,d | awk -F, 'BEGIN { one = 1; two = 2 } { print $(one + two) }
- f vs F:
awk -F, -f awkScriptFile.awk inputDataFile.dat
A better option is to specify it in BEGIN:
BEGIN { FS = "," }
- FS = "\t
Tab, i.e. a single tab as the field separator
- FS = "\t+
Tabs one or more!
- FS = "[':\t]
Any of these three 1, : or tab could be present
- awk -F word[0-9][0-9][0-9] file
fields separated by 3 digits
34
CONFIDENTIAL
Records and Fields
- RS - how to separate records, default value is \n
- It can be changed:
awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
or from the command-line, like this (i.e. even before it starts processing BBS-list!):
awk '{ print $0 }' RS="/" BBS-list
- NR: Record number total records if multiple files - read so far , FNR resets for each file
35
CONFIDENTIAL
awk pattern and actions
Kinds of Patterns
- /regular expression/ It matches when the text of the input record fits the regular expression.
- Expression
A single expression, matches when its value is non-zero (if a number) or
non-null (if a string)
- Range patter, e.g. pat1, pat2 A pair of patterns separated by a comma, specifying a range of records.
The range includes both the initial record that matches pat1, and the final
record
that matches pat2.
- BEGIN / END
Special patterns to supply start-up or clean-up actions
Empty
36
The empty pattern matches every input record
CONFIDENTIAL
awk pattern and actions: Regular expressions
It matches when the text of the input record fits the regular expression.
awk '/foo/ { print $2 }' BBS-list
awk '$1 ~ /J/' inventory-shipped
awk '$1 !~ /J/' inventory-shipped
tolower($1) ~ /foo/ { ... }
Regexp, e,g: e.g. ^ (Start) $(End) .(1 char) [] (char list) *(0-more) +(1-more)etc. could be used.
37
CONFIDENTIAL
awk pattern and actions: Expressions and Range
A single expression, matches when its value is non-zero (if a number) or a non-null (if a string)
awk
awk
awk
awk
awk
'$1 == "foo" { print $2 }' BBS-list
Exact word foo
'$1 ~ /foo/ { print $2 }' BBS-list shall contain foo
'/2400/ && /foo/' BBS-list
2400 and foo, both should be present
'/2400/ || /foo/' BBS-list
either of these two
'! /foo/' BBS-list
all line, but those having the word foo
Range pattern, pat1, pat2 : A pair of patterns separated by a comma, specifying a range of records.
The range includes both the initial record that matches pat1, and the final
record that
matches pat2
awk '$1 == "on", $1 == "off"
Everything b/w on and off inclusive
38
CONFIDENTIAL
awk pattern and actions: BEING-END and Empty pattern
BEGIN / END - Special patterns to supply start-up or clean-up actions
awk BEGIN { print "Analysis of \"foo\"" }
/foo/ { ++n }
END
{ print "\"foo\" appears " n " times." }' BBS-list
Empty: To print every input record
awk '{ print $1 }' BBS-list
39
CONFIDENTIAL
awk functions
Built-in functions:
C-like operations, and operators.
Arithmetic functions
int(), sqrt(), sin( ), cos( ), exp( ), atan2( ), sqrt( ), rand( ), srand()
http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_91.html#SEC94
String functions
index(), length(), match(), split(),sprint(), sub(), gsub(),substr(),tolower(), toupper()
http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_92.html
awk 'x = sqrt( $2+$3);{printf("%f,%f,%f,%f",x, $2,$3, $2+$3)}'
file2
awk 'x = sqrt( $2+$3);{printf("%s %.2f %d %d %d", substr($1,3,3) ,x, $2,$3, $2+$3)}'
40
CONFIDENTIAL
file2
awk functions
datafile
1.2
3.4
5.6
7.8
9.10 11.12 -13.14 15.16
17.18 19.20 21.22 23.24
User-defined functions:
Define:
function myprint(num)
{
printf "%6.3g\n", num
}
File rev.ask
function rev(str)
{
if (str == "")
return ""
return (rev(substr(str, 2)) substr(str, 1, 1))
}
41
CONFIDENTIAL
awk Operators
Arithmetic operators:
^, **, -, +, *, / , %,
Comparison-operators: <, <=, >, >=, ==, !=, ~, !~, in
String Concatenation:
No explicite operator, simply write strings next to each other, e.g. print "Field number one: " $1
Assignment:
=
Increment/Decrement: ++, -- : both post and pre-fix
Regexp Operators:
\
Suppress special meaning of a character, e.g. \$ would match a $ and not something at end of a line
^
Beginning of a string
$
End of a string
.(Period)
Any single character
() Group regexp together, e.g. @(samp|code)\{[^}]+\} matches both @code{foo} and @samp{bar}.
*
Repeat as many times as possible, e.g. ph* - lookup for one p followed by 0 or more h, e.g. p, ph, phhh
+
Repeat at least once, e.g. p - lookup for one p followed by 1 or more h, i.e. ph, phh etc. but not p
?
Match once or not at all, e.g. fe?d matches fd or fed, but not feed
{n}/{n,},{n,m} Match exactly n / n or more / n to m e.g. wh{3}y whhhy, w{1,2}y - why, why, w{1,}y why, whhy, whhhy etc.
[] Bracket expression, match any one, e.g. [Yog] matches any one of the Y, o or g.
[^] Complimented bracket expression, e.g. [^Yog] match if it does not contain either of Y, or or g.
|
Alteration operator, e.g. ^P|[aeiouy] - either it starts with a P, or contains any of aeiouy
42
CONFIDENTIAL
awk built-in variables
Field variables: $1, $2, $3, and so on ($0 represents the entire record).
NR:
Current count of the number of input records / line being read
NF:
Count of the number of fields in an input record. $NF for the last field in the input record
FILENAME:
Contains the name of the current input-file.
FS:
Field-separator" character for input record, default is "white space (1/more spaces/tabs)
characters. FS can be reassigned to another character to change the field separator.
RS:
Record Separator" character. new line is the default record separator character
OFS:
Output field separator for o/p fields when awk prints them, default is a "space"
character.
ORS:
Output record separator, for o/p records when awk prints them, default is a "newline"
character.
OFMT:
Format for numeric output. The default format is "%.6g".
43
CONFIDENTIAL
awk Data Manipulation
Input file flat file containing record and fields, available for string-manipulation and arithmetic operations.
e.g. consider file2:
If the 5th column is + then subtract 5000 from column 2 and add 2000 to column 3
If the 5th column is "-", then add 5000 to column 3 and subtract 2000 from column 2
awk '$5 == "+" {$2-=5000;$3+=2000}; $5 == "-"{$3+=5000;$2-=2000};{print}'
awk -f awkscript04.awk file2
cat awkscript04.awk
BEGIN{("---START-----")}
{
if($5 == "+"){
$2-=5000;
$3+=2000
}
if($5 == "-") {
$3+=5000;
$2-=2000
}
print
}
END { print ("---END-----") }
file2
cat file2
#track
chr11
61731756
61735132
chr12
6643584 6647537 GAPDH
chr11
18415935
18429765
chr12
21788274
21810728
chr22
24236564
24237409
chr4
6641817 6644470 MRFAP1
chr15
72491369
72523727
chr10
73576054
73611082
chr2
85132762
85133799
chr13
45911303
45915297
FTH1
+
LDHA
LDHB
MIF +
+
PKM PSAP
TMSB10
TPT1
Ref.Stack Exchange:
http://unix.stackexchange.com/questions/127471/using-awk-for-data-manipulation
44
CONFIDENTIAL
+
-
+
-
Data transformation and report generation language
Data manipulation and retrieval of information from text files
Initialize variables before reading a file:
awk -f progfile a=1 f1 f2 a=2 f3
sets a to 1 before reading input from f1 and sets a to 2 before reading input from f3
-
The -v option lets you assign a value to a variable before the awk program begins running (that is, before the
BEGIN action). For example, in
awk -v v1=10 -f prog datafile
45
CONFIDENTIAL
Report Generation-I
- Get employee names and salary:
awk '{print $2, $5}' employee.tx
employee.txt
100 Thomas Manager
Sales
$5,000
200 Jason
Developer Technology $5,500
300 Sanjay Sysadmin
Technology $7,000
400 Nisha
Manager
Marketing
$9,500
500Randy
DBA
Technology $6,000
..
- or something more report like:
cat -n report01.awk
1 BEGIN {
2
printf("
Salary Report\n");
3
printf("EName\tSalary\n");
4
printf("=====\t=======\n")
5 }
6 { printf("%s\t%d\n",$2,$5)}
7 END {
8
printf("--END OF REPORT--\n")
9 }
awk -f report01.awk employee.txt
46
CONFIDENTIAL
Report Generation - II
- An HTML report:
awk -f report02.awk -v v1=Technology employee.txt > abc.html
cat report02.awk:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
47
BEGIN {
title="Salary Report by awk"
print"<html>\n<title>"title"</title><body bgcolor=\"#aabbcc\">"
print"\n<table border=1><th colspan=3 align=centre>Salary Report"
print"for "v1" department</th>";
print "<tr><td>#</td><td>EName</td><td>Salary</td>"
totalSal=0
count=0
}
{
#if($4=="Technology")
if($4==v1) {
count++
print "<tr><td>"count"</td><td>"$2"</td><td>"$5"</td>"
totalSal+=$5
}
}
CONFIDENTIAL
Assignment
Create an HTML Report for states data input states.dat (file shared w/ all)
Print name of state / UT, Capital, and year (in which capital was established).
Skip the header and footer (first and last row) of file
Background of UTs should be red
Names of UTs contain words union territory - since youre using highlighting, dont print that part
States set background colour to blue
In the bottom, print counts of:
States
Uts
Email the report as an attachment to yourself
Nice job! Now try this as wellit would be fun!
- Count and list of states / UTs sharing capitals, e.g. PB, HR, CH
- Has their been any city which was capital of more than one states at different times (e.g. Shimla has been
capital of PB and HP and Kolkata, Mumbai)
- The oldest and the newest capitals
- whatever else catches your fancy!
48
CONFIDENTIAL
Checklist - awk
01
02
03
04
05
06
49
awk programming model
Records and fields
Pattern matching
Functions
System/built-in variables
Data manipulation and report Generation
CONFIDENTIAL
Questions?
Thank you
[email protected]
2013 GlobalLogic Inc.
CONFIDENTIAL