Regex Cheatsheet
? : match any single character. a, b, c, d
∗ : match zero or more characters. ab*c = abc|abbc|abbbc
[ ] : match list of characters in the list [1-9] = 1, 2, 3, 4, 5, 6, 7, 8, 9
specified
[! ] : match characters not in the list [!1-4] = 5, 6, 7, 8, 9
specified
[ˆ ] : match a single character that’s not in [^a-c] = d-z
the list specified *Carlos -> [^C-a] = rlos
{m,n} : at least m times and not more than n a{3,5} = aaa | aaaa | aaaaa
times,
{m, } : at least m times, a{3,} = aaa | aaa…
{n} : exactly n times a{2} = aa
+ : Match the last ”block” one or more times ba+ = ba & baa & baaa…
? : any character ba? = b | a
^ : matches the starting line Basic | Bass | Based | Batido = Ba^
$ : Matches the ending position of the string, av? = microwave; ave
it matches the ending position of any line lo? = love; lov, lopez; lop
\s : it matches the spaces
\S : Matches any non-whitespace. Hola yo soy = holayosoy
\d : Matches any digit Numeros
\D : Matches any non-digit. Quita numeros
\w : Matches any word. Cualquier palabra
\W : Matches any non-word. Quita cualquier palabra
\b : Matches any word boundary.
\B : Matches any non-word boundary.
Character classes
. any character except newline
Example;
abc = a.c
\w\d\s word, digit, whitespace
\W\D\S not word, digit, whitespace
[abc] any of a, b, or c
[^abc] not a, b, or c
[a-g] character between a & g
Anchors
^abc$ start / end of the string
\b\B word, not-word boundary
Escaped characters
\.\*\\ escaped special characters
\t\n\r tab, linefeed, carriage return
Groups & Lookaround
(abc) capture group
\1 backreference to group #1
(?:abc) non-capturing group
(?=abc) positive lookahead
(?!abc) negative lookahead
Quantifiers & Alternation
a*a+a? 0 or more, 1 or more, 0 or 1
a{5}a{2,} exactly five, two or more
a{1,3} between one & three
a+?a{2,}? match as few as possible
ab|cd match ab or cd
Comandos
cut
-c list : la lista especifica cualquier posición del caracter
-b list : La lista especifica la posición del byte
-f list : selecciona solo estas areas
-d delimiter : usar el delimitador como delimitar una área en vez del tab
grep
-i : ignore case during search
-r : search recursively
-v : invert match i.e. match everything except pattern
-l : list files that match pattern
-L : list files that do not match pattern
-n : prefix each line of output with the line number within its input file.
-A num : print num lines of trailing context after matching lines.
-B num : print num lines of leading context before matching lines.
sed (stream editor)
Pattern Operation Command Operation
-e combines multiple commands
s - substitution
g - global replacement
p - print
I - ignore case -f read commands from file
d - delete
G - add newline -h print help info
w - write to file
-n disable print
x - exchange pattern with hold buffer
-V print version info
h - copy pattern to hold buffer
s - substitution
g - global replacement -i in file subsitution
p - print
awk
awk pattern {action}
awk patterns may be one of the following
BEGIN : special pattern which is not tested against input. Mostly used for preprocessing, setting
constants, etc. before input is read.
END : special pattern which is not tested against input. Mostly used for postprocessing after input
has been read.
/regular expression/ : the associated regular expression is matched to each input line that
is read
relational expression : used with the if, while relational operators
&& : logical AND operator used as pattern1 && pattern2. Execute action if pattern1 and pattern2
are true
|| : logical OR operator used as pattern1 —— pattern2. Execute action if either pattern1 or
pattern2 is true
! : logical NOT operator used as !pattern. Execute action if pattern is not matched
?: : Used as pattern1 ?
pattern2 : pattern3. If pattern1 is true use pattern2 for testing else use pattern3
pattern1, pattern2 : Range pattern, match all records starting with record that matches pattern1
continuing until a record has been reached that matches pattern2
Most common action: print
string constants supported by awk
\\ : Literal backslash
\n : newline
\r : carriage-return
\t : horizontal tab
\v : vertical tab
Format specifiers are similar to the C-programming language
%d,%i : decimal number
%e,%E : floating point number of the form [-]d.dddddd.e[±]dd. The %E format uses E instead of e.
%f : floating point number of the form [-]ddd.dddddd
%g,%G : Use %e or %f conversion with nonsignificant zeros truncated. The %G
format uses %E instead of %e
%s : character string