Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Save time Googling with this list of text manipulation commands!

Notifications You must be signed in to change notification settings

greatestusername/TextManipulationTidbits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

Useful Text Manipulation Tidbits For Working With Twitter Messages

Sure you could find these same sorts of answers by searching Google and clicking StackOverflow links. But I thought I should put these sed/awk/grep/etc commands, which have proven very useful when working with a large corpus of tweets, together in one place. Hopefully this collection of commands saves others some time searching.

(use -i flag with sed commands to do inline editing on a file rather than display standard output)

show only lines between 36 and 150 characters in length

sed -nr '/^.{36,150}$/p' temp.txt

delete lines longer than 36 characters

sed '/.\{36\}./d' file.txt

delete all empty lines

sed '/^\s*$/d' file.txt

remove non-ascii characters

sed 's/[\d128-\d255]//g' file.txt

delete lines starting with a hashtag (comments or Twitter hashtags)

sed '/^#/d' file.txt

print any paragraph containing the matching pattern "brains" (paragraph is defined by whitespace line before and after)

sed '/./{H;$!d};x;/brains/!d' file.txt

separate all paragraphs into separate files based on a trailing symbol (In this case using \xa9 (copyright symbol))

awk -v RS="\xa9" 'NR > 1 {print RS $0 > (NR-1)}' file.txt

(j)oin and (p)rint silently all lines of file.txt using ex (removing ^M (Windows line break), character \xa9 (copyright symbol), and lines starting with a hashtag)

ex +%j +%p -scq! file.txt | sed -e 's/\^M//g' -e 's/\xa9//g' -e '/^#/d'

display a count for the number of times the hastag #cool can be found in all files in the current directory (*) (use the -r flag if you wish to search recursively in sub directories)

grep -c -i "#cool" *

search and replace using literal / character to replace "/grin" with "/cheer" (and avoid escaping characters)

sed 's@/grin@/cheer@g' file.txt

Substitute "the" with "ppp" (Finding "the" as a whole word with < > marking the spaces surrounding it)

sed 's/\<the\>/ppp/g' file.txt

replace all instances of "Dang" or "dang" with "Oops"

sed 's/[dD]ang/Oops/g' file.txt

About

Save time Googling with this list of text manipulation commands!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published