Codestin Search App

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

- Vasudev Ram - Online Python training and consulting

Share |

Wednesday, May 10, 2017

Python utility like the Unix cut command - Part 1 - cut1.py

By Vasudev Ram

Regular readers of my blog might have noticed that I sometimes write (and write about) command line utilities in posts here.

Recently, I thought of implementing a utility like the Unix cut command in Python.

Here is my first cut at it (pun intended :), below.

However, instead of just posting the final code and some text describing it, this time, I had the idea of doing something different in a post (or a series of posts).

I thought it might be interesting to show some of the stages of development of the utility, such as incremental versions of it, with features or code improvements added in each successive version, and a bit of discussion on the design and implementation, and also on the thought processes occurring during the work.

(For beginners: incidentally, speaking of thought processes, during interviews for programming jobs at many companies, open-ended questions are often asked, where you have to talk about your thoughts as you work your way through to a solution to a problem posed. I know this because I've been on both sides of that many times. This interviewing technique helps the interviewers gauge your thought processes, and thereby, helps them decide whether they think you are good at design and problem-solving or not, which helps decide whether you get the job or not.)

One reason for doing this format of post, is just because it can be fun (for me to write about, and hopefully for others to read - I know I myself like to read such posts), and another is because one of my lines of business is training (on Python and other areas), and I've found that beginners sometimes have trouble going from a problem or exercise spec to a working implementation, even if they understand well the syntax of the language features needed to implement a solution. This can happen because 1) there can be many possible solutions to a given programming problem, and 2) the way to break down a problem into smaller pieces (stepwise refinement), that are more amenable to translating into programming statements and constructs, is not always self-evident to beginners. It is a skill one acquires over time, as one keeps on programming for months and years.

So I'm going to do it that way (with more explanation and multiple versions), over the course of a few posts.

For this first post, I'll just describe the rudimentary first version that I implemented, and show the code and the output of a few runs of the program.

The Wikipedia link about Unix cut near the top of this post describes its behavior and command line options.

In this first version, I only implement a subset of those:
- reading from a file (only a single file, and not reading from standard input (stdin))
- only support the -c (for cut by column) option (not the -b (by byte) or -f (by field) options)
- only support one column specification, i.e. -cm-n, not forms like -cm1-n1,m2-n2,...

In subsequent versions, I'll add support for some of the omitted features, and also fix any errors that I find in previous versions, by testing.

I'll call the versions cutN.py, where 1 <= N <= the highest version I implement. So this current post is about cut1.py.

Here is the code for cut1.py:

"""
File: cut1.py
Purpose: A Python tool somewhat similar to the Unix cut command.
Does not try to be exactly the same or implement all the features 
of Unix cut. Created for educational purposes.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
"""

from __future__ import print_function

import sys
from error_exit import error_exit

def usage(args):
    #"Cuts the specified columns from each line of the input.\n", 
    lines = [
    "Print only the specified columns from lines of the input.\n", 
    "Usage: {} -cm-n file\n".format(args[0]), 
    "or: cmd_generating_text | {} -cm-n\n".format(args[0]), 
    "where -c means cut by column and\n", 
    "m-n means select (character) columns m to n\n", 
    "For each line, the selected columns will be\n", 
    "written to standard output. Columns start at 1.\n", 
    ]
    for line in lines:
        sys.stderr.write(line)

def cut(in_fil, start_col, end_col):
    for lin in in_fil:
        print(lin[start_col:end_col])

def main():

    sa, lsa = sys.argv, len(sys.argv)
    # Support only one input file for now.
    # Later extend to support stdin or more than one file.
    if lsa != 3:
        usage(sa)
        sys.exit(1)

    prog_name = sa[0]

    # If first two chars of first arg after script name are not "-c",
    # exit with error.
    if sa[1][:2] != "-c":
        usage(sa)
        error_exit("{}: Expected -c option".format(prog_name))

    # Get the part of first arg after the "-c".
    c_opt_arg = sa[1][2:]
    # Split that on "-".
    c_opt_flds = c_opt_arg.split("-")
    if len(c_opt_flds) != 2:
        error_exit("{}: Expected two field numbers after -c option, like m-n".format(prog_name))

    try:
        start_col = int(c_opt_flds[0])
        end_col = int(c_opt_flds[1])
    except ValueError as ve:
        error_exit("Conversion of either start_col or end_col to int failed".format(
        prog_name))

    if start_col < 1:
        error_exit("Error: start_col ({}) < 1".format(start_col))
    if end_col < 1:
        error_exit("Error: end_col ({}) < 1".format(end_col))
    if end_col < start_col:
        error_exit("Error: end_col < start_col")
    
    try:
        in_fil = open(sa[2], "r")
        cut(in_fil, start_col - 1, end_col)
        in_fil.close()
    except IOError as ioe:
        error_exit("Caught IOError: {}".format(repr(ioe)))

if __name__ == '__main__':
    main()

Here are the outputs of a few runs of cut1.py. I used this text file for the tests.
The line of digits at the top acts like a ruler :) which helps you know what character is at what column:

$ type cut-test-file-01.txt
12345678901234567890123456789012345678901234567890
this is a line with many words in it. how is it.
here is another line which also has many words.
now there is a third line that has some words.
can you believe it, a fourth line exists here.

$ python cut1.py
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.

$ python cut1.py -c
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.

$ python cut1.py -c a
cut1.py: Expected two field numbers after -c option, like m-n

$ python cut1.py -c0-0
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.

$ python cut1.py -c0-0 a
Error: start_col (0) < 1

$ python cut1.py -c1-0 a
Error: end_col (0) < 1

$ python cut1.py -c1-1 a
Caught IOError: IOError(2, 'No such file or directory')

$ python cut1.py -c1-1 cut-test-file-01.txt
1
t
h
n
c

$ python cut1.py -c6-12 cut-test-file-01.txt
6789012
is a li
is anot
here is
ou beli

$ python cut1.py -20-12 cut-test-file-01.txt
Print only the specified columns from lines of the input.
Usage: cut1.py -cm-n file
or: cmd_generating_text | cut1.py -cm-n
where -c means cut by column and
m-n means select (character) columns m to n
For each line, the selected columns will be
written to standard output. Columns start at 1.
cut1.py: Expected -c option

$ python cut1.py -c20-12 cut-test-file-01.txt
Error: end_col < start_col

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

- Vasudev Ram - Online Python training and consulting

Share |

Friday, April 14, 2017

Quick CMD one-liner like "which python" command - and more

By Vasudev Ram

Today, while browsing StackOverflow, I saw this question:

Is there an equivalent of 'which' on the Windows command line?

And I liked the second answer.

The first part of that answer had this solution:

c:\> for %i in (cmd.exe) do @echo.   %~$PATH:i
   C:\WINDOWS\system32\cmd.exe

c:\> for %i in (python.exe) do @echo.   %~$PATH:i
   C:\Python25\python.exe

Then I modified the one-liner above to add a few more commands to search for, and ran it:

$ for %i in (cmd.exe python.exe metapad.exe file_sizes.exe tp.exe alarm_clock.py
) do @echo.   %~$PATH:i
   C:\Windows\System32\cmd.exe
   D:\Anaconda2-4.2.0-32bit\python.exe
   C:\util\metapad.exe
   C:\util\file_sizes.exe
   C:\util\tp.exe
   C:\util\alarm_clock.py

Notice that I also included a .py file in the list of commands to search for, and it worked for that too. This must be because .py files are registered as executable files (i.e. executable by the Python interpreter) at the time of installing Python on Windows.

Of course, as a user comments in that SO post, this is not exactly the same as the which command, since you have to specify the .exe extension (for the commands you search for), and there are other extensions for executable files, such as .bat and others. But since the Python interpeter is normally named python.exe, this can be used as a quick-and-dirty way to find out which Python executable is going to be run when you type "python", if you have more than one of them installed on your system.

I had also written a simple Python tool roughly like the Unix which command, earlier:
A simple UNIX-like "which" command in Python

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Share |

Friday, March 31, 2017

A Python class like the Unix tee command

By Vasudev Ram

Tee image attribution

Hi readers,

A few days ago, while doing some work with Python and Unix (which I do a lot of), I got the idea of trying to implement something like the Unix tee command, but within Python code - i.e., not as a Python program but as a small Python class that Python programmers could use to get tee-like functionality in their code.

Today I wrote the class and a test program and tried it out. Here is the code, in file tee.py:

# tee.py
# Purpose: A Python class with a write() method which, when 
# used instead of print() or sys.stdout.write(), for writing 
# output, will cause output to go to both sys.stdout and 
# the filename passed to the class's constructor. The output 
# file is called the teefile in the below comments and code.

# The idea is to do something roughly like the Unix tee command, 
# but from within Python code, using this class in your program.

# The teefile will be overwritten if it exists.

# The class also has a writeln() method which is a convenience 
# method that adds a newline at the end of each string it writes, 
# so that the user does not have to.

# Python's string formatting language is supported (without any 
# effort needed in this class), since Python's strings support it, 
# not the print method.

# Author: Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com/vasudevram

from __future__ import print_function
import sys
from error_exit import error_exit

class Tee(object):
    def __init__(self, tee_filename):
        try:
            self.tee_fil = open(tee_filename, "w")
        except IOError as ioe:
            error_exit("Caught IOError: {}".format(repr(ioe)))
        except Exception as e:
            error_exit("Caught Exception: {}".format(repr(e)))

    def write(self, s):
        sys.stdout.write(s)
        self.tee_fil.write(s)

    def writeln(self, s):
        self.write(s + '\n')

    def close(self):
        try:
            self.tee_fil.close()
        except IOError as ioe:
            error_exit("Caught IOError: {}".format(repr(ioe)))
        except Exception as e:
            error_exit("Caught Exception: {}".format(repr(e)))

def main():
    if len(sys.argv) != 2:
        error_exit("Usage: python {} teefile".format(sys.argv[0]))
    tee = Tee(sys.argv[1])
    tee.write("This is a test of the Tee Python class.\n")
    tee.writeln("It is inspired by the Unix tee command,")
    tee.write("which can send output to both a file and stdout.\n")
    i = 1
    s = "apple"
    tee.writeln("This line has interpolated values like {} and '{}'.".format(i, s))
    tee.close()

if __name__ == '__main__':
    main()

And when I ran it, I got this output:

$ python tee.py test_tee.out
This is a test of the Tee Python class.
It is inspired by the Unix tee command,
which can send output to both a file and stdout.
This line has interpolated values like 1 and 'apple'.

$ type test_tee.out
This is a test of the Tee Python class.
It is inspired by the Unix tee command,
which can send output to both a file and stdout.
This line has interpolated values like 1 and 'apple'.

$ python tee.py test_tee.out > main.out

$ fc /l main.out test_tee.out
Comparing files main.out and TEST_TEE.OUT
FC: no differences encountered

As you can see, I compared the teefile with the redirected stdout output, and they are the same.

I have not implemented the exact same features as the Unix tee. E.g. I did not implement the -a option (to append to a teefile if it exists, instead of overwriting it), and did not implement the option of multiple teefiles. Both are straightforward.

Ideas for the use of this Tee class and programs using it:

- the obvious one - use it like the Unix tee, to both make a copy of some program's output in a file, and show the same output on the screen. We could even pipe the screen (i.e. stdout) output to a Python (or other) text file pager :-)

- to capture intermediate output of some of the commands in a pipeline, before the later commands change it. For another way of doing that, see:

Using PipeController to run a pipe incrementally

- use it to make multiple copies of a file, by implementing the Unix tee command's multiple output file option in the Tee class.

Then we can even use it like this, so we don't get any screen output, and also copy some data to multiple files in a single step:

program_using_tee_class.py >/dev/null # or >NUL if on Windows.

Assuming that multiple teefiles were specified when creating the Tee object that the program will use, this will cause multiple copies of the program's output to be made in different specified teefiles, while the screen output will be thrown away. IOW, it will act like a command to copy some data (the output of the Python program) to multiple locations at the same time, e.g. one could be on a directory on your hard disk, another could be on a USB thumb/pen drive, a third could be on a network share, etc. The advantage here is that by copying from the source only once, to multiple destinations, we avoid reading or generating data multiple times, one for the copy to each destination. This can be more efficient, particularly for large outputs / copies.

For more fun Unixy / Pythonic stdin / stdout / pipe stuff, check out:

[xtopdf] PDFWriter can create PDF from standard input

and a follow-up post, that shows how to use the StdinToPDF program in that post, along with my selpg Unix C utility, to print only selected pages of text to PDF:

Print selected text pages to PDF with Python, selpg and xtopdf on Linux

Enjoy your tea :)

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Share |

Saturday, March 4, 2017

m, a Unix shell utility to save cleaned-up man pages as text

By Vasudev Ram

mu image attribution

I was using this Unix utility called m today, as I often do, when working on Linux. It's a shell script that lets you save the man pages for one or more Unix commands, system calls or other topics, to text files, after cleaning up the man command output to remove formatting meant for emphasis, printing, etc.

I had first written m a while ago, on Unix boxes that I used to work on earlier, as opposed to Linux, which I use these days.
At that time it was needed, because on some Unix versions, man page output used to be formatted for printers (by text-processing tools such as nroff and troff. These tools would insert extra formatting characters in the output, for effects like bold and underscore, that made it less easy to read the text file on screen, if you simply redirected it to a file, and opened it in a text editor. (Reading the page via the man command itself would work fine.)

Here is the m script, shown by cat [1]:

cat ~/bin/m

cat displays:

mkdir -p ~/man
for i
do
    man $i | col -bx > ~/man/$i.m
done

[1] Check out "useless use of cat" at the cat link above.

(A less-known fact is that "for i" is shorthand for "for i in $*", i.e. it iterates over all the command-line arguments to the script. Not to be confused with "for i in *" which will iterate over all the filenames in the current directory, because the * expands to that.)

m uses the convention of putting all the text files that it creates (one per command-line argument), into a directory called man, under your home directory, i.e. ~/man. If the directory does not exist, it will be created.

You have to save the above script as a file called m in a directory that is in your Unix PATH. Creating a directory called ~/bin is a good choice - your local bin directory:

mkdir ~/bin
cp m ~/bin  # Assumes you created m in your current directory.

and make it executable using chmod:

chmod u+x ~/man/m

Now if I run m as follows, to generate (as cleaned-up text) the man pages for, say, the fopen and fclose C stdio library functions,

m fopen fclose

it creates the text files fopen.m and fclose.m in my ~/man directory.
I can then open fopen.m with the view command (vi in read-only mode):

view ~/man/fopen.m

Here is a screenshot of the file opened in vi(ew):

Enjoy.

P.S. If you are new to vi and want to get up and running with it fast, check out my vi quickstart tutorial. I first wrote it for a couple of friends, Windows system administrator colleagues of mine, who had been given additional charge of a few Unix boxes, at their request, to help them to get up to speed with vi. They later said it helped with that.

P.P.S. If you like short words and commands like m, check out the Japanese word mu for some interesting points.

A few excerpts:
[
The Japanese and Korean term mu (Japanese: 無; Korean: 무) or Chinese wú (traditional Chinese: 無; simplified Chinese: 无) meaning "not have; without" is a key word in Buddhism, especially Zen traditions.
...
Some English translation equivalents of wú or mu 無 are:
"no", "not", "nothing", or "without"[2]
nothing, not, nothingness, un-, is not, has not, not any[3]
[1] Nonexistence; nonbeing; not having; a lack of, without. [2] A negative. [3] Caused to be nonexistent. [4] Impossible; lacking reason or cause. [5] Pure human awareness, prior to experience or knowledge. This meaning is used especially by the Chan school.
...
The character wu 無 originally meant "dance" and was later used as a graphic loan for wu "not". The earliest graphs for 無 pictured a person with outstretched arms holding something (possibly sleeves, tassels, ornaments) and represented the word wu "dance; dancer".
...
The Gateless Gate, which is a 13th-century collection of Chan or Zen kōans, uses the word wu or mu in its title (Wumenguan or Mumonkan 無門關) and first kōan case ("Joshu's Dog" 趙州狗子). Chinese Chan calls the word mu 無 "the gate to enlightenment".[9] The Japanese Rinzai school classifies the Mu Kōan as hosshin 発心 "resolve to attain enlightenment", that is, appropriate for beginners seeking kenshō "to see the Buddha-nature"'.[10]
...
In the original text, the question is used as a conventional beginning to a question-and-answer exchange (mondo). The reference is to the Mahāyāna Mahāparinirvāṇa Sūtra[14] which says for example:
In this light, the undisclosed store of the Tathagata is proclaimed: "All beings have the Buddha-Nature".[15]
...
In Robert M. Pirsig's 1974 novel Zen and the Art of Motorcycle Maintenance, mu is translated as "no thing", saying that it meant "unask the question". He offered the example of a computer circuit using the binary numeral system, in effect using mu to represent high impedance:
...
"Mu" may be used similarly to "N/A" or "not applicable," a term often used to indicate the question cannot be answered because the conditions of the question do not match the reality.
...
Because of this meaning, programming language Perl 6 uses "Mu" for the root of its type hierarchy.[23]
]

The image at the top of the post, is the character mu in seal script.

P.P.P.S. Really the last this time :) Another m-word:

Check out Muji - simplicity is deceptively complex.

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Managed WordPress Hosting by FlyWheel

Follow me on: LinkedIn * Twitter

Share |

Wednesday, March 1, 2017

Show error numbers and codes from the os.errno module

By Vasudev Ram

While browsing the Python standard library docs, in particular the module os.errno, I got the idea of writing this small utility to display os.errno error codes and error names, which are stored in the dict os.errno.errorcode:

Here is the program, os_errno_info.py:

from __future__ import print_function
'''
os_errno_info.py
To show the error codes and 
names from the os.errno module.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
'''

import sys
import os

def main():
    
    print("Showing error codes and names\nfrom the os.errno module:")
    print("Python sys.version:", sys.version[:6])
    print("Number of error codes:", len(os.errno.errorcode))
    print("{0:>4}{1:>8}   {2:<20}    {3:<}".format(\
        "Idx", "Code", "Name", "Message"))
    for idx, key in enumerate(sorted(os.errno.errorcode)):
        print("{0:>4}{1:>8}   {2:<20}    {3:<}".format(\
            idx, key, os.errno.errorcode[key], os.strerror(key)))

if __name__ == '__main__':
    main()

And here is the output on running it:

$ py -2 os_errno_info.py >out2 && gvim out2
Showing error codes and names
from the os.errno module:
Python sys.version: 2.7.12
Number of error codes: 86
 Idx    Code   Name                    Message
   0       1   EPERM                   Operation not permitted
   1       2   ENOENT                  No such file or directory
   2       3   ESRCH                   No such process
   3       4   EINTR                   Interrupted function call
   4       5   EIO                     Input/output error
   5       6   ENXIO                   No such device or address
   6       7   E2BIG                   Arg list too long
   7       8   ENOEXEC                 Exec format error
   8       9   EBADF                   Bad file descriptor
   9      10   ECHILD                  No child processes
  10      11   EAGAIN                  Resource temporarily unavailable
  11      12   ENOMEM                  Not enough space
  12      13   EACCES                  Permission denied
  13      14   EFAULT                  Bad address
  14      16   EBUSY                   Resource device
  15      17   EEXIST                  File exists
  16      18   EXDEV                   Improper link
  17      19   ENODEV                  No such device
  18      20   ENOTDIR                 Not a directory
  19      21   EISDIR                  Is a directory
  20      22   EINVAL                  Invalid argument
  21      23   ENFILE                  Too many open files in system
  22      24   EMFILE                  Too many open files
  23      25   ENOTTY                  Inappropriate I/O control operation
  24      27   EFBIG                   File too large
  25      28   ENOSPC                  No space left on device
  26      29   ESPIPE                  Invalid seek
  27      30   EROFS                   Read-only file system
  28      31   EMLINK                  Too many links
  29      32   EPIPE                   Broken pipe
  30      33   EDOM                    Domain error
  31      34   ERANGE                  Result too large
  32      36   EDEADLOCK               Resource deadlock avoided
  33      38   ENAMETOOLONG            Filename too long
  34      39   ENOLCK                  No locks available
  35      40   ENOSYS                  Function not implemented
  36      41   ENOTEMPTY               Directory not empty
  37      42   EILSEQ                  Illegal byte sequence
  38   10000   WSABASEERR              Unknown error
  39   10004   WSAEINTR                Unknown error
  40   10009   WSAEBADF                Unknown error
  41   10013   WSAEACCES               Unknown error
  42   10014   WSAEFAULT               Unknown error
  43   10022   WSAEINVAL               Unknown error
  44   10024   WSAEMFILE               Unknown error
  45   10035   WSAEWOULDBLOCK          Unknown error
  46   10036   WSAEINPROGRESS          Unknown error
  47   10037   WSAEALREADY             Unknown error
  48   10038   WSAENOTSOCK             Unknown error
  49   10039   WSAEDESTADDRREQ         Unknown error
  50   10040   WSAEMSGSIZE             Unknown error
  51   10041   WSAEPROTOTYPE           Unknown error
  52   10042   WSAENOPROTOOPT          Unknown error
  53   10043   WSAEPROTONOSUPPORT      Unknown error
  54   10044   WSAESOCKTNOSUPPORT      Unknown error
  55   10045   WSAEOPNOTSUPP           Unknown error
  56   10046   WSAEPFNOSUPPORT         Unknown error
  57   10047   WSAEAFNOSUPPORT         Unknown error
  58   10048   WSAEADDRINUSE           Unknown error
  59   10049   WSAEADDRNOTAVAIL        Unknown error
  60   10050   WSAENETDOWN             Unknown error
  61   10051   WSAENETUNREACH          Unknown error
  62   10052   WSAENETRESET            Unknown error
  63   10053   WSAECONNABORTED         Unknown error
  64   10054   WSAECONNRESET           Unknown error
  65   10055   WSAENOBUFS              Unknown error
  66   10056   WSAEISCONN              Unknown error
  67   10057   WSAENOTCONN             Unknown error
  68   10058   WSAESHUTDOWN            Unknown error
  69   10059   WSAETOOMANYREFS         Unknown error
  70   10060   WSAETIMEDOUT            Unknown error
  71   10061   WSAECONNREFUSED         Unknown error
  72   10062   WSAELOOP                Unknown error
  73   10063   WSAENAMETOOLONG         Unknown error
  74   10064   WSAEHOSTDOWN            Unknown error
  75   10065   WSAEHOSTUNREACH         Unknown error
  76   10066   WSAENOTEMPTY            Unknown error
  77   10067   WSAEPROCLIM             Unknown error
  78   10068   WSAEUSERS               Unknown error
  79   10069   WSAEDQUOT               Unknown error
  80   10070   WSAESTALE               Unknown error
  81   10071   WSAEREMOTE              Unknown error
  82   10091   WSASYSNOTREADY          Unknown error
  83   10092   WSAVERNOTSUPPORTED      Unknown error
  84   10093   WSANOTINITIALISED       Unknown error
  85   10101   WSAEDISCON              Unknown error

In the above Python command line, you can of course skip the "&& gvim out2" part. It is just there to automatically open the output file in gVim (text editor) after the utility runs.

The above output was from running it with Python 2.
The utility is written to also work with Python 3.
To change the command line to use Python 3, just change 2 to 3 everywhere in the above Python command :)
(You need to install or already have py, the Python Launcher for Windows, for the py command to work. If you don't have it, or are not on Windows, use python instead of py -2 or py -3 in the above python command line - after having set your OS PATH to point to Python 2 or Python 3 as wanted.)

The only differences in the output are the version message (2.x vs 3.x), and the number of error codes - 86 in Python 2 vs. 101 in Python 3.
Unix people will recognize many of the messages (EACCES, ENOENT, EBADF, etc.) as being familiar ones that you get while programming on Unix.
The error names starting with W are probably Windows-specific errors. Not sure how to get the messages for those, need to look it up. (It currently shows "Unknown error" for them.)

This above Python utility was inspired by an earlier auxiliary utility I wrote, called showsyserr.c, as part of my IBM developerWorks article, Developing a Linux command-line utility (not the main utility described in the article). Following (recursively) the link in the previous sentence will lead you to the code for both the auxiliary and the main utility, as well as the PDF version of the article.

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Managed WordPress Hosting by FlyWheel

Follow me on: LinkedIn * Twitter

Share |

Sunday, January 8, 2017

An Unix seq-like utility in Python

By Vasudev Ram

Due to a chain (or sequence - pun intended :) of thoughts, I got the idea of writing a simple version of the Unix seq utility (command-line) in Python. (Some Unix versions have a similar command called jot.)

Note: I wrote this program just for fun. As the seq Wikipedia page says, modern versions of bash can do the work of seq. But this program may still be useful on Windows - not sure if the CMD shell has seq-like functionality or not. PowerShell probably has it, is my guess.)

The seq command lets you specify one or two or three numbers as command-line arguments (some of which are optional): the start, stop and step values, and it outputs all numbers in that range and with that step between them (default step is 1). I have not tried to exactly emulate seq, instead I've written my own version. One difference is that mine does not support the step argument (so it can only be 1), at least in this version. That can be added later. Another is that I print the numbers with spaces in between them, not newlines. Another is that I don't support floating-point numbers in this version (again, can be added).

The seq command has more uses than the above description might suggest (in fact, it is mainly used for other things than just printing a sequence of numbers - after all, who would have a need to do that much). Here is one example, on Unix (from the Wikipedia article about seq):

# Remove file1 through file17:
for n in `seq 17`
do
    rm file$n
done

Note that those are backquotes or grave accents around seq 17 in the above code snippet. It uses sh / bash syntax, so requires one of them, or a compatible shell.

Here is the code for seq1.py:

'''
seq1.py
Purpose: To act somewhat like the Unix seq command.
Author: Vasudev Ram
Copyright 2017 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
'''

import sys

def main():
    sa, lsa = sys.argv, len(sys.argv)
    if lsa < 2:
        sys.exit(1)
    try:
        start = 1
        if lsa == 2:
            end = int(sa[1])
        elif lsa == 3:
            start = int(sa[1])
            end = int(sa[2])
        else: # lsa > 3
            sys.exit(1)
    except ValueError as ve:
        sys.exit(1)

    for num in xrange(start, end + 1):
        print num, 
    sys.exit(0)
    
if __name__ == '__main__':
    main()

And here are a few runs of seq1.py, and the output of each run, below:

$ py -2 seq1.py

$ py -2 seq1.py 1
1

$ py -2 seq1.py 2
1 2

$ py -2 seq1.py 3
1 2 3

$ py -2 seq1.py 1 1
1

$ py -2 seq1.py 1 2
1 2

$ py -2 seq1.py 1 3
1 2 3

$ py -2 seq1.py 4
1 2 3 4

$ py -2 seq1.py 1 4
1 2 3 4

$ py -2 seq1.py 2 2
2

$ py -2 seq1.py 5 3

$ py -2 seq1.py -6 -2
-6 -5 -4 -3 -2

$ py -2 seq1.py -4 -0
-4 -3 -2 -1 0

$ py -2 seq1.py -5 5
-5 -4 -3 -2 -1 0 1 2 3 4 5

There are many other possible uses for seq, if one uses one's imagination, such as rapidly generating various filenames or directory names, with numbers in them (as a prefix, suffix or in the middle), for testing or other purposes, etc.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Managed WordPress Hosting by FlyWheel

Follow me on: LinkedIn * Twitter

Share |

Friday, November 25, 2016

Processing DSV data (Delimiter-Separated Values) with Python

By Vasudev Ram

I needed to process some DSV files recently. Here is a Python program I wrote for it, with a few changes over the original. E.g. I do not show the processing of the data here; I only read and print it. Also, I support two different command-line options (and ways) to specify the delimiter character.

DSV (Delimiter-separated values) is a common tabular text data format, with one record per line, and some number of fields per record, where the fields are separated or delimited by some specific character. Some common delimiter characters used in DSV files are tab (which makes them TSV files, Tab-Separated Values, common on Unix), comma (CSV files, Comma-Separated Values, a common spreadsheet and database import-export format), the pipe character (|), the colon (:) and others.

They - DSV files - are described in this section, Data File Metaformats, under Chapter 5: Textuality, of Eric Raymond (ESR)'s book, The Art of Unix Programming, which is a recommended read for anyone interested in Unix (one of the longest-lived operating systems [1]) and in software design and development.

[1] And, speaking a bit loosely, nowaday Unix is also the most widely used OS in the world, by a fair margin, due to its use (as a variant) in Android and iOS based mobile devices, both of which are Unix-based, not to mention Apple MacOS and Linux computers, which also are. Android devices alone number in the billions.

The program, read_dsv.py, is a command-line utility, written in Python, that allows you to specify the delimiter character in one of two ways:

- with a "-c delim_char" option, in which delim_char is an ASCII character,
- with a "-n delim_code" option, in which delim_code is an ASCII code.

It then reads either the files specified as command-line arguments after the -n or -c option, or if no files are given, it reads its standard input.

Here is the code for read_dsv.py:

from __future__ import print_function

"""
read_dsv.py
Author: Vasudev Ram
Web site: https://vasudevram.github.io
Blog: https://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
Purpose: Shows how to read DSV data, i.e. 

https://en.wikipedia.org/wiki/Delimiter-separated_values 

from either files or standard input, split the fields of each
line on the delimiter, and process the fields in some way.
The delimiter character is configurable by the user and can
be specified as either a character or its ASCII code.

Reference:
TAOUP (The Art Of Unix Programming): Data File Metaformats:
http://www.catb.org/esr/writings/taoup/html/ch05s02.html
ASCII table: http://www.asciitable.com/
"""

import sys
import string

def err_write(message):
    sys.stderr.write(message)

def error_exit(message):
    err_write(message)
    sys.exit(1)

def usage(argv, verbose=False):
    usage1 = \
        "{}: read and process DSV (Delimiter-Separated-Values) data.\n".format(argv[0])
    usage2 = "Usage: python" + \
        " {} [ -c delim_char | -n delim_code ] [ dsv_file ] ...\n".format(argv[0])
    usage3 = [
        "where one of either the -c or -n option must be given,\n",  
        "delim_char is a single ASCII delimiter character, and\n", 
        "delim_code is a delimiter character's ASCII code.\n", 
        "Text lines will be read from specified DSV file(s) or\n", 
        "from standard input, split on the specified delimiter\n", 
        "specified by either the -c or -n option, processed, and\n", 
        "written to standard output.\n", 
    ]
    err_write(usage1)
    err_write(usage2)
    if verbose:
        for line in usage3:
            err_write(line)

def str_to_int(s):
    try:
        return int(s)
    except ValueError as ve:
        error_exit(repr(ve))

def valid_delimiter(delim_code):
    return not invalid_delimiter(delim_code)

def invalid_delimiter(delim_code):
    # Non-ASCII codes not allowed, i.e. codes outside
    # the range 0 to 255.
    if delim_code < 0 or delim_code > 255:
        return True
    # Also, don't allow some specific ASCII codes;
    # add more, if it turns out they are needed.
    if delim_code in (10, 13):
        return True
    return False

def read_dsv(dsv_fil, delim_char):
    for idx, lin in enumerate(dsv_fil):
        fields = lin.split(delim_char)
        assert len(fields) > 0
        # Knock off the newline at the end of the last field,
        # since it is the line terminator, not part of the field.
        if fields[-1][-1] == '\n':
            fields[-1] = fields[-1][:-1]
        # Treat a blank line as a line with one field,
        # an empty string (that is what split returns).
        print("Line", idx, "fields:")
        for idx2, field in enumerate(fields):
            print(str(idx2) + ":", "|" + field + "|")

def main():
    # Get and check validity of arguments.
    sa = sys.argv
    lsa = len(sa)
    if lsa == 1:
        usage(sa)
        sys.exit(0)
    if lsa == 2:
        # Allow the help option with any letter case.
        if sa[1].lower() in ("-h", "--help"):
            usage(sa, verbose=True)
            sys.exit(0)
        else:
            usage(sa)
            sys.exit(0)

    # If we reach here, lsa is >= 3.
    # Check for valid mandatory options (sic).
    if not sa[1] in ("-c", "-n"):
        usage(sa, verbose=True)
        sys.exit(0)

    # If -c option given ...
    if sa[1] == "-c":
        # If next token is not a single character ...
        if len(sa[2]) != 1:
            error_exit(
            "{}: Error: -c option needs a single character after it.".format(sa[0]))
        if not sa[2] in string.printable:
            error_exit(
            "{}: Error: -c option needs a printable ASCII character after it.".format(\
            sa[0]))
        delim_char = sa[2]
    # else if -n option given ...
    elif sa[1] == "-n":
        delim_code = str_to_int(sa[2])
        if invalid_delimiter(delim_code):
            error_exit(
            "{}: Error: invalid delimiter code {} given for -n option.".format(\
            sa[0], delim_code))
        delim_char = chr(delim_code)
    else:
        # Checking for what should not happen ... a bit of defensive programming here.
        error_exit("{}: Program error: neither -c nor -n option given.".format(sa[0]))

    try:
        # If no filenames given, read sys.stdin ...
        if lsa == 3:
            print("processing sys.stdin")
            dsv_fil = sys.stdin
            read_dsv(dsv_fil, delim_char)
            dsv_fil.close()
        # else (filenames given), read them ...
        else:
            for dsv_filename in sa[3:]:
                print("processing file:", dsv_filename)
                dsv_fil = open(dsv_filename, 'r')
                read_dsv(dsv_fil, delim_char)
                dsv_fil.close()
    except IOError as ioe:
        error_exit("{}: Error: {}".format(sa[0], repr(ioe)))
        
if __name__ == '__main__':
    main()

Here are test runs of the program (both valid and invalid), and the results of each one:

Run it without any arguments. Gives a brief usage message.

$ python read_dsv.py
read_dsv.py: read and process DSV (Delimiter-Separated-Values) data.
Usage: python read_dsv.py [ -c delim_char | -n delim_code ] [ dsv_file ] ...

Run it with a -h option (for help). Gives the verbose usage message.

$ python read_dsv.py -h
read_dsv.py: read and process DSV (Delimiter-Separated-Values) data.
Usage: python read_dsv.py [ -c delim_char | -n delim_code ] [ dsv_file ] ...
where one of either the -c or -n option must be given,
delim_char is a single ASCII delimiter character, and
delim_code is a delimiter character's ASCII code.
Text lines will be read from specified DSV file(s) or
from standard input, split on the specified delimiter
specified by either the -c or -n option, processed, and
written to standard output.

Run it with a -v option (invalid run). Gives the brief usage message.

$ python read_dsv.py -v
read_dsv.py: read and process DSV (Delimiter-Separated-Values) data.
Usage: python read_dsv.py [ -c delim_char | -n delim_code ] [ dsv_file ] ...

Run it with a -c option but no ASCII character argument (invalid run). Gives the brief usage message.

$ python read_dsv.py -c
read_dsv.py: read and process DSV (Delimiter-Separated-Values) data.
Usage: python read_dsv.py [ -c delim_char | -n delim_code ] [ dsv_file ] ...

Run it with a -c option followed by the pipe character (invalid run). The OS (here, Windows) gives an error message because the pipe character cannot be used to end a pipeline.

$ python read_dsv.py -c |
The syntax of the command is incorrect.

Run it with the -c option and the pipe character as the delimiter character, but protected (by double quotes) from interpretation by the OS shell (CMD).

$ python read_dsv.py -c "|" file1.dsv
processing file: file1.dsv
Line 0 fields:
0: |1|
1: |2|
2: |3|
3: |4|
4: |5|
5: |6|
6: |7|
Line 1 fields:
0: |field1|
1: |fld2|
2: | fld3 with spaces around it |
3: |    fld4 with leading spaces|
4: |fld5 with trailing spaces     |
5: |next field is empty|
6: ||
7: |last field|
Line 2 fields:
0: ||
1: |1|
2: |22|
3: |333|
4: |4444|
5: |55555|
6: |666666|
7: |7777777|
8: |88888888|
Line 3 fields:
0: ||
1: ||
2: ||
3: ||
4: ||
5: |                      |
6: ||
7: ||
8: ||
9: ||
10: ||

Run it with the -n option followed by 124, the ASCII code for the pipe character as the delimiter.

$ python read_dsv.py -n 124 file1.dsv
[Gives exact same output as the run above, as it should, because both use the same delimiter and read the same input file.]

Copy file1.dsv to file3.dsv. Change all the pipe characters (delimiters) to colons:
Run it with the -n option followed by 58, the ASCII code for the colon character.

$ python read_dsv.py -n 58 file3.dsv
[Gives exact same output as the run above, as it should, because other than the delimiters (pipe versus colon), the input is the same.]

I added support for the -n option to the program because it makes it more flexible, since you can specify any ASCII character as the delimiter (that makes sense), by giving its ASCII code.

And of course, to find out the values of the ASCII codes for these delimiter characters, I used the char_to_ascii_code.py program from my recent post:

Trapping KeyboardInterrupt and EOFError for program cleanup

You may have noticed that I mentioned delimiter characters and DSV files in that post too. The char_to_ascii_code.py utility shown in that post was created to find the ASCII code for any character (without having to look it up on the web each time).

- Enjoy.

- Vasudev Ram - Online Python training and consulting

- Black Flyday at Flywheel Wordpress Managed Hosting - get 3 months free on the annual plan.

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

- Vasudev Ram - Online Python training and consulting

My ActiveState recipes

Share |

Saturday, October 1, 2016

min_fgrep: minimal fgrep command in D

By Vasudev Ram

min_fgrep pattern < file

The Unix fgrep command finds fixed-string patterns (i.e. not regular expression patterns) in its input and prints the lines containing them. It is part of the grep family of Unix commands, which also includes grep itself, and egrep.

All three of grep, fgrep and egrep originally had slightly different behaviors (with respect to number and types of patterns supported), although, according to the Open Group link in the previous paragraph, fgrep is marked as LEGACY. I'm guessing that means the functionality of fgrep is now included in grep, via some command-line option, and maybe the same for egrep too.

Here is a minimal fgrep-like program written in D. It does not support reading from a filename given as a command-line argument, in this initial version; it only supports reading from stdin (standard input), which means you have to either redirect its standard input to come from a file, or pipe the output of another command to it.

/*
File: min_fgrep.d
Purpose: A minimal fgrep-like command in D.
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
*/

import std.stdio;
import std.algorithm.searching;

void usage(string[] args) {
    stderr.writeln("Usage: ", args[0], " pattern");
    stderr.writeln("Prints lines from standard input that contain pattern.");
}

int main(string[] args) {
    if (args.length != 2) {
        stderr.writeln("No pattern given.");
        usage(args);
        return 1;
    }

    string pattern = args[1];
    try {
        foreach(line; stdin.byLine)
        {
            if (canFind(line, pattern)) {
                writeln(line);
            }
        }
    } catch (Exception e) {
        stderr.writeln("Caught an Exception. ", e.msg);
    }
    return 0;
}

Compile with:

$ dmd min_fgrep.d

I ran it with its standard input redirected to come from its own source file:

$ min_fgrep < min_fgrep.d string

and got the output:

void usage(string[] args) {
int main(string[] args) {
    string pattern = args[1];

(Bold style added by me in the post, not part of the min_fgrep output, unlike with some greps.)

Running it as part of a pipeline:

$ type min_fgrep.d | min_fgrep string > c

gave the same output. The interesting thing here is not the min_fgrep program itself - that is very simple, as can be seen from the code. What is interesting is that the canFind function is not specific to type string; instead it is from the std.algorithm.searching module of the D standard library (Phobos), and it is generalized - i.e. it works on D ranges, which are a rather cool and powerful feature of D. This is done using D's template metaprogramming / generic programming features. And since the version of the function specialized to the string type is generated at compile time, there is no run-time overhead due to genericity (I need to verify the exact details, but I think that is what happens).

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf