PYTHON PROGRAMMING
ADVANCED:
The Guide for Data Analysis and Data Science.
Discover Machine Learning With the optimum
Recipes for Mastering Python and Powerful Object-
Oriented Programming
Table of Contents
Introduction
Chapter 1: Getting Started With Python
Basic Knowledge of How Python Works
Chapter 2: What is the Python Code?
Programs
What Happens When You Run a Program
Benefits of knowing Python code
Chapter 3: Code Style
Recommendations for coding styles
Chapter 4: Functional Programming
Types of functional programming languages
Advantages of Functional programming
The efficiency of a program code
Comparison of functional programming and object-oriented
programming
Chapter 5: Generators and Coroutines
Generators:
Coroutines
Async I/O and the asyncio module:
Native Coroutines and the async/await:
The interoperability of the Native and Generator Based Coroutines:
Chapter 6: The asyncio library
Coroutines and future
Chapter 7: Dynamically creating classes
How classes are dynamically created
Chapter 8: Documentation
Style guide for Python documentation
Chapter 9: How to use Sphinx and restructured text
How Sphinx is used
Workflow
Preparations
Chapter 10: Styles of documenting class - NumPy
Chapter 11: Testing logging
Techniques of capture
Chapter 12: Debugging
Chapter 13: Tracking and reducing memory and CPU usage
Chapter 14: Performance Improvement
Chapter 15: Multiprocessing, is a single-core CPU enough?
The basics of multiprocessing
Locks
Logging
The pool class
Chapter 16: C/C++
History of C and C++
The uses of C and C++
Characteristics of C and C++
The development of C and C++:
Chapter 17: Windows
Windows and Python
Chapter 18: OS X
OS X and Python:
How to correctly install Python on OS X
How to work with Python 3
Virtual Environments and Pipenv
Chapter 19: Linux
Linux and Python
How to use the virtual environment in Python
How to execute Python programs in Linux
Chapter 20: Unix
Unix and Python
Chapter 21: Creating your libraries
Conclusion
© Copyright 2019 - All rights reserved.
The contents of this book may not be reproduced, duplicated or transmitted
without direct written permission from the author.
Under no circumstances will any legal responsibility or blame be held
against the publisher for any reparation, damages, or monetary loss due to
the information herein, either directly or indirectly.
Legal Notice:
This book is copyright protected. This is only for personal use. You cannot
amend, distribute, sell, use, quote or paraphrase any part of the content
within this book without the consent of the author.
Disclaimer Notice:
Please note the information contained within this document is for
educational and entertainment purposes only. Every attempt has been made
to provide accurate, up to date and complete, reliable information. No
warranties of any kind are expressed or implied. Readers acknowledge that
the author is not engaging in the rendering of legal, financial, medical or
professional advice. The content of this book has been derived from various
sources. Please consult a licensed professional before attempting any
techniques outlined in this book.
By reading this document, the reader agrees that under no circumstances is
the author responsible for any losses, direct or indirect, which are incurred
as a result of the use of information contained within this document,
including, but not limited to, —errors, omissions, or inaccuracies.
Introduction
Python is a multipurpose programming language that is on the verge of
constant evolution and growth. This property makes it one of the pioneer
languages among its contemporaries, and a good enough reason for
everyone to learn it. In this third volume of the Python series, we would be
exploring the limits to which Python can be used, although frankly, there is
no telling how much more uses it can be put to now or in the future. It's a
pretty excellent programming language like that.
The knowledge contained within this book is guaranteed to spur your
programming skills to even greater heights and endow you with all the
necessary skills needed to operate Python. A combination of all three
volumes of this series is sure to groom you from a newbie in the
programming world to a good old programmer. And better still, at your own
pace. Every discussion is a well-detailed, systematic process written in a
basic and fluid style to facilitate a complete knowledge of the subject matter
and help you learn without a hassle.
Become the programmer you always wanted to be. Choose Python. Choose
the future.
Chapter 1: Getting Started With Python
This section is a refresher outline of the previous volumes of the Python
series. Here we would be discussing the primary method of getting started
with Python.
To start with, you would have to download the appropriate version of
Python for your software. Python is valid on a wide range of platforms, so it
shouldn't be difficult finding one that matches your system. You can get the
Python interpreter for free on the Python main website. Ensure to download
a version that is suitable for your operating system to avoid complications
when installing the program. At the time of this book, the latest version of
Python available is the 3.x series. For Linux and Mac OS X systems,
Python comes as an already pre-installed package. So, it would not be
necessary to install any other software related to Python. However, you
might want to get a code editor for writing your codes. The pre-installed
versions of Python in some Mac OS X and Linux systems are usually the
2.x series of Python, so it might differ a bit from the one used in this book
(the 3.x series). One significant difference between both series is the change
made to the print() statement. So, feel free to upgrade to the 3.x series in
your Linux or Mac OS X systems by downloading some files via the
official Python website.
Once you have successfully downloaded the appropriate version for your
system, the next stage is to install the program. The installation process
allows you to make some customization depending on what you want, but it
is best to keep it simple and just use the default settings. Python can be
integrated into the command prompt of your system by initializing the final
option in a list of the modules available to you.
The next stage is to get and Install a code editor. Although it is possible to
code in Python with apps like TextEdit or Notepad, it is usually a much
easier task to write your codes with the help of a particular code editor. You
would find there are lots of code editors both free and paid from which you
can choose from. For example, for the Windows operating system, you can
get the Notepad++, for the Mac operating system, you can use
TextWrangler, and for other operating systems you can use JEdit.
Once you have completed your downloading bad installing, proceed to the
next phase in which you test everything is you installed. Visit the command
prompt if you use a Windows system, or Terminal if your system runs on
Linux or Mac OS. Enter Python into the command line and click to enter.
The Python program would boot up, displaying the version number of the
installed program. From there, you would be shown the command prompt
of the Python interpreter, which displays as three brackets (>>>). Enter the
following line of code shown below and click to enter:
print(‘Hello, World!')
The code should output the result shown as follows, just beneath the
command line:
Hello, world!
Basic Knowledge of How Python Works
Keep in mind that Python needs nor be compiled to be executed. Being an
object-oriented language in its own right codes in Python can be executed
as quickly as they have been modified without having to use a compiler.
The fact that it doesn't need a compiler makes it all the more comfortable
and quicker to iterate, revise and troubleshoot than most other programming
languages of like status.
Also, owing to a high preference for simplicity and readability, you can
begin right away to begin coding some simple programs which you can set
up in a few moments.
Feel free to try out what you have learned in the interpreter environment.
With the interpreter, you can try out your codes, which would not
necessarily be added to your program. You can use this environment to
groom your coding skills and experiment on specific programs you don't
want to include in your programming.
Understand how variables and objects are handled in Python. As an object-
oriented language, Python considers everything within it as objects. Another
thing to note is that the declaration of variables when starting a program is
unnecessary because it can be done at any point in the program without any
need to specify what type of variable it is, be it a string, integer, or what
have you.
Another thing you can familiarize yourself with is the syntax used by
Python. It would help attune you with the method Python uses in handling
strings and numbers. To begin, enter the enter interpreter environment.
Click to open the Terminal or Command Prompt, depending on your
operating system. Input "python" into the prompt as a code and click to
enter. You would be directed to the command prompt (>>>) in Python.
If, by now, you haven't already integrated Python into the command prompt
of your system, you would have to visit the Python directory to be able to
execute the interpreter.
Put your knowledge of Python operators to use in performing simple
arithmetic calculations efficiently. Below is a line of code on how to use
operators to execute fundamental mathematical problems. Keep in mind
that # is used to indicate comments when coding in Python, and the
interpreter doesn't execute them with the rest of the codes.
>>> 5 + 6
11
>>> 150 - 5*6
120
>>> (150 - 5*6) / 5 # Division would always output a floating-point
(otherwise known as a decimal) number
24.0
>>> (150 - 5*6) // 5 # Floor division (double forward slashes) would
remove any decimal in the results
24
>>> 23 % 4 # It calculates the remainder of the division
3
>>> 15.23 * 4.26 / 2.3
28.2086087
You can also attempt to calculate in powers using the double-asterisk (**).
It is an operator that indicates powers. As such, Python can make large
calculations on numbers. See the sample below:
>>> 8 ** 2 # 8 squared
64
>>> 2 ** 10 # 2 to the power of 10
1024
You can also design and manage variables of your own. For example, in
performing algebraic calculations, you can create and assign variables to
carry out your calculations. In Python, we assign variables using the sign of
equation (=). See the sample below for clarity:
>>> x = 3
>>> y = 9
>>> x * y
27
>>> 27 * x // y
9
>>> x ** 2
9
>>> length = 12 # Any string can be used as variables
>>> breadth = 4
>>> length * breadth
60
You can also practice basic flow control statements and see how good you
are at it. Flow control statements let a user take charge of the actions of the
program based on certain given conditions. These statements serve as the
core of programming in Python and let users design programs which
perform many functions depending on the conditions set, and the input is
given. For instance, the while statement can be used in calculating a
Fibonacci sequence measuring up to 100. Let's consider the sample below:
# Every number in the Fibonacci sequence represents
# a sum of two prior numbers
x, y = 1, 2
while y < 100:
print(y, end=' ')
x, y = y, x+y
Here, the sequence can be executed provided (while) y < (is less than) 100.
The result that would be outputted would typically follow after this fashion:
1 1 2 3 5 8 13 21 34 55 89
Rather than separate each value into multiple lines, the end= "command
would output all the results on the same line. There are some elements
worth noting in this sample, which is critical to the creation of complex
Python programs:
Take note of indentation when coding. A colon (:) is an indication that a set
of lines are being indented and form a part of the entire block. In the sample
above, the lines:
Print(y) and x, y = y, x+y; act as a part of the while block in the code. Thus,
adequate programming is a necessity if a program must work.
Also, it is possible to define several variables on one line. Consider the
sample above where x and y were both defined on the same line.
When inputting the program into the interpreter directly, take note to
include a blank line that ensures that the interpreter is aware of when the
program terminates.
Chapter 2: What is the Python Code?
Python code refers to a series of commands written in Python, which, when
executed under standard conditions, performs intended tasks. Since we have
covered how to's of coding in Python in the second book, we would
consider coding from a general perspective in this section.
We can design websites, create apps, develop software, among other things,
with the help of coding. The browsers we use down to the operating system
of our PC's and the apps that fill our devices are all extensions of coding.
So, in a nutshell, what is coding?
Coding is the process of giving a machine a specific set of instructions.
However, unlike how it is with humans, instructing a machine isn't an easy
task. In a more apt explanation, we have to consider what type of
instructions the computer understands. A computer is capable of
understanding two primary forms of data, namely, on and off. Matter of
fact, we can conclude that a computer is a broad array of on and off
switches known as transistors. The computer can do anything based on any
unique combination of turning on and off its transistors.
The transistors in a computer are represented by a series of digits, usually
0s and 1s. These numbers are known as a binary code, and they represent
the combination of instructions fed to the computer. Binary codes are
divided into bytes — a group of 8 digits which indicate 8 transistors. For
instance, 10010011 is a byte. In contemporary times, computers field as
much as millions or billions even of transistors, meaning they are capable of
many different kinds of combinations. So, how does coding come in here?
To be able to instruct a computer, one must be able to communicate in the
language it understands. However, this becomes an inhibition because the
only language understood by the computer is the binary. To write computer
code, one would have to write lots of 0s and 1s; it becomes a herculean task
in itself. Putting time into consideration, it could well take till the end of
days to finish a simple block of code. It is for this reason that programming
languages come into the fray. For instance, consider a simple line of code in
Python is:
print "Hello, world!"
A programming language, also known as a coding language, refers to a
range of syntax rules which determine how codes have to be formatted and
written. Programming languages let us write in instructions in a relatively
more straightforward, faster, and more readable format than writing in
binary code. Every programming language comes with a unique program
that sees the translation of what is inputted into binary form.
Programs
A program refers to a text file that is written in any specific programming
language. A source code is any line of code that is written in a program file.
Each coding language there comes with a file extension peculiar to it. The
extension is used to indicate the files coded in the language. For instance,
".py" is the file extension for Python. To create a program, you need to
write your codes using a code editor. For instance, the simple hello.py
program, which can be written, as shown below, is complete in itself.
print "Hello, world!"
However, merely writing a program is not enough to get the computer to
execute your commands. Moreover, since performance varies by
programming languages, there would be some disparities in how the
computer executes commands in different languages. In some programming
languages, there is a distinct file meant for the binary format, which can be
directly executed by the computer. In other languages, the programs are
executed by special software. For instance, programs like JavaScript and
PHP need special software to run. JavaScript programs are executed using
browsers like Google Chrome, while PHP programs are executed using web
servers such as LAMP.
In the case of Python, like with our hello.py, there is a command-line in
which the output of the program is displayed. If the code is inputted into the
command line and entered, the program is executed.
What Happens When You Run a Program
Computers don't understand the phrase "Hello, world!" and would be
unable to reproduce it on a screen because they only understand the on and
off transistors sequence. As such, to execute a line code such as print
"Hello, world!" all the codes in the program have to be translated into a
language the computer understands — a set on and offs. To perform this
translation, the following has to occur:
1.
The line of code is converted into an assembly
language.
2.
The assembly code is then converted into a
machine language.
3.
The machine language is then run directly as a
binary code.
Here is a breakdown of the process:
The programming language begins by converting its source code into an
assembly language, which is a much lower-level language that utilizes
numbers and words informing patterns in binary. This conversion is done
using a computer or an interpreter, depending on the programming language
used. In the case of Python, an interpreter is used, and the program is
converted on a line by line basis. With other languages that require a
compiler, the entirety of the program is converted at once. From here, the
programming language proceeds to feed the assembly code into the
assembler if the computer, where it is converted into a machine language
that can be understood and run directly in the form of a binary code.
Benefits of knowing Python code
1. You can take it professional:
If you need a career change, you can switch to computer programming
easily with a good knowledge of Python codes. Matter of fact, computer
programming is one underrated profession that pays well, and with the
relative unavailability of programmers, there has been a steady growth rate
of about 30% in recent years. For instance, at Facebook and Google,
programmers have been known to be paid up of $125k or more as salaries.
Quite the lucrative profession.
2. You can make a business of your coding skills:
If you have an excellent idea for a software or web product that could have
business potentials, knowing how to code comes in really handy. Not only
would it save you time and money, but you would be able to manage your
product the way you want. Lately, it is becoming increasingly popular for
people to take to coding as a form of business. So, people start with no
knowledge of coding whatsoever to becoming great entrepreneurs. Take
Nick D'Aloisio, for instance, the creator of Summly, an iOS application.
Nick sold his application for around 30 million dollars to Yahoo!. Other
businesses you can branch into with some knowledge of coding are
inclusive but not limited to the following:
a) Selling software
b) Selling mobile apps
c) E-commerce
d) Selling your coding time.
e) Offering coding services.
Should you have a marketable product with potentials, you can avoid doing
nothing about it or waiting for money. If you engage in coding and build
substantial knowledge, you can create anything to market your product(s).
By so doing, you would know the source code of your program while being
able to actualize your products in the same vein.
3. You would understand How Computers Work:
This is no doubt the best advantage of having coding knowledge. It is also
applicable for whatever reason you learned to code or how you learned to
code. We all make use of the internet every day, and computers, phones, and
software are commonly used elements in the world today. However, it isn't
the untruth that the workings of these things are still much unknown to us.
Good knowledge of coding would help you find better methods of going
about your everyday life both at your job and personal life. You can choose
to write your codes to carry out specific tasks such as managing data,
replying to emails, cross-checking texts, and making calculations.
Regardless of your field of discipline, knowing the way code-related things
work is a viable asset at any time.
Chapter 3: Code Style
In this section, we would be considering more coding conventions,
continuing from where we left off in the second volume.
Guido van Rossum believes that codes are read quite more often than they
are written. Thus, the coding conventions point towards improving the
general readability of codes while maintaining a particular level of
consistency across a wide range of Python codes. As contained in PEP 20,
"Readability counts."
1. Consistency:
Consistency is quite important as a subject when it comes to code styles. A
style guide revolves around consistency, and being consistent with a style
guide is quite essential. However, consistency in a function or module is
considered the most important of all.
Regardless of how consistency seems to be important in coding, it is just as
important to know the limits to which you can be consistent. That is, you
have to know when inconsistency should replace consistency; because
sometimes, the recommendations of a style guide won't readily fit in.
Whenever you have doubts, apply your sense of judgment in figuring things
out. Consider other examples and figure out which applies better to your
code. Don't be hesitant to ask questions. Mainly, you need not break the
backward compatible in a bid to comply with the PEP on consistency.
Below are some reasons for which you can choose to ignore the guidelines
of consistency:
If the application of the consistency guideline would reduce the readability
of your code, even to people who are used to reading codes that adhere to
this PEP.
In being consistent with the codes around it, which also breaks it up
(perhaps for historical purposes) — it could be an opportunity to clean up
the mess made by someone (in proper XP fashion).
When the code, in particular, exists before the guidelines were introduced,
then there is no reason whatsoever to modify the said code.
If the code has to maintain compatibility with older Python versions, which
do not have the adequate features to support the recommendations of the
style guide.
2. Module-level dunders:
These refer to names that have underscores, usually two, going before and
trailing after them. Examples include __version__, __author__, __all__,
among others. These dunders should always come after the module
docstring but precede an import statement, with the only exception being
from__future__ imports. It is mandatory in Python for future-imports to
appear in a module preceding any other code asides docstrings. Consider
the sample below:
a) ‘''This is a sample module.
b) This module performs tasks.
c) ‘"
d) from_future__import barry_as_FLUFL
e) __all__ = ["x", "y", "z"]
f) __version__ = "0.1"
g) 1__author__ = "Cardinal Biggles"
h) import os
i) import sys
3. Trailing Commas:
When writing codes, a trailing comma is an optional element, with
exceptions being when a Tuple of an element is being made, in which case
it is mandatory. Also, it contains some semantics for the print statement in
earlier versions of Python, like the 2.x series. In simple terms, trailing
commas are used in surrounding parentheses in a technically redundant
manner. Consider the samples below:
The right method of use:
1. FILES= ("setup.cfg",)
An okay but confusing method of use:
1. FILES = "setup.cfg",
In their redundancy, trailing commas are often very useful when using a
version control system, and it is expected of a list of imported items,
arguments, or values to be overtime extended. The pattern follows a
sequence of putting out each value by itself in a line, closely followed by a
trailing comma, and adding a closing brace, parenthesis or bracket in the
following line. Albeit, it doesn't altogether appear sensible to add a trailing
comma on a similar line to the delimiter that closes the line, with exceptions
made in the case of singleton tuples as seen above. Consider the sample
below:
The correct method of use:
FILES = [
"setup.cfg",
"tox.ini",
]
initialize(FILES,
error=True,
)
Wrong method of use:
FILES = ["setup.cfg", "tox.ini",]
initialize(FILES, error=True,)
2. Encoding source files:
In the core Python distribution, codes should always adhere to UTF–8,
otherwise known as ASCII in the Python 2.x series. Files that make use of
ASCII in the 2.x series or UTF–8 in the 3.x series should not come with a
declaration of encoding. In the standard Python library, encodings that
aren't by default are used only in testing or when a docstring or comment
has to mention the name of an author which contains characters not found
in ASCII; otherwise \U, \u, \x, or \Nescapes is the much-liked method of
including data in string literals not found in ASCII.
For the Python 3.x series, the following are some policies prescribed in the
standard Python library:
Every identifier contained in the standard library of Python has to use
identifiers peculiar to ASCII alone and should involve the use of words in
English whenever applicable because lots of cases account for the use of
technical terms and abbreviations that isn't English. Furthermore, comments
and string literals have to be in ASCII as well, with the only exceptions
being those listed below:
Test cases experimenting on features not peculiar to ASCII.
Author names — Authors whose names have no basis in the
letters of the Latin alphabet (that is, the Latin-1, ISO/IEC 8859-
1-character set) have to transliterate their names based on this
character set.
Open-source projects targeted at a worldwide audience are
advised to adopt policies similar to this.
3. Comments:
Comments which contradict a code are deemed to be worse than adding no
comments are all. Ensure that you prioritize keeping your comments
updated as the code is modified over time. Comments should be made up of
complete sentences. The first word has to begin with a capital letter, with an
exception to identifiers that start with lower case letters, in which case it is
forbidden to alter the case of an identifier. Block comments are generally
made up of one or more paragraphs that are built out to make up sentences,
with every sentence ending with a period. It is advisable to use two spaces
when following up a sentence ending in a period in a comment made up of
multiple sentences. An exception is, however, made if it is the final
statement.
When writing in English, go with Strunk and White. Coders writing in
Python from countries speaking languages other than English would have to
write their comments in English. The only exception here, being that the
code would only be used by people who understand the language and no
more.
4. Block Comments:
Block comments are usually applied to some, if not all, the codes which
follow them and have an indent of a similar level to the said code. Every
line making up a block comment begins with a (#) followed by a single
space, except when the text in the comment is indented. Paragraphs making
up the block comment are usually separated using a line that has a single
(#).
5. Inline Comments:
An inline comment is one that is on a similar line as a statement. It should
be used sparingly. Inline comments have to be separated by up to two
spaces away from the statement. They should be started with a (#) followed
by a single space. The majority of the time, inline statements are
unnecessary elements and can be somewhat distracting should the obvious
be stated.
Refrain from coding this way:
x = x + 1 #Increment x
However, this is sometimes useful:
x = x + 1 #Compensate for border
6. Docstrings:
The code style for writing a good docstring, otherwise known as document
strings, can be found in PEP 257. Some of them include the following:
Write document strings for every method, function, module, and class. For
non-public methods, docstrings are somewhat unnecessary, but there should
be a comment to describe what purpose of the method. This comment
should come immediately after the line bearing def.
According to the description of a good docstring in PEP 257: it is worthy to
note that the (''') which terminate a multiline docstring should occupy a line
of its own. See the sample below:
‘''Return a foobang
Alternative plotz says to first frobnicate the bizbaz.
""
In docstrings that only occupy one line, ensure to keep the Terminal (''') on a
similar line.
Recommendations for coding styles
A code should always be written in a manner that isn't disadvantageous to
any other implementation of Python like Psyco, IronPython, Jython,
CPython, PyPy, among others. Take, for instance, the effective
implementation of the in-place string by CPython cannot be relied upon for
the concatenation of statements that appear in formats like x = x + y and x
+= y. The optimization process is frail in CPython and only seldom ever
works for specific types but is otherwise absent in all other implementations
which do not utilize ref counting. The “. join() form has to be employed
instead in the parts of the Python library that is sensitive to performance.
This would make sure that concatenation happens in a linear time over
several implementations.
In comparing singletons such as None, "is" or "is n should always be used,
not with equality operators. Also, be wary of writing "if x" when what you
actually mean is "if x is not None," for example, when experimenting to
find out whether or not an argument or variable which defaults to None was
assigned to any other value. The other value may come in types, like a
container, and might be false in the context of Booleans.
Utilize the "is not" operator instead of the "not…is." Although these
expressions share some measure of functional similarities, the former is the
most preferred and more readable form.
The correct way to write:
if foo is not None:
The wrong way to write:
if not foo is None:
When performing implementations on ordering operations using rich
comparisons, it is a good idea to implement all of the six operators, namely:
__ne__, __ge__, __eq__, __le__, __lt__, and __gt__; instead of depending
on other codes to exercise one specific comparison alone. To reduce the
effort required, the decorator functions.total_ordering() offers a tool for
generating comparison methods deemed missing.
In PEP 207, it is indicated that Python generally assumes reflexivity rules.
As such, the Python interpreter might interchange b > a for a > b, b >= a or
a <= b, and may likely interchange the arguments of a != b and a == b, as
well. Operations such as min() and sort() are assured to utilize the less than
operator (<), while the max() function utilizes the greater than operator (>).
Albeit, it is advisable to implement every one of the six operations to avoid
the risk of confusion in any other context. Ensure always to make use of a
def statement rather than an assignment statement that joins an identifier
and a lambda expression together directly.
The right method of use:
def f(x): return 2*x
Wrong method of use:
f = lambda x: 2*x
Chapter 4: Functional Programming
Functional programming languages are crafted, especially for handling list
processing applications as well as symbolic computation. As such,
functional programming typically revolves around the use of mathematical
functions. Widely used functional programming languages are inclusive but
not limited to, the following, Erlang, Clojure, Haskell, Python, Lisp, among
others.
Types of functional programming languages
There are two major categories of functional programming languages. They
are:
Pure functional language: This type of functional language only
supports the functional paradigms, as well and, an example of
such functional programming language is Haskell.
Impure functional language: This type of functional language
only supports functional paradigms. An example of such a
functional programming language is Lisp.
Characteristics of functional programming languages
Discussed below are some of the characteristics of functional programming
languages:
1. How functional programming languages are designed is based
on the concept of mathematical functions that utilize recursion,
as well as conditional expressions in performing computations.
2. Functional programming also supports lazy evaluation features
and higher-order functions.
3. However, flow control statements such as conditional
statements and loop statements like Switch statements and if-
Else are not supported by functional programming, which
makes use of the use of functional calls and functions directly.
4. As it is in object-oriented programming languages, functional
programming languages utilize widely known concepts like
polymorphism, encapsulation, inheritance, and abstraction.
Advantages of Functional programming
Below are some of the benefits offered by functional programming:
Codes are free of bugs: Since functional programming doesn't
support state, its results are without any side effects whatsoever,
meaning it is possible to write error-free code.
Its parallel programming is effective: There are no mutable
states in functional programming languages, so there are no
issues related to any change of state. Thus, one can program a
function to work parallel to instructions. Codes of this sort
usually have high testability bad reusability.
1. High efficiency: Functional programs are made up of a series of
independent units that can be run in a sequence. As such,
programs of this sort have better efficiency than most others.
2. It provides support for nested functions: Nested functions are
supported in functional programming.
3. It performs lazy evaluations: Lazy functional constructs such as
Lazy Maps, Lazy Lists, among others, are supported by
Functional programming.
One major downside to functional programming is that it needs a large
amount of space in the memory. Also, owing to the absence of a state, new
objects would have to be created every other time an action is performed.
Functional programming comes in handy in cases where many different
operations have to be performed on one given data set.
Lisp is put to use the application of artificial intelligence, modeling of speed
and vision, language processing, machine learning, among others.
Interpreters embedded in Lisp extend an arm of programmability to other
systems such as Emacs.
The efficiency of a program code
Efficiency is a crucial part of coding because excellent efficiency results in
better performance of the overall code. The efficiency of any programming
code is directly proportional to its execution speed and algorithmic
efficiency. Some factors influence the efficiency of a given program. They
are:
1. The speed of the compiler
2. The manner the data of a program is arranged
3. The operating system of the machine
4. The selected algorithms for solving a problem
5. The machine's overall speed
6. Picking the appropriate programming language
By taking the following steps listed below, the efficiency of any
programming language can be significantly improved:
1. Ensure to take out every unnecessary code or any part of the
code which could be fed into redundant processing.
2. Ensure only to utilize optimal memory while using up
nonvolatile storage as well.
3. Keep in mind to make do with reusable components whenever
you can find them, and wherever they can be applied.
4. At all aspects of a program, ensure to utilize the error and
exception handling at all times.
5. Keep in mind to always create a program code which complies
to the flow and logic design of a system.
6. Ensure that the codes you create allow for consistency and the
integrity of data.
Comparison of functional programming and object-oriented
programming
In this section, we would be comparing functional programming to object-
oriented programming and highlighting the primary differences that exist
between both of them.
1. While functional programming requires immutable data to
execute, object-oriented programming cannot run on such; thus,
it makes use of mutable data.
2. Functional programming follows a declarative programming
model, while object-oriented programming takes after an
imperative programming model.
3. In functional programming, the focus is placed on what is being
done at the moment in time, while object-oriented
programming tends to focus on how an activity is being carried
out.
4. While functional programming is suitable for and supports
parallel programming, object-oriented programming isn't, and
so does not support parallel programming.
5. The methods in object-oriented programming are capable of
producing significant side effects, while the functions in
functional programming have no side effects whatsoever.
6. In functional programming, flow control is carried out
employing function calls and recursion as well as function calls
alone. In object-oriented programming, flow control is
performed with the use of conditional statements and loops.
7. Functional programming makes use of the concept of
"Recursion" when iterating collection data. In object-oriented
programming, the concept of "Loop" is used in the iteration of
collection data. For instance, the For-each loop used in Java
programming.
8. In functional programming, the execution order that guides the
running of statements is altogether a critical concept. In object-
oriented programming, however, the case is different because
execution orders are considered to be critical.
Functional programming supports "Abstraction over Behavior," as well as
"Abstraction over Data," while object-oriented programming supports
"Abstraction over Data" alone.
Chapter 5: Generators and Coroutines
Generators:
Generators refer to functions which generate values. A function would
typically return a value before an underlying scope is terminate. When it is
called again, the function begins all over again from the beginning. The
execution is usually a one-time thing. However, a generator function is
capable of yielding value and stopping the execution of a function. When
control is then returned to the calling scope, the execution can then be
resumed at would be another value, if any, can be obtained. Consider the
sample codes below:
def simple_gen():
yield "Hello"
yield "World"
gen = simple_gen()
print (next(gen))
print(next(gen))
Take note that the generator function does not precisely return any set of
values directly, but when called, it produces a generator object which is
similar to an iterable. Thus, the function next() on a generator object can be
called to iterate over a set of values, or a for loop can efficiently be
executed. In summary, a generator function is one that is capable of pausing
execution and generating several values rather than returning a single value
alone. When a generator is called, it produces a generator object which
bears a measure of synonymity to an iterable, which can be used to obtain
the values needed one at a time.
Coroutines
As can be seen in the discussion above, generators can be used to extract
certain data from the context of a function and pause the execution of the
function even. However, what if we wanted to do more than pausing or
extracting? What if it is required of us to push some data, what do we use
then? At this point, coroutines enter into the fray. The keyword, yield, used
in pulling values, can also be utilized as an expression within a function,
only this time, it appears to the right of the sign of equality (=). Afterward,
the send(k method found in a generator object can get used to return values
to the function. This ideology is known as "generator-based coroutines."
Consider the sample below:
def coro():
hello = "Hello"
yield hello
c = coro()
print(next(c))
print(c.send("World"))
So, in this code, the first thing is that the value is taken as usual through the
function next (). In turn, we move in to "yield"Hello" where "Hello" is
outputted. Next, a value is sent in through the method send (). From here,
the function is then resumed, and the value sent into "hello" is assigned,
moving on to the following line where the statement is finally executed. As
such, "World" is obtained as a return value from the execution of the
method, send (). When we use generator-based coroutines, keep in mind
that the terms "coroutine" and "generator" are quite synonymous in
themselves, so the same thing is implied. However, they are not necessarily
the same thing, only used interchangeably in several such cases. Albeit, in
the Python 3.x series, the presence of native coroutines is joined by
keywords like the async/await, which would be discussed below.
Async I/O and the asyncio module:
In the Python 3.x series beginning from version 3.4, the asyncio module is a
new addition that offers useful APIs for standard async programming.
Coroutines can be used alongside the asyncio module in efficiently carrying
out the async io. Below is a code sample excerpted from the official
documentation:
import datetime
import random
@asyncio.coroutine
def display_date(num, loop):
end_time = loop.time() + 50.0
while True:
print("Loop: {} Time: {}".format(num, datetime.datetime.now()))
if (loop.time() + 1.0) >= end_time:
break
yield from asyncio.sleep(random.randint(0, 5))
loop = asyncio.get_event_loop()
asyncio.ensure_future(display_date(1, loop))
asyncio.ensure_future(display_date(2, loop))
loop.run_forever()
Not much can be explained about the code above because it is quite
complete and well-documented in itself. The coroutine, display_date(num,
loop) is created, and it assumes a number or identifier, as well as an event
loop, and goes on to print the correct time. From there on, the keyword,
yield from, is used to await the results of the call from the asyncio.sleep()
function. The function serves as a coroutine that is completed after a given
amount of time. Random seconds are passed to it; then, the
asyncio.ensure_future is used in scheduling the running of the coroutine in
the standard event loop. The loop is then programmed to continue
execution. The output returned would be proof of the concurrence with
which the coroutines are run.
Whenever the yield from the function is used, the event loop is aware that it
would be busy for a given timeframe. Thus it briefly stops the running of
the coroutine while it executes another. As such, two coroutines are
executed one after the other, however, not in a parallel manner seeing as the
event loop assumes a single thread. Keep in mind that yield from acts as a
pretty cool syntactic sugar for the line: for x in
asyncio.sleep(random.randint(0, 5)): in making the async codes cleaner.
Native Coroutines and the async/await:
Keep in mind that the generator based coroutines are what is being used
here. In the Python 3.x series, the new native coroutines that make use of
the async/wait syntax was introduced in version 3.5. Consider the code
below:
import random
async def display_date(num, loop, ):
end_time = loop.time() + 50.0
while True:
print("Loop: {} Time: {}".format(num, datetime.datetime.now()))
if (loop.time() + 1.0) >= end_time:
break
await asyncio.sleep(random.randint(0, 5))
loop = asyncio.get_event_loop()
asyncio.ensure_future(display_date(1, loop))
asyncio.ensure_future(display_date(2, loop))
loop.run_forever()
If you consider the code above, you will find that the native coroutine is
defined using the keyword — async, before the other keyword — def is
applied. Within a native coroutine, rather than the keyword— yield from,
the await keyword is used instead.
The interoperability of the Native and Generator Based
Coroutines:
Both the native and generator-based coroutine have no primary difference
in terms of functions, with the only exception appearing as the distinction in
syntax, as it is impermissible to mix both syntaxes during use. Thus, the
keyword await cannot now be used within a generator-based coroutine, the
same way a yield/yield from keyword cannot be used within a native
coroutine. However, regardless of these differences, it is possible to
interoperate between the native and generator-based coroutines. All we need
do is introduce the decorator — @types.coroutine into aged generator-
based coroutines. From there, we can easily use any of both coroutines
within the other type. Put simply, when the decorator is introduced, we can
now add an await keyword from a generator-based coroutine within a native
coroutine. Conversely, we can also introduce the yield from a keyword from
a native coroutine within a generator-based coroutine. Consider the codes
below for more insight:
import datetime
import random
import types
@types.coroutine
def my_sleep_func():
yield from asyncio.sleep(random.randint(0, 5))
async def display_date(num, loop, ):
end_time = loop.time() + 50.0
while True:
print("Loop: {} Time: {}".format(num, datetime.datetime.now()))
if (loop.time() + 1.0) >= end_time:
break
await my_sleep_func()
loop = asyncio.get_event_loop()
asyncio.ensure_future(display_date(1, loop))
asyncio.ensure_future(display_date(2, loop))
loop.run_forever()
Chapter 6: The asyncio library
The asyncio library was introduced into Python in the 3.x series,
particularly version 3.4. It is used in the running of single-threaded
concurrent programs. The asyncio library is famous among other
frameworks and libraries in Python with its exceptional speed and variety of
applications. In Python, it is used during the creation, running, and
structuring of coroutines, and is capable of handling several tasks one after
the other without executing all of them in parallel. The asyncio library is
divided into a number of parts which are discussed below:
1. Coroutine: A coroutine refers to an aspect of a code which is
capable of being resumed and paused in a multi-threaded script.
Coroutines can work together in a multi-threaded program, and
one has to be paused for another to be executed.
2. Event loop: The event loop is used to initialize the running of
coroutines as well as managing the input/output operations.
Event loops take control of several tasks and see to their
completion.
3. Task: Tasks define the running and output of coroutines in a
script. It is possible to assign several numbers of tasks with the
asyncio library and asynchronously execute all tasks.
4. Future: The future serves as potential storage in which the
results of coroutines are stored once they are completed. The
future comes in handy, especially when a coroutine is required
to tarry until the result of another coroutine is received.
The concept of the asyncio library can be implemented in several ways,
some of which are discussed below:
i. Creating a single coroutine with a single task: To begin, start by
creating a file to be saved under the name async.py. Next, enter
the code shown below:
import asyncio
async def add(start,end,wait):
#Initialize sum variable
sum = 0
#Find the sum of all numbers
for n in range(start,end):
sum += n
#Wait for given seconds
await asyncio.sleep(wait)
#Print a result
print(f'Sum from {start} to {finish} is {sum}')
async def main():
#Assign a sole task
task=loop.create_task(add(x,y,z))
#Execute the task asynchronously
await asyncio.wait([task])
if __name__ == '__main__':
#Declare event loop
loop = asyncio.get_event_loop()
#Run the code until all tasks are completed
loop.run_until_complete(main())
#Close the loop
loop.close()
First, the asyncio library is imported for its functions to be available within
the script. The add function is then declared for use in calculating the sum
belonging in a specific array of numbers. The task sees to the assigning of
the number array, which begins fro. 1 and stretches to 101 after a delay
interval of one second. The declaration of the event loop comes next, and it
is executed until ever task belonging in the primary method arrives at
completion. Once the value has been calculated, the function would delay
for about a second before the result is printed.
The output would typically take after this fashion:
$ python3 async1.py
ubuntu@ubuntu — VirtualBox: ~ / python $ python 3 async 1.py
Sum from 1 to 101 is 5050
ubuntu@ubuntu — VirtualBox: ~ / python $
The output is an indication that the sum of 1 to 101 is 5050.
ii. Creating more than one coroutine:
The function of the asyncio library becomes more evident when it is used in
the concurrent execution of multiple coroutines. We start by creating a new
file under the name of async2.py. Next, the code shown below is entered:
import asyncio
async def add(start,finish,delay):
#Initialize sum variable
sum = 0
#Find the sum of all numbers
for n in range(start,finish):
sum += n
#Wait for given seconds
await asyncio.sleep(delay)
#Print the result
print(f'Sum from {start} to {finish} is {sum}')
async def main():
#Assign first task
task1=loop.create_task(add(5,500000,3))
#Assign second task
task2=loop.create_task(add(2,300000,2))
#Assign third task
task3=loop.create_task(add(10,1000,1))
#Run the tasks asynchronously
await asyncio.wait([task1,task2,task3])
if __name__ == '__main__':
#Declare event loop
loop = asyncio.get_event_loop()
#Run the code until completing all task
loop.run_until_complete(main())
#Close the loop
loop.close()
Next, three ranges are used to generate three tasks and waiting values in the
method — main(). The first task would find the sum between 5 to 500000
with a delay time of three seconds, the second task would find the sum
between 2 and 300000 with a delay time of two seconds, and the third and
final task would find the sum between 10 and 1000 with a delay time of one
second. The task having the least delay time would be the first to reach
completion, while the task with the highest delay time would reach
completion as the last.
The output would typically take this fashion:
$ python3 async1.py.
ubuntu@ubuntu — VirtualBox: ~ / python $ python 3 async 1.py
sum from 10 to 1000 is 499455
sum from 2 to 300000 is 44999849999
sum from 5 to 500000 to 44999749990
ubuntu@ubuntu — VirtualBox: ~ / python $
In the output, it can be seen that task 3 attains completion first because the
delay time set to the task lasted only one second; and task 1 is the last to
attain completion because it has a delay time, which lasts about 3 seconds.
Coroutines and future
In this sample, we would be considering how the future objects are used in
the asyncio library. To begin, we create a file under the name of async3.py
and enter the following lines of code:
import asyncio
async def show_message(number, wait):
#Print a message
print(f'Task {number} is running')
#Wait for given seconds
await asyncio.sleep(wait)
print(f'Task {number} is finished')
async def stop_after(time):
await asyncio.sleep(time)
loop.stop()
async def main():
#Assign first task
task1=asyncio.ensure_future(show_message(a,b))
print('Schedule X')
#Assign second task
task2=asyncio.ensure_future(show_message(b,a))
print('Schedule Y')
#Run the tasks asynchronously
await asyncio.wait([taskX,taskY])
if __name__ == '__main__':
#Declare the event loop
loop = asyncio.get_event_loop()
#Run the code of the main method until all tasks are completed
loop.run_until_complete(main())
In the code above, it can be seen that two tasks are attached to the future in
use. The declaration of the function, show_message, is to print the message
before it runs the coroutine and after execution has been concluded. The
first task would have a delay time of about 2 seconds, and would, thus,
attain completion the last. The second task with a delay time of one second
would attain completion first.
The output would typically follow after this fashion:
$ python3 async3.py
ubuntu@ubuntu — VirtualBox: ~ / python $ python 3 async 3.py
Schedule 1
Schedule 2
Task 1 is running
Task 2 is running
Task 1 is completed
Task 2 is completed
ubuntu@ubuntu — VirtualBox: ~ / python $
From the output produced, it can be seen that task 1, although it started off
first and attained completion behind task two. Conversely, task two is
started much later but finishes first as a result of its shorter delay period.
Chapter 7: Dynamically creating classes
The flexibility of dynamically typed coding languages given them a
significant edge over statically typed languages. Codes are capable of being
dynamically imported, types of variables are subject to changes, and classes
can be dynamically created during run-time.
How classes are dynamically created
The type(name, attributes, based) type method has to be used for a class to
be dynamically created. Consider the codes below:
def constructor(self, arg1):
self.constructor_arg = arg1
def some_func(self, arg1):
print(arg1)
@classmethod
def some_class_method(cls, arg1):
print("{} - arg1: {}".format(cls.__name__, arg1))
NewClass = type("NewClass", (object,), {
"string_val": "this is val1",
"int_val": 10,
"__init__": constructor,
"func_val": some_func,
"class_func": some_class_method
})
instance = NewClass("constructor arg")
print(instance.constructor_arg)
print(instance.string_val)
print(instance.int_val)
instance.func_val("test")
NewClass.class_func("test")
Running the code above yields the output:
constructor arg
this is val1
10
test
NewClass - arg1: test
i. The type function:
Another method of dynamically creating classes is by making use of the
type function. The type function is normally put to use when returning the
type of a supplied object. Consider the sample below:
print(type(5)) # <type 'int'>
print(type("hello")) # <type 'str'>
It is possible to also use the type function in dynamically creating classes.
Following the indications of the official type documentation, a type
function comes with two signatures, namely:
Type(object): It returns the type of an object.
Type (name, dict, bases): It returns a new type object.
It is usually the second signature of the type function that is used in the
dynamic creation of classes.
Let's consider some examples involving the creation of classes.
As a basic example, we would highlight some codes which dynamically
create classes that simulate the gene and attribute inheritance between
individuals that are related. Classmates would come in as the last names,
and every instance of a person would be assigned a member variable, which
indicates the first name of the individual. Consider the code below:
class Person(object):
def __init__(self, firstname):
self.firstname = firstname
Given the chance that a person is capable of spawning another new
individual who shares exact replicas of every function and attributes of its
inheritance. When spawning is done from a new individual (marking the
dynamic creation and instantiation of a new class), it is possible to also add
certain new attributes to the person who was spawned. Consider the code
below:
class Person(object):
# ...
@classmethod
def spawn(cls, firstname, lastname, **attributes):
new_class = type(lastname, (cls,), attributes)
globals()[lastname] = new_class
return new_class(firstname)
Keep in mind that the classes created newly are being added to the globals()
dict promptly. Doing this would provide us global access to the class, which
was named after the last name of the person who was spawned. Consider
the codes below:
susy = person.spawn("Susy", "Awesome")
susys_sister = Awesome("Sister")
Outlined below is the complete code of this example:
class Person(object):
@classmethod
def spawn(cls, firstname, lastname, **attributes):
new_class = type(lastname, (cls,), attributes)
globals()[lastname] = new_class
return new_class(firstname)
def __init__(self, firstname):
self.firstname = firstname
def say_hi(self, to_person):
print("Hi {}".format(to_person.wholename()))
def wholename(self):
return "{} {}".format(
self.firstname.capitalize(),
self.__class__.__name__
)
def punch(self, person):
print("{} punched {}! ({} damage)".format(
self.wholename(),
person.wholename(),
self.punch_damage
))
frank = Person.spawn("Frank", "Puncherson",
punch_damage=10,
punch=punch
)
# a normal person
harold = Person.spawn("Harold", "Hill")
frank.punch(harold)
franks_bro = Puncherson("Ralph")
franks_bro.punch(harold)
def kick(self, person):
print("{} kicked {}! ({} damage)".format(
self.wholename(),
person.wholename(),
self.kick_damage
))
susy = Puncherson.spawn("Susy", "KickPuncherson",
kick_damage=20,
kick=kick
)
susy.punch(frank)
susy.kick(frank)
susy.punch(franks_bro)
susy.kick(franks_bro)
The output of the example above would typically follow this fashion:
Frank Puncherson punched Harold Hill! (10 damage)
Ralph Puncherson punched Harold Hill! (10 damage)
Susy KickPuncherson punched Frank Puncherson! (10 damage)
Susy KickPuncherson kicked Frank Puncherson! (20 damage)
Susy KickPuncherson punched Ralph Puncherson! (10 damage)
Susy KickPuncherson kicked Ralph Puncherson! (20 damage)
Chapter 8: Documentation
In Python programming, there is a substantial amount of documentation,
much of which is credited to several authors within and outside the Python
franchise. In the Python documentation, the markup used, reStructuredText,
was developed by the docutils project and was amended based on custom
directives with the aid of a toolset called Sphinx, which saw to the post-
processing of the HTML output. In this section, we would be considering
the style guide applicable to general documentation during programming,
and the custom reStructuredText markup which the Sphinx introduced as a
way of supporting documentation in Python, as well as how it should be out
to use. The documentation exists in EPUB, PDF and HTML formats, and is
generated from the text files that were written in the format of
reStructuredText. It can be found in the CPython Git repository.
Keep in mind that if you want to become a contributor to documentation in
Python, you need not write in reStructuredText if you aren't inclined to it
because contributions in plain text are just as welcomed.
Documentation in Python has been considered over time to be a good
practice, especially as the programming language is free to all. Many
different reasons herald this belief, but the most vital of them are credited to
the commitment of the Guido van Rossum in the early periods of the
program's development. Van Rossum provided extensive documentation on
the language, as well as its libraries, and engaged the user base in
concurrent involvement in offering assistance for the creation and
maintenance of documentation.
The community's involvement in the documentation process takes on many
forms ranging from mere bug reports to authoring documentation to making
complaints about documentation that could be easier and concise to use.
This section is aimed at encouraging potential authors and already existing
authors of documentation in Python. It is primarily aimed at those who
contribute to the standard of documentation and the development of added
documents with similar tools to those of the standard documents. However,
this section would come in less handy for authors who look to use the tools
of Python documentation for other topics outside of Python. More so, it
would be more useless for authors with no inclination to the tools at all.
Style guide for Python documentation
I. The use of whitespace:
All the files in reST make use of an indentation of no more than 3 spaces,
and the use of tabs is forbidden. The maximum line length is capable of
containing 80 characters in a standard text; however, alterations might occur
in the case of long links, tables, and profoundly indented code samples,
which may require extensions. Bodies of code examples should make use of
the standard 4 space indentation in Python.
Where applicable, do not hold back on using blank lines as generously as
possible, as they aid in grouping things together. When introducing a period
to terminate a sentence, one or two spaces can be used after it. Although
reST would generally ignore the use of a second space, some users find it
customary to add the second space. Take, for instance, when aiding the
auto-fill mode in Emacs.
II. Use of footnotes:
The use of footnote is a practice that is generally frowned upon and
discouraged; however, they could be used in cases where no other option
can present a piece of given information better. In introducing a footnote
reference at the tail of a sentence, it is advisable that it succeeds in the
punctuation that terminates the sentence. In this case, the reST markup
should follow after this fashion:
This sentence contains a footnote reference. [#]_This sentence is next in
line.
At the end of a file is where footnotes ought to be gathered. However,
should the file be longer than usual, then the end of a section should suffice.
The docutils is responsible for creating backlinks automatically to a
footnote reference. When appropriate, footnotes can also appear in the
center of some sentences.
III. The use of capitalization:
In the documentation of Python, it is highly preferable to use sentence cases
in section titles; however, consistency within a unit is more prioritized than
adhering to the rule. Sentence case refers to a range of capitalization rules
applied to sentences in English. The first word always begins in the
uppercase, and any other word can be capitalized should any given rule
require it. In adding a section to a chapter in a case where most sections
appear in title case, it is possible to covert all the titles into sentence case, or
better still, utilize the style with higher dominance in a new section title. It
is best to avoid beginning sentences using words for which specific rules
require that they begin with a lower case.
Keep in mind that the sections which explain a library module would often
contain titles in this format: "modulename — Brief description about the
module." In such cases, you should capitalize the description to be an
independent sentence.
There are lots of unique names used in the process of documentation in
Python, inclusive but not limited to the names of standard bodies,
programming languages, operating systems, among others. The majority of
these entities have no specially assigned markup; however, some of the
generally preferred spellings are outlined below to help authors attain
consistency when presenting their documentation in Python. Various other
terms are also deserving of special mentions, as well. These conventions
outlined below ought to be used in ensuring consistency across the
documentation:
1. CPU: Acronym for the central processing unit. It is common
knowledge in many different style guides that during the first
use, it should be expressly spelled out. Albeit, it is best to avoid
this abbreviation altogether in Python documentation because
there exists no reasonable method of predicting the occurrence
that a reader would first experience. It is considered even better
to apply the term "processor" alone instead.
2. POSIX: This term is always capitalized. It is a name assigned to
a specific group of standards used in documentation.
3. Python: The name of the programming language always begins
with a letter in the uppercase.
4. reST: It is used in "reStructuredText," as a plaintext markup
syntax that can be read with ease. It is used in the production of
Python documentation. It is usually spelled out as one word,
and it both forms, it begins with its first letter "r" written in
lowercase.
5. Unicode: This refers to the name given to a character coding
system. It always begins with a capitalized first letter.
6. Unix: This is the given name of the operating system which
AT&T developed in the early periods of the '70s at Bell Labs
The use of affirmative tone:
Every documentation ought to focus on stating affirmatively the purpose of
the language and how it can be effectively put to use. Asides for some risks
of segfault or security, the docs ought to refrain from using warnings along
the lines of "experts only" or "feature x is dangerous." Value judgments of
this sort are typically contained in the external wikis and blogs, and not in
the primary documentation.
An unfortunate instance (stirring up worry in the minds of
readers):
Warning: Failure to close files explicitly would result in
excessive consumption of resources or loss of data. Avoid
depending on reference counting to close a file automatically.
A good instance (encouraging confident knowledge in the
efficient utilization of the programming language):
The best method of using files is to employ the try/finally pair in closing a
file explicitly once usage is completed. On the other hand, a with statement
can also be used to deliver a similar effect. It ensures that all files are
cleared, and the resources of the file descriptor are provided in a timely
fashion.
The concept of economy of expression:
That documentation is more doesn't necessarily imply that it is better.
Erring is on the side of being concise. It is rather unfortunate that increasing
the length of a documentation can prove inhibitive to understanding,
increasing the chances of the text being misinterpreted or misread. Lengthy
descriptions with lots of caveats and corner cases are capable of giving off
the impression that a function is more difficult or complicated to use than is
really true.
Security concerns and relative considerations:
Some of the modules which come contained in Python have and inherent
exposure to security risks, which can be traced to the function of the said
module. For example, shell injections are a probe to specific vulnerabilities.
Putting up warning boxes in the documentation of these models for every
problem that rears depending on the task used for rather than contact
Python directly for support on the task doesn't precisely code well for a
great reading experience. Instead, security concerns of this sort ought to be
compiled into a unique read tagged "Security Considerations," which would
be sectioned within the documentation of the module, and passed through a
cross-reference from the documentation of all interfaces that are affected
using a note like:
"Please, visit the ref:security considerations section for vital info on how
common mistakes can be avoided."
Similarly, if a popular error which affects several interfaces of a module, for
instance, the pipe buffers of an operating system getting filled and delaying
child processes, these should be documented and stored in a section named
"Common errors," and cross-referenced instead of repeating for the same
interface.
The use of code samples:
Examples of shortcodes can serve as essential adjuncts in the understanding
process. A reader would often learn a simple example in a much faster
manner than they would tend to understand a formal description in prose.
People tend to learn faster with substantial and motivating examples, which
is synonymous with the context of a commonly used text. Take, for
example, the method strl.rpartitition() can be demonstrated even better
using an instance of splitting a domain from its URL than it would be using
an instance of taking out the final word from a dialog by Monty Python.
The ellipsis used by the sys.ps2 secondary interpreter prompt ought to be
used as sparingly as possible: only when it is a priority to make a clear
distinction between the input and output lines. Asides adding to visual
clutter, it makes it almost impossible for readers to cut and paste samples
with which they can make variations of experiments.
The code equivalents:
An adjunct that can come in handy in a prose description is in giving code
equivalents or approximate equivalents of pure Python. A documenter
should be careful in weighing whether or not a code equivalent brings value
into a document.
An excellent instance to consider is the code equivalent for all(). The
shortcode of four lines can be quickly taken in. It places emphasis on early-
out behavior, and it makes clear how the corner case is handled when the
iterable is left empty. Furthermore, it would serve as a model for people
who want to implement a popularly requested alternative where "all()" can
return to being the specific object which evaluates to false anytime the
function is terminated early. An example which appears to be more
questionable is the code used in itertools.groupby(). Its code equivalent can
be said to be slightly too complicated for a quick and basic guide to
understanding. Asides the complexity, the code equivalent was saved
because it can serve as a model used in alternative implementations and
because the way the grouper is operated is shown way more quickly in the
code than it is in the English prose.
Consider an instance of when a code equivalent should not be used — the
function, oct(). The actual steps necessary for converting a number into
octal does not really improve the value of a user who is trying to know the
purpose of the function.
The audience:
The tone of any tutorial, the documentation inclusive, has to possess a
respectful tone in regard to the intelligence of the reader.
Chapter 9: How to use Sphinx and restructured
text
In this section, we would be making use of Sample Project — a simple
project in Python to show how Sphinx and reStructuredText can be used in
the generation of documents in the HTML format. The Sample Project
refers to any simple binary search tree, as well as similar binary tree
traversal implementation. A Sample Project is documented adequately
according to the NumPy style of document strings. The primary function of
the Sample Project in this section transcends merely being a sample code
alone to demonstrating how a NumPy style document string can be
translated into a proper document through Sphinx.
The Sample Project used here was obtained from GitHub via
the code shown below:
$ git clone https://github.com/shunsvineyard/python-sample-code.git
1. Requirements:
2. This demonstration is performed using the software outlined
below:
1. Python 3.7
2. Sphinx 2.2
Keep in mind that Sphinx is compatible with both the Windows and Linux
operating systems.
How Sphinx is used
The markup language Sphinx makes use of is reStructuredText. Sphinx
generated documents through a process similar to the following:
Project source code (Python or other supported languages ->
reStructuredText files -> documents (HTML or other supported format)
Sphinx offers two tools in its command-line, namely, the sphinx-apidoc and
the sphinx-quickstart.
The sphinx-apidoc is used in the generation of files in reStructuredText for
documentation from all modules found. On the other hand, the sphinx-
quickstart is used in setting up a source directory and forms a traditional
configuration — conf.py, as well as a master document — index.rst, which
serves as the welcome page for any document it is used in.
Put simply, both tools are used in the generation of Sphinx source codes,
that is, reStructuredText files and these files are modified before being used
by Sphinx to develop essential documents.
Workflow
Just as the software requires regular maintenance from its developer,
authoring the document of software isn't exactly an easy task. The
documentation of software changes with the software. When making use of
Sphinx, the workflow would typically assume the fashion shown below:
Project [docs] ← Sphinx-quickstart
↓
Project [docs] Sphinx-quickstart generates a standard conf.py and index.rst
↓
Conf.py
Index.rst
↓
Project [docs] ← Sphinx-apidoc
↓
Conf.py
Index.rst
(This step is repeated when a new class, module, or API is added.)
↓
Project [docs] Sphinx-apidoc generates rst files per module
↓
Conf.py
Index.rst
Module1.rst
Module2.rst
…
↓
Project [docs] ← Make HTML or other formats generate HTML-based
documents
↓
Conf.py
Index.rst
Module1.rst
Module2.rst
… → HTML [doc]
The outline above shows the general workflow that occurs when Sphinx is
used, as well as the details of what goes on at every phase, demonstrated in
each of the subsections.
Preparations
We have to set up an environment before we delve into how to use Sphinx
in the documentation. Below is an outline on how to proceed with setting
up an environment on both Windows and Linux.
Windows:
c:\Workspace>python -m venv sphinxenv
c:\Workspace>sphinxenv\Scripts\activate
(sphinxenv) c:\Workspace>git clone
https://github.com/shunsvineyard/python-sample-code.git
(sphinxenv) c:\Workspace>cd python-sample-code
(sphinxenv) c:\Workspace\python-sample-code>pip install -r
requirements.txt
Linux:
user@ubuntu:~$ python3 -m venv sphinxvenv
user@ubuntu:~$ source sphinxvenv/bin/activate
(sphinxvenv) user@ubuntu:~$ git clone
https://github.com/shunsvineyard/python-sample-code.git
(sphinxvenv) user@ubuntu:~$ cd python-sample-code/
(sphinxvenv) user@ubuntu:~/python- sample-code$ pip install -r
requirements.txt
Once this is completed, we would have successfully obtained a Sample
Project as well as an environment to carry out the Sphinx demonstration.
Since the Sample Project has a document folder already, we would have to
get rid of it. The Sample Project should have a similar layer to that shown
below once the document folder has been deleted:
python-sample-code
├ ── LICENSE
├ ── README.rst
├
├ ── binary_trees
│ ├ ── __init__.py
│ ├ ── binary_search_tree.py
│ ├ ── binary_tree.py
│ └── traversal.py
├ ── pytest.ini
├ ── requirements.txt
├ ── setup.py
└── tests
├ ── __init__.py
├ ── conftest.py
├ ── test_binary_search_tree.py
└── test_traversal.py
First step: Use the sphinx-quickstart to generate a source directory for
Sphinx having index.rst and conf.py.
Take that we want to store every file associated with the document in a
document directory. To begin, we start by creating a documentation
directory for Sphinx — docs. Next, we move to the document directory,
where we execute the sphinx-quickstart.
On Windows OS:
(sphinxenv) c:\Workspace\python-sample-code>mkdir docs
(sphinxenv) c:\Workspace\python-sample-code>cd docs
(sphinxenv) c:\Workspace\python- sample-code\docs>sphinx-quickstart
On Linux OS:
(sphinxvenv) user@ubuntu:~/python-sample-code$ mkdir docs
(sphinxvenv) user@ubuntu:~/python-sample-code$ cd docs/
(sphinxvenv) user@ubuntu:~/python- sample-code/docs$ sphinx-quickstart
When the sphinx-quickstart has been executed, it inquires about the project
through a series of questions. Below are some model answers to some of
the questions asked:
Welcome to the Sphinx 2.x.x quickstart utility
Please enter some values for the displayed settings. N.B: just click Enter to
register a default value, if one is provided in parenthesis
Chosen root path:
You have two selections for locating the build directory for the Sphinx
output. Either, you employ a directory "_build" in the root path, or you
separate both the "source" and "build" directories in the root path.n
> Separate the source and build directories (y/n) [n]: y
The project name would occur in several places in the built documentation.
> Project name: Sample Project
> Author name(s): Author
> Project release []: 0.1.0
If the documents are to be coded in a language that is not English, you can
pick a language here using its language code. Sphinx would then translate
text that it generates into that language.ahts
For a list of supported codes, see:
https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-
language.
> Project language [en]:
Creating file ./source/conf.py.
Creating file ./source/index.rst.
Creating file ./Makefile.
Creating file ./make.bat.
Finished is the initial directory structure which has been created.
You should now fill your master file ./source/index.rst and develop other
documentation source files. Use the Makefile to create the docs, like this:
make builder
where "builder" represents one of the supported builders, e.g., HTML, latex,
or linkcheck.
When we reach the terminal point of the sphinx-quickstart, we are shown
how documents can be built. You should now proceed to fill your master
file ./source/index.rst and build other documentation source files. Utilize the
Makefile in building the docs, like this:
make builder
where the "builder" denotes one of the supported builders, e.g., Html, latex,
or linkcheck.
If at this point, we make an HTML, the default documents would be
generated by Sphinx and would contain nothing related to the Sample
Project used. You can preview the output of the process using the links
provided below:
https://github.com/shunsvineyard/shunsvineyard/blob/master/use-sphinx-
for-python-documentation/step1_output/index.html
Keep in mind that Sphinx isn't a tool that provides you with totally
automated document generation such as Doxygen. The sphinx-quickstart is
capable of generating some standard files like the conf.py and index.rst
using general info entered by the user. Thus, work has to be done if the
documents are to appear real.
Once you have completed the execution of the sphinx-quickstart, the outline
of the project should look similar to this:
python-sample-code
├ ── LICENSE
├ ── README.rst
├ ── binary_trees
│ ├ ── __init__.py
│ ├ ── binary_search_tree.py
│ ├ ── binary_tree.py
│ └── traversal.py
├ ── docs
│ ├ ── Makefile
│ ├ ── build
│ ├ ── make.bat
│ └── source
│ ├ ── _static
│ ├ ── _templates
│ ├ ── conf.py
│ └── index.rst
├ ── pytest.ini
├ ── requirements.txt
├
├ ── setup.py
└── tests
├ ── __init__.py
├ ── conftest.py
├ ── test_binary_search_tree.py
└── test_traversal.py
Keep in mind that make.bat is peculiar to Windows, while that of Linux is
known as the Makefile.
Second step: The configuration of the conf.py
The sphinx-quickstart is capable of generating a few files, one of which is
quite essential, the conf.py. It represents the configuration of a document,
and although it acts as a configuration file, it is, in fact, a Python file in
itself. The confmpy contains the Python syntax.
Making use of Sphinx in the generation of documents is quite configurable.
In this section, we would consider a basic form of configuration: the theme
of the document, the path of the project's source code, and other added
extensions.
Define a path to the project:
For Sphinx to be able to locate the project, we have to remove three of these
lines as comments.
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
Try to update the path to the project after that.
import os
import sys
sys.path.insert(0, os.path.abspath('../..'))
Pick a Theme:
Sphinx comes embedded with a variety of built-in themes; however, the one
assumed by default is the alabaster.
html_theme = 'alabaster'
However, for this demonstration, we would use bizstyle instead.
html_theme = 'bizstyle'
You can find a variety of themes, as well as their configurations using the
link provided below:
https://www.sphinx-doc.org/en/master/usage/theming.html
Include an extension for the NumPy style:
As was earlier mentioned, the Sample Project makes use of NumPy style
for its document strings. Thus, we have to include an extension (in this case,
Napoleon) to parse the NumPy style document strings.
extensions = [
'sphinx.ext.napoleon'
]
The extension, name of napoleon, supports document strings in both
Google and NumPy styles, and offer many different features, all of which
are configurable. Since NumPy style document strings are what is used for
the Sample Project in this demo, we would have to disable the Google style.
napoleon_google_docstring = False
You can find other settings for napoleon through this link:
https://www.sphinx-
doc.org/en/master/usage/extensions/napoleon.html#module-
sphinx.ext.napoleon
The concluded example on conf.py can be viewed through the link below:
https://github.com/shunsvineyard/python-sample-
code/blob/master/docs/source/conf.py
Asides having many different built-in extensions, Sphinx also supports a
custom extension. You can learn more about this extension via the link
provided:
visit https://www.sphinx-doc.org/en/master/usage/extensions/index.html
Having now gathered the basic configurations needed for this
demonstration, we make use of the sphinx-apidoc in generating
reStructuredText files from the source code of the Sample Project.
Third Step: Generate reStructuredText files from source code
using the sphinx-apidoc:
The sphinx-apidoc is a tool that generates reStructuredText files
automatically from source code, for example, Python modules. To make use
of the sphinx-apidoc, execute the following lines of code:
sphinx-apidoc -f -o <path-to-output><path-to-module>
Where -f represents the forced overwriting of any generated files already
existing, and -o indicates the path in which the outputted files are placed.
On a Windows system:
(sphinxvenv) shunsvineyard@remote-ubuntu:~/python-sample-code/docs$
sphinx-apidoc -f -o source/ ../binary_trees/
On a Linux system:
(sphinxvenv) user@ubuntu:~/python-sample-code/docs$ sphinx-apidoc -f -
o source/ ../binary_trees/
The sphinx-apidoc generates two files, namely the modules.rst and
binary_trees.rst in this Sample Project. The outline of the project would
typically take after this fashion:
python-sample-code
├ ── LICENSE
├ ── README.rst
├ ── binary_trees
│ ├ ── __init__.py
│ ├ ── binary_search_tree.py
│ ├ ── binary_tree.py
│ └── traversal.py
├ ── docs
│ ├ ── Makefile
│ ├ ── build
│ ├ ── make.bat
│ └── source
│ ├ ── _static
│ ├ ── _templates
│ ├ ── binary_trees.rst
│ ├ ── conf.py
│ ├ ── index.rst
│└
│ └── modules.rst
├ ── pytest.ini
├ ── requirements.txt
├ ── setup.py
└── tests
├ ── __init__.py
├ ── conftest.py
├ ── test_binary_search_tree.py
└── test_traversal.py
iv. Fourth Step: Edit the index.rst, as well as the reStructuredText files that
were generated
Index.rst is a second important file generated by the sphinx-quickstart. It is
a master document that is used as a welcome page and comprises of all the
roots found in the toctree, otherwise known as the table of contents tree.
Initially, the toctree assumes an empty state once the sphinx-quickstart has
created the index.rst.
toctree::
maxdepth: 2
Include the modules in the index.rst:
All the modules can be found in the modules.rst generated in the process. In
this case, however, only the binary_trees exist. As such, we would have to
include the modules.rst into the index.rst.
To include the document that would be listed on the welcome page (that is,
the index.rst), we enter the following:
toctree::
maxdepth: 2
Modules
Take note that when another reStructuredText file has to be added, we
would have to make use of the filename, but omit the extension. In the case
that the file exists in a hierarchy, the forward-slash "/" is used to separate
the directory.
Include the README.rst into the index.rst:
Seeing as the Sample Project already contains a readme file, README.rst
placed atop the project; it can be included in the welcome page of the
documentation.
Under the docs/source path, proceed to develop a readme.rst file and
include the line
include:: ../../README.rst.
(Visit the link: https://github.com/shunsvineyard/python-sample-
code/blob/master/docs/source/readme.rst)
Include the readme in the index.rst; thus, it can be added into the welcome
page. Once both steps are completed, the index.rst would look something of
this sort:
Sample Project documentation master file, created by
sphinx-quickstart on Sun Sep 19 15:37:32 2019.
This file can be adapted to any variant of your liking, but it has to have at
least the root directive of the "toctree."
The output should be similar to this:
Welcome to Sample Project's documentation!
==========================================
toctree::
:maxdepth: 2
readme
modules
Indices and tables
==================
*: ref:`genindex`
*: ref:`modindex`
*: ref:`search`
Keep in mind that steps 3 and 4 have to be performed again should a new
class, API, module, or other changes in code me made to the document.
This would ensure the documents are updated.
Fifth step: Develop the documents.
The final step in generating the documents involves issuing the make
HTML, primarily if the document to be generated is needed in the HTML-
based format.
On Windows OS enter the following:
(sphinxvenv) shunsvineyard@remote-ubuntu:~/python-sample-code/docs$
make Html
On Linux OS enter the following:
(sphinxvenv) user@ubuntu:~/python-sample-code/docs$ make HTML
Once the make HTML command has been executed, it creates a build folder
in documents. The HTML-based documents are also stored in the
build/HTML.
Chapter 10: Styles of documenting class - NumPy
In this section, we would be considering the best syntax bad techniques for
docstrings, which the numpydoc extension in Sphinx is used with. Keep in
mind that a few features used in this section are only compatible with more
recent versions of numpydoc. For instance, a section on Yields got added in
version 0.6 of numpydoc.
To begin, we introduce a code checker:
1. Pylint: It is an analysis tool Python static code.
2. pyflakes: It is a tool for checking for errors in Python codes by
attempting to parse the source file rather than import it.
3. Pycodestyle: Priorly known as PEP-8, it is a tool used in
comparing Python codes to the style conventions contained in
the PEP-8 guide.
4. Flake7: It is a tool that blends mccabe, pyflakes, and
pycodestyle in checking the quality and style of a Python code.
5. Vim-flake8: A custom Vim plugin of the flake8.
Conventions for importing:
For the NumPy documentation and source, the following import
conventions are put to use throughout.
np to import numpy
mpl to import matplotlib
plt to import matplotlib.pyplot
Refrain from abbreviating "scipy," as there is no meaningful use instance
for which it should be abbreviated in real-time. Thus, it is altogether
avoided when creating documentation to prevent confusion.
Documentation String Standard:
A docstring or documentation string refers to any string which describes the
definition of a method, class, module, or function. Docstrings appear as
unique attributes of the object (object__doc__) and, for the sake of being
consistent, is encircled with triple double-quotes. That is:
"""This is the form of a docstring.
It can be spread over several lines.
"""
SciPy, scikits, and NumPy are expressed according to a similar docstring
convention which allows for consistency, while also letting the toolchain
output reference guides that have been well-formatted. The document
describes the present community consensus for a standard like that. The
docstring standard used here makes use of the reST (re-structured text)
syntax and can be rendered with the use of Sphinx — a form of pre-
processor which comprehends the exact style of documentation we intend
to use. Although we are exposed to a wide variety of available markups, we
tend to limit ourselves to a somewhat necessary subset to produce
docstrings, which can be read relatively easily on terminals that support text
alone.
As a guiding principle, it is believed that human readers of a text should be
given precedence over how docstrings are contorted, so tools tend to
produce significant output. Instead of sacrificing the readability level of the
docstrings, preprocessors have been written to help out Sphinx in
performing its tasks. Docstrings lines should be kept to a maximum length
of 75 characters as it bolsters its readability in text terminals.
Sections:
Docstrings are comprised of a range of sections that are divided by
headings, except in the case of deprecated warnings. Each heading has to be
underlined with hyphens, and there should be a level of consistency with
the section order as seen in the description below:
Summary:
A summary of one line which makes use of neither the function name nor a
variable name. For example:
def add(a, b):
"""The sum of two numbers.
"""
Signature:
A function signature is typically located by introspection and displayed
using the help function. For a few functions, particularly those coded in C,
the signature is unavailable, as such, it has to be specified from the first line
in the docstring:
"""
add(a, b)
The sum of two numbers.
"""
Deprecation warning:
This section is used when applicable in warning users of a deprecated
object in the content. Contents in the section should have the following:
The particular NumPy version in which the object became deprecated, and
when the object was or would be taken out.
The reason the deprecation occurred (should this information be at all
useful). For example, perhaps the object has been superseded or duplicates
a function found somewhere else, among other reasons.
If there are newly recommended methods of acquiring a similar
functionality:
In this section, the deprecated Sphinx directive is used rather than an
underlined section header.
deprecated:: 1.6.0
`ndobj_old` would be removed in NumPy 2.0.0, it is replaced by
`ndobj_new` because the latter also works with array subclasses.
Extended summary:
A couple of sentences which provide an extensive description. In this
section, clarity should be placed on functional, and background theory or
implementation details should not be discussed as they would be explored
in the following section for Notes.
Parameters:
The description of the function keywords, arguments, as well as their own
types.
Parameters
----------
x : type
Description of parameter `x`.
y
Description of parameter `y` (with type not specified)
Surround the variables with single backticks. Space ought to precede the
colon, and the colon can be omitted should the type be unavailable.
Ensure to be as precise as ever with the parameter types. A few examples of
parameters and their respective types are shown below:
Parameters
----------
filename: str
copy: bool
type : data-type
iterable: iterable object
shape: int or tuple of int
files: list of str
If it is relatively unnecessary to specify a given keyword argument, make
use of the optional keyword:
x: int, optional
Optional keyword parameters come with default values that are displayed
as a given part of the signature of a function. They are also capable of being
detailed as in the description below;
Description of parameter `x` (the default is -1, which implies summation
over all axes).
When a parameter is capable of assuming only one of a specific range of
values, values of that sort could be listed using braces, and the defaults
should come first.
order : {'C', 'F', 'A'}
Description of `order`.
In the case that the input of two or more parameters contains a similar
shape, description, and type, they can be used together:
x1, x2 : array_like
Input arrays, description of `x1`, `x2`.
vi. Returns:
Explaining the returned values, as well as their own types. The Returns
section is quite synonymous to the Parameters section, the only difference
being that each return contains an optional return value name. A common
requirement is the type of every return value:
Returns
-------
int
Description of anonymous integer return value.
Should the type and name be both specified, the Returns section assumes a
similar form as the Parameters section:
Returns
-------
err_code : int
Non-zero value indicates error code or zero on success.
err_msg : str or None
Human readable error message, or None on success.
vii. Yields:
Explaining the values yielded, as well as their own types. Yields are
keywords relevant to only generators. They are synonymous to the Returns
section in that the type of every value is a requirement, but each value name
applies as optional:
Yields
------
int
Description of the anonymous integer return value.
Should the type and name both be defined, the Yields section would assume
a similar form to the Returns section:
Yields
------
err_code : int
Non-zero value indicates error code or zero on success.
err_msg : str or None
Human readable error message, or None on success.
A support system for the Yields section was included in version 0.6 of
numpydoc.
Receives:
Explaining the parameters fed into the send() method of a generator,
formatted into Parameters. Since, as it is with Returns and Yields, a single
object can always be sent to the method, this might define either the
positional arguments which pass for a Tuple or the single parameter. Should
a docstring contain Receives, it had also to contain Yields.
Other Parameters:
This is an optional section used in describing the parameters used
infrequently. Only when a function contains a broad array of keyword
parameters should it be used, to avoid cluttering up the Parameters section.
Raises:
This is an optional section which details the errors that are raised, and the
conditions for which they were raised:
Raises
------
LinAlgException
If the matrix is not numerically invertible.
This section ought to be put to judicious use. That is, only when errors
which aren't visible or have a high tendency of being raised.
Warns:
This is an optional section containing details of warnings that are raised,
and the conditions prompting them, formatted similarly to that's of Raises.
Warnings:
This is an optional section containing cautions directed at a user in free
text/reST.
See Also:
This is a selective section that is used to show users to other related code(s).
This section can come in quite handy but should be out to good use. The
aim is to refer users to other functions that they may yet be unaware of, or
have any easy method of finding out, for example, by checking out the
module docstring). Routines with docstrings that explain parameters which
this function uses are great candidates.
For instance, in numpy.mean, the result would be:
See Also
--------
average : Weighted average
In referring to functions in a similar sub-module, there is no need for
prefixes, and searching is done upwards on the tree for a match.
Prefix functions correctly form other sub-modules. For example, in
documenting the random module, the function can be referred to in fft
using:
fft.fft2: 2-D fast discrete Fourier transform
In referring to a different module:
scipy.random.norm : Random variates, PDFs, etcetera.
Functions tend to be listed with no descriptions, and it is preferable should
the functionality be clear from the part of the function name:
See Also
--------
func_a : Function a with its description.
func_b, func_c_, func_d
func_e
Notes:
This is an optional section that offers added information regarding the code
and could include a discussion based on the algorithm. In this section, there
could be mathematical equations written in the format — LaTeX:
FFT refers to a fast implementation of the discrete Fourier transform:
.. math:: X(e^{j\omega } ) = x(n)e^{ - j\omega n}
It is also possible to typeset equations beneath a math directive:
The discrete-time Fourier time-convolution property states that
.. math::
x(n) * y(n) \Leftrightarrow X(e^{j\omega } )Y(e^{j\omega } )\\
another equation here
In addition, it is possible to use math inline, that is:
The value of :math:`\omega` is more significant than 5.
Variable names can also be displayed using typewriter font acquired
through \mathtt{var}:
We square the input parameter `alpha` to obtain
:math:`\mathtt{alpha}^2`.
Keep in mind that LaTeX isn't especially easy to read; hence, make use of
the equations sparingly. Images are allowed, but shouldn't be the center of
explanation; users who view the docstring as a text should be capable of
understanding its meaning without needing to use an image viewer. The
additional illustrations included are done with:
.. image:: filename
Where the filename is a path that is associated with the source directory of
the reference guide.
References:
It is possible to list the references contained in the notes section here. For
example, if the article shown below is cited using text [1]_, endeavor to add
it like in the list shown below:
.. [1] O. McNoleg, "The integration of GIS, remote sensing,
expert systems, and adaptive co-kriging for environmental habitat modeling
of the Highland Haggis using object-oriented, fuzzy-logic and neural-
network techniques," Computers & Geosciences, vol. 22,
pp. 585-588, 1996.
The output renders as:
O. McNoleg, "The integration of remote sensing, GIS, adaptive co-kriging
used in environmental habitat modeling, expert systems, neural-network
techniques, and fuzzy logic,"
Computers & Geosciences, vol 22, pp. 585 to 577, 1996.
Referencing specific sources, especially ones with a temporary nature such
as web pages, is an utterly discouraged act. References aim to augment
docstrings, but it should not be a necessity to comprehend it. References
tend to be numbered, beginning from one, in an order in which they were
called.
Keep in mind that references tend to break tables.
When references such as [1] are found in tables located in a NumPy
document string, the table markup would be broken up by numpydoc
processing.
Examples:
There is an optional section meant for examples that uses the doctest
format. This section is aimed at illustrating usage instead of providing a
framework for testing. To perform tests, users should use the tests/
directory. Although optional, it is a very encouraged section.
In the case that several examples are used, separation should be done with
the use of blank lines. Comments which explain the examples should
contain black lines below and above them. See the sample below:
>>> np.add(1, 2)
3
Comment explaining the second example
>>> np.add([1, 2], [3, 4])
array([4, 6])
It is possible to split the example code across several lines provided each
succeeding line after the first beginning with an ellipsis (…).
>>> np.add([[1, 2], [3, 4]],
... [[5, 6], [7, 8]])
array([[ 6, 8],
[10, 12]])
In the case of tests having random results or are dependent on a specific
platform, the output should be marked as follows:
>>> import numpy.random
>>> np.random.rand(2)
array([ 0.35773152, 0.38568979]) #random
Examples can be executed as doctests through:
>>> np.test(doctests=True)
>>> np.linalg.test(doctests=True) # for a single module
It is also possible to execute single examples in IPython by the simple
action or copying and pasting them into the doctest mode:
In [1]: %doctest_mode
Exception reporting mode: Plain
Doctest mode is: ON
>>> %paste
import numpy.random
np.random.rand(2)
## -- End pasted text --
array([ 0.8519522 , 0.15492887])
It isn't a necessity that the doctest markup <BLANK LINE> is used in
indicating the empty lines the output contains. Keep in mind that the option
to execute the examples via the NumPy.test is produced to check whether
the examples are functional, and not to make the examples a part of the
testing framework. The examples may assume that import numpy as np is
run before the example code in numpy. Other examples may use matplotlib
in plotting; however, it should be imported explicitly. For example, import
matplotlib.pyplot as plt. Every other import, the demonstrated function
inclusive, has to be explicit.
Having imported matplotlib into the example, the Sphinx plot directive for
matplotlib would wrap up the Example code.
<http://matplotlib.org/sampledoc/extensions.html>`_.
When not explicitly importing matplotlib, it is possible to use directly..plot::
should the matplotlib.sphinxext.plot_directive be loaded as a conf.py
extension in Sphinx.
Chapter 11: Testing logging
Python comes with a logging package, and although it is popular among
users, people tend to assume that logging calls need not be tested, or
consider the prospect rather tiring. As an aid, TestFixtures lets a user
capture the output of calls easily to the logging framework in Python, and
ensure that the output fit what is expected of them.
Keep in mind that the LogCapture is used in checking that your code logs
the right messages.
Techniques of capture
There are three distinct methods used in capturing messages that are logged
into the logging framework in Python, depending on the form of the test
being written. Below is a description of these techniques:
The context manager:
If the version of Python you use still contains or is compatible with the with
keyword, it is possible to use the context manager TestFixtures provides.
Enter the following code:
>>> import logging
>>> from testfixtures import LogCapture
>>> with LogCapture() as l:
... logger = logging.getLogger()
... logger.info('a message')
... logger.error('an error')
For the time the with block takes, the log messages are being captured. The
context manager offers a check method which returns an exception should
the logging not be as expected by you:
>>> l.check(
... ('root', 'INFO', 'a message'),
... ('root', 'ERROR', 'another error'),
... )
Traceback (most recent call last):
...
AssertionError: sequence not as expected:
<BLANKLINE>
same:
(('root', 'INFO', 'a message'),)
<BLANKLINE>
expected:
(('root', 'ERROR', 'another error'),)
<BLANKLINE>
actual:
(('root', 'ERROR', 'an error'),)
It also contains a string representation which lets you see what has been
logged, which is useful for doc tests:e
>>> print(l)
root INFO
the message
root ERROR
an error
The decorator:
If you are making use of a traditional unittest environment and wish to
capture the logging of a specific function, you might find that the decorator
is quite suitable for your needs:
from testfixtures import log_capture
@log_capture()
def test_function(capture):
logger = logging.getLogger()
logger.info('a message')
logger.error('an error')
capture.check(
('root', 'INFO', 'a message'),
('root', 'ERROR', 'an error'),
)
Keep in mind that this method has no compatibility with the fixture
discovery feature in PyTest. Instead, you can input a fixture like any of the
following into the contest.py:
import pytest
@pytest.fixture(autouse=True)
def capture():
with LogCapture() as capture:
yield capture
Manual usage:
Should you wish to capture the logging during the duration of a document
test, or for every test performed in a TestCase, it is possible to make use of
the LogCapture manually. The replacement and instantiation process re
carried out within the setUp function in the TestCase, or fed into the
DocTestSuite constructor:
>>> from testfixtures import LogCapture
>>> l = LogCapture()
You can then execute whatever would log the messages you want to test for:
>>> from logging import getLogger
>>> getLogger().info('a message')
It is possible to check whatever has been logged in at any point with the use
of the check method:
>>> l.check(('root', 'INFO', 'a message'))
Conversely, you can also make use of the string representation belonging to
the LogCapture:
>>> print(l)
root INFO
a message
Next, in the tearDown function found in the TestCase or introduced into the
DocTestSuite constructor, you should ensure to end the capturing process:
>>> l.uninstall()
In the case of multiple objects in the LogCapture, it is possible to uninstall
all of them quite easily:
>>> LogCapture.uninstall_all()
Checking for captured log messages:
Despite how you make use of LogCapture in capturing messages, you can
use three methods to check if the messages were captured as you expected
they would. The example shown below expresses helps demonstrate the
following:
import LogCapture from testfixtures
import getLogger from logging
logger = getLogger()
with LogCapture() as log:
logger.info('start of block number %i', 1)
try:
logger.debug('inside try block')
raise RuntimeError('No code to run!')
except:
logger.error('error occurred', exc_info=True)
The check technique:
LogCapture instances contain check_present() and check() methods in
making assertions about the entries which have been logged. Check() would
often bring the captured log messages into comparison using the expected
outcomes. The expected messages are expressed using the default mode as
three-element tuples, where the first element belongs to the name of the
logger in which the message should be logged. The second element also
belongs in the string representation of the level in which the message
should be logged. In addition, the third element represents the message
which should be logged in to coming after the parameter interpolation has
already occurred.
Should things go as expected, the method used would not being up any
exceptions:
>>> log.check(
... ('root', 'INFO', 'start of block number X'),
... ('root', 'DEBUG', 'inside try block'),
... ('root', 'ERROR', 'error occurred'),
... )
Albeit, should the real messages logged into were distinct, you would arrive
at an AssertionError which explains that which happened:
>>> log.check(('root', 'INFO', 'start of block number X'))
Traceback (most recent call last):
...
AssertionError: sequence not as expected:
<BLANKLINE>
same:
(('root', 'INFO', 'start of block number 1'),)
<BLANKLINE>
expected:
()
<BLANKLINE>
actual:
(('root', 'DEBUG', 'inside try block'), ('root', 'ERROR', 'error occurred'))
In sharp contrast, the check_present() would check if the messages
specified by the user are present, and that their order follows the specified
precedence. The other messages not included would be ignored. Consider
the code below:
>>> log.check_present(
... ('root', 'INFO', 'start of block number 1'),
... ('root', 'ERROR', 'error occurred'),
... )
Should the order followed by the messages be non-deterministic, you can
decide to go explicit because the order isn't of any much consequence.
>>> log.check_present(
... ('root', 'ERROR', 'error occurred'),
... ('root', 'INFO', 'start of block number 1'),
... order_matters=False
... )
Printing:
The LogCapture possesses a string representation that indicates the
messages which have been captured. This information can prove useful in
doc tests. Consider the code below:
>>> print(log)
root INFO
start of block number X
root DEBUG
inside try block
root ERROR
error occurred
This demonstration can also be used in checking if any logging activity has
been performed:
>>> empty = LogCapture()
>>> print(empty)
No logging captured
Inspection:
Another function of the LogCapture is to keep a list of all the LogRecord
instances that were captured. This specifically comes in handy when you
want to find out if the specifics of a captured logging which isn't available
on both the check() method and the string representation.
A relatively popular case of such happening can be seen when a user wants
to find out if a piece of exception information was logged for given
messages.
from testfixtures import compare, Comparison as C
compare(C(RuntimeError('No code to execute!')),
log.records[-1].exc_info[1])
Should you desire that the given extraction given in the attributes parameter
in the LogCapture constructor should be taken cognizance of, feel free to
research the list of recorders entries returned by the actual() method.
assert log.actual()[-1][-1] == 'error occurred'
Capturing specific logging alone:
Attempting to test certain actions may result in a series of logging, but you
need only to care about some rather than all of them. The logging you
should bother with id usually slightly above a given log level. if this is the
case, you can attempt to tweak the LogCapture to capture only logging
above or exactly at a given level. Consider the code below:
>>> with LogCapture(level=logging.INFO) as l:
... logger = getLogger()
... logger.debug('junk')
... logger.info('something we care about')
... logger.error('an error')
>>> print(l)
root INFO
something we care about
root ERROR
an error
On another hand, the problem can be allayed by the sheer action of
capturing a given logger:
>>> with LogCapture('specific') as l:
... getLogger('something').info('junk')
... getLogger('specific').info('what we care about')
... getLogger().info('more junk')
>>> print(l)
specific INFO
what we care about
Albeit, it might seem that although you do not want capture every detail of
the logging, you plan on capturing the logging from many unique loggers.
Consider the code below:
>>> with LogCapture(('one','two')) as l:
... getLogger('three').info('3')
... getLogger('two').info('2')
... getLogger('one').info('1')
>>> print(l)
two INFO
2
one INFO
1
It might relatively seem like the easiest action to carry out is to capture the
logging alone for aspects of the testing process. This is especially popular
practice when carrying out long document tests. To reduce the constraint of
the task, LogCapture is capable of supporting un-installation and manual
installation as contained in the demonstration shown below:
>>> l = LogCapture(install=False)
>>> getLogger().info('junk')
>>> l.install()
>>> getLogger().info('something we care about')
>>> l.uninstall()
>>> getLogger().info('more junk')
>>> l.install()
>>> getLogger().info('something else we care about')
>>> print(l)
root INFO
something we care about
root INFO
something else we care about
When you successfully filter to the entries you wish to make certain
assertions about, you might also want to take into cognizance the series of
attributes which are the defaults of the LogCapture. Consider the code
below:
>>> with LogCapture(attributes=('levelname', 'getMessage')) as log:
... logger = getLogger()
... logger.debug('a debug message')
... logger.info('something %s', 'info')
... logger.error('an error')
>>> log.check(('DEBUG', 'a debug message'), ('INFO', 'something info'),
('ERROR', 'an error'))
As is evident in the example, should a given attribute exhibit callable
features, it would be called, after which its results would be used in making
up parts of the entry. Should you even crave for more control, you can
easily run a callable into the attributes parameter, which is capable of
extracting any information that is needed from the records, and returning it
in proper formats.
def extract(record):
return {'level': record.levelname, 'message': record.getMessage()}
>>> with LogCapture(attributes=extract) as log:
... logger = getLogger()
... logger.debug('a debug message')
... logger.error('an error')
>>> log.check(
... {'level': 'DEBUG', 'message': 'a debug message'},
... {'level': 'ERROR', 'message': 'an error'},
... )
IX. Check the configurations of your log handlers:
LogCapture can be utilized in checking if a user's code is logging the
appropriate messages; in the same way, it is vital to check that your
application is appropriately configured to a log handler. It can be carried out
with the help of a unit test, as shown in the code below:
import Comparison as C, compare from testfixtures
import TestCase from unittest
import logging
import sys
class LoggingConfigurationTests(TestCase):
# We create a handlers list for the logger we
# configuring in a manner that we have no handlers
# configured at the bginning of the test and the handlers our
# configuration installations are removed at the fnishing point of the test.i
def setUp(self):
self.logger = logging.getLogger()
self.orig_handlers = self.logger.handlers
self.logger.handlers = []
self.level = self.logger.level
def tearDown(self):
self.logger.handlers = self.orig_handlers
self.logger.level = self.level
def test_basic_configuration(self):
# Our logging configuration code, in this case just a
# call to basicConfig:
logging.basicConfig(format='%(levelname)s %(message)s',
level=logging.INFO)
# Now we check the configuration is as expected:
compare(self.logger.level, XX)
compare([
C('logging.StreamHandler',
stream=sys.stderr,
formatter=C('logging.Formatter',
_fmt='%(levelname)s %(message)s',
strict=False),
level=logging.NOTSET,
strict=False)
], self.logger.handlers)
Chapter 12: Debugging
To debug your Python programs, you don't have to use an IDE explicitly. In
this section, we would cover the methods of debugging a basic script in
Python using the PDB module obtained through the standard library of
Python, which is available most times when Python is installed.
Let's consider the following lines of code below:
def funcA(first_val, second_val):
result = (first_val*2) - (second_val/4)
return result
def functionB(first_val=23, last_val=72):
response = funcA(first_val, last_vale)
result = response * first_val / 7
return result
functionB(33,88) # we are evaluating the function.
Should we attempt to execute the lines of code shown above, the result
would be an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in functionB
NameError: global name 'last_vale' is not defined
To deduce what isn't right with this code, we have to be aware of the
location to introduce our breakpoint. The breakpoint is what informs the
Python interpreter when to stop running at a given part of the application.
From the error message shown above, we can deduce that the issue is
traceable to the function — functionB. However, depending on your level
of expertise, you might be unsure of what promotes the error.
To debug this problem, you would have to include a breakpoint into the
beginning of the implementation of functionB.
# we would place our break point here
import pdb
pdb.set_trace()
response = funcA(first_val, last_vale)
result = response * first_val / 7
return result
In this case, the breakpoint exists as the two lines PDB.set_trace and import
PDB. Keeping both lines in place, attempting to execute the application
again would result in an output similar to that shown below:
Amos@Amos-pc MINGW32
~/Documents/projects/tuteria/microservices/tuteria (breakout/tutor-service)
$ cd ..
Amos@Amos-pc MINGW32 ~/Documents/projects/tuteria/microservices
$ python sample.py
> c:\users\amos\documents\projects\tuteria\microservices\sample.py(9)
functionB()
-> response = funcA(firat_val, last_vale)
(Pdb)
The interface appears synonymous with that of the interactive shell
provided by Python. The values sent into the function can be tested by
typing them out:
(Pdb) first_val
33
(Pdb) last_val
88
(Pdb)
To move onto the next line, the key "n" signifying "next" is used to move
down the code from one line to the other:
(Pdb) n
NameError: ‘global name "last_vale" is not defined'
> c:\users\amos\documents\projects\tuteria\microservices\sample.py(9)
functionB()
-> response = funcA(first_val, last_vale)
(Pdb)
It can be noted from the output above that on entering the next line, the
error that was experienced previously occurred again. As such, you can be
sure that the problem in the code can be found on the line that was just
executed. To let Python continue the execution process of the program, we
enter the key "c," which implies "continue," and the program continues
execution as expected.
Assuming that your implementation is fixed by making sure to use the
appropriate variable name and take out the breakout to avoid pausing the
program, the output should be something similar to the following:
def funcA(first_val, second_val):
result = (first_val * 2) - (second_val / 0)
return result
def functionB(first_val=23, last_val=72):
# we would place our break point here
response = funcA(first_val, last_val)
result = response * first_val / 7
return result
functionB(33, 88)
This time, the error you would experience would be different:
$ python sample.py
Traceback (most recent call last):
File "sample.py", line 13, in <module>
functionB(33, 88)
File "sample.py", line 8, in functionB
response = funcA(first_val, last_val)
File "sample.py", line 2, in funcA
result = (first_val * 2) - (second_val / 0)
ZeroDivisionError: integer division or modulo by zero
Although you might not be so sure as to where the error is emanating from
this time, especially since you are still getting to know Python more, you
can incrementally move on to the exact location of the error using the
Python debugger (PDB).
It is a known factor that the functionB function serves as the beginning
point of the application; hence, the breakpoint would be placed there and go
over the application one line after another. By now, you are already quite
acquiesced with the function of n in moving onto the following line.
However, now s is being introduced into the fray. S signifies step, which
shifts the flow of control into functionA. "n" should be entered on every
line of functionA until we arrive at the location of the code where the error
is generated. There are several more commands, all of which can be located
using the link provided below:
https://docs.python.org/2/library/pdb.html.
However, to fulfill your daily debugging needs, c, s, and n are all that you
require. Should you prefer this method of debugging, one thing to keep in
mind is to make sure that single words values such as variable names are
not used in our application. Take, for example:
import PDB
pdb.set_trace()
n= 84
s = 45
c = 23
If we entered n to navigate to the following line, we would encounter some
form of conflict since n is of particular significance in the Python debugger
(PDB), and the variable name used is also n.
Chapter 13: Tracking and reducing memory and
CPU usage
In executing a sophisticated program in Python, which takes up a rather
long time to run, it is only plausible that you attempt to improve its time of
execution. How can this be done?
To start with, you have to possess some set of tools that identify the
bottlenecks in your code. That is the parts that take a longer time to run. By
so doing, you would be able to concentrate on speeding up those parts first.
Also, it is crucial to take control of the CPU and memory usage as they can
prove helpful in directing you to new areas of code which could be
improved upon. Thus, in this section, we would be considering some 7
distinct tools in Python, which provide you with insights on how the
execution time of your code, as well as the CPU usage.
Make use of a decorator in tracking functions:
A straightforward method of tracking a function is in defining a given
decorator which measures the time used up in executing the function, as
well as the printing the result:
import time
from functools import wraps
def fn_timer(function):
@wraps(function)
def function_timer(*args, **kwargs):
t0 = time.time()
result = function(*args, **kwargs)
t1 = time.time()
print ("Total time running %s: %s seconds"
(function.func_name, str(t1-t0))
)
return result
return function_timer
At this point is where you introduce the decorator to precede the function
you wish to measure. Consider the codes below:
Then, you have to add this decorator before the function you want to
measure, like
@fn_timer
def myfunction(...):
...
For instance, let's find out how much time it takes to sort through an array
of 2000000 random numbers:
@fn_timer
def random_sort(n):
return sorted([random.random() for i in
if __name__ == "__main__":
random_sort(2000000)
Whenever you run the script, you should see something similar to this:
Total time running random_sort: 1.41124916077
Make use of the timeit module:
An alternative method is to make use of the timeit module because it offers
you a mean measure of the time. To execute it, enter and run the following
set of commands into your terminal:
$ python -m timeit -n 4 -r 5 -s "import timing_functions"
Here, timing_function represents the name of the script.
By the terminal point of the output, the result should be something similar
to this:
4 loops, best of 5: 2.08 sec per loop
Representing that for the 4 times the tests were executed (-n 4), with an
average of about 5 repetitions per test (-r 5), the most excellent test result
was obtained at 2.08 seconds. Should the number of repetitions and tests be
undefined, a default standard of 5 repetitions and 10 loops is used.
Make use of the time Unix command:
In this method, the timeit module and the decorator are both based on
Python. This explains why the Unix time utility might come in handy since
it is an external measure in Python.
To execute the time utility enter the following:
$ time -p python timing_functions.py
The output given would be similar to the following:
Total time running random_sort: 1.3931210041
real 1.49
user 1.40
sys 0.08
The first line is obtained from the already defined decorator, as well as the
other three lines.
1. Real is an indication of the total time expended in running the
script.
2. User is a representation of the amount of time spent by the CPU
in running the script.
3. Sys is a representation of the amount of time expended on
functions at the kernel level.
Take note: Wikipedia defines a kernel as a program in computing that
manages input and output requests from software and changes them into
instructions for data processing passed into the CPU (central processing
unit), as well as the other electronic components of a computer. Thus, the
distinction between the sum of user + sys and real-time might be an
indication that the time expended on waiting for an input or output, or that
is system is busy with other external activities.
Make use of a cProfile module:
If you wish to find out how much time is expended executing each method
and function, as well as the number of times each one is called, it is possible
to make use of the cProfile method:
$ python -m cProfile -s cumulative timing_functions
From this point, you would find a piece of well-detailed information
regarding the number of times each function is called in the code, and it
would be handled using the cumulative time expended on each one (thanks
to the option of the s-cumulative). In the end, you would discover that the
total time taken to execute your script is higher than it used to be. This is
the result of measuring the time taken to run every function.
Make use of a line_profiler module:
A line_profiler module is used in obtaining data regarding the time
expended by the CPU on each line of a written code. To use it, you first
have to install it. The installation is done as follows:
$ pip install line_profiler
The next step is to specify the functions you wish to evaluate with the
@profile decorator (you need not import it into your file):
@profile
def random_sort2(n):
l = [random.random() for i in range(n)]
l.sort()
return l
if __name__ == "__main__":
random_sort2(2000000)
In conclusion, it is possible to acquire a detailed line by line description of
the function of the random_sort2 by entering the code below:
$ kernprof -l -v timing_functions.py
Where the -l flag stands for the line-by-line while the -v flag represents a
verbose output. Employing this method, you would find that the
construction of the array takes up to 44% of the time used in the
computation process, whereas the remaining 56% is taken up by the sort()
method. You would also deduce that owing to the number of time
measurements performed, it might take the script even longer to run.
Make use of the memory_profiler module:
We use the memory_profiler module in measuring the memory usage level
of your code based on a line by line analysis. Albeit, it can result in your
code executing even slower. The memory_progiler can be reinstalled using:
$ pip install memory_profiler
A plausible recommendation is to have the psutil package installed; as such,
the memory_profile would tend to execute much faster:
$ pip install psutil
In a fashion similar to that of the line_profiler, make use of the @profile
decorator to mark the tracked functions. When this is completed, enter the
following next:
$ python -m memory_profiler timing_functions.py
Keep in mind that the script used previously takes way more time than the
previous 1 to 2 seconds like it used to be. Moreover, should you not install
the psutil package, perhaps you still are waiting on the results. Considering
the delivered output, take into cognizance that the memory usage which is
expressed in the unit of MIB — as in mebibyte. 1 MiB = 1.05Mb).
Make use of the guppy package:
With the guppy package, you would find you are able to keep track of the
number of objects of every type ranging from str to tuple to dict, etcetera,
which were designed at each stage of the code. To install, use the code:
$ pip install guppy
The next step is to introduce it into the code by:
from guppy import hpy
def random_sort3(n):
hp = hpy()
print "Heap at the beginning of the function\n"
l = [random.random() for i in range(n)]
l.sort()
print "Heap at the end of the function\n",
return l
if __name__ == "__main__":
random_sort3(2000000)
And run your code with:
$ python timing_functions.py
Something similar to the following is what you would see:
The output:
When you place the heap across the code at many different points, it is
possible to study the deletion and object creation process performed in the
script flow.
Make use of generators in calculating an extensive array of
results:
Generators provide a lazy evaluation process. They can be used by iterating
over them explicitly with the "for" statement or implicitly by feeding it into
a given construct or function which iterates. It is okay to consider
generators outputting several items as though they were returning a list —
rather than returning them altogether; they are returned one after the other.
In the process, the generator function gets paused often until the request of
another item.
- In cases of large numbers or data crunching, libraries such as
NumPy can be used since it is quite adept to managing memory.
- Keep in mind never to use the (+) operator to generate long strings.
Recall that str in Python is immutable; hence, the strings on the right
and left would have to be copied into a new string for every time a
pair of concatenation is performed. Should you concatenate, say,
four strings having a length of 10, you would altogether be copying
(10+10) + ((10+10)+10) + (((10+10)+10)+10) = 90 characters rather
than the intended 40 characters. Over time, things get worse
quadratically because the size and number of the string tend to go
up. In Java, a case like this is optimized by the transformation of a
series of concatenations by use of the String builder sometimes,
although nothing of that sort exists on CPython. It is thus advisable
that the % or the .format syntax is used, even though they are
somewhat slower than (+) used in shorter strings. Better still, if
there are already available contents in the form an iterable object,
".join(iterable_object) can then be used as it is a much faster
alternative.
def add_string_with_plus(iters):
s = ""
for i in range(iters):
s += "xyz"
assert len(s) == 3*iters
def add_string_with_format(iters):
fs = "{}"*iters
s = fs.format(*(["xyz"]*iters))
assert len(s) == 3*iters
def add_string_with_join(iters):
l = []
for i in range(iters):
l.append("xyz")
s = "".join(l)
assert len(s) == 3*iters
def convert_list_to_string(l, iters):
s = "".join(l)
assert len(s) == 3*iters
The output would be similar to this:
>>> timeit(add_string_with_plus(10000))
100 loops, best of 3: 9.73 ms per loop
>>> timeit(add_string_with_format(10000))
100 loops, best of 3: 5.47 ms per loop
>>> timeit(add_string_with_join(10000))
100 loops, best of 3: 10.1 ms per loop
>>> l = ["xyz"]*10000
>>> timeit(convert_list_to_string(l, 10000))
10000 loops, best of 3: 75.3 µs per loop
- Ensure to make use of slots in defining what a Python class is. You
can inform the program not to make use of a dynamic dict, and only
assign space for a given set of attributes, taking out the overhead
that comes with making use of one dict per object by setting
__slots__ on the class to a given set of attribute names. Slots can
also stop the arbitrary attribute assignment on a given object;
therefore, the object retains a sole shape throughout the process.
Chapter 14: Performance Improvement
Although Python is a relatively excellent programming language what with
how little codes can do so much and its ability to perform many different
tasks like multiprocessing efficiently, Python has often been tagged as slow
by some detractors. However, this isn't often the case. Below are some tips
with which you can improve the performance of Python in your
applications.
Make use of keys for sorts and generators:
Generators are a helpful tool in optimizing memory because they allow you
design functions capable of returning items one at a time rather than at
once. A good instance of this can be seen in the creation of a vast array of
numbers, which are then added together. When sorting the items of a list,
also, ensure that you make use of the default sort() methods and keys also,
whenever applicable. In the example shown below, take note that for every
case, the list is sorted based on the index selected as a part of the critical
argument. This approach can also be used when working on numbers and
strings. There are a variety of sorting codes in Python which are old and
time-consuming when developing a custom sort, and speed limiting when
executing the sort during runtime. Consider the code sample shown below:
import operator
somelist = [(2, 5, 7), (8, 3, 4), (5, 7, 9)]
somelist.sort(key=operator.itemgetter(0))
somelist
#Output = [(2, 5, 7), (8, 3, 4), (5, 7, 9)]
somelist.sort(key=operator.itemgetter(1))
somelist
#Output = [(2, 5, 7), (8, 3, 4), (5, 7, 9)]
somelist.sort(key=operator.itemgetter(2))
somelist
#Output = [(2, 5, 7), (8, 3, 4), (5, 7, 9)]
An external package is the best alternative for critical codes:
While Python makes the process of programming relatively easier to
perform, it may not offer the finest of performances always, especially with
critical tasks. Using ab external package in machine language, C++, or C
when dealing with critical tasks can help improve the performance of the
application. However, keep in mind that packages of this sort are usually
platform-specific, meaning that an adequate package related to the packages
of your choice is needed for the application process. Put simply, in this
method; application portability is given up in exchange for optimized
performance, which can be acquired only through direct programming with
the underlying host. Outlined below are some packages that could come in
handy for improving performance on tasks related to critical coding:
1.
Cython
2.
PyInlne
3.
PyPy
4.
Pyrex
These packages have responses peculiar to them, so disparities exist in the
application. For instance, Pyrex allows Python to be extended into
performing tasks such as using C data types in making more accessible and
more efficient memory tasks. PyInline, on the other hand, allows a user to
apply codes in C in Python applications directly. While the inline code is
separately computer, every other thing is kept in one place as the
effectiveness provided by C is put to use.
Refrain from using too many or unnecessary loops:
It is generally believed that using way too many loops in any programming
language whatsoever is a bad practice and strains the server more than is
necessary. Simple tweaks such as saving the length of an array into another
variable rather than making it scan the length at every point of iteration on
the loop can be beneficial indeed in making sure things are effectively
executed. Also, you can try to refactor your codes to make use of unions
and intersections. For instance, rather than do this;
for x in a:
for y in b:
if x == y:
yield (x,y)
You could simply do this instead:
return set(a) & set(b)
Learn to optimize loops:
In every programming language known, the emphasis is always placed on
the need for the optimization of loops. When coding in Python, you could
depend on a variety of techniques to make your loops execute even faster.
Albeit, one of the methods many developers miss out on is the failure to
refrain from making use of dots in a loop. Consider the instance below:
lowerlist = ['this', 'is', 'lowercase']
upper = str.upper
upperlist = []
append = upperlist.append
for word in lowerlist:
append(upper(word))
print(upperlist)
#Output = ['THIS', 'IS', 'LOWERCASE']
For every time a call is made to the str.upper, the method would go through
evaluation from Python. However, if the evaluation is enclosed within a
variable, Python would carry out the task even faster since the value is
already defined. The goal is to lower the amount of work performed by the
program within each loop since Python is easily beset by its interpreted
nature, which tends to reduce its speed on such occasions. Keep in mind
that there are different ways in which loops can be optimized, and this is but
one of such ways. Take, for instance; it is common knowledge among
developers to use list comprehension because it is believed to help speed up
the process of looping. The point here is that the optimization of loops
accounts for one of the finest ways of attaining better speed in the
application of Python.
Experiment with other methods of coding:
It is common practice to use the general technique of coding in Python all
through your application process. However, attempting to experiment with
other techniques to find out the more optimal or better one can be helpful. It
would aid in keeping you abreast and show innovation in your approach to
coding, regardless of the language used in your applications. The sheer
practice of thinking outside the box would lend you creative methods of
applying new and better coding approaches to obtain results faster in your
applications of Python.
Take a look at this sample of code:
n = 16
myDict = {}
for i in range(0, n):
char = 'abcd'[i%4]
if char not in myDict:
myDict[char] = 0
myDict[char] += 1
print(myDict)
This code has a much higher chance of being executed in record time when
myDict is started off empty. Albeit, in the case when myDict contains some
content or is mostly filled with data, an optional approach may seem to be
the better method.
n = 16
myDict = {}
for i in range(0, n):
char = 'abcd'[i%4]
try:
myDict[char] += 1
except KeyError:
myDict[char] = 1
print(myDict)
There exists huge similarities in both cases regarding the output of {'d': 4,
'c': 4, 'b': 4, 'a': 4}.
Make use of a more current version of Python:
Sometimes, the problem isn't with the codes, but the version of Python
used. On the Python official website, one can often find several messages
suggesting moving onto more recent versions of the program. Generally,
with every version update in Python comes better optimizations, which
make it better than its predecessor versions.
Here, the limiting element lies in whether or not the libraries you are used
to have also been moved to the new Python version or have become
obsolete. Instead of asking whether or not to make a move, an essential
inquiry is to find out when the new version contains enough support to
make the process of a move more practical. Before that, you have to make a
verification to prove that your code is still viable. You have to visit the new
Python version and then check for breaking changes in your application.
Only when the appropriate corrections have been made until you would find
out something different.
Albeit, merely ensuring that your application is capable of being executed
using the new version is limited as you could miss the new features as there
could be problem areas. Next, you can decide to update the parts that make
use of the features of the new version first. You would experience a
performance boost during the upgrade process.
Keep your coding light and simple:
The case in programming is often that the simplest is the fastest. In such a
time and era where performance exists as a critical factor, it is of high
import to make your codes in Python as concise as can be to lower the rate
of latency and pick up the pace. To help with keeping your codes clean,
there are specific questions you ought to ask yourself in the development
phase, like, why is this framework bring used? Is that module really
important to this code? I'd there a more straightforward way to do things?"
"Is any of it worth the overhead?"
Ensure to cross-compile your application:
Sometimes, it so happens that developers forget that the language used in
creating most applications in modern times appear as foreign to a system.
Computers understand machine code as a primary language. To be able to
execute an application, you need to make use of any application possible to
translate the human-readable code used into a language that is
comprehended by the computer. There are specific periods when coding an
application in a language like Python, and executing it in other languages
like C++ is a friend plan in yes of performance. However, depending on
what you expect of the application and the resources that can be offered by
the host system.
Nuitka is an excellent cross-compiler because it translates your code in
Python in codes in C++. The output is that you can run the application in a
native mode rather than depending on an interpreter of any sort. There
would tend to be a significant increase in performance depending on the
task in question and the platform used. As of now, Nuitka is still in beta
testing, so it is best used cautiously with production applications. Matter of
factly, it is best used for experimenting at the moment. There are still some
slight debates over whether cross-compilation is an excellent way to attain
better performance. Over the years, developers have experimented with
cross-compilation to attain specific goals like improving the speed of
applications. Just keep in mind that each solution comes with its
downside(s), which should be taken into cognizance before the solution is
tried out in a production environment.
When using any cross-compiler of your choice, ensure that it is supported
by the version of Python you use. Python versions 2.6 and 2.7 in the 2.x
series, as well as 3.2 and 3.3 in the 3.x series, support Nuitka. For this
method to work, you have to use both a compiler in C++ and a Python
interpreter. A variety of C++ compilers are compatible with Nuitka, like
MinGW, Clanf/LLVM , and Microsoft Visual Studio. Be aware of the
downsides of cross-compilation, as they can be pretty dangerous
sometimes. For instance, when using Nuitka, you may find that the smallest
program can take up a large amount of space in your computer's memory
because Python's functionality is implemented by Nuitka using several
DLLs (Dynamic Link Libraries). As such, this solution may not exactly be
practical if your system faces some constraints with resources.
Chapter 15: Multiprocessing, is a single-core CPU
enough?
Every system user would like to have more time to carry out one activity or
the other, but, unfortunately, time isn't one of the things we can create. In
this vein, everyone seems to turn their attention to making certain parts of
their task process when faster. In using Python, one is allowed a plethora of
tools that make the task of managing data rather easy using tools such as
NumPy and Pandas. However, owing to slight technical issues in codes,
performance could sometimes be significantly hindered. For this reason,
there have been many interventions aimed at bolstering the performance
speed of programs in Python. In this section, we would be considering how
multiprocessing can be used in speeding up the performance of Python.
It is worth noting that ubiquitous programming languages are usually
known to employ a single processor only. "Multi-" in the term
"multiprocessing" indicates the multiple cores of the CPU (central
processing unit) of a computer. Originally, computers tend to have a single
processor or CPU core, which serves as the unit which takes care of all
mathematical calculations. However, in present times, computers have up to
128 cores, indicating that a single-core CPU enough to improve the
processing time of a program.
In most programs and programming languages, multiple cores are seldom
ever taken advantage of. Programming languages such as C and Java
simultaneously send tasks into multiple CPUs automatically, while it is not
necessarily so for other languages like Python and R. Higher level
languages of this sort come equipped with out-of-the-box functionality
coupled with packages which make it much easier to handle data, but these
languages tend to default to the use of a single core, even in systems having
multiple CPUs. These programming languages tend to be less effective
owing to their use of only one core. In Python, the use of a single core can
be attributed to GIL — the acronym for global interpreter lock, which lets
the Python interpreter run no more than a single thread at any point in time.
The use of GIL came about as an implementation meant to take care of the
issue of memory management, but in turn, multiprocessing is disused for
efficiency in memory.
Multiprocessing is capable of increasing the speed of processing
significantly. To do this, we would have to bypass the GIL when running
codes in Python. This technique not only improves runtime, but it also helps
take advantage of the upsides of multiprocessing. The built-in
multiprocessing module in Python lets a user assign specific sections of
code into bypassing the GIL and sending that code into several processors
to be executed simultaneously. The multiprocessing module was first
introduced in Python in the late 2.x series (version 2.6). Richard Oudkerk
and Jesse Noller, in PEP 371, originally defined it as the multiprocessing
module. It allows a user to spawn specific processes in an equally similar
way to the way threads are spawned using a threading module. The general
point here is that since processes are now being spawned, the Global
Interpreter Lock can be avoided, and full advantage can be taken of the
multiple processors available in a system. The multiprocessing package also
contains specific APIs which cannot be found in any threading module
whatsoever. For instance, there exists a precise Pool class which can be
used in the parallel execution of a function across many different inputs.
Pool and Process constitute the classes of multiprocessing modules found in
Python. Let's consider them below:
The basics of multiprocessing
The Process class is quite synonymous with the Thread class found in the
threading module. As an example, we would attempt to create a series
which calls a similar function, and see how it works out:
import os
from multiprocessing import Process
def doubler(number):
"""
A doubling function can be used by the process
"""
result = number * 2
proc = os.getpid()
print('{0} doubled to {X} by process id: {2}'.format(
number, output, proc))
if __name__ == '__main__':
numbers = [15, 20, 25, 30, 35]
procs = []
for index, number in enumerate(numbers):
proc = Process(target=doubler, args=(number,))
procs.append(proc)
proc.start()
for proc in procs:
proc.join()
In this example, the Process was imported, and a doubler function was
created. Within the said function, the number fed in was doubled as well.
The OS module of Python was also used in acquiring the PID or ID of the
present process. The PID is what informs us of the exact process that calls
the function. Then, in the line of code situated at the bottom, a set of
Processes are created and initialized. The final loop in the code calls the
method — join() on every process, telling the program when to wait for a
process to end. To stop a process, all you need do is call its terminate()
method. When this code is executed, the expected output would typically
take after this fashion:
15 doubled to 30 by process id: 10468
20 doubled to 40 by process id: 10469
25 doubled to 50 by process id: 10470
30 doubled to 60 by process id: 10471
35 doubled to 70 by process id: 10472
Sometimes, it is highly preferable to use a more human-readable name for
your processes. Fortunately, the process class grants you access to a process
similar to yours. Take a look at the example below:
import os
from multiprocessing import Process, current_process
def doubler(number):
"""
A doubling function can be used by the process
"""
result = number * 2
proc_name = current_process().name
print('{x} doubled to {y} by: {2}'.format(
number, result, proc_name))
if __name__ == '__main__':
numbers = [15, 20, 25, 30, 35]
procs = []
proc = Process(target=doubler, args=(z,))
for index, number in enumerate(numbers):
proc = Process(target=doubler, args=(number,))
procs.append(proc)
proc.start()
proc = Process(target=doubler, name='Test', args=(z,))
proc.start()
procs.append(proc)
for proc in procs:
proc.join()
In this case, something else is imported — the current_process, which is
quite similar to the current_thread module in threading. The current_process
is used in obtaining the name of a thread which calls the function being
used. You would find out that there is no well-defined name for the first five
processes. Then, at the sixth, "Test" is set as the name of the process.
Consider the sample below to find out the output of the entry:
5 doubled to 10 by: Process-2
10 doubled to 20 by: Process-3
15 doubled to 30 by: Process-4
20 doubled to 40 by: Process-5
25 doubled to 50 by: Process-6
2 doubled to 4 by: Test
The output shown indicates that the module in charge of multiprocessing,
by default, assigned a number to every process as a part of its name. This is
because when a name is specified, we can't expect that a number would be
included in it.
Locks
A multiprocessing module supports locks in a similar way to how it is done
in the threading module. All a user needs to do is import Lock, obtain it,
and carry out a task before releasing it. Let's consider an example below:
from multiprocessing import Process, Lock
def printer(item, lock):
"""
Prints out the items passed in
"""
lock.acquire()
try:
print(item)
finally:
lock.release()
if __name__ == '__main__':
lock = Lock()
items = ['tango', 'foxtrot', 10]
for item in items:
p = Process(target=printer, args=(item, lock))
p.start()
In this case, the goal is to create a printing function that prints whatever is
passed into it. To avoid interference between the codes, a Lock object is
used. The code would loop over the provided list made up of three items
and form a process for every one of them. An individual process would call
the function, and one of the items obtained from the iterable would be
passed to it. Since we are making use of locks, the succeeding in-line
process would continue only after the lock has been released.
Logging
There is a bit of a disparity between logging processes and logging threads.
The reason for the difference can be traced to the fact that the logging
packages in Python are incapable of using process shared locks. As such,
one can be left with messages from a variety of processes getting mixed up.
Let's consider introducing some basic logging into the prior sample. The
code would assume a fashion similar to:
import logging
import multiprocessing
from multiprocessing import Process, Lock
def printer(item, lock):
"""
Prints out the items passed in
"""
lock.acquire()
try:
print(item)
finally:
lock.release()
if __name__ == '__main__':
lock = Lock()
items = ['tango', 'foxtrot', 10]
multiprocessing.log_to_stderr()
logger = multiprocessing.get_logger()
logger.setLevel(logging.INFO)
for item in items:
p = Process(target=printer, args=(item, lock))
p.start()
The easiest method of logging is in sending everything to stderr. It can be
done by calling the function — log_to_stderr(). Then, the function—
get_logger is called, as well, to access the logger and tweak its logging level
to INFO. The remainder of the code remains the same. It is important to
note that the join() method would not be called here. Rather, the parent
thread, that is, your script, would implicitly call join() anytime it ends. In
doing this, you should end up with an output synonymous to this shown
below:
[INFO/Process-1] child process calling self.run()
tango
[INFO/Process-1] process shutting down
[INFO/Process-1] process exiting with exitcode 0
[INFO/Process-2] child process calling self.run()
[INFO/MainProcess] process shutting down
foxtrot
[INFO/Process-2] process shutting down
[INFO/Process-3] child process calling self.run()
[INFO/Process-2] process exiting with exitcode 0
10
[INFO/MainProcess] calling join() for process Process-3
[INFO/Process-3] process shutting down
[INFO/Process-3] process exiting with exitcode 0
[INFO/MainProcess] calling join() for process Process-2
The pool class
The pool class is an indication of a pool made up of worker processes. It is
comprised of methods that are capable of letting one offload tasks into the
worker processes. Consider the sample of code shown below:
from multiprocessing import Pool
def doubler(number):
return number * 2
if __name__ == '__main__':
numbers = [5, 10, 20]
pool = Pool(processes=3)
print(pool.map(doubler, numbers))
Put simply, the task in this code involves creating an instance of a Pool and
informing it to create three worker processes. Next, the map method is used
to map an iterable, as well as a function to every process. In conclusion, the
result is printed, in which case takes the form of a list: [10, 20, 40].
The result of the process can also be obtained in a pool through the use of
the method — apply_async. Consider the sample below:
from multiprocessing import Pool
def doubler(number):
return number * 2
if __name__ == '__main__':
pool = Pool(processes=3)
result = pool.apply_async(doubler, (25,))
print(result.get(timeout=1))
Chapter 16: C/C++
History of C and C++
C, a multipurpose programming language was designed in 1972 by Dennis
Ritchie of Bell Labs to use with UNIX — an operating system used at the
time. C is mainly used in computer software programming, but it can also
be used in the creation of general-purpose software. C programming
language can be defined as a language that is procedural, imperative, and
assumes a block structure.
C++ was initially known as "C with Classes" and is still popular as the
superstructure of the C programming language among computer scientists.
C++ was developed by Bjarne Stroustrup in 1983 at Bell Labs as an
enhanced form of C. In 1979, Bjarne Stroustrup began introducing operator
overloading, classes, multiple inheritances, virtual functions, exception
handling, templates, among others. In 1998, C++ as a programming
language received ratification as an ISO/IEC 14882:1998. The current
version still in use dates back to 2003, the ISO/IEC 14882:2003. Matter of
factly, this version is but a mere correction of the 1998 version of C++. In
2005, the released "Library Technical Report 1" provides insights into
extensions of the standard library, which would not serve as a part of the
standard version. A much more recent version now known informally as the
C++0x is still being developed. Over the years since 1990, C++ gas grown
to become a successful programming language. Although C++ has a
royalty-free policy, its documentation, however, is not available for free.
The uses of C and C++
C is a great language used in executing applications coded in assembly
language as a result of its great features such as simple compilation,
reduced level of run-time supports, reduced levels of access to memory,
high efficiency as a constructing language, which is in sync with the
hardware instructions. Another credit shows that C++ can be put to great
use as a result of its high portability, which is compatible with many
different platforms and operating systems, and little to no changes are made
or required in the source code. Thus, independence and remote operations
have been enabled through the hardware. Also, since C complies with many
different standards, it is capable of working with everything.
On the other hand, C++ is known to be a mid-level language, seeing as it is
composed of features found in both low-level and high-level programming
languages.
Characteristics of C and C++
C:
Below are some of the critical characteristics peculiar to the C
programming language:
Conforming to the traditions of ALGOL (Algorithmic
language).
Static typing system for the prevention of unintended
operations
Free-format source text and reserved keywords
Use of character—integer peculiar to assembly languages
Structured programming features.
Value passed parameters with pertinence to the pointer value
passing.
Large capacity for hiding variables, though function definitions
are non-nestable.
Short circuit evaluation — the use of a sole operand should the
result be determinable with only it.
Larger range of compound operators like ++, +=, among others.
Heterogeneous manipulation and combination of data.
Other features added unofficially over time due to use and updates include
the following:
1. Enumerated types
2. Assignments enabled in struct datatypes.
3. Void functions
4. Creation of tools to prevent the fundamental problems peculiar
to the programming language.
5. Const qualifiers used in making read-only objects.
6. Returning union types or struct rather than pointers by
functions.
7. Over time, C evolved to the capacity of rewriting the UNIX
Kernel, which is coded initially in assembly language; making
it one of the pioneers of the first-ever operating system kernels
coded in a language other than any assembly language.
C++:
1. The design of C++ is such that it can be statically typed while
serving as a multipurpose language with as much effectiveness
and portability as the C programming language.
2. The design of C++ offers a series of choices to a programmer,
even though the sheer misuse of it can make the programmer
end up with an incorrect choice.
3. C++ generally keeps off of features with platform specificity or
ones that aren't aimed at general purpose.
4. The design of C++ makes it capable of functioning in the
absence of a complex programming environment.
5. The structure of C++ is such that it supports a variety of
programming styles comprehensively and directly, including
data abstraction, procedural programming, generic
programming, and object-oriented programming.
6. C++ is incapable of incurring overhead for features not utilized
by a user.
7. Polymorphism accounts for one of the attractive characteristics
of C++, as it allows the implementation of several tasks within
a single interphase and makes objects act based on the situation.
Both dynamic (as in run-time) and static (as in compile-time)
forms of polymorphism are supported in C++.
The development of C and C++:
As time passes, standardization grew important as a result of the vast array
of extensions, a random library with widespread usage in the language, and
the apparent absence of precise implementation of compilers according to
given specifications. One of the aims of the standardization process in C
was to provide users with a superset of K&R C, which would incorporate
many of the unofficially introduced features subsequently. The committee
of standards introduced Albeit, a variety of new features inclusive but not
limited to support for locales and international character sets, void pointers,
a pre-processor with better capacity, and function prototypes.
On the other hand, the evolution of C++ was rapid, while C trailed statically
behind until the Normative Amendment 1 in 1995 when a new standard was
created. This new standard was placed under further supervision, resulting
in the 1999 publication of the ISO 9899:1999. The standard, which is
typically identified as "C99" was in March of 2000, adopted as an ANSI
(American National Standards Institute) standard. Below are a few of the
new functions added into the standard:
1. Library functions such as snprintf
2. Inline functions
3. The length of an array can be a variable
4. The support of one-line comments that begin with //
5. Assigned initializers
6. Type-generic math functions such as to math.h
7. Compound literals
8. The capability to declare a variable at the beginning of a
compound statement or any point rather than after another
declaration alone
9. Support for variadic macros. That is, macros with variable arity
10.
New header files like inttypes.h and stdbool.h
11.
Better support for IEEE floating point
12.
Inclusion of new data types such as optional
extended integer types, long int, complex type, and explicit
boolean data type to indicate complex numbers.
Chapter 17: Windows
Windows refers to a group of many different graphical operating system
families that are produced, marketed, and distributed by the Microsoft
group. Every family in this group cares for a given part of the computing
world. Some of the active Microsoft Windows families include the
following: Windows IoT and Windows NT — which contains other
subfamilies such as Windows CE (Windows Embedded Compact) or
Windows Server. Microsoft Windows families now defunct include the
following: Windows Phone, Windows Mobile, and Windows 9x.
The operating environment now known as Windows was introduced in
1985, on the 20th of November by Microsoft. The release was to serve as a
graphical operating system shell for the then MS-DOS as a response to the
growing rate of interest in GUIs (Graphical user interfaces). Over time,
Microsoft Windows dominated the PC (personal computer) market across
the world, grossing over 90% of the market share worldwide, and
overthrowing Apple's Mac OS, which was released earlier in 1984.
Windows and Python
In this section, we would consider the relationship between Windows and
Python.
Installing Python on Windows:
Contrary to many of the services and systems operating Unix, Windows
doesn't traditionally need Python, and as such, contains no pre-installed
version of the programming language. Albeit, Windows installers known as
MSI packages have been compiled the CPython team following every
release of Python across the years. With the ongoing evolution of Python,
some of the platforms that used to support it previously no longer support it
as a result of the absence of a user base or developers. PEP 11 contains
details about the platforms that no longer support Python.
Windows 3.x and DOS are since deprecated, dating as early as Python 2.0.
For this reason, codes peculiar to these platforms were omitted following
the release of Python 2.1.
Up until version 2.5, Python was still supported on platforms like Windows
ME, 98, and 95, even though a deprecation warning was usually raised
during the installation. However, from Python 2.6 and every other release
until the time if this writing, the compatibly to those platforms ended, and
future releases are expected to share compatibility solely to the Windows
NT family.
It still supports Windows CE
The Python interpreter is installed by the Cygwin installer as well (cf.
Cygwin package source, Maintainer releases).
Optional bundles:
Asides the standard CPython distribution, specific modified packages
include extra functionality. Shown below are some of the widely known
versions as well as their essential characteristics:
· ActivePython:
This is an installer that is compatible across multiple platforms and contains
PyWin32 and documentation.
· Enthought Python Distribution:
Widely known modules like PyWin32, as well as the documentation
peculiar to them, and a tool suite for developing extensible applications in
Python.
Keep in mind that the packages mentioned above have a much higher
tendency of installing an older version of Python.
Python Configuration:
To be able to execute Python codes flawlessly, you would find that specific
changes needed to be made to the environment in Windows.
Excursus: Setting up environment variables:
There is a built-in dialog in Windows used in changing the environment
variables according to a guide, which mainly applies to the classical view in
XP. To start, right-click on the My Computer icon on your Windows
system, and click on properties. Next, click to open the Advanced tab, and
select the button marked Environment Variables.
Put simply, the path used is:
My computer → Properties → Advanced → Environment Variables
The system and user variables can be added or modified in this dialog. To
alter the system variables, a non-restricted access to the machine is
required. That is administrator rights.
An alternative way to add variables into the environment is to make use of
the set command:
set PYTHONPATH=%PYTHONPATH%;C:\My_python_lib
To ensure this setting stays permanent, you can try adding the
corresponding command line onto your autoexec.bat. In this file, msconfig
assumes the role of a graphical interface.
Checking out environmental variables can be a relatively straightforward
process if done correctly. The strings automatically wrapped into the
percent signs would be expanded by the command prompt, as shown below:
echo %PATH%
Locating the Python executable:
Asides making use of the start menu automatically developed for the
Python interpreter, it is possible to execute Python from the DOS prompt.
To ensure this technique works, you would have to set up the %path%
environment variable to add to the directory of the Python distribution,
limited using a semicolon form another entry. A sample variable would
typically follow after this fashion, given that the first two entries are default
in Windows:
C:\WINDOWS\system32;C:\WINDOWS;C:\Python25
Entering Python into your command prompt would start up the Python
interpreter. As such, you would be able to run Python scripts using options
in the command line.
Locating Modules:
The libraries in Python, including site-packages folders, are usually stored
in the installation directory in Python. As such, if you have your program
installed into c:\python\, the default library can be found in the path:
C:\Python\Lib\, while the third-party modules would typically be saved in
the path: C:\Python\Lib\site-packages\.
Sys.path is populated in Windows through the following:
At the start, an empty entry is introduced that is similar to that of the
present directory.
Should the environment variable PYTHONPATH be available as was
described in environment variables, the next thing to be added are its
entries. Keep in mind that paths in the variable have to be spaced using
semicolons in Windows; this would serve to tell them apart from the colon,
which the drivers use, as in C:\.
Other added application paths can be introduced into the registry in the
form of subkeys of the path
\SOFTWARE\Python\PythonCore{version}\PythonPath, placed under the
hives HKEY_LOCAL_MACHINE and HKEY_CURRENT_USER. The
subkeys which are comprised of path strings delimited by semicolons as
their default values would make every path to be included in the sys.path.
Take note that every installer is known to make use of HKLM alone; hence,
HKCU is usually left empty.
Should the environment variable PYTHONHOME be already set up, it can
be taken as "Python Home." Otherwise, the primary Python executable's
path would be used to find a "landmark file" — Lib\os.py to figure out the
"Python Home." Should it find a Python home, all the subdirectories
relevant to it and included in the sys.path such as plat-win, Lib, among
others, are according to the folder. Conversely, the core Python path is
formed using the PYTHONPATH saved into the registry.
In the case that it cannot find the Python Home, there would be no defined
PYTHONPATH in the environment. Conversely, when no registry entries
can be located, a default path containing associated entries is utilized
instead. For example, \Lib;.\plat-win, etcetera.
The eventual output of this process is given as:
When Python.exe or other .exe are being executed using the Python leading
directory (either from the PC build directory directly or on an installed
version), the core path can be deduced while those in the registry are left
unused. The other "application paths" found in a registry are always read.
When another .exe is used to host Python (say, embedded through a COM,
placed in a different directory, etcetera), the "Python Home" would be left
undeduced; hence, the core path stemming from the registry is utilized
instead. In this case the other "application paths" in the registry are read at
all times.
Should Python be unable to locate its home, and there is an apparent
absence of a registry, say, from frozen .exe or any other absurd set up
during installation, you would receive a path containing some defaults,
albeit relative paths.
Running Python scripts:
Python scripts are files which contain the .py extension. They are typically
executed by default using the Python.exe. The executable raises a terminal
which remains unclosed even when the program makes use of a GUI. To
avoid such an occurrence, ensure to make use of the extension .pyw, that
would make sure the script is run by default with the pythonw.exe (both
executables can be found in the top part of the directory of Python
installation. During startup, this intervention would suppress the terminal
window. You can also try to run all your Python scripts using pythonw.exe
by setting it via the normal facilitates, for instance, (administrative rights
might be required:
Startup the command prompt environment
Assign the correct file group to the Python scripts:
assoc .py=Python.File
Proceed to redirect every Python file into the newly made executable:
ftype Python.File=C:\Path\to\pythonw.exe "%1" %*
Chapter 18: OS X
OS X used to be known as Mac OS X, and presently macOS. It refers to a
set of graphical operating systems that are designed, marketed, and
distributed by Apple Inc. since the year 2001. OS X is the primary operating
system used in the Mac family of personal computers created by apple. In
the world of laptops, desktops, web usage, and home systems, OSX is the
second most popularly used operating system for computers, after Windows
by Microsoft.
OS X and Python:
Python is an already installed package in OS X, so one can begin using it
right away. Albeit, to be able to utilize the features of the newer versions,
you would have to download new version updates and install them with the
pre-installed version. The easiest method to go about this is to find and
install a binary installer meant for OS X from the official Python website.
There are available installers for both Python 2 and Python 3, depending on
your preference, which is compatible with every Mac system, which runs
on OS X 10.5 and earlier versions. The Python releases contain IDLE —
the built-in interactive development environment in Python. When you get
and install Python from their official webpage, you might require a more
recent version of Tcl/Tk for OS X.
A Python interpreter can be run on OS X by double-clicking on
Applications, entering Utilities, then Terminal, and finally, entering Python
3 (if your installed version is Python 3, that is) or Python (if your installed
version is Python 2) in the window that is raised. It is also possible to
launch IDLE for the version of Python you installed by double-clicking its
icon in the correct Python version folder contained in the applications
folder. Another alternative is to enter idle or idle3 into a terminal window.
There are a variety of additional software packages in Python that can be
accessed via PyPI, known as the Python Package Index. You should make
use of pip to ensure ease of installation and management of the additional
packages. Pip tends to come with versions 3.4 and upwards, so for earlier
versions of Python; you might want to follow through with the instructions
regarding the installation of pip.
Among the many different packages that can be accessed via PyPI are
several of them, which are particularly meant for OS X environments. They
are inclusive but not limited to, the following:
1. Pyobjc: This package serves as a bridge between Objective-C
and Python, letting users code native Cocoa applications in pure
Python with complete features.
2. Py2app: This package lets users create independent OS X
plugins and application bundles that can be double-clicked from
Python scripts.
Other packages compatible with the OS X include:
1. Enthought Python Distribution: The Enthought Python
Distribution offers scientists a complete set of tools with which
to carry out visualization and thorough analysis of data.
2. ActiveState ActivePython: This package exists in both
community and commercial versions, as well as scientific
computing modules.
Python, as well as a complete range of third-party libraries and packages,
can be accessed from a variety of open-source package managers
compatible with OS X, like:
Fink
Homebrew
MacPorts
How to correctly install Python on OS X
In this section, we would be exploring how a real Python version can be
installed.
Before beginning the installation process, you would have first to install
GCC. You can acquire GCC by getting Xcode, a smaller OSX-GCC-
Installer package, or a more compact Command Line Tools (you need an
Apple account for this).
Keep in mind that if you have already installed Xcode, there is no point in
installing the OSX-GCC-Installer. When combined, both packages tend to
raise problems that are impossible to find. Moreover, should you have
installed Xcode newly, it would be necessary to include the command line
tools by visiting your Terminal and entering the code:
xcode-select –install
Although OS X comes pre-installed with a vast array of Unix utilities, if
you are familiar with Linux OS, you would discover a missing component
— the package manager. In this case, the void us filled by Homebrew.
For the installation of Homebrew, visit your Terminal or any other OS X-
compatible terminal emulator of your choice and enter the following codes:
$ ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/install/master/install)"
The script would explain explanations regarding any alterations to be made,
and you would be shown some prompts before starting the installation.
Having installed Homebrew now, include the Homebrew directory to the
top of the PATH environment variable. It can be done by including the lines
of code shown below into the bottom part of your ~/.profile file.
export PATH="/usr/local/opt/python/libexec/bin:$PATH"
Should your system use an OS X 10.12 or any older version, make use of
this code instead:
export PATH=/usr/local/bin:/usr/local/sbin:$PATH
From here on, we would be able to install Python 3:
$ brew install python
The process would take a moment to complete.
How to work with Python 3
If at this point, your installed version of Python is the 2.7 release, the
chances are that you already have the Homebrew version of Python 2
installed, as well as that of Python 3.
$ python
Entering the code above would raise the Python 3 interpreter installed by
the Homebrew.
$ python2
The code above would open the Python 2 interpreter installed by the
Homebrew.
$ python3
The code above is an alternative entry to launch the Python 3 interpreter
installed by the Homebrew.
Should the Homebrew version of Python 2 be the installed package in your
system, pip2 would point in the direction of Python 2. Conversely, pip
would point in the direction of Python 3 if the installed Homebrew version
you have to be Python 3.
For the remainder of this guide, we would assume your installed version
references to Python 3:
# Do I have a Python 3 installed?
$ python --version
Python 3.7.1 # Success!
Virtual Environments and Pipenv
Chapter 19: Linux
Linux refers to the family of open-source operating systems bearing
similarities to Unix and is based on the Linux kernel. The first operating
system kernels were published in 1991, the 17th of September by its creator
Linus Torvalds. Usually, Linux is packed in a Linux distribution. The
operating system was primarily designed for PCs (personal computers)
according to the structure of the Intel ×86 but has since moved on to other
platforms more than other operating systems of its stature. In mainframe
computers, large iron systems, supercomputers, and servers, Linux is the
pioneer operating system used since 2017, November precisely, growing
until it ousted out all rivalry. Up to 2.3% of all desktop computers around
the world run on the Linux operating system. The Chromebook primarily
uses a Chrome operating system based on Linux kernel. The Chromebook
accounts for nearly 20% of all sales of the notebook in the educational
market of the United States.
Linux and Python
How to set up Python environments in Unix and Linux Operating Systems:
For Linux systems (similar to how it is on OS X), you would find that
Python comes already pre-installed in your system, although there's no
guarantee the version is a recent one. Matter of factly, Python accounts for a
significant fraction of how the package installer functions on Linux and
Unix systems. The main point here lies in discovering the version of Python
pre-installed in your system, as well as the version you plan on carrying out
programming with. Begin by opening a terminal and finding out what
version it is. Enter the code below:
python –version
When entered, the code would return one of the following depending on the
installed version:
Python2.x.x
or
Python 3.x.x
Depending on the result outputted, you can now determine whether or not
to get another version. If you wish to get another version while still
retaining the pre-installed one, try appending the version number into the
Python command line. If, say, your pre-installed version type in Python 3,
you can enter:
python2 --version
Once entered, you would receive a response peculiar to Python 2.x.x.
This process is necessary because it defines how Python codes would be
executed through the Python interpreter you choose. If you find that your
version of Python is obsolete and wish to install a newer version, enter the
following codes:
sudo apt-get install python
or
python#
Python environments:
The environment on your system matters a lot. While Python has its
significant part as being a program that is relatively easy to use, its
simplicity also accounts for one of its significant caveats. It is why setting
up an adequate Python environment to work with is essential, and can seem
confusing on the first try. Confusing because you might make the mistake
of thinking a simple installation makes you all set and ready to go. You
have to keep in mind that whatever version of Python you prefer, you would
still need to use a similar set-up in your production environment. Any
package you obtain from the index, for instance, would require installation
in your system as well before it can be used. It is advisable to keep tabs of
these packages in a text file that can be accessed by pip later on when you
need to install them.
To get started, you first have to develop a virtual environment peculiar to
your installed version of Python.
· For Python 2:
In this version of Python, the first step is the installation of virtualenv
through pip:
pip install virtualenv
Should you receive an error message stating that pip first has to be installed,
you would first have to install pip then. Pip serves as the most reliable
method of meaning packages in Python, and as is usually advised, it is the
only recommended method to use. Once you have successfully installed
pip, proceed to install virtualenv. From this point, you can "cd" to the
project directory, and go ahead to create a new environment of your choice:
virtualenv [name_of_your_project]
Running this code creates a set for Python files within the present directory
known as my_project. And you are set.
· Python 3:
Not much difference exists in Python 3. Here, you may have to install the
Python virtual environment module:
sudo apt-get install python3-venv
Having installed it; the next step is to "cd" to the project directory where
you would execute the command shown below:
python -m venv [name_of_your_project]
Running the command would create a set of Python files within the present
directory known as my_project.
How to use the virtual environment in Python
Having installed and set up the virtual environment; the process is quite
similar in both Python versions. For clarity, shown below is a working
directory:
me@path/to/my_dir$ source my_project/bin/activate
(my_project)me@path/to/my_dir$
In essence, the purpose of this command is to make use of the local and
clean installation of Python to execute inputted commands in the virtual
environment. To try it out, you can attempt to execute the Python interpreter
from within the environment, and trying importing some modules you
already have in the main installation of Python, such as NumPy, for
instance.
To return and leave the environment, enter:
(my_project)me@path/to/my_dir$ deactivate
When you begin performing a project as the source, keep in mind that you
would have to change the source environment and not the main
environment. Hence, anything done on Python within the virtual
environment ends there.
How to execute Python programs in Linux
Seeing as you have the Python virtual environment already set-up, you can
try it out by developing some simple codes in Python. Get a code editor of
your choice. In the example shown in this section, the code editor used is
vim, and the codes written are in Python 3. Bear in mind that the Django
used in the ensuring example is installed and can be found in the central
installation of Python, on the source.
import django
print("Got here")
Basically, all you need to run a program in Python on a Linux system is the
command shown below:
Python program-name.py
Chapter 20: Unix
Unix refers to a family of multiuser and multipurpose operating systems,
which can be traced back to the original Unix by AT&T. It was developed in
the 1970s by Dennis Ritchie, Ken Thompson, among others are the Bell
Labs research center. Primarily, the operating system was meant to be used
in the Bell System alone, however, AT&T licensed it to third parties in the
later periods of the '70s, resulting in a variety of both commercial and
academic styled Unix variants from different vendors such as Microsoft
(Xenix), Sun Microsystems (Solaris), the University of California, IBM
(AIX), and Berkeley (BSD). Unix systems typically take after a modular
design, often known as the "Unix philosophy." The philosophy revolves
around the concept of the operating system providing a series of essential
tools which individually carry out a limited but well-defined function using
a compact filesystem — the Unix filesystem — as the significant method of
communication, as well as a command language — the Unix shell — and a
shell scripting; all of which combines the tools to carry out sophisticated
workflows. Unix sets itself apart from other predecessor operating system
as the first to be portable: almost the entirety of the operating system is
coded in the programming language, C, making Unix capable of being used
on a variety of platforms.
Unix and Python
How to set up Python on a Unix Machine using pyenv
In this section, we would be taking a look about how to set up a Python
version, as well as an environment for use on a Unix system.m with pyenv
rather than conda.
· The installation of pyenv:
Installing pyenv is the first step in the process. Pyenv lets a user compile
and manage many different versions of Python on one Unix system. Keep
in mind that pyenv is a tool with which several versions of Python
interpreters can be managed on a single machine. Formerly known as
"pyvenv" until the deprecation of the first v. Pyenv refers to a tool of the
Python command line which comes within the Python package and can be
used in the creation and management of virtual environments. It can be
efficiently used as a shell alias for Python -m venv.
On Unix-based apple systems, pyenv can be installed using the following
commands shown below:
brew install pyenv
However, one sure way of installing pyenv on any Unix-based system
regardless of its type is to use a git clone:
git clone https://github.com/pyenv/pyenv.git ~/.pyenv
Once done, proceed to enter the following set of commands into the shell
startup of your system to include the functionality of pyenv into your shell:
eval "$(pyenv init -)"
Keep in mind that if pyenv is cloned into a path which isn't in your main
shell $PATH, you would have to add it, as in the example below:
export PATH=~/.pyenv/bin:$PATH
· Install a compiler for C, as well as its corresponding libraries:
Pyenv is capable of compiling Python, meaning you would require a C
compiler and many different libraries. Both the compiler and libraries may
already be present in your system, but if they are not you can use the
following:
For Apple systems enter:
xcode-select –install
For Ubuntu enter:
sudo apt-get install -y \
make \
build-essential \
libssl-dev \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
wget \
curl \
llvm \
libncurses5-dev \
libncursesw5-dev \
xz-utils \
tk-dev
· Using pyenv to install multiple versions of Python:
Once you have completed the installation of pyenv, the C compiler, and
libraries, you are quite set to install any Python version of your choice. For
instance, you can choose between versions inclusive but not limited to the
following:
pyenv install 2.7.13
pyenv install 3.5.3
pyenv install 3.6.2
When these commands are executed, they install an official build of your
chosen version from Python's official website. To get a comprehensive list
of all the versions of Python pyenv is capable of installing, simply run the
command below:
pyenv install –list
· Setting up pyenv preferences:
Often, you may find that you are working outside a virtual environment. In
that scenario, it is possible to directly interact with pyenv to select a version
you would like to use. The command is shown below sets up a default
version of pyenv that would be raised when the word "python" is run:
pyenv global 3.6.1
If you wish to automatically override the global settings whenever you visit
a given directory, run the following commands:
cd ~/oldpython2project
pyenv local 2.7.13
For overriding the global preference in a single shell session, execute the
command shown below:
pyenv shell 3.5.3
· How to set up virtual environments:
On a general note, pyenv should be very rarely used except in the event of a
new release of Python. Rather than pyenv, you should stick to the basic
creation and termination of a virtual environment. To use this method, you
first have for ensure that the packaging basics are the current ones in use for
any version of pyenv installed in your system:
for v in $(pyenv versions --bare) ; do
pyenv shell $v
pyenv which python
python -m pip install --upgrade pip virtualenv wheel
done
Next, proceed to create a file (~/.pip/pip.conf) that contains:
[global]
require-virtualenv = true
Doing this makes sure that nothing is accidentally installed with pip outside
of the virtual environment.
The next phase is to begin the creation of the virtual environment. Begin by
temporarily activating the version of pyenv that corresponds to the version
of Python interpreter you want to use for the virtual environment. Then,
proceed to begin creating the environment:
pyenv shell 3.6.1
python -m venv path/to/virtual/environment
· Using direnv to manage the activation and creation of virtual
environments:
To begin, you first have to install direnv before introducing it into your
shell. When using bash on Apple systems, your output may be similar to:
brew install direnv
echo 'eval "$(direnv hook bash)"' >> ~/.bashrc
At this juncture, you have to inform direnv about the presence of pyenv by
entering the commands below into ~/.direnvrc
use_python() {
local python_root=$HOME/.pyenv/versions/$1
load_prefix "$python_root"
layout_python "$python_root/bin/python"
}
Once completed, you are all set up. At this point, if you create any file, say,
for example, ~/project/.envrc that shows:
Use python 3.6.1
Your first time attempting to cd into ~/project, you would be promoted to
enter direnv allow, which is a security feature of the program. Afterward,
direnv would use your specified version of Python in .envrc to create and
activate a virtual environment.
Chapter 21: Creating your libraries
Python libraries refer to any compact collection comprising of Python
modules, which are organized as Python packages. Put simply, a group of
modules shares one directory located in the Python search path.
· Pathology package:
The Path object in Python 3 is great; however, it has a caveat in being
unable to locate the path of a script in use. This feature is especially
important when you have to access files that are relevant to any script you
are working on. Most of the time, the script can be found anywhere, so it is
impossible to use absolute paths or relative paths either since there can be
no defined value set for the working directory. You have to be able to figure
out the directory of the current script if you want to access a file in the sub-
directory.
To do this, enter:
import pathlib
script_dir = pathlib.Path(__file__).parent.resolve()
To gain access to the file tagged "file.txt" in the data sub-directory of the
present script's directory, use the code:
print(open(str(script_dir/'data/file.txt').read())
There is a built-in script_dir method in the pathology package, and it can be
used as follows:
from pathology.Path import script_dir
print(open(str(script_dir()/'data/file.txt').read())
The pathology package obtains its Path class using the path of the pathlib,
and includes a static script_dir() which returns the path of the calling script
at all times.
View the implementation below:
import pathlib
import inspect
class Path(type(pathlib.Path())):
@staticmethod
def script_dir():
print(inspect.stack()[1].filename)
p = pathlib.Path(inspect.stack()[1].filename)
return p.parent.resolve()
Owing to the implementation of pathlib.Path done across platforms, it can
be directly derived from through a particular sub-class known as
WindowsPath or PosixPath. The resolution of the script for makes use of
the inspect module in locating the caller, as well as its filename attributes.
· Trying out the pathology package:
Whenever you write a script, you consider to be more than a throw-away,
test it. In this case, the pathology module should not be exempted. Below
are the tests performed with the standard unit test framework:
import os
import shutil
from unittest import TestCase
from pathology.path import Path
class PathTest(TestCase):
def test_script_dir(self):
expected = os.path.abspath(os.path.dirname(__file__))
actual = str(Path.script_dir())
self.assertEqual(expected, actual)
def test_file_access(self):
script_dir = os.path.abspath(os.path.dirname(__file__))
subdir = os.path.join(script_dir, 'test_data')
if Path(subdir).is_dir():
shutil.rmtree(subdir)
os.makedirs(subdir)
file_path = str(Path(subdir)/'file.txt')
content = '123'
open(file_path, 'w').write(content)
test_path = Path.script_dir()/subdir/'file.txt'
actual = open(str(test_path)).read()
self.assertEqual(content, actual)
· Python Path:
Python packages have to be installed in a location within the Python search
path that would be imported by the modules. The Python search path
contains a list of directories and is available at all times in the sys.path.
Below is the sys.path used in this sample:
>>> print('\n'.join(sys.path))
/Users/gigi.sayfan/miniconda3/envs/py3/lib/python36.zip
/Users/gigi.sayfan/miniconda3/envs/py3/lib/python3.6
/Users/gigi.sayfan/miniconda3/envs/py3/lib/python3.6/lib-dynload
/Users/gigi.sayfan/miniconda3/envs/py3/lib/python3.6/site-packages
/Users/gigi.sayfan/miniconda3/envs/py3/lib/python3.6/site-
packages/setuptools-27.2.0-py3.6.egg
Keep in mind that the first empty line in the output is an indication of the
current directory; as such, modules can be imported from the current
directory in use. Directories cab he directly included or omitted to and from
the sys.path.
· Packaging the Python library:
Having now obtained our tests and codes; it is time to package it into a
standard library. In Python, there is provision for an easy setup module.
Begin by creating a file under the name setup.py in the root directory of
your package. Then, in setting up a source distribution, execute the
command:
python setup.py sdist
In creating a binary distribution known as the wheel, enter the following:
python setup.py bdist_wheel
Below is the setup.py file used by the pathology package:
from setuptools import setup, find_packages
setup(name='pathology',
version='0.1',
url='https://github.com/the-gigi/pathology',
license='MIT',
author='Gigi Sayfan',
author_email='
[email protected]',
description='Add static script_dir() method to Path',
packages=find_packages(exclude=['tests']),
long_description=open('README.md').read(),
zip_safe=False)
Here, we create a source distribution:
$ python setup.py sdist
running sdist
running egg_info
creating pathology.egg-info
writing pathology.egg-info/PKG-INFO
writing dependency_links to pathology.egg-info/dependency_links.txt
writing top-level names to pathology.egg-info/top_level.txt
writing manifest file ‘pathology.egg-info/SOURCES.txt'
reading manifest file ‘pathology.egg-info/SOURCES.txt'
writing manifest file ‘pathology.egg-info/SOURCES.txt'
warning: sdist: standard file not found: should have one of README,
README.rst, README.txt
running check
creating pathology-0.1
creating pathology-0.1/pathology
creating pathology-0.1/pathology.egg-info
copying files to pathology-0.1...
copying setup.py -> pathology-0.1
copying pathology/__init__.py -> pathology-0.1/pathology
copying pathology/path.py -> pathology-0.1/pathology
copying pathology.egg-info/PKG-INFO -> pathology-0.1/pathology.egg-
info
copying pathology.egg-info/SOURCES.txt -> pathology-0.1/pathology.egg-
info
copying pathology.egg-info/dependency_links.txt -> pathology-
0.1/pathology.egg-info
copying pathology.egg-info/not-zip-safe -> pathology-0.1/pathology.egg-
info
copying pathology.egg-info/top_level.txt -> pathology-0.1/pathology.egg-
info
Writing pathology-0.1/setup.cfg
creating dist
Creating tar archive
removing 'pathology-0.1' (and everything under it)
The warning shown is as a result of the use of an unrecognized
README.md file. Ignore the warning, it's safe. The output is usually a tar-
gzipped file contained in the dist directory:
$ ls -la dist
total 8
drwxr-xr-x 3 gigi.sayfan gigi.sayfan 102 Apr 18 21:20 .
drwxr-xr-x 12 gigi.sayfan gigi.sayfan 408 Apr 18 21:20 ..
-rw-r--r-- 1 gigi.sayfan gigi.sayfan 1223 Apr 18 21:20 pathology-0.1.tar.gz
The binary distribution is shown below:
$ python setup.py bdist_wheel
running bdist_wheel
running build
running build_py
creating build
creating build/lib
creating build/lib/pathology
copying pathology/__init__.py -> build/lib/pathology
copying pathology/path.py -> build/lib/pathology
installing to build/bdist.macosx-10.7-x86_64/wheel
running install
running install_lib
creating build/bdist.macosx-10.7-x86_64
creating build/bdist.macosx-10.7-x86_64/wheel
creating build/bdist.macosx-10.7-x86_64/wheel/pathology
copying build/lib/pathology/__init__.py -> build/bdist.macosx-10.7-
x86_64/wheel/pathology
copying build/lib/pathology/path.py -> build/bdist.macosx-10.7-
x86_64/wheel/pathology
running install_egg_info
running egg_info
writing pathology.egg-info/PKG-INFO
writing dependency_links to pathology.egg-info/dependency_links.txt
writing top-level names to pathology.egg-info/top_level.txt
reading manifest file 'pathology.egg-info/SOURCES.txt'
writing manifest file 'pathology.egg-info/SOURCES.txt'
Copying pathology.egg-info to build/bdist.macosx-10.7-
x86_64/wheel/pathology-0.1-py3.6.egg-info
running install_scripts
creating build/bdist.macosx-10.7-x86_64/wheel/pathology-0.1.dist-
info/WHEEL
Should your package contain extensions in C, you would have to design a
distinct wheel for every platform individually:
$ ls -la dist
total 16
drwxr-xr-x 4 gigi.sayfan gigi.sayfan 136 Apr 18 21:24 .
drwxr-xr-x 13 gigi.sayfan gigi.sayfan 442 Apr 18 21:24 ..
-rw-r--r-- 1 gigi.sayfan gigi.sayfan 2695 Apr 18 21:24 pathology-0.1-py3-
none-any.whl
-rw-r--r-- 1 gigi.sayfan gigi.sayfan 1223 Apr 18 21:20 pathology-0.1.tar.gz
Distributing the Python package:
To distribute the created pathology package, you would have to upload it
into PyPI and make available some additional metadata requirements by
PyPI. The steps include:
Creating an account:
To create an account, you have to visit the official PyPI webpage. Next, a
.pypirc file has to be created in your home directory:
[distutils]
index-servers=pypi
[pypi]
repository = https://pypi.python.org/pypi
username = the_gigi
Registration of your package:
Make use of the register command in setup.py. It would usually request a
password from you:
$ python setup.py register -r pypitest
running register
running egg_info
writing pathology.egg-info/PKG-INFO
writing dependency_links to pathology.egg-info/dependency_links.txt
writing top-level names to pathology.egg-info/top_level.txt
reading manifest file ‘pathology.egg-info/SOURCES.txt'
writing manifest file ‘pathology.egg-info/SOURCES.txt'
running check
Password:
Registering pathology to https://testpypi.python.org/pypi
Server response (200): OK
Uploading the package:
Once you have successfully registered the package, the next step is to
upload it. It is advisable to use twine as it is more secure than its
contemporaries.
$ twine upload -r pypitest -p <redacted> dist/*
Uploading distributions to https://testpypi.python.org/pypi
Uploading pathology-0.1-py3-none-any.whl
[================================] 5679/5679 - 00:00:02
Uploading pathology-0.1.tar.gz
[================================] 4185/4185 - 00:00:01
Conclusion
Many thanks for adding this read to your collection and following the series
all through from book one. You have been a rather fantastic audience.
Warmest regards,
Lewis Taylor