Three players when running a python script
python myscript.py
Python Python script imports
interpreter Python script other packages
• Python interpreter: software to compile/execute the script.
• Python script: script you wrote
• Python packages: python libraries called by script
Two different ways to run a python script
python myscript.py
If the script has a shebang* line, you can also run the script like this:
myscript.py
* Shebang line is the first line of a script to specify path of the interpreter, e.g. “#!/usr/bin/python3.9.6”;
** In Linux, file extension like “.py” is ignored. It is the “Shebang” line that defines the type of a script.
In Linux, it is the Shebang line that defines the script type.
In Windows, the file name
Python script: bamCoverage.py extension define the script type.
#!/usr/bin/python3.6
import deeptools.misc
In Linux, the Shebang line
if __name__ == "__main__":
args = None define the script type,
if len(sys.argv) == 1: whether it is a Python, R,
args = ["--help"] Perl, or shell script.
main(args)
Two different formats of Shebang line
#!/usr/bin/python3.6 Full path of the Python interpreter
#!/usr/bin/env python3 Default python3 on the system, as defined in $PATH.
Python interpreter
&
Python packages (libraries)
Which Python?
Multiple Python installations co-exist on the same computer. On BioHPC, we have v2.7.5, v2.7.15,
v3.6.7, v3.9.6. There are more versions of Python in Conda.
How to verify which Python is being used?
which python
python -V
Alternative ways to use a different version of Python.
• Shebang line #!/usr/bin/python3.9.6
• Add to PATH export PATH=/programs/python-3.9.6/bin:$PATH
• Linux Module module load python/3.9.6
Each Python has its own library directories, and
a companion “pip” for library installation
For example:
Python /usr/bin/python3.6
Alias (symbolic link): /usr/local/bin/python
Pip /usr/bin/pip3.6
Alias (symbolic link): /usr/local/bin/pip
Packages /usr/lib/python3.6/ & /usr/lib64/python3.6/
If you run “pip install”, you will get an error message “permission denied”. You need
to run “pip install --user” which would install python packages under your home
directory.
When running a script, Python looks for packages from three
different places, and following this order. The first found is used.
Directories defined in • Custom location, e.g. export
$PYTHONPATH
PYTHONPATH=/workdir/lib:$PYTHONPATH. This is independent
of which “python” or which version of “python” you use.
• If you run “pip install --user packageName”, the
$HOME/.local package are installed under $HOME/.local. This
is independent of which “python” you use, but
different for each python version.
sys.path
• Each python installation has its own unique sys.path.
e.g. /usr/lib/python3.6
Install python software with Pip
sys.path
pip install deepTools e.g. /usr/lib/python3.6
# you need write permission to the
sys.path.
pip install deepTools --user $HOME/.local
# packages are only accessible by the user
pip install deepTools --prefix=/workdir/$USER
/workdir/$USER
(Pip download software from PyPI) #when using this library, you need to
specified it in $PYTHONPATH
Some other features of pip
1. Install a specific version of a python package
pip install --user deepTools==3.5.1
2. Upgrade a package including its dependencies to latest
pip install --upgrade deepTools
Conda
• Online software repository (independent from PyPI);
What is Conda? • A package manager for software installation;
• An environment manager for running software;
Why Conda? /usr/bin
For executables
Traditionally, Linux software
For libraries
are installed into these /usr/lib
three directories
/etc For config files
Only a system admin can install software
into these directories.
Conda adds a directory where user can install software
/usr/bin
Python, pip and other
/usr/lib executables go here
bin
etc
lib
Python packages
/home/qs24/ go here
miniconda3 etc
Some config files
A regular user can install software into go here
these directories.
Conda envs directory is a collection of multiple environments
/usr/bin
Each software can have its
own isolated environment.
/usr/lib
bin
etc
lib
binPython3.6
$HOME/
etc
env_1 lib
miniconda3 etc
env_2
envs bin Python3.9
env_3 lib
etc
env_n
Each Conda environment has its own python, libraries
and companion pip
Install Conda
python packages in base
pip Install Conda
packages in a
Conda python environment
base pip
python
pip
Install softwere in Conda base vs Conda environment
Install under Conda base:
conda install -c bioconda deeptools
Create a Conda environment and install software:
conda create -c bioconda -n deeptools deeptools
Name of Conda channel. It is Name of the environment you Name of the Conda package.
the place where conda find will create. It can be any This name must exists in the
the package name. channel.
Activate/deactivate a Conda environment
Activate De-activate
#activate conda base
source ~/miniconda/bin/activate conda deactivate
#activate an environment
conda activate busco
or
#activate conda base
source ~/miniconda/bin/activate busco
During Conda installation, it tries to trick you to make conda activated by default. Don’t do
that!!! If you have already done that, disable it by modifying .bashrc file.
Within a conda environment, you can run either
“conda install” or “pip install”.
# create and activate an environment, which only has python in it
conda create -n myEnv python=3.9
conda activate myEnv
# install deeptools in the environment
conda install deeptools #installation through Anaconda repository
or
pip install deeptools #installation through Pypi repository
Compatibility of software versions within a Conda environment
When depositing a software, the developer When installing a software, Conda
provides an installation recipe package manager reads the recipe to
determine which version to download.
For example, the recipe for Deeptools:
run: • Check whether a package exists
- deeptoolsintervals >=0.1.8 in the current environment;
- matplotlib-base >=3.1.0
- numpy >=1.9.0
- plotly >=2.0.0 • Find a package available in the
- py2bit >=0.2.0 repository and compatible with
- pybigwig >=0.2.3 all software within the same
- pysam >=0.14.0 environment.
- python >=3
- scipy >=0.17.0
Conda as a package manager Mamba, an alternative to Conda
conda create -n deeptools deeptools package manager
Install mamba:
conda install mamba
Use mamba:
mamba install …
mamba create …
* Mamba is often much faster than
conda and more robust.
A few tips of using Conda
Sometimes, a little intervention is needed.
For example, when “biopython” was upgraded to 1.77, it was not compatible with
“hicexplorer”. In this case, you need to explicitly specify a lower biopython version.
conda install -c bioconda hicexplorer biopython=1.76
* Afterwards, hicexplorer developers noticed this problem and updated its recipe
to “<1.77”
You might need to update Conda software once in a while
conda update conda
Conda channels
conda install -c bioconda -c conda-forge deeptools
* conda-forge is more comprehensive, but less strictly managed.
Including conda-forge could take much longer to “solve packages”.
Troubleshooting Python
Step 1. verify which Python you are using
which python
Common errors:
1. You are using a wrong version of python;
For example, running Python2 script with Python3. You would see this
error message: SyntaxError: Missing parentheses in call to 'print’.
To fix:
module load python/2.7.15
2. A python module is missing, and you need to install it.
If you are using system Python
pip install --user theModuleName
If you are using Python in Conda
pip install theModuleName
3. You are using a wrong version of Python modules. You need to re-install the right version.
pip install theModuleName==3.12
* When running into version issue, it is better to do it within a Conda environment, to
avoid interference with other software.
If you installed the right version, but still got error message. You need to
verify which python module is actually being used
Python follows this order to find a library
echo $PYTHONPATH >>> import numpy
ls -l ~/.local/lib
>>> print numpy.__file__
/usr/lib64/python2.7/site-
packages/numpy/__init__.pyc
Under $HOME/.local, libraries for different
python versions are separated
>>> print numpy.__version__
1.14.3
* run these commands in “python” prompt
The most common error: you are in Conda base, but try to run a
software not installed through Conda
• System default
You are in Conda base, but try to run
• Conda base
a Python script installed by BioHPC
• Conda environment admin.
How to tell that you are in Conda? How to correct?
Edit the .bashrc file in your home directory.
Insert a line with the word
“return” before “conda
“(base)” initialize”. Then logout and
login again.
Jupyter Notebook
Three ways to run Python:
Python shell, Python script and Jupyter Notebook (Jupyter Lab)
Python shell
Python script (run in Linux shell)
python myscript.py or ./myscript.py
(#!shebang line ignored) (#!shebang line define which python interpreter to use)
Jupyter notebook (Jupyter Lab)
(https://biohpc.cornell.edu/lab/userguide.aspx?a=software&i=263#c )
Jupyter notebook runs Python through a web browser
http://cbsum1c2b010.biohpc.cornell.edu:8016/?token=72cc017561bd59ba4dab4a5604d7857c93dd8f68a45d520b
Client: your laptop Server: cbsum1c2b010.biohpc.cornell.edu
Putty (ssh) ssh cbsum1c2b010.biohpc.cornell.edu • Port 22
Browser (http)
ssh daemon • Protocol: ssh
http://cbsum1c2b010.biohpc.cornell.edu:8009
• Port 80
http daemon 1 • Protocol: http
http://cbsum1c2b010.biohpc.cornell.edu:8009 • Port 8009
http daemon 2 • Protocol: http
http: communication protocol
Cbsum1c2b010.biohpc.cornell.edu: server address
8009: port
• A ‘daemon’ is a software process that is continuously
running in a background, often listening to a port;
User 1 cbsum1c2b010
Cbsum1c2b010:8009
jupyter daemon 1
8009
ssh daemon
User 2 22
Cbsum1c2b010:8010
jupyter daemon 2
8010
rstudio daemon
8015
User 3
Cbsum1c2b010:8011 Jupyter daemon 3
8011
With ssh and rstudio, one daemon can
serve multiple users.
To start a Jupyter notebook daemon with default Python (v3.6)
It is important keep the server daemon
running in a persistent “screen” session
screen
export PYTHONPATH=/programs/jupyter3/lib/python3.6/site-
packages:/programs/jupyter3/lib64/python3.6/site-packages
export PATH=/programs/jupyter3/bin:$PATH
jupyter notebook --ip=0.0.0.0 --port=8017 --no-browser
You will be provided with a URL which you can open in a web browser:
http://cbsum1c2b010.biohpc.cornell.edu:8017/?token=dfe3b002ca2d7721c4a2c0c641de91645e74f59d6519e31b
How to use “screen”: https://biohpc.cornell.edu/lab/doc/Linux_exercise_part2.pdf
If you need a different version of Python,
install and run Jupyter with Conda or Docker
source ~/miniconda3/bin/activate #activate Conda
#create a Conda environment
conda create -n mypython3 python=3.8 “mypython3” with python v3.8
conda activate mypython3 #activate mypython3 environment
mamba install -c conda-forge notebook #install Jupyter Notebook. I
use “mamba” here as it is a lot
faster than Conda.
To run Jupyter installed in a Conda environment:
screen
source ~/miniconda3/bin/activate mypython3
jupyter notebook --ip=0.0.0.0 --port=8019 --no-browser
• On BioHPC, only ports between 8009-8039 are open to users;
• Check if a port is already being used: netstat -tulpn | grep 8019
In summary
Installing Python software Running Python software
Python software repository: Which python interpreter?
Pypi which python
Anaconda python -V
#check shebang line of the script
Installation package manager:
Pip
Which python package?
Conda or Mamba
Installation directory:
Pip: sys.path or ~/.local (--user option)
Conda: Conda base and Conda environment
Some afterthoughts
Why is it so complicated?
Because a server is shared by many people and many
applications. To work peacefully together, we have to follow
certain rules.
Maybe someday a computer is cheap enough, I
can have a dedicated computer for each job.
Not likely in the near future.
… But wait, we have something that is close enough, “Docker”
and “Singularity”.