python_scripts_modules.rst

/notes/python_scripts_modules.rst

https://bitbucket.org/andybaoxv/uwhpsc
ReStructuredText | 407 lines | 292 code | 115 blank | 0 comment | 0 complexity | 9547cde5001f5e6ed17fcc65ad97c219 MD5 | raw file


.. _python_scripts_modules:

=============================================================
Python scripts and modules
=============================================================

A Python script is a collection of commands in a file designed to be
executed like a program.  The file can of course contain functions and
import various modules, but the idea is that it will be run or executed
from the command line or from within a Python interactive shell to perform a
specific task.  Often a script first contains a set of function definitions
and then has the *main program* that might call the functions.  

Consider this script,  found in $UWHPSC/python/script1.py:

**script1.py**

.. literalinclude:: ../codes/python/script1.py
   :language: python
   :linenos:


The *main program* starts with the print statement.

There are several ways to run a script contained in a file.  

At the Unix prompt::

     $ python script1.py
         x        f(x)
       0.000     1.000
       2.000     5.000
       4.000    17.000

From within Python::

    >>> execfile("script1.py")
    [same output as above]

From within IPython, using either `execfile` as above, or `run`::

    In [48]: run script1.py
    [same output as above]

Or, you can `import` the file as a module (see :ref:`importing_modules`
below for more about this)::

    >>> import script1
         x        f(x)
       0.000     1.000
       2.000     5.000
       4.000    17.000

Note that this also gives the same output.  Whenever a module is imported,
any statements that are in the main body of the module are executed when it
is imported.  In addition, any variables or functions defined in the file
are available as attributes of the module, e.g., ::

    >>> script1.f(4)
    17.0

    >>> script1.np
    <module 'numpy' from
    '/Library/Python/2.5/site-packages/numpy-1.4.0.dev7064-py2.5-macosx-10.3-fat.egg/numpy/__init__.pyc'>


Note there are some differences between executing the script and importing
it.  When it is executed as a script, 
it is as if the commands were typed at the command line.  Hence::

    >>> execfile('script1.py')
         x        f(x)
       0.000     1.000
       2.000     5.000
       4.000    17.000

    >>> f
    <function f at 0x1c0430>

    >>> np
    <module 'numpy' from
    '/Library/Python/2.5/site-packages/numpy-1.4.0.dev7064-py2.5-macosx-10.3-fat.egg/numpy/__init__.pyc'>

In this case `f` and `np` are in the namespace of the interactive session as
if we had defined them at the prompt.

.. _python_name_main:

Writing scripts for ease of importing
-------------------------------------

The script used above as an example contains a function `f(x)` that we might
want to be able to import without necessarily running the *main program*.
This can be arranged by modifying the script as follows:

**script2.py**

.. literalinclude:: ../codes/python/script2.py
   :language: python
   :linenos:

When a file is imported or executed, an attribute `__name__` is
automatically set, and has the value `__main__` only if the file is executed
as a script, not if it is imported as a module.  So we see the following
behavior::

     $ python script2.py 
         x        f(x)
       0.000     1.000
       2.000     5.000
       4.000    17.000

as with `script1.py`, but::

    >>> import script2           # does not print table

    >>> script2.__name__
    'script2'                    # not '__main__'

    >>> script2.f(4)
    17.0

    >>> script2.print_table()
         x        f(x)
       0.000     1.000
       2.000     5.000
       4.000    17.000


.. _python_reload:

Reloading modules
-----------------

When you import a module, Python keeps track of the fact that it is imported
and if it encounters another statement to import the same module will not
bother to do so again (the list of modules already import is in
`sys.modules`).  This is convenient since loading a module can be
time consuming.  So if you're debugging a script using `execfile` or `run`
from an IPython shell, each time you change it and then re-execute it will
not reload `numpy`, for example.

Sometimes, however, you want to force reloading of a module, in particular
if it has changed (e.g. when we are debugging it).  

Suppose, for example, that we modify `script2.py` so
that the quadratic function is changed from `y = x**2 + 1 ` to 
`y = x**2 + 10`.  
If we make this change and then try the following (in the same Python
session as above, where `script2` was already imported as a module)::

    >>> import script2

    >>> script2.print_table()
         x        f(x)
       0.000     1.000
       2.000     5.000
       4.000    17.000

we get the same results as above, even though we changed `script2.py`.  

We have to use the `reload` command to see the change we want::

    >>> reload(script2)
    <module 'script2' from 'script2.py'>

    >>> script2.print_table()
         x        f(x)
       0.000    10.000
       2.000    14.000
       4.000    26.000

.. _python_argv:

Command line arguments
----------------------

We might want to make this script a bit fancier by adding an optional argument
to the `print_table` function to print a different number of points, rather
than the 3 points shown above.

The next version has this change, and also has a modified version of the
main program that allows the user to specify this value `n` as a command
line argument:

**script3.py**

.. literalinclude:: ../codes/python/script3.py
   :language: python
   :linenos:

Note that:

 * The function `sys.argv` from the `sys` module returns the arguments that
   were present if the script is executed from the command line.  It is a
   list of strings, with `sys.argv[0]` being the name of the script itself,
   `sys.argv[1]` being the next thing on the line, etc. (if there were more
   than one command line argument, separated by spaces).

 * We use `int(sys.argv[1])` to convert the argument, if present, from a 
   string to an integer.

 * We put this conversion in a try-except block in case the user gives an
   invalid argument.

Sample output::

    $ python script3.py
         x        f(x)
       0.000     1.000
       2.000     5.000
       4.000    17.000

    $ python script3.py 5
         x        f(x)
       0.000     1.000
       1.000     2.000
       2.000     5.000
       3.000    10.000
       4.000    17.000

    $ python script3.py 5.2
    *** Error: expect an integer n as the argument

.. _importing_modules:

Importing modules
-----------------

When Python starts up there are a certain number of basic commands defined
along with the general syntax of the language, but most useful things needed
for specific purposes (such as working with webpages, or solving linear
systems) are in *modules* that do not load by default.  Otherwise it would
take forever to start up Python, loading lots of things you don't plan to
use.  So when you start using Python, either interactively or at the top of
a script, often the  first thing you do is *import* one or more modules.

A Python module is often defined simply by grouping a set of parameters and
functions together in a single .py file.  
See :ref:`python_scripts_modules` for some examples.

Two useful modules are *os* and *sys* that help you interact with the
operating system and the Python system that is running.  These are standard
modules that should be available with any Python implementation, so you
should be able to import them at the Python prompt::

    >>> import os, sys

Each module contains many different functions and parameters which are the
*methods* and *attributes* of the module.   Here we will only use a couple
of these.  The
*getcwd* method of the os module is called to return the "current working
directory"  (the same thing *pwd* prints in Unix), e.g.::

    >>> os.getcwd()
    /home/uwhpsc/uwhpsc/codes/python

Note that this function is called with no arguments, but you need the open
and close parens.  If you type "os.getcwd" without these, Python will
instead print what type of object this function is::

    >>> os.getcwd
    <built-in function getcwd>

.. python_path:

The Python Path
---------------

The *sys* module has an attribute *sys.path*, a variable that is set by
default to the search path for modules.  Whenever you perform an *import*,
this is the set of directories that Python searches through looking for a
file by that name (with a .py extension).  If you print this, you will see a
list of strings, each one of which is the full path to some directory.
Sometimes the first thing in this list is the empty string, which means "the
current directory", so it looks for a module in your working directory first
and if it doesn't find it, searches through the other directories in order:

    >>> print sys.path
    ['', '/usr/lib/python2.7', ....]

If you try to import a module and it doesn't find a file with this name on
the path, then you will get an import error::

    >>> import junkname
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: No module named junkname

When new Python software such as NumPy or SciPy is installed, the
installation script should modify the path appropriately so it can be found.
You can also add to the path if you have your own directory that you want
Python to look in, e.g.::

    >>> sys.path.append("/home/uwhpsc/mypython")

will append the directory indicated to the path.  To avoid having to do this
each time you start Python, you can set a Unix environment variable that
is used to modify the path every time Python is started.  First print out
the current value of this variable::

    $ echo $PYTHONPATH

It will probably be blank unless you've set this before or have installed
software that sets this automatically.  
To append the above example directory to this path::

    $ export PYTHONPATH=$PYTHONPATH:/home/uwhpsc/mypython

This appends another directory to the search path already specified (if any).
You can repeat this multiple times to add more directories, or put something
like::

    export PYTHONPATH=$PYTHONPATH:dir1:dir2:dir3

in your *.bashrc* file if there are the only 3 personal 
directories you always want to search.


Other forms of import
----------------------------------------

If all we want to use from the *os* module is *getcwd*, then another option
is to do::

    >>> from os import getcwd
    >>> getcwd()
    '/Users/rjl/uwhpsc/codes/python'

In this case we only imported one method from the module, not the whole
thing.  Note that now *getcwd* is called by just giving the name of the
method, not *module.method*.  The name *getcwd* is
now in our *namespace*.  If we only imported *getcwd* and tried typing 
"os.getcwd()" we'd get an error, since it wouldn't find *os* in our
namespace.

You can rename things when you import them, which is sometimes useful if
different modules contain different objects with the same name.
For example, to compare how the `sqrt` function in the standard Python math
module compares to the numpy version::

    >>> from math import sqrt as sqrtm
    >>> from numpy import sqrt as sqrtn

    >>> sqrtm(-1.)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: math domain error

    >>> sqrtn(-1.)
    nan

The standard function gives an error whereas the *numpy* version returns
*nan*, a special *numpy* object representing "Not a Number".


You can also import a module and give it a different name locally.  This is
particularly useful if you import a module with a long name, but even for
*numpy* many examples you'll find on the web abbreviate this as *np*
(see :ref:`numerical_python`)::

    >>> import numpy as np
    >>> theta = np.linspace(0., 2*np.pi, 5)
    >>> theta
    array([ 0.        ,  1.57079633,  3.14159265,  4.71238898,  6.28318531])

    >>> np.cos(theta)
    array([  1.00000000e+00,   6.12323400e-17,  -1.00000000e+00, -1.83697020e-16,   1.00000000e+00])

If you don't like having to type the module name repeatedly you can import
just the things you need into your namespace::

    >>> from numpy import pi, linspace, cos
    >>> theta = linspace(0., 2*pi, 5)
    >>> theta
    array([ 0.        ,  1.57079633,  3.14159265,  4.71238898,  6.28318531])
    >>> cos(theta)
    array([  1.00000000e+00,   6.12323400e-17,  -1.00000000e+00, -1.83697020e-16,   1.00000000e+00])


If you're going to be using lots of things form *numpy* you might want to
import everything into your namespace::

    >>> from numpy import *

Then *linspace*, *pi*, *cos*, and several hundred other things will be available
without the prefix.

When writing code it is often best to not do this, however, since then it is
not clear to the reader (or even to the programmer sometimes)
what methods or attributes are coming from which module if several
different modules are being used. (They may define methods with the same
names but that do very different things, for example.)

When using IPython, it is often convenient to start it with::

    $ ipython --pylab

This automatically imports everything from *numpy* into the namespace, and
also all of the plotting tools from *matplotlib*.  

Further reading
----------------

`Modules section of Python documentation
<http://docs.python.org/2/tutorial/modules.html>`_