1.5. Basics of Code Organization
Back in Programming Basics, we wrote our first program:
print("Hello, world!")
And saw that we could run it directly in the Python interpreter, like this:
>>> print("Hello, world!")
Hello, world!
Or by placing the code in a file called hello.py
, and running that file
from the terminal:
$ python hello.py
Hello, world!
In the previous chapter, we also saw that it is possible to write function
definitions in a Python file, such as the primes.py
file that contained
our is_prime
function, and them “import” the contents of that file
into the interpreter:
>>> import primes
>>> primes.is_prime(4)
False
>>> primes.is_prime(7)
True
So, it seems that Python files can serve two purposes: I can
run a Python file, which may produce some result (like printing Hello, world!
) or
I can import a Python file, which allows me to use functions that are defined inside
that file. In this chapter, we will expand on this distinction (running vs. importing)
by introducing the notion of Python modules, and providing an introduction to
how to organize Python code in a program.
1.5.1. Python modules
A Python module is, quite simply, a file containing Python code. The hello.py
file
mentioned above is a Python module, as is the primes.py
file from the previous
chapter. However, as we saw earlier, we used each of these files in very different ways:
we “ran” the hello.py
file, but we import
-ed the primes.py
file. To
explore this distinction, we’ll start by elaborating on what it means to “import” a module.
To do this, we will use two
modules which you can find in our example code: getting-started/code-organization/primes.py
and getting-started/code-organization/mersenne.py
. In Python, modules are usually referred
to without the .py
extension, so we will refer to these as the “primes
modules” and the
“mersenne
module”.
If you look at the primes
module, you’ll see it contains two functions, print_primes
and is_prime
:
def print_primes(max_n):
"""
Print the primes between 2 and max_n inclusive.
Args:
max_n (int): the upper bound on the range
Returns: None
"""
# code omitted
def is_prime(n):
"""
Is n a prime number?
Args:
n (int): the value to check
Returns (bool): True if n is prime and False otherwise.
"""
# code omitted
The actual implementation of these functions won’t be relevant to our discussion of modules,
as we’ll be focusing on how these functions are called across modules, and not on how they
work internally. So, we won’t be digging into their implementation, but we nonetheless
encourage you to take a quick look at the implementation of the is_prime
function:
it is a bit more complex than the one we saw in the previous chapter, because
we need a faster algorithm for this chapter’s example, and it provides several examples
of using conditional and looping statements in a non-trivial way.
As we’ve seen previously, we can use the import
statement to access the contents of
the primes
module from the interpreter:
>>> import primes
>>> primes.is_prime(4)
False
>>> primes.is_prime(7)
True
The import
statement also allows us to import only specific functions from a module,
like this:
>>> from primes import is_prime
>>> is_prime(17)
True
>>> is_prime(42)
False
We can actually also import modules from other modules as well. In particular, the mersenne
module provides a number of functions related to Mersenne primes, or prime numbers of the
form \(2^p-1\), where \(p\) is itself a prime number. One of these functions
needs to call the is_prime
function, located in the primes
module, so we will
need to import the primes
module from the mersenne
module, which looks something
like this:
import primes
def is_mersenne_prime_exponent(p):
# docstring and code omitted
def get_power_of_two_exponent(n):
# docstring and code omitted
def print_mersenne_primes(max_p):
"""
Print the mersenne primes that have an exponent in the range from
1 to max_p (non-inclusive).
Args:
max_p (int): the upper bound (non-inclusive) for the range of
exponents to consider.
"""
i = 1
for p in range(max_p):
if not primes.is_prime(p):
continue
if is_mersenne_prime_exponent(p):
m = 2 ** p - 1
print(f"M{i}: {m} = 2**{p} - 1")
i += 1
As with the primes
module, the exact implementation of these
functions won’t be relevant to our discussion, but notice how the
print_mersenne_primes
function uses the is_prime
function from
the primes
module.
We are able to use this function because we included the import
primes
statement at the top of the mersenne
module. If you look
at mersenne.py
, you’ll see that the the math
library is also
imported, because the get_power_of_two_exponent
function (which is
not shown) uses math.log2
.
Now, let’s try using a function from the mersenne
module
from the interpreter. Before doing so, exit the interpreter and start it again,
so we can make sure the previous import primes
statement we ran from
the interpreter isn’t affecting this next example.
>>> import mersenne
>>> mersenne.print_mersenne_primes(100)
M1: 3 = 2**2 - 1
M2: 7 = 2**3 - 1
M3: 31 = 2**5 - 1
M4: 127 = 2**7 - 1
M5: 8191 = 2**13 - 1
M6: 131071 = 2**17 - 1
M7: 524287 = 2**19 - 1
M8: 2147483647 = 2**31 - 1
M9: 2305843009213693951 = 2**61 - 1
M10: 618970019642690137449562111 = 2**89 - 1
Notice how we are able to call the print_mersenne_primes
function in the mersenne
module,
which internally requires using the is_prime
function, located in a different module.
However, we are not required to run import primes
ourselves in the interpreter,
because it is already being imported from inside the mersenne
module.
1.5.2. Running vs importing a module
So far, we’ve seen that Python modules can be imported from the interpreter and from
other modules, but modules can also be run from the command-line. To better understand
this distinction, we will use an arithmetic
module which you can find in the
getting-started/code-organization/
directory of the examples. You’ll see that there
are three versions of this module: arithmetic
, arithmetic_nomain
, and arithmetic_main
.
We’ll start by looking at the arithmetic
module, which contains two very simple functions:
def add(x, y):
""" Add x and y """
return x + y + 1
def multiply(x, y):
""" Multiply x and y """
return x * y
As expected, we can import this module and use it from the interpreter:
>>> import arithmetic
>>> arithmetic.add(2, 10)
13
>>> arithmetic.multiply(2, 10)
20
But what happens if we run this module from the command-line?
$ python3 arithmetic.py
$
Nothing happens: Python returns immediately. The reason for this is
that Python ran through all the code in the arithmetic.py
file, and only encounters
function definitions. Python internally makes a note that these functions have been defined,
but there are no statements in the file that would make Python do something, like
print a message or call the functions.
So, let’s take a look at this slightly modified version, arithmetic_nomain
:
def add(x, y):
""" Add x and y """
return x + y
def multiply(x, y):
""" Multiply x and y """
return x * y
a = add(2, 10)
m = multiply(2, 10)
print("add(2, 10) = ", a)
print("multiply(2, 10) =", m)
This version includes some statements after the function definitions. If we run this module we’ll see the following:
$ python3 arithmetic_nomain.py
add(2, 10) = 12
multiply(2, 10) = 20
$
What’s happening here is that Python runs through the code, makes a note of that functions
add
and multiply
have been defined, and then encounters code that calls those functions
and prints something, and runs that code as well.
This may seem like a convenient way to define a few functions, and then include some basic code to informally test those functions, but there is a snag: that code will also run when we import the module, resulting in this:
>>> import arithmetic_nomain
add(2, 10) = 12
multiply(2, 10) = 20
>>> arithmetic_nomain.add(2, 10)
12
>>> arithmetic_nomain.multiply(2, 10)
20
The reason for this is that importing a module also causes Python to run through
all the code in the corresponding Python file, which is why we are then able
to use the functions defined in that module. However, we may want to be selective
about what code is run exactly and, in particular, we may want to separate out
the code that should only run when the module is run from the command line.
We can do this by placing the code in a main block.
We can see what this looks like in the arithmetic_main
module:
def add(x, y):
""" Add x and y """
return x + y
def multiply(x, y):
""" Multiply x and y """
return x * y
if __name__ == "__main__":
a = add(2, 10)
m = multiply(2, 10)
print("add(2, 10) = ", a)
print("multiply(2, 10) =", m)
When we run it, the code under if __name__ == "__main__":
runs as expected:
$ python3 arithmetic_main.py
add(2, 10) = 12
multiply(2, 10) = 20
$
But that code won’t be run if we import the module:
>>> import arithmetic_main
>>> arithmetic_main.add(2, 10)
12
>>> arithmetic_main.multiply(2, 10)
20
All that said, this doesn’t mean that every Python module has to have a main block. In the next section, we will elaborate on what a computer program is, and how Python programs often span multiple modules (where typically only one module will have a main block).
Reloading modules
When you import a module from the interpreter, Python will import the current version of that module, and won’t track changes in that module. This means that, if you make a change to the module, that change won’t automatically propagate to the interpreter. For example, try doing the following:
>>> import arithmetic >>> arithmetic_main.add(2, 10) 12
Now, try modifying the arithmetic.py
file modify the add
function
to look like this:
def add(x, y): return x + y + 1
Let’s also add the following function:
def subtract(x, y): return x - y
If you try to use the add
function, you’ll see it still behaves according
to the original (correct) version. Python will also tell you it can’t find
a subtract
function:
>>> arithmetic_main.add(2, 10) 12 >>> arithmetic_main.subtract(100, 10) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'arithmetic_main' has no attribute 'subtract'
Interestingly, importing the module again won’t actually resolve the situation:
>>> import arithmetic >>> arithmetic_main.add(2, 10) 12 >>> arithmetic_main.subtract(100, 10) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'arithmetic_main' has no attribute 'subtract'
You need to explicitly reload the module using Python’s built-in importlib
module:
>>> import importlib >>> importlib.reload(arithmetic) <module 'arithmetic' from 'arithmetic.py'> >>> arithmetic.add(2, 10) 13 >>> arithmetic.subtract(100, 10) 90
1.5.3. Computer programs revisited
In Programming Basics, we said that “a computer program is, at its
core, a collection of instructions that the computer must perform”. We now have a better
sense of what these instructions look like: if
statements, for
and while
loops,
assignments, function definitions, function calls, etc. And, as we’ve seen a few times
already, the vessel for these instructions is text file with a name ending in .py
, i.e.,
a Python module.
However, this doesn’t mean that a program is composed of exactly one module. While simple
programs can often be implemented in one module, it is very common for programs to span
multiple modules. To see an example of this, take a look at the prime-checker
module
in the getting-started/code-organization/
examples directory. This is a module
with a main block that asks the user to enter a number, and will then print out
some information about whether the number is prime or not, and whether it is a Mersenne
prime or not:
"""
Simple program for checking primes.
"""
from primes import is_prime
from mersenne import is_mersenne_prime_exponent, get_power_of_two_exponent
if __name__ == "__main__":
n = input("Enter a number: ")
n = int(n)
if not is_prime(n):
print(f"{n} is not a prime number.")
else:
p = get_power_of_two_exponent(n + 1)
if p is not None:
if is_mersenne_prime_exponent(n):
print(f"{n} is a double Mersenne prime: both {n}")
print(f" and 2**{n} - 1 are both Mersenne primes.")
else:
print(f"{n} is a Mersenne prime ({n} == 2**{p} - 1.")
else:
if is_mersenne_prime_exponent(n):
print(f"{n} is a prime number, but not a Mersenne prime")
print(f" (however, 2**{n} - 1 is a Mersenne prime).")
else:
print(f"{n} is a prime number, but not a Mersenne prime")
print(f" and neither is 2**{n} - 1.")
Here are some sample runs of this program:
$ python3 prime-checker.py
Enter a number: 16
16 is not a prime number.
$ python3 prime-checker.py
Enter a number: 23
23 is a prime number, but not a Mersenne prime (and neither is 2^23-1).
$ python3 prime-checker.py
Enter a number: 61
61 is a prime number, but not a Mersenne prime (however, 2^61-1 is a Mersenne prime).
$ python3 prime-checker.py
Enter a number: 127
127 is a double Mersenne prime: both 127 and 2^127-1 are Mersenne primes.
All the logic involved in actually testing each number’s primality is
contained in two other modules: primes
and mersenne
. Our
program, thus, spans three modules: prime-checker
, primes
, and
mersenne
.
This is a further example of decomposition: while we could have placed all the
code in a single module, dividing it into distinct modules, each with a related
set of functions, makes the code more manageable. It also improves the reusability
of our code: if we wanted to write a different program that involves checking a
number’s primality, all we need to do is import our primes
module.
All that said, this doesn’t mean that “a collection of modules” is a program.
A program is specifically something that is executable, which is the technical term for
“something I can run” (in the manner we’ve described above, as opposed to
just importing a module). In a Python program, this often means that at least
one of the modules must include a __main__
block.
On the other hand, when we have a collection of modules that provides some useful functionality,
but which is not executable, that is what we would call a software library, or simply a library.
For example, we could distribute the primes
and mersenne
modules as a “prime number library”.
Neither of these modules has a __main__
block and that is totally fine: these
modules are not meant to be run but, rather, to be imported by other modules.
While we may not develop that many libraries ourselves, we will almost certainly use existing libraries in our code. In particular, Python itself includes a vast collection of modules, called the Python Standard Library, that we can use in our code and which we describe next.
1.5.4. The Python Standard Library
When you install Python on your computer, you are not only getting a Python interpreter, but also access to a huge collection of modules that is already included with Python. This is known as the Python Standard Library, or PSL, and it provides all sorts of functionality that can come in handy when writing our code. It’s hard to overstate how large and useful this library is: it includes modules for most common tasks you can imagine, such as string processing, math functions, file and directory access, network utilities, and much, much more. You can see the full content of the PSL here: https://docs.python.org/3/library/
We have actually already used two of the modules included with the PSL: the random
module
and the math
module. Now that we know what modules are, we can better understand what is
happening when we do this:
>>> import random >>> random.randint(1,100) 65
Based on what we saw earlier this chapter, it would seem that running import random
requires
that there be a random.py
file in the same directory as our code. Python will look for
a random.py
file in the same directory as our code first but, if it does not find it,
it will check whether the PSL includes such a module (which it does). So, somewhere
in the official Python code, there is a random.py
file containing a bunch of functions
related to random number generation, including a randint
function (the actual random.py
is actually a bit more complicated, and involves classes and objects, which we have not yet
seen).
Where exactly is the PSL?
When you install any piece of software, some of that software will usually be installed in a “system directory”, separate from the directories were you (a regular user) keep your own files. Without going to deep into how operating systems organize their filesystems, it is enough to know that the location of these system directories is well-known by applications running in your computer, including the Python interpreter.
This means that the Python interpreter knows to search through
those system directories if we ask it to import a module (and that module can’t be found
in the same directory as our code). For example, this is where you would find the
random.py
in most operating systems:
Windows:
C:\Program Files\Python 3.8\lib\random.py
MacOS:
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/random.py
Linux:
/usr/lib/python3.8/random.py