Other topics
Caching function calls (“memoization”)
Simple example
Consider the following function:
import time
def slow(x):
time.sleep(1) # mimicking some heavy computation
return x
Calling it 20 times will result in a 20-sec wait:
%%time
for i in range(10):
slow(2)
slow(3)
Wall time: 20.1 s
In reality there are only 2 functions calls: slow(2)
and slow(3)
, and everything else is just a repeat.
Now let’s cache our function calls with a decorator:
import collections
import functools
class memoized(object):
'''Decorator. Caches a function's return value each time it is called.
If called later with the same arguments, the cached value is returned
(not reevaluated).
'''
def __init__(self, func):
self.func = func
self.cache = {}
def __call__(self, *args):
if not isinstance(args, collections.abc.Hashable):
# uncacheable. a list, for instance.
# better to not cache than blow up.
return self.func(*args)
if args in self.cache:
return self.cache[args]
else:
value = self.func(*args)
self.cache[args] = value
return value
def __repr__(self):
'''Return the function's docstring.'''
return self.func.__doc__
def __get__(self, obj, objtype):
'''Support instance methods.'''
return functools.partial(self.__call__, obj)
@memoized
def fast(x):
time.sleep(1) # mimicking some heavy computation
return x
The function will be called twice, and the other 18 times it’ll reuse previously computed results:
%%time
for i in range(10):
fast(2)
fast(3)
Wall time: 2.01 s
Badly-coded Fibonacci problem
Consider a top-down (and thus terribly inefficient) Fibonacci number calculation:
def fibonacci(n):
if n == 0:
return 0
elif n == 1:
return 1
return fibonacci(n-1) + fibonacci(n-2)
%%timeit
fibonacci(30)
181 ms ± 2.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The total number of fibonacci()
calls in this example is 2,692,537, and it scales exponentially towards
lower n
, e.g. f(1)
is evaluated 832,040 times. It turns out it is possible to reuse previous function
calls and only compute new ones using the decorator defined earlier:
@memoized
def fastFibonacci(n):
if n == 0:
return 0
elif n == 1:
return 1
return fastFibonacci(n-1) + fastFibonacci(n-2)
%%timeit
fastFibonacci(30)
268 ns ± 1.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
We have a speedup by a factor of 675,000! This is even better than the anticipated speedup of $2,692,537/30 \approx 89,751$ (from the reduction in the number of function calls) which probably tells us something about the overhead of managing many thousands of simultaneous nested function calls in Python in the first place.
Programming Style and Wrap-Up
- Comment your code as much as possible.
- Use meaningful variable names.
- Very good idea to break complex programs into blocks using functions.
- Change one thing at a time, then test.
- Use version control.
- Use docstrings to provide online help:
def average(values):
"Return average of values, or None if no values are supplied."
if len(values) > 0:
return(sum(values)/len(values))
print(average([1,2,3,4]))
print(average([]))
help(average)
def moreComplexFunction(values):
"""This string spans
multiple lines.
Blank lines are allowed."""
help(moreComplexFunction)
- Very good idea to add assertions to your code to check input:
assert n > 0., 'Data should only contain positive values'
is the same as
import sys
if n <= 0.:
print('Data should only contain positive values')
sys.exit(1)
Links
- Python 3 documentation
- Matplotlib gallery
- NumPy scientific computing package
- SciPy collection of scientific utilities
- Pandas library
- Xarray documentation