Speed up your python code by caching

Featured on Hashnode
Listen to this article

Let's say you have a function that is a super-slow function. Not sure how you can find which is a super slow function? Measure it with this.

Now there is now way you can optimize the function, what you can do instead is that you can store results from a previous computation and reuse those results in a new computation to find solutions to other problem.

What we are gonna see in this post?

  • We'll implement finding n-th fibonacci number problem
  • We'll find out how much time it takes to compute 40th fibonacci number.
  • and at the end we'll make our code 400k times faster. (and yeah you are reading it right)

Let's get to code

Let's create world's worst fibonacci series computation code. The algorithm might look like this:

FIBONACCI (n):
    if n -> 0: f = 0
    elif n -> 1: f = 1
    else:
        f = FIBONACCI(n - 1) + FIBONACCI (n - 2)
    return f

This is a correct algorithm for fibonacci. But if you see the recurrence relation T(n) = T(n-1) + T(n-2) + O(1) you can see that the code is running in exponential time O(2^N) which is really really bad.

The equivalent python code would be:

def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)

If you draw a recursion tree you can find that you are computing same computation over and over again in different trees. Let's see what I mean:

+--+-----------+-----------+--------+-----------+-----------+--+
|  |           |           | Fib(n) |           |           |  |
+--+-----------+-----------+--------+-----------+-----------+--+
|  |           | Fib (n-1) |        | Fib (n-2) |           |  |
+--+-----------+-----------+--------+-----------+-----------+--+
|  | Fib (n-2) | Fib (n-3) |        | Fib (n-3) | Fib (n-4) |  |
+--+-----------+-----------+--------+-----------+-----------+--+

See for calculating fib(n) you are calculating Fib (n-1) and Fib (n-2). In a separate computation you are computing Fib (n-2) for that you are computing Fib (n-3) and Fib (n-4).

If you had Fib (n-2) from the previous computation stored, you wouldn't have to recompute that Fib (n-2) and it's subsequent branches. So you would've saved a lot of time by just not recomputing anything.

Let's without caching how much time it would take to compute fib(40) that is 50th fibonacci number:

import time

def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)


start = time.perf_counter()
print(fibonacci(40))
end = time.perf_counter()

print(f"Computed in {(end - start) * 1000} ms")

Total time for computation is 40.635853995000005 seconds. So our python program is taking 40 seconds to compute fib(40). Now let's store intermediate step's data in a dictionary so that we can retrieve those data at a later time in constant time.

Creating a decorator

AKDSFramework has a built in decorator for caching purposes. You can find AKDSFramework here. You can pretty much use this on any python function as you like, small-big-has other dependency anything.

If you install it you can get the benchmarking of python programs, caching python functions and implementation of several data structures and algorithms using best practices in it.

If you don't wish to use my package at the end of the blog I'll paste the source code for @cached decorator.

Now let's import the cached decorator from AKDSFramework

from AKDSFramework.applications.decorators import cached

Now with the cached decorator let's implement the fibonacci series code and see how much time it takes to find fib(40)

import time

@cached
def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)


start = time.perf_counter()
print(fibonacci(40))
end = time.perf_counter()

print(f"Computed in {(end - start)} seconds")

Now it takes around 8.945500000000217e-05 seconds. Which is 400k times faster to compute.

Source code

If you don't wish to use our AKDSFramework here is the source code for the caching decorator.

Our cache storage stores unlimited data, but if your program has limited storage you can update the dictionary to hold predefined amount of data and if one data is not used for long enough you can kick it out with pre defined cache replacement policies.

def cached(func):
    cache = dict()

    def caching(*args):
        if args in cache:
            return cache[args]
        result = func(*args)
        cache[args] = result
        return result

    return caching
Luiz Filipe da Silva's photo

Great one! I'll be using this decorator in future needs. Thanks for sharing!

theroyakash's photo

Glad you liked it

Sébastien Portebois's photo

Did you try timing it against the built-in cache functions Python provides?

eg docs.python.org/3/library/functools.html

It’s indeed a good way to introduce how to create custom decorators, but for this caching purpose, I’d believe the built-in implementation will be faster.

theroyakash's photo

The built in caching function has a limited storage space and unused cached elements are evicted with LRU policy. As both uses a dictionary to store the data cache access time would be same O(1) constant