… and how we might fix it.
This is NOT a hate post. Let me say that again, This is NOT a hate post! Python is a fine language and it was my first programming language. Like the initial stages of falling in love with someone, you can’t see the flaws, the same was with python :) But as time goes by and the initial limerance fades, you start seeing the flaws. Nobody is free of flaws, and neither are programming languages (not even lisp).
The pythonic way of doing map and filter
Using list comprehensions are the pythonic way of doing this.
def inc(x):
return x+1
>>> list(map(inc,range(10)))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# pythonic way
>>> [inc(i) for i in range(10)]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
def is_even(x): return x%2==0
>>> list(filter(is_even, range(10)))
[0, 2, 4, 6, 8]
# pythonic way
>>> [i for i in range(10) if is_even(i)]
[0, 2, 4, 6, 8]
There are several benefits to this, list comprehensions are highly optimized,
and avoid some of the pitfalls that are seen in map
and filter
.
TLDR if you don’t care about using map and filter use List comprehensions.
But since this post is about map and filter, let’s focus on that.
What is a value?
To understand this post you need to understand what values are. In simple terms, entities that can be compared, have a magnitude, and are immutable are values.
How can you tell if something is a value? Easy, just ask whether it is immutable.
Is a string a value? yes, it’s immutable.
Is a tuple a value? yes, it’s immutable.
Is a list a value? no, it’s mutable.
Is an iterator a value? hmm… this one is hard, let’s try it out.
>>> a = iter((1,2,3))
>>> next(a)
1
>>> next(a)
2
>>> next(a)
3
each time you call next
on a
the return changes.
So I’d say it’s not a value
Programming with values is very beneficial see Rich Hickey’s talk on the value of values.
Problem #1 map
and filter
returns an iterator.
>>> res = map(inc, range(10))
# let's check if it worked
>>> list(res)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# let's filter all even integers from res
>>> list(filter(is_even, res))
[]
That is unexpected. If you’re an experienced pythonista, you probably know what went wrong, and this is expected (because you have learned to expect it).
Here’s why it is wrong to expect this. If we used list comprehensions, we wouldn’t run into this.
>>> res=[inc(i) for i in range(10)]
# let's check if it worked
>>> res
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# let's filter all even integers from res
>>> [i for i in res if is_even(i)]
[2, 4, 6, 8, 10]
# unless you directly mutate res
# you can do more things with res.
I’m simplifying a bit, but map
and filter
returns an iterator when you call list
or tuple
on them.
list(res)
exhausts the iterator and res
becomes empty.
>>> res = map(inc, range(10))
# res returns an iterator here
>>> list(res)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# list(res) exhausts the iterator
# so you're filtering an empty iterator here
# so you get an empty list
>>> list(filter(is_even, res))
[]
A workaround
You can immediately realize the iterator and store that result.
res = list(map(inc, range(10)))
>>> list(res)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# works fine!
>>> list(filter(is_even,res))
[2, 4, 6, 8, 10]
But you lose the laziness of map
and filter
and it is inconvenient to do list(map...)
.
Problem #2 map
and filter
is lazy.
>>> filter(is_even, [1,2,3])
<filter object at 0x0000018B347B0EB0>
Here when you call filter
you’re creating a filter object, you are not computing the result.
You only compute it when it’s absolutely required, this is laziness. It’s quite commmon in functional programming.
Here’s why it’s a problem in python.
>>> a = [1,2,3,4]
>>> res = filter(is_even, a)
>>> a.append(10)
>>> a.append(12)
What do you think the result of the filteration will be?
What would you get if you did list(res)
?
I want you to think long and hard.
Here’s the answer
>>> list(res)
[2, 4, 10, 12]
Most people would’ve guessed the answer correctly but that’s not the tough part. I want you to think what you meant when you wrote this. What did you really mean?
>>> res = filter(is_even, a)
I definitely meant that filter the value of a
, which was [1,2,3,4]
.
This can result in bugs that are hard to track down and more importantly it makes
your code hard to reason about.
Remember this, always -
Laziness coupled with mutability is a recipe for setting your hair on fire.
There is a reason that most functional languages have immutability backed in. You can only postpone evaluation of an expression when you can guarantee that the arguments to the expression will mean the same thing EVERYTIME.
in this case the result of filter(is_even, a)
depends on when we realize the iterator.
It is dependent on time.
>>> a = [1,2,3,4]
>>> res = filter(is_even,a)
>>> a.append(10)
>>> a.append(12)
>>> a.append(14)
>>> a.append(16)
>>> list(res)
[2, 4, 10, 12, 14, 16]
it’s exactly the same line of code but the result changes. Here’s another way to think of it.
Your future actions affect the results of your past actions. We are essentially changing the past, which makes it extremely difficult to reason about your code.
I’ll quickly show you a clojure example. (don’t worry it looks almost like python)
user=> (def a [1,2,3,4]) ; equivalent to a = [1,2,3,4]
#'user/a
user=> (def res (filter even? a)) ; even? = is_even
#'user/res
user=> (def a (concat a [10])) ; concat is similar to append
#'user/a
user=> (def a (concat a [12]))
#'user/a
user=> res
(2 4) ; isn't this what you expected?
user=> a ; proof that a is something else
(1 2 3 4 10 12)
filter
is lazy in clojure and yet you get the correct result i.e. filteration of [1,2,3,4]
not [1,2,3,4,10,12]
.
You can’t change the past. You can see why time-travel might be a bad idea 😂
Just to remind you, List comprehensions solve these problems.
How can we fix map
and filter
?
To fix the problems we must
- return a value, not an iterator
- eliminate laziness or make sure mutability doesn’t affect the return value.
Fixing problem #1 is as easy as returning a list or a tuple.
Fixing problem #2 is harder.
If we want to make sure the return value is not affected by mutability, and try to have laziness, we need to do a deepcopy
on the input iterable.
Here’s one way.
class filter:
def __init__(self,fn, iterable):
self.fn = fn
self.iterable = deepcopy(iterable)
self.res = None
def __iter__(self):
return [i for i in self.iterable if self.fn(i)]
But laziness isn’t only delaying computation, it is also computing the results only when they are required.
user=> (take 10 (map inc (range)))
(1 2 3 4 5 6 7 8 9 10)
here (range)
returns an infinite sequence starting from 0
Since map is lazy, it only computes the first ten elements.
So the deepcopy
in my filter
implementation means that my implementation isn’t fully lazy.
The only upside to this implementation is when the filteration function is expensive.
Using eager evaluation
I think the most pragmatic solution would be to eagerly evaluate map
and filter
.
def map(fn, *iterables):
return [fn(*i) for i in zip(*iterables)]
def filter(fn, iterable):
return [i for i in iterable if fn(i)]
The benefit of this is that it can work as a drop in replacement for python’s default map
and filter
and if iterable
is hashable then we can even add an lru_cache
to these functions.
But lists are the most commonly used containers and they’re not hashable, so maybe not that big of a benefit?
Why should you use this over a list comprehension?
This is a matter of personal opinion, after doing functional programming for a long time, I find maps
and filters
more readable
than list comprehensions.
They are more concise and using lambda functions in list comprehensions can be confusing, see this stack overflow answer
A compromise
There might be some rare case where the user might want to iterate over an infinite sequence or a huge sequence where laziness
is necessary.
In that case we can define a lazymap
and lazyfilter
.
In my opinion, making the default case eager, and forcing the user to explicitly use the lazier version when needed is better.
This would reduce surprises when a novice uses map
and filter
.
Can we do better than python’s default lazy implementation ?
Indeed we can
class lazymap:
def __init__(self,fn, *iterables):
self.fn = fn
self.iterables = iterables
def __iter__(self):
return (self.fn(*i) for i in zip(*self.iterables))
class lazyfilter:
def __init__(self,fn, iterable):
self.fn = fn
self.iterable = iterable
def __iter__(self):
return (i for i in self.iterable if self.fn(i))
Here’s why it is better.
let’s define take
.
# taken from functionali
def take(n: int, iterable: Iterable) -> Tuple:
"""Returns the first n number of elements in iterable.
Returns an empty tuple if iterable is empty
>>> take(3, [1,2,3,4,5])
(1, 2, 3)
"""
it = iter(iterable)
accumulator = []
i = 1
while i <= n:
try:
accumulator.append(next(it))
i += 1
except StopIteration:
break
return tuple(accumulator)
now let’s see an example with the default python implementation.
>>> res = map(inc, range(100))
>>> take(5, res)
(1, 2, 3, 4, 5)
>>> take(5, res)
(6, 7, 8, 9, 10)
You don’t get the same result, even though it looks like you’re evaluating the same expression.
Here’s the same thing with lazymap
.
>>> res = lazymap(inc, range(100))
>>> take(5, res)
(1, 2, 3, 4, 5)
>>> take(5, res)
(1, 2, 3, 4, 5)
>>> take(5, res)
(1, 2, 3, 4, 5)
You always get the same result just like you would in clojure or any other functional programming language.
user=> (def res (map inc (range 100)))
#'user/res
user=> (take 5 res)
(1 2 3 4 5)
user=> (take 5 res)
(1 2 3 4 5)
Shameless self-plug
If you liked the ideas in this post and would like to use them; functionali implements these functions.
Could this be a PEP/PR?
If you’re a member of the core python team, and think the default python implementation of map
and
filter
could be changed to something similar to lazymap
and lazyfilter
please let me know I’d love to make a PR/PEP.
As far as I can tell, there should be no breaking changes.
Do reach out to me on twitter :)
Closing thoughts.
There is an overarching theme in this post; how mutability makes things harder to reason about. A lot of languages have mutability by default because they lean towards object orinted programming. Even though python supports multiple paradigms, it heavily favors OOP and an Imperative style. The design decisions behind python were driven by a desire to make code readable, and an imperative/OOP style is more readable. The underlying assumption behind that was code that is easy to read is easy to reason about. That is definitely true, you can’t reason about code written in brainfuck but readability isn’t the only factor makes code easy to reason about.
This is very readable but hard to reason about.
>>> a = [1,2,3,4]
>>> res = filter(is_even,a)
>>> a.append(10)
>>> a.append(12)
>>> a.append(14)
>>> a.append(16)
>>> list(res)
[2, 4, 10, 12, 14, 16]
Besides readability, immutability is the another major factor that makes code easy to reason about.
Thanks to Jacob Zimmerman for pointing out a mistake in my lazymap
and lazyfilter
implementation, see this issue.