Introduction
Clojure Transducers are a thing of beauty. They maybe hard to wrap your head around the first time, but are a joy to work with when it clicks. I’m not going to cover Transducers in this post, you can watch Lamdba Island’s video on transducers.
All that you need to know is transducers are FAST! With a small change in your code, you can get a performance boost for free.
I have been thinking about writing code in a way that enables us to use transducers and this is still an idea, a suggestion. I’d like your input on this, so do send feedback :)
A small example
(->> (range 10)
(map inc)
(filter even?)
(map #(* 2 %)))
;; (4 8 12 16 20)
You can turn the transformation into a transducer quite simply like this.
(def transducer
(comp
(map inc)
(filter even?)
(map #(* 2 %))))
;; realizing the transducer
(into [] transducer (range 10))
; [4 8 12 16 20]
As you can see, turning a threaded macro transformation to a transducer is trivial.
The problem
It was trivial to create a transducer because we had control over how we did the map and filter. What if we had defined each transformation step like this
(defn increment-all [coll]
(map inc coll))
(defn filter-evens [coll]
(filter even? coll))
(defn double-all [coll]
(map #(* 2 %) coll))
We can easily thread them, but we can’t make them transducers. The code is not transducer friendly.
A solution
I wanted a solution that lets me use the defined functions individually as if they were normal functions that work on collections.
(filter-evens (range 10))
; (0 2 4 6 8)
and also compose them as transducers.
(def transducer
(comp
(increment-all)
(filter-evens)
(double-all)))
(into [] transducer (range 10))
The solution I propose is simple, albeit requires a few additional keystrokes. We make the function a multi arity function, that supports 0-arity and 1-arity. the 0-arity returns a transducer and the 1-arity returns the result of the transformation.
(defn increment-all
([]
(map inc))
([coll]
(sequence (increment-all) coll)))
(increment-all (range 10))
; (1 2 3 4 5 6 7 8 9 10)
Alternatively, you could define increment-all
in this way.
(defn increment-all
([]
(map inc))
([coll]
(map inc coll)))
The reason I chose the first way to implement it is because using anonymous functions with map and filter, etc is common, and defining the anonymous function twice (for both arities) can be error prone, what if you change one but not the other?
The other subtle benefit of using the first way is that when you have multiple seq functions like map and filter inside the function, you can internally use a transducer and the caller won’t know (or care).
;; 1st impl
(defn filter-then-map
([] (comp
(filter even?)
(map inc)))
([coll]
(sequence (filter-then-map) coll)))
;; 2nd impl
(defn filter-then-map
([] (comp
(filter even?)
(map inc)))
([coll]
; not using a transducer
(->> coll
(filter even?)
(map inc))))
Here, the first implementation would be much faster than the second implementation when the 1-arity version is called. Plus, if both map and filter use lambdas, or there are a lot of transformations, the 2nd implementation would be more concise.
Let’s redefine all our functions in this way.
(defn increment-all
([]
(map inc))
([coll]
(sequence (increment-all) coll)))
(defn filter-evens
([]
(filter even?))
([coll]
(sequence (filter-evens) coll)))
(defn double-all
([]
(map #(* 2 %)))
([coll]
(sequence (double-all) coll)))
Now, what if we want to find the product of all the numbers after they go through our transformation. Its a classic reduce.
(defn product
([coll]
(reduce * 1 coll)))
(->> (range 10)
(map inc)
(filter even?)
(map #(* 2 %))
product)
If we want product
to work with transducers we need to use transduce
instead of reduce
.
Here’s how we can support both.
(defn product
([] (completing *))
([coll]
(reduce (product) 1 coll))
([xform coll]
(transduce xform (product) 1 coll)))
The 0-arity version will return the reducing fn *
. In this case completing
isn’t necessary but if your reducing function only supports 2 arities,
it will cause a problem with transduce, so I’ve included it here for completeness (haha).
The 1-arity version will be a normal reduce, and the 2-arity will be a transduce.
The beauty of writing functions like this is that if the caller wishes to use transducers they can, and if they wish to use them as plain old seq functions they can.
Bringing it all together
;;; definitions
(defn increment-all
([]
(map inc))
([coll]
(sequence (increment-all) coll)))
(defn filter-evens
([]
(filter even?))
([coll]
(sequence (filter-evens) coll)))
(defn double-all
([]
(map #(* 2 %)))
([coll]
(sequence (double-all) coll)))
(defn product
([] (completing *))
([coll]
(reduce (product) 1 coll))
([xform coll]
(transduce xform (product) 1 coll)))
;; normal transformation
(->> (range 10)
(increment-all)
(filter-evens)
(double-all)
(product))
; 122880
;; transducer version
(def transducer
(comp
(increment-all)
(filter-evens)
(double-all)))
(product transducer (range 10))
; 122880
Why should I do things this way?
You might ask “Why should I define increment-all
instead of letting the caller decide whether they want to use map
, mapv
, or a transducer?”.
Indeed, I struggled with this question quite a bit.
Here’s my answer.
In application code, we build abstraction layers.
map
, filter
and reduce
are low-level building blocks used to build the lower layers,
i.e your basic transformation steps, but composing these transformations is at a level higher.
Let’s take a concrete example. If we are building software for schools then we might have a map called Student, and a list of students. Each student has a name, and a discipline they belong to.
This is our domain model.
(def students
[{:student-name "Luke Skywalker"
:discipline "Jedi"}
{:student-name "Hermione Granger"
:discipline "Magic"}])
(defn student-name [student]
(:student-name student))
(defn discipline [student]
(:discipline student))
If we want a list of all student’s names, we will have to map student-name
over all students.
You could create a function called student-names
that does this for you or let the user map student-name
wherever they want.
If we take the second option, we might encounter something like this.
;; Displaying student names in the UI
(->> students
(map student-name)
(map clojure.string/capitalize))
;; sorting in the UI
(->> students
(map student-name)
sort)
;; generating slugs in the backend using a transducer
(->> students
(into [] (comp
(map student-name)
(map clojure.string/lower-case)
(map #(clojure.string/replace % #" " "-")))))
You’ll notice that (map student-name)
logic is being repeated in many places.
You’re also burdening the caller with always having to call (map student-name)
it’s obvious that most of our actions will be on all students, so it makes sense
to define a student-names
function.
(defn student-names
([] (map student-name))
([students]
(sequence (student-names) students)))
;; for the sake of completeness
(defn capitalize-names
([] (map clojure.string/capitalize))
([students]
(sequence (capitalize-names) students)))
(defn lowercase-names
([] (map clojure.string/lower-case))
([students]
(sequence (lowercase-names) students)))
(defn slugify-names
([] (map #(clojure.string/replace % #" " "-")))
([students]
(sequence (slugify-names) students)))
Now let’s see how the application code will look
;; Displaying student names in the UI
(->> students
student-names
capitalize-names)
;; sorting in the UI
(->> students
student-names
sort)
;; generating slugs in the backend using a transducer
(->> students
(into [] (comp
(student-names)
(lowercase-names)
(slugify-names))))
Notice how it seamlessly works with threading macros and transducers?
Closing Thoughts
This way of writing functions lets you compose them as plain old functions or transducers, which is beautiful if you ask me. Doing something like this is only possible because of Clojure’s elegant design.
I don’t know if this is actually a good idea (I think it is but I’m biased). I don’t know if this will work well in production or it will cause problems.
Ideas need to be scrutinized, tested and built upon. That’s why I’m putting this out there for you. Send me your feedback, if you use it, tell me how it goes. You can tweet at me @the_lazy_folder.
EDIT
Thanks to @timothypratley for pointing out an error in my code.
Initially I wrote increment-all
and other functions as
(defn increment-all
([]
(map inc))
([coll]
(lazy-seq (into [] (increment-all) coll))))
Turns out putting a lazy-seq infront of a vector does not make it lazy as evidenced by (take 5 (increment-all (range)))
.