September 21, 2011
These are notes from a talk I gave at Clojure.mn on September 7, 2011. The goal was to explore Clojure sequences in more depth than the standard introductory books and tutorials. As you will see, our investigations quickly led us to the depths of Clojure's Java implementation.
A sequence is a logical list that, unlike a standard Lisp cons cells, is not tied to a particular implementation. A seq is an object that implements the seq interface on a collection, allowing you to access it like a sequence. Rich Hickey says, "A seq is like a logical cursor," and that, unlike iterators, they are immutable.
At the implementation level, sequences are defined by the
clojure.lang.ISeq
Java interface. When this interface is implemented on a collection,
it becomes accessible as a sequence.
Collections that can be "viewed as sequences" are called seq-able. Programming Clojure provides a nice overview such collections:
So if you are familiar with sequences, you can manipulate most of the data structures you'll use in Clojure.
Let's look at some collections to see what is and isn't a sequence.
We'll use the seq? predicate to check if a collection implements the
ISeq interface.
user> (seq? '(1 2)) ;=> true
user> (seq? [1 2]) ;=> false
user> (seq? {:foo :bar}) ;=> false
user> (seq? #{1 2}) ;=> false
user> (seq? ()) ;=> true
So lists (even empty ones) are sequences, and vectors, maps, and sets
are not. But aren't all Clojure collections sequences? Technically,
they are all "viewable as sequences," which means that they can
implement the ISeq interface, and return a seq object that can be
treated like a sequence. To make them do that, you use the the
seq function, which implements the ISeq interface on the
collection passed to it. Let's try it out on the collections we just
tested.
user> (seq '(1 2)) ;=> (1 2)
user> (seq [1 2]) ;=> (1 2)
user> (seq {:foo :bar}) ;=> ([:foo :bar])
user> (seq #{1 2}) ;=> (1 2)
user> (seq ()) ;=> nil
We see that calling seq on the list, vector, and set all printed the
same output, while the map was turned into something else, and the
empty set returned nil. But this is just how the REPL represents
sequences. To see what's really happening under the hood, let's look
at the classes that are returned.
user> (class '(1 2))
;=> clojure.lang.PersistentList
user> (class (seq '(1 2)))
;=> clojure.lang.PersistentList
So calling seq on a list returns a list. This is unsurprising,
since the seq? predicate returns true for lists, meaning they
implement ISeq out of the box. Another interpretation is that since
the argument to seq is already a sequence, it returns the original
sequence rather than a seq object which accesses the argument.
user> (class [1 2])
;=> clojure.lang.PersistentVector
user> (class (seq [1 2]))
;=> clojure.lang.PersistentVector$ChunkedSeq
Here we see that
clojure.lang.PersisitentVector
is the vector class, and clojure.lang.PersistentVector$ChunkedSeq is
a seq object that accesses vectors. We see similar seq objects
returned for maps
(clojure.lang.PersistentArrayMap)
and sets
(clojure.lang.PersistentHashSet).
user> (class (seq {:foo :bar}))
;=> clojure.lang.PersistentArrayMap$Seq
user> (class (seq #{1 2}))
;=> clojure.lang.APersistentMap$KeySeq
The point is not to dwell on the specific implementation details, but
to see that while they all look like (1 2) in the REPL, each is
actually a different collection, and each collection returns a unique
seq object to access it as a sequence.
Laziness means you can specify a computation that would theoretically
take forever to complete, and evaluate as much or as little of it as you
want. An example of this is the primes sequence taken from
clojure.contrib.lazy-seqs.
It's said that "most" Clojure sequences are lazy. So why can't we simply say they're all lazy? One reason is that you can always define your own non-lazy sequence. There may also be implementation details that make some sequences less lazy than others, such as with those collections that are implemented as chunked sequences.
A simple lazy sequence can be created using the range function.
user> (range 10)
;=> (0 1 2 3 4 5 6 7 8 9)
user> (class (range 10))
;=> clojure.lang.LazySeq
While the REPL evaluates the entire sequence and prints it out as part
of it's P step, a program using this structure would not count from
0 to 9 until it was told to. One way to keep the REPL from evaluating
the whole sequence is to use the take function:
user> (take 5 (range 10))
;=> (0 1 2 3 4)
user> (class (take 5 (range 10)))
;=> clojure.lang.LazySeq
We see that take returns a sequence that is a subset of its input
sequence. As a consequence, only that subsequence is evaluated in the
REPL.
So how do we force the evaluation of an entire sequence, say, to
realize its side effects? You can use the dorun and doall
functions (or the doseq macro, which works like a list
comprehension),
as follows:
user> (dorun (range 10))
;=> nil
user> (doall (range 10))
;=> (0 1 2 3 4 5 6 7 8 9)
This illustrates the difference between these two functions: doall
returns the sequence after evaluating it; dorun does not. Both
produce all of the side effects resulting from evaluating the
sequence.
first returns itemsThe first function evaluates the first element of a sequence and
returns it. So the output is of the same class as the elements of the
sequence. If its argument isn't a sequence, it calls seq on it
first.
user> (first '(1 2)) ;=> 1
user> (first [1 2]) ;=> 1
user> (first {:foo :bar, :baz :bot}) ;=> [:foo :bar]
user> (first #{1 2}) ;=> 1
user> (first ()) ;=> nil
The behavior is pretty straightforward for standard Clojure collections. Notice that
Calling first on the empty set returns nil, as expected. And
what's the class of the first map pair?
user> (class (first {:foo :bar, :baz :bot}))
;=> clojure.lang.MapEntry
So calling seq on a map returns a sequence of
clojure.lang.MapEntrys,
which are printed like vectors in the REPL.
rest and cons always return sequencesThe rest function returns the remaining items in a sequence. The
result is always a sequence, regardless of it's
input. So with vectors we see that rest outputs a sequence:
user> (seq? [1 2 3]) ;=> false
user> (rest [1 2 3]) ;=> (2 3)
The cons function constructs new sequences from old sequences by appending the
first argument to the front of the second.
user> (cons 1 '(1 2)) ;=> (1 1 2)
user> (class (cons 1 '(1 2))) ;=> clojure.lang.Cons
This clojure.lang.Cons class is a sequence inasmuch as it
implements ISeq, but it does not implement all the features of a standard
list (PersistentList).
consing a vector also produces a clojure.lang.Cons sequence.
user> (cons 1 [1 2]) ;=> (1 1 2)
user> (class (cons 1 [1 2])) ;=> clojure.lang.Cons
Sets operate similarly, but since the second argument has seq called
on it before appending the first argument to it, it no longer
acts like a set, which can only have unique elements.
user> (cons 1 #{1 2}) ;=> (1 1 2)
Since cons always outputs clojure.lang.Conss, I'll omit any
further class checking of its output.
consing maps is not as straightforward, since a map is a set of
key/value pairs and cons will simply call seq on its input and
append an item to the front of it. So consing a keyword will produce
a sequence with a keyword followed by a map entry.
user> (cons :baz {:foo :bar}) ;=> (:baz [:foo :bar])
consing another map will produce a sequence whose first entry is a
map, followed by a map entry, which is a pretty odd thing.
user> (cons {:baz :bot} {:foo :bar})
;=> ({:baz :bot} [:foo :bar])
The preferred method is to cons a two-element vector, which is
interpreted as a map entry, onto a map.
user> (cons [:baz :bot] {:foo :bar})
;=> ([:baz :bot] [:foo :bar])
The result is a sequence of map entries — the standard way Clojure converts maps to sequences.
rest versus nextBoth rest and next return the remaining items in the sequence, but they deal with
empty sequences differently. rest returns a sequence no matter what,
and never returns nil. next returns nil instead of an empty
sequence, by calling seq on the result. So you can think of (next
aseq) as equivalent to (seq (rest aseq)).
user> (rest '(1)) ;=> ()
user> (seq ()) ;=> nil
user> (next '(1)) ;=> nil
So next is less lazy than rest because it looks ahead at the
returned sequence to see if it's empty via the seq function. This
makes rest the preferred method.
cons versus conjThe main difference between cons and conj is that cons returns
sequences (of type clojure.lang.Cons) regardless of its input, while
conj returns the type of collection that was passed to it. Also,
conj can add multiple items to a collection, whereas cons can only
add one at a time. This means the collection argument comes first in
conj, so that there can be a variable number of items added to the
collection.
A more subtle difference is that conj operates on the collection's
underlying data structure, rather than on the ISeq interface, so it
can be more efficient. As a consequence, conj doesn't always append
items to the beginning of collections. Here are some illustrations.
user> (conj '(1 2) 1) ;=> (1 1 2)
user> (class (conj '(1 2) 1)) ;=> clojure.lang.PersistentList
So for lists, conj appends to the beginning, like cons. But it
returns a PersistentList rather than a Cons sequence. On a
vector, conj appends to the end, and returns a vector:
user> (conj [1 2] 1) ;=> [1 2 1]
user> (class (conj [1 2] 1)) ;=> clojure.lang.PersistentVector
Sets are unordered, so there's no concept of beginning or end, and they contain only unique elements.
user> (conj #{1 2} 1) ;=> #{1 2}
user> (class (conj #{1 2} 1)) ;=> clojure.lang.PersistentHashSet
So conjing 1 to a set that already contains 1 produces an identical
set. In this way, conj operates like a set union operation. It's
worth noting that consing 1 to #{1 2} not only outputs a
different data structure, but that data structure also contains the
element 1 twice, since sequences don't preserve the uniqueness
constraint of sets.
conjing a map pair onto a map produces a new map, which is unordered.
user> (conj {:foo :bar} [:baz :bot])
;=> {:baz :bot, :foo :bar}
user> (class (conj {:foo :bar} [:baz :bot]))
;=> clojure.lang.PersistentArrayMap
Unlike with cons, we can also append maps to maps.
user> (conj {:foo :bar} {:baz :bot})
;=> {:baz :bot, :foo :bar}
There are many other interesting features of sequences, but our focus here has been on their implementations, rather than their day to day use. Much more could be said about these implementations but, as we've seen, the further you go down the implementation rabbit hole, the more your inquiry begins to resemble a Java exercise rather than a Clojure one. Maybe its best, next time you're wondering what's under the hood of your favorite sequence, to remind yourself that you're here to program Clojure, not Java.
Update: (October 1, 2011) Fixed some typos and made more precise use of the terms seq and sequence, following the conventions in The Joy of Clojure, §5.1.2.