With the release of a first preview during RubyConf 2012, the release of Ruby 2.0 is getting closer.
After Module#prepend, Module#refine, let's talk about Enumerator::Lazy.
Chainable iteration
Given this nonsensical example:
hashes = (1..10000).select(&:even?).map(&:hash).map(&:to_s)
This code does the following:
- keeps only even numbers
- fetches their internal hash
- transforms those hash in String
But it also creates a intermediate array for each block, and iterates several times.
That can be done in one iteration:
hashes = (1..10000).inject([]) do |accumulator, number|
if number.even?
accumulator << number.hash.to_s
else
accumulator
end
end
Let's face the truth: the code is less readable.
Enumerator::Lazy
Here comes the laziness.
hashes = (1..10000).lazy.select(&:even?).map(&:hash).map(&:to_s).to_a
When to_a
is called, the code is evaluated. Internally, Ruby builds a specific
block: no intermediate arrays are created and only one iteration occurs.
Without calling to_a
(which is an alias to the force
method), it returns a
Enumerator#Lazy object.
#<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy:
1..10000>:select>:map>:map>
Benchmarking
Since there is only one iteration, it should be faster. Well, let's run some benchmarks first.
require 'benchmark'
[10000, 100000, 1000000, 10000000].each do |size|
Benchmark.bm do |b|
b.report("chainable #{size}") do
hashes = (1..size).select(&:even?).map(&:hash).map(&:to_s)
end
end
Benchmark.bm do |b|
b.report("one iteration #{size}") do
hashes = (1..size).inject([]) do |accumulator, number|
if number.even?
accumulator << number.hash.to_s
else
accumulator
end
end
end
end
Benchmark.bm do |b|
b.report("chainable lazy #{size}") do
hashes = (1..size).lazy.select(&:even?).map(&:hash).map(&:to_s).to_a
end
end
end
So, Enumerator#Lazy seems to be the slowest in every case. According to a bug report, the cost of block creation for laziness is bigger than its gain. That being said, those benchmarks need to be rerun when Ruby 2.0 will be released.
As pointed out in this bug report, there are some cases where Enumerator#Lazy is definitely the best choice you can make. For example, you can use it when extracting elements from a huge, or even infinite enumerator.
Prime.select {|x| x % 4 == 3 }.take(10)
This code iterates on all prime numbers before doing a select, then a take. However, it will iterates indefinitely and will never do the select nor the take.
a = []
Prime.each do |x|
next if x % 4 != 3
a << x
break if a.size == 10
end
This code would do the job, but is not easily readable.
With lazy
, we can do:
Prime.lazy.select {|x| x % 4 == 3 }.take(10).to_a
This would return immediately, and provide a good readability.
Coming next in this series: Named arguments.