SoftOver
 
Recommended


User login


 

Cool vs. Clear

    Recently I've stumbled upon an interesting code at the Ruby Quiz site.

Unnecessary details dropped, it looks like:

arr = File.open("/usr/share/dict/words") do |dict|
  dict.inject(Hash.new) do |all, word|
    all.update(word.delete("^A-Za-z").downcase => true)
  end.keys
end
What does this neat piece of code do? First, we read a list of words from a system dictionary (available on most unix platforms). Then we inject to this dictionary a hash update block that adds alpha-only downcased words into a new hash. After the injection we return hash keys...

Stop, stop, stop! So we do not need a hash at all? We are interested in cleaning of input dictionary only? Then why just not use Array#uniq?

The code will be cleaner (though it will not show the full extent of author coolness :), and should be faster (even if Array#uniq uses internally the same idea, it is precompiled):

arr = File.open("/usr/share/dict/words") {|f| f.readlines}.map{|word| word.delete("^A-Za-z").downcase}.uniq
We can go a bit further in improving this. Just switching the sequence of operations, we can shorter this line by 3 letters, and improve slightly its performance:
arr = File.open("/usr/share/dict/words") {|f| f.readlines}.map{|word| word.downcase.delete("^a-z")}.uniq
Here is a complete code with a benchmark and its result:
require 'benchmark'

def hash_uniq
  arr = File.open("/usr/share/dict/words") do |dict|
    dict.inject(Hash.new) do |all, word|
      all.update(word.delete("^A-Za-z").downcase => true)
    end.keys
  end
end

def array_uniq
  arr = File.open("/usr/share/dict/words") {|f| f.readlines}.map{|word| word.delete("^A-Za-z").downcase}.uniq
end

def arydc_uniq
  arr = File.open("/usr/share/dict/words") {|f| f.readlines}.map{|word| word.downcase.delete("^a-z")}.uniq
end

hus=hash_uniq.sort
aus=array_uniq.sort
dus=arydc_uniq.sort

raise "My bad!" unless hus == aus && aus==dus

num=1
(1..8).each do
  Benchmark.bm do |x|
     x.report(" hash_uniq(#{num})") { num.times { hash_uniq } }
     x.report("array_uniq(#{num})") { num.times { array_uniq } }
     x.report("arydc_uniq(#{num})") { num.times { arydc_uniq } }
     puts '-'*58
     num*=2
  end
end
      user     system      total        real
 hash_uniq(1)  0.440000   0.010000   0.450000 (  0.457720)
array_uniq(1)  0.280000   0.020000   0.300000 (  0.295444)
arydc_uniq(1)  0.270000   0.000000   0.270000 (  0.274489)
----------------------------------------------------------
      user     system      total        real
 hash_uniq(2)  0.900000   0.000000   0.900000 (  0.908177)
array_uniq(2)  0.780000   0.000000   0.780000 (  0.815331)
arydc_uniq(2)  0.700000   0.000000   0.700000 (  0.698690)
----------------------------------------------------------
      user     system      total        real
 hash_uniq(4)  2.160000   0.000000   2.160000 (  2.158588)
array_uniq(4)  1.700000   0.000000   1.700000 (  1.718859)
arydc_uniq(4)  1.230000   0.000000   1.230000 (  1.271515)
----------------------------------------------------------
      user     system      total        real
 hash_uniq(8)  4.260000   0.000000   4.260000 (  4.324980)
array_uniq(8)  3.160000   0.010000   3.170000 (  3.164200)
arydc_uniq(8)  2.560000   0.010000   2.570000 (  2.628832)
----------------------------------------------------------
      user     system      total        real
 hash_uniq(16)  8.730000   0.030000   8.760000 (  8.837948)
array_uniq(16)  5.790000   0.000000   5.790000 (  5.859001)
arydc_uniq(16)  5.260000   0.010000   5.270000 (  5.327904)
----------------------------------------------------------
      user     system      total        real
 hash_uniq(32) 17.500000   0.020000  17.520000 ( 17.778069)
array_uniq(32) 11.870000   0.020000  11.890000 ( 12.002044)
arydc_uniq(32) 11.330000   0.000000  11.330000 ( 11.437371)
----------------------------------------------------------
      user     system      total        real
 hash_uniq(64) 37.030000   0.020000  37.050000 ( 37.569187)
array_uniq(64) 21.680000   0.030000  21.710000 ( 22.173968)
arydc_uniq(64) 19.600000   0.010000  19.610000 ( 19.888466)
----------------------------------------------------------
      user     system      total        real
 hash_uniq(128) 70.690000   0.040000  70.730000 ( 71.883314)
array_uniq(128) 41.160000   0.040000  41.200000 ( 41.889894)
arydc_uniq(128) 38.110000   0.020000  38.130000 ( 38.757940)
----------------------------------------------------------