As you may recall from last time, kata number eight has three parts: write it to be readable, write it to be fast, and write it to be extensible. I covered readability last time, so now let's go for speed. (Cue The Distance, by Cake.)
#! /usr/bin/ruby # Dave Thomas Code Kata #8: Conflicting Objectives # See http://codekata.pragprog.com/2007/01/kata_eight_conf.html # Solution by Dave Aronson # version 2: make it fast # note that it's no longer in functions, to save the call overhead # unroll the inner loop again, for speed one_letter_words = {} two_letter_words = {} three_letter_words = {} four_letter_words = {} five_letter_words = {} # this one is still fine as an array, since we just chug through it. # no need for the extra functionality of a hash, nor the overhead. six_letter_words = [] start_time = Time.now File.new(ARGV[0], 'r').each_line do |line| word = line.chomp case word.length # TODO: check if there's a faster way than this... maybe a trie? # we don't need the value, just whether the key is there. when 1: one_letter_words [word] = 0 when 2: two_letter_words [word] = 0 when 3: three_letter_words[word] = 0 when 4: four_letter_words [word] = 0 when 5: five_letter_words [word] = 0 when 6: six_letter_words << word end end # accumulate results, rather than reporting them as we find them; # also use this to track # of hits results = [] six_letter_words.each do |word| part_1 = word[0 .. 0] part_2 = word[1 .. 5] # do longer part first to make dropout most likely if five_letter_words.has_key? part_2 and one_letter_words.has_key? part_1 results << "#{part_1} + #{part_2} => #{word}\n" end part_1 = word[0 .. 1] part_2 = word[2 .. 5] # do longer part first to make dropout most likely if four_letter_words.has_key? part_2 and two_letter_words.has_key? part_1 results << "#{part_1} + #{part_2} => #{word}\n" end part_1 = word[0 .. 2] part_2 = word[3 .. 5] if three_letter_words.has_key? part_1 and three_letter_words.has_key? part_2 results << "#{part_1} + #{part_2} => #{word}\n" end part_1 = word[0 .. 3] part_2 = word[4 .. 5] if four_letter_words.has_key? part_1 and two_letter_words.has_key? part_2 results << "#{part_1} + #{part_2} => #{word}\n" end part_1 = word[0 .. 4] part_2 = word[5 .. 5] if five_letter_words.has_key? part_1 and one_letter_words.has_key? part_2 results << "#{part_1} + #{part_2} => #{word}\n" end end puts results puts "#{results.length} hits" puts "Running time: #{Time.now - start_time} seconds"
This version is about 1/3 faster than the "readable to Ruby programmers" version. (And of course hugely faster than the "readable to non-Rubyists" version, with arrays!) I also surmised that perhaps the "case word.length
" part might be inefficient, and used again an array of hashes... but apparently the indexing was sufficiently slow that it was in fact worse than before!
Now it's your turn again. Despite the fact that I'm doing these in Ruby, I don't claim to be a Ruby guru, so there may have been some cute Ruby tricks I missed, that could speed this up. Maybe even some algorithm tweaks. What would you do differently?
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.