Attack of the Codosaurus!: 2010

Saturday, November 20, 2010

Code Kata Nine: Supermarket Checkout, Part B (Decouple)

As you may recall, before life (real life, not Conway's) got rather busy, I left you hanging off the cliff of Dave Thomas's Code Kata #9. I had finished the "just make it work" part. Now for the "decouple the price rule format" part.

As Dave (the other Dave) says, there are many ways to do it. I chose to use the Strategy Pattern. (If you're not familiar with design patterns, you really need to go read up on them.)

As you can see in the code below, the piece that gets swapped out is actually very small. It does nothing but parse the pricing rules. How they get applied remains the same.

#! /usr/bin/ruby

# Dave Thomas Code Kata #9: Supermarket Checkout
# See http://codekata.pragprog.com/2007/01/kata_nine_back_.html
# Solution by Dave Aronson
# VERSION B: decouple the pricing format.


class CheckOut

  def initialize rules, parser
    @items = {}
    # only really need this in total, but IRL a customer
    # might just want a price-check on something....
    @pricer = Pricer.new rules, parser
  end

  def scan sku
    @items[sku] = 0 if @items[sku].nil?
    @items[sku] += 1
  end

  def total
    tot = 0
    @items.each_pair { |sku, qty| tot += @pricer.price sku, qty }
    tot
  end

end


# separate from CheckOut 'cuz IRL there may be price-check stations
class Pricer

  def initialize rules, parser
    @prices = parser.parse rules
  end

  def price sku, quantity
    qty = quantity
    tot = 0
    while qty > 0 do
      deal_qty, deal_price = find_best_deal sku, qty
      num_deals = qty / deal_qty
      tot += num_deals * deal_price
      qty -= num_deals * deal_qty
    end
    tot
  end

  private

  def find_best_deal sku, qty
    prices = @prices[sku]
    if prices.nil?
      puts "ERROR: NO PRICE FOUND for #{sku}!"
      return nil
    end
    deal = nil
    prices.each_pair do |rule_qty, rule_tot|
      if rule_qty > qty then
        break
      else
        # Inefficient to keep overwriting this,
        # but shouldn't happen too much....
        deal = [rule_qty, rule_tot]
      end
    end
    puts "ERROR: NO PRICE FOUND for #{qty} of #{sku}!" if deal.nil?
    deal
  end

end


# now we get to the heart of the matter:
# different ways to parse prices.
# essentially the strategy pattern.


class CSVPriceParser
  def parse rules
    prices = {}
    rules.each do |line|
      line.chomp!
      cur_sku, qty, tot = line.split ","
      prices[cur_sku] = {} if prices[cur_sku].nil?
      prices[cur_sku][qty.to_i] = tot.to_i
    end
    prices
  end
end


CSV_RULES = [
  "A,1,50",
  "A,3,130",
  "B,1,30",
  "B,2,45",
  "C,1,20",
  "D,1,15",
]


class YAMLPriceParser
  def parse rules
    prices = {}
    cur_sku = ''
    rules.each do |line|
      line.chomp!
      if line[0..0] == "\t"
        qty, tot = line[1..-1].split "\t"  # 1 to ignore tab
        # really should check for cur_sku not being set yet,
        # though that would mean it's an invalid file....
        prices[cur_sku] = {} if prices[cur_sku].nil?
        prices[cur_sku][qty.to_i] = tot.to_i
      else cur_sku = line
      end
    end
    prices
  end
end


YAML_RULES = [
  "A",
  "\t1\t50",
  "\t3\t130",
  "B",
  "\t1\t30",
  "\t2\t45",
  "C",
  "\t1\t20",
  "D",
  "\t1\t15",
]


# TESTS, initially supplied by Dave Thomas,
# but refactored by Dave Aronson
# to support swapping out strategies

require 'test/unit'

class TestPrice < Test::Unit::TestCase

  def helper_test_incremental rules, parser
    co = CheckOut.new rules, parser
    assert_equal 0, co.total
    co.scan "A";  assert_equal( 50, co.total)
    co.scan "B";  assert_equal( 80, co.total)
    co.scan "A";  assert_equal(130, co.total)
    co.scan "A";  assert_equal(160, co.total)
    co.scan "B";  assert_equal(175, co.total)
  end

  def helper_test_totals rules, parser
    assert_equal(  0, price("", rules, parser))
    assert_equal( 50, price("A", rules, parser))
    assert_equal( 80, price("AB", rules, parser))
    assert_equal(115, price("CDBA", rules, parser))

    assert_equal(100, price("AA", rules, parser))
    assert_equal(130, price("AAA", rules, parser))
    assert_equal(180, price("AAAA", rules, parser))
    assert_equal(230, price("AAAAA", rules, parser))
    assert_equal(260, price("AAAAAA", rules, parser))

    assert_equal(160, price("AAAB", rules, parser))
    assert_equal(175, price("AAABB", rules, parser))
    assert_equal(190, price("AAABBD", rules, parser))
    assert_equal(190, price("DABABA", rules, parser))
  end

  def price goods, rules, parser
    co = CheckOut.new rules, parser
    goods.split(//).each { |item| co.scan(item) }
    co.total
  end

  def helper_test rules, parser
    helper_test_incremental rules, parser
    helper_test_totals rules, parser
  end

  def test_rulesets
    helper_test CSV_RULES, CSVPriceParser.new
    helper_test YAML_RULES, YAMLPriceParser.new
  end


end

As always (just had to continue starting each paragraph with "As"!), it's now your turn. How would you have designed and/or implemented this differently? Leave a comment.

Sunday, November 14, 2010

And now for something completely different...

Okay, maybe not completely different, as it will still be code-related. But, coding time is at quite a premium right now. So, until I manage to get some time to sit down and code for fun, I'm just going to post little snippets of advice.

I'm working on a team with some newer developers -- just a few years under each belt, versus my decades. One advantage, as far as this blog goes, is that it lets me spot times when they make newbie-ish mistakes that I've since learned solutions to. Then I can share those solutions with you! (And of course my colleagues.)

The first one is to keep track of work not-yet-done, or klugily done in a way that should be fixed ASAP.

On one of the programs we're working on now, there's a part that counts down "3, 2, 1" in some particular (human) language, depending on the "content set" it's using. (As you may recall, I'm now working for Rosetta Stone.) We got a bug report that this was always in Spanish, even if the content set it was dealing with was in French, English, etc. Turns out this wasn't so much a bug per se, as an unimplemented feature -- or rather a chain of unimplemented features.

It turns out that the guy who originally wrote the program did not know how to make Part A (which knows all about the content set) tell Part B anything. So, he hardcoded a bunch of things, such as that Part B should launch Part C (which does the countdown) in Spanish.

Since he's fairly new, I can't fault him too much for not knowing how to do something. I had to Google it myself. I can fault him for not trying to find out, though! Over on my other blog (personal excellence blog Dare2XL), I have an old post about asking for help. But that's only part of the story -- the non-coding part, and this is my coding blog. B-)

The other part is how to keep track of such unimplemented features, temporary kluges, etc. How do you avoid releasing something as "done", only to get bug reports due to such things?

What I've usually done is to mark them with "TODO" or "FIXME" comments. When I think I've finished, I grep the code, configs, etc. for such tags. (Use grep, not your eyeballs, even if you're using an editor that syntax-highlights those words. Grep is much more reliable -- and when combinded with find or ls -R, it's much more certain to find all the relevant files.)

Some of them are about ideas for future features, so I use a colon, and those I mark as "TODO MAYBE:" or "TODO LATER:", rather than simple "TODO:". Note how the colon is now not next to TODO, so I can grep for "TODO:", not just "TODO".

Another way is to use your issue-tracker (such as Bugzilla, Jira, Trac, etc.) to make small tickets for yourself for each such feature. The effectiveness of this approach depends on many factors, such as your team's policy on making tickets, your tool's complexity, and its filtering ability. If you can easily make tickets that come quickly to your attention without getting in other people's way, great. If not, you can still use a private task list, be it in some purpose-built tool, a spreadsheet, a simple text file, or whatever -- just so long as you refer to it often. (I won't go into tons of advice on how to use a task list; surely there are gazillions of web pages on that. That said, though, Google "Getting Things Done" for one very good approach.)

Yet another way is to make sure that the requirement is covered by a test. (Our group is not doing much developer-level testing right now; I intend to fix that.) In this case, there could have been a test like "Make sure that a French content set makes the countdown launch in French. Repeat for Spanish." (Or maybe, to get broader coverage, pick two random languages each time.)

So now, as always, dear reader, it's your turn! How do you keep track of unfinished, or klugily finished, work, to avoid releasing it into "the wild"? Leave a comment.

Tuesday, October 5, 2010

Ads? What ads?

Oops! Sorry for the large number of ads on here. Long story short I goofed the setup. I knew they wouldn't show up for me, due to Adblock Plus. (One of my most favoritest Firefox plugins EVAR!) What I didn't realize is that it would also block their representation on the Blogger layout design screen! So I had a bunch, at the top, when I thought I had none, and intended to just put one at the bottom. Mega chalupa!

Sunday, October 3, 2010

Rosetta Stone: the company, and my own

In other news, I have accepted a post with Rosetta Stone, in Rosslyn VA, and will be starting on October 18. Oddly enough, I'll be working mainly in ActionScript and Flex... neither of which I know yet! I'm told that ActionScript is generally similar to JavaScript, which I do know, so at least that's a head-start.

What does this mean for you, my loyal readers? I may soon start doing some of these katas in those languages. Are you disappointed that I'll be discontinuing Ruby? You were bound to be disappointed that way eventually anyway. My plan has long been to build a matrix of problems (such as Dave Thomas katas, other katas, normal life, N-dimensional life, multi-lifeform life, etc.), versus languages, with links to blog posts showing solutions for each problem in each language. Sort of... my own little electronic Rosetta Stone.

Life in N Dimensions

Sometimes we all feel as though life has gotten a bit out of hand. It's certainly been that way for me lately. As you may have noticed, I haven't updated this blog in quite some time, despite the way I was obviously trying to do it weekly. Was this just yet another case of someone up and walking away from their blog? No, not me! Really! I promise! (Well, at least this time it wasn't....)

So what's been keeping me so busy? My life has been very busy in several dimensions. I'll spare you most of it, but one was Ruby DCamp, an "unconference" centered around Ruby. The first day featured a "code retreat". Not an official one with Corey Haines himself present, but at least an event of that style. Apparently, the canonical problem to tackle in these is Conway's Game of Life.

Since each round is only 45 minutes, I didn't manage to get all the way through a successful implementation. And since it's so simple, I won't bother you with the one I did later.

BUT...

Evan Light, the organizer of Ruby DCamp, sort of threw down a gauntlet about extending Life from two dimensions to... unlimited. So, of course I then spent much of my spare time figuring that out.

The really hard part was to figure out how to display the state of the world. Eventually I settled on a sort of recursive concept:

An individual cell is just output as-is.
A row of cells (which I represented internally as a String) is output as-is, again.
A two-dimensional world was output as a grid, as usual.
For any higher dimensions, though:
- If it's odd, it's output as a row of blocks from the next dimension down, with a border between each and around the whole set.
- If it's even, it's output as a block of rows from the next dimension down, with a border between each and around the whole set.

Yeah, I know, that sounds kinda complex. But take a look at it in action, and you'll see what I mean. For instance, a three-dimensional world would look like this:

|* *| * |  *|
|***| **|*  |
|** | * |*  |

a four-dimensional world would look like this:

+---+---+---+
|* *| * |  *|
|***| **|*  |
|** | * |*  |
+---+---+---+
|* *| **|  *|
| **| **|* *|
|** |  *|*  |
+---+---+---+
|* *| * |  *|
|* *| **|***|
|** |** |* *|
+---+---+---+

a five-dimensional world would look like this:

|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| * |  *|||* *| * |  *|||* *| * |  *||
||***| **|*  |||***| **|*  |||***| **|*  ||
||** | * |*  |||** | * |*  |||** | * |*  ||
|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| **|  *|||* *| **|  *|||* *| **|  *||
|| **| **|* *||| **| **|* *||| **| **|* *||
||** |  *|*  |||** |  *|*  |||** |  *|*  ||
|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| * |  *|||* *| * |  *|||* *| * |  *||
||* *| **|***|||* *| **|***|||* *| **|***||
||** |** |* *|||** |** |* *|||** |** |* *||
|+---+---+---+|+---+---+---+|+---+---+---+|

a six-dimensional world would look like this:

+-------------+-------------+-------------+
|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| * |  *|||* *| * |  *|||* *| * |  *||
||***| **|*  |||***| **|*  |||***| **|*  ||
||** | * |*  |||** | * |*  |||** | * |*  ||
|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| **|  *|||* *| **|  *|||* *| **|  *||
|| **| **|* *||| **| **|* *||| **| **|* *||
||** |  *|*  |||** |  *|*  |||** |  *|*  ||
|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| * |  *|||* *| * |  *|||* *| * |  *||
||* *| **|***|||* *| **|***|||* *| **|***||
||** |** |* *|||** |** |* *|||** |** |* *||
|+---+---+---+|+---+---+---+|+---+---+---+|
+-------------+-------------+-------------+
|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| * |  *|||* *| * |  *|||* *| * |  *||
||***| **|*  |||***| **|*  |||***| **|*  ||
||** | * |*  |||** | * |*  |||** | * |*  ||
|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| **|  *|||* *| **|  *|||* *| **|  *||
|| **| **|* *||| **| **|* *||| **| **|* *||
||** |  *|*  |||** |  *|*  |||** |  *|*  ||
|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| * |  *|||* *| * |  *|||* *| * |  *||
||* *| **|***|||* *| **|***|||* *| **|***||
||** |** |* *|||** |** |* *|||** |** |* *||
|+---+---+---+|+---+---+---+|+---+---+---+|
+-------------+-------------+-------------+
|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| * |  *|||* *| * |  *|||* *| * |  *||
||***| **|*  |||***| **|*  |||***| **|*  ||
||** | * |*  |||** | * |*  |||** | * |*  ||
|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| **|  *|||* *| **|  *|||* *| **|  *||
|| **| **|* *||| **| **|* *||| **| **|* *||
||** |  *|*  |||** |  *|*  |||** |  *|*  ||
|+---+---+---+|+---+---+---+|+---+---+---+|
||* *| * |  *|||* *| * |  *|||* *| * |  *||
||* *| **|***|||* *| **|***|||* *| **|***||
||** |** |* *|||** |** |* *|||** |** |* *||
|+---+---+---+|+---+---+---+|+---+---+---+|
+-------------+-------------+-------------+

and so on.

And now, what you've all been waiting for... the code:

#!/usr/bin/ruby


# first some general utility stuff


# combine several blocks of text, as columns.
# might be some more elegant way to do this with recursive
# calls to Array#zip or something like that....
def combine_columns blocks, separator = '|', border = true
  exploded = blocks.map { |blk| blk.split "\n" }
  num_rows = exploded[0].length
  result = []
  num_rows.times do |row_num|
    row = exploded.map { |blk| blk[row_num] }
    if border then
      row.push ''
      row.unshift ''
    end
    result << row.join(separator)
  end
  result.join "\n"
end


# combine several rows of text, with separators made of a given char, using an
# alternate char wherever another certain char appears in the top or bottom of
# the row, assuming that they all start (and end) with the same thing as the
# start of the first.
def combine_rows rows, norm_char = '-', except = '|', response = '+', border = true
  first_line = rows[0]
  len = first_line.index("\n")
  sep = norm_char * len
  start = 0
  while not (start.nil? or start >= len)
    start = first_line.index except, start
    if not start.nil?
      sep[start] = response
      start += 1
    end
  end
  tmp = rows.join("\n#{sep}\n")
  tmp = "#{sep}\n#{tmp}\n#{sep}" if border
end


# now the stuff specifically for this app:


class World


  # apply the basic rules of life and death
  # TODO MAYBE: Speed this up.  This is where the profiler says it spends
  # the second most time out of all the parts I wrote.
  def apply_rule cur, nbrs
    if cur == '*' then
      (nbrs >= @min_stay  and nbrs <= @max_stay ) ? '*' : ' '
    else
      (nbrs >= @min_birth and nbrs <= @max_birth) ? '*' : ' '
    end
  end


  # public only for external testing purposes
  def count_neighbors coords
    count_neighbors_helper [], coords, coords, @cells
  end


  def initialize opts
    @cells = opts[:cells]
    @num_dims = get_num_dims_helper @cells
    # Note, these formulas are rather ad-hoc.  Just my SWAG at how to adjust
    # for varying number of dimensions, and give the "standard" results in 2d.
    base = 2 ** @num_dims
    @min_stay  = opts.fetch :min_stay,  base / 2
    @max_stay  = opts.fetch :max_stay,  base     - 1
    @min_birth = opts.fetch :min_birth, base / 2 + 1
    @max_birth = opts.fetch :max_birth, base     - 1
    @max_stay = @min_stay if @max_stay < @min_stay
    @max_birth = @min_birth if @max_birth < @min_birth
  end


  # convert current state to a string, for display purposes
  def to_s
    to_s_helper @cells, @num_dims
  end


  # let the world turn....
  def turn
    @cells = turn_helper @cells, []
  end


private


  # see what's in a particular subset of cells neighboring a given center
  # TODO MAYBE: Speed this up.  This is where the profiler says it spends
  # the most time out of all the parts I wrote.
  def count_neighbors_helper coords_done, coords_todo, center, cells
    todo_dup = coords_todo.dup
    count = 0
    next_coord = todo_dup.shift
    ((next_coord - 1) .. (next_coord + 1)).each do |idx|
      next if idx < 0 or idx >= cells.length
      done_dup = coords_done.dup
      done_dup << idx
      if cells.class == Array
        count += count_neighbors_helper done_dup, todo_dup, center, cells[idx]
      elsif done_dup != center and cells[idx..idx] == '*'
        count += 1
      end
    end
    count
  end


  # figure out how many dimensions are in this world.
  # (need this in order to set default rule parameters.)
  def get_num_dims_helper cells
    if cells.class == Array: get_num_dims_helper(cells[0]) + 1
    elsif cells.class == String: 1
    else
      puts "ERROR: cells must be of type [Array of ...] String"
      puts "(i.e., String, Array of String, Array of Array of String, etc.)"
      # TODO: throw an exception?
      exit 1
    end
  end


  # turn a chunk of the world into a string,
  # by recursing if it's too big to do at once (3+ dimensions).
  # 1d is a simple string (should only happen in a total 1d world),
  # 2d is a block of strings printed atop each other,
  # higher odd dimensions are rows of blocks,
  # higher even dimensions are blocks of rows.
  def to_s_helper cells, num_dims
    if num_dims == 1: cells
    elsif num_dims == 2: cells.join("\n")
    elsif num_dims % 2 == 0 then
      rows = cells.map { |row| to_s_helper(row, num_dims - 1) }
      combine_rows rows
    else
      blocks = cells.map { |blk| to_s_helper blk, num_dims-1 }
      combine_columns blocks
    end
  end


  # turn the world a piece at a time, drilling down into subsections as needed,
  # accumulating results as we go
  def turn_helper cells, coords
    results = []
    cells.length.times do |idx|
      coords_dup = coords.dup
      coords_dup << idx
      if cells.class == Array: results << turn_helper(cells[idx], coords_dup)
      else
        results << apply_rule(cells[idx..idx], count_neighbors(coords_dup))
      end
    end
    results = results.join if cells.class == String
    results
  end


end

But wait! There's more! You get a special bonus: the tests used to develop this program. Sorry they're not complete, but as you can probably guess, the ones I've left blank would be horribly tedious to fill in.

#! /usr/bin/ruby

require 'nd_life'
require 'profile'

describe 'nd_life' do

  it 'should apply altered rules correctly' do
    w = World.new :cells => '',
                  :min_stay  => 4,
                  :max_stay  => 6,
                  :min_birth => 2,
                  :max_birth => 5
    w.apply_rule('*', 3).should == ' '
    w.apply_rule('*', 4).should == '*'
    w.apply_rule('*', 6).should == '*'
    w.apply_rule('*', 7).should == ' '
    w.apply_rule(' ', 1).should == ' '
    w.apply_rule(' ', 2).should == '*'
    w.apply_rule(' ', 5).should == '*'
    w.apply_rule(' ', 6).should == ' '
  end

  it 'should apply default 1d rules correctly' do
    w = World.new :cells => ''
    w.apply_rule('*', 0).should == ' '
    w.apply_rule('*', 1).should == '*'
    w.apply_rule('*', 2).should == ' '
    w.apply_rule(' ', 1).should == ' '
    w.apply_rule(' ', 2).should == '*'
    w.apply_rule(' ', 3).should == ' '
  end

  it 'should apply default 2d rules correctly' do
    w = World.new :cells => ['']
    w.apply_rule('*', 1).should == ' '
    w.apply_rule('*', 2).should == '*'
    w.apply_rule('*', 3).should == '*'
    w.apply_rule('*', 4).should == ' '
    w.apply_rule(' ', 2).should == ' '
    w.apply_rule(' ', 3).should == '*'
    w.apply_rule(' ', 4).should == ' '
  end

  it 'should apply default 3d rules correctly' do
    w = World.new :cells => [['']]
    w.apply_rule('*', 3).should == ' '
    w.apply_rule('*', 4).should == '*'
    w.apply_rule('*', 7).should == '*'
    w.apply_rule('*', 8).should == ' '
    w.apply_rule(' ', 4).should == ' '
    w.apply_rule(' ', 5).should == '*'
    w.apply_rule(' ', 7).should == '*'
    w.apply_rule(' ', 8).should == ' '
  end

  it 'should apply default 4d rules correctly' do
    w = World.new :cells => [[['']]]
    w.apply_rule('*',  7).should == ' '
    w.apply_rule('*',  8).should == '*'
    w.apply_rule('*', 15).should == '*'
    w.apply_rule('*', 16).should == ' '
    w.apply_rule(' ',  8).should == ' '
    w.apply_rule(' ',  9).should == '*'
    w.apply_rule(' ', 15).should == '*'
    w.apply_rule(' ', 16).should == ' '
  end

  it 'should count 1d neighbors correctly' do
    w = World.new :cells => '* ** '
    w.count_neighbors([0]).should == 0
    w.count_neighbors([1]).should == 2
    w.count_neighbors([2]).should == 1
    w.count_neighbors([3]).should == 1
    w.count_neighbors([4]).should == 1
  end

  it 'should count 2d neighbors correctly' do
    w = World.new :cells => ['*  ',
                             ' * ',
                             ' **']
    w.count_neighbors([0,0]).should == 1
    w.count_neighbors([0,1]).should == 2
    w.count_neighbors([0,2]).should == 1
    w.count_neighbors([1,0]).should == 3
    w.count_neighbors([1,1]).should == 3
    w.count_neighbors([1,2]).should == 3
    w.count_neighbors([2,0]).should == 2
    w.count_neighbors([2,1]).should == 2
    w.count_neighbors([2,2]).should == 2
  end

  it 'should count 3d neighbors correctly' do
    c = [['* ',
          '  '],
         [' *',
          '**'],
         ['* ',
          ' *']]
    w = World.new :cells => c
    w.count_neighbors([0,0,0]).should == 3
    w.count_neighbors([0,0,1]).should == 4
    w.count_neighbors([0,1,0]).should == 4
    w.count_neighbors([0,1,1]).should == 4
    w.count_neighbors([1,0,0]).should == 6
    w.count_neighbors([1,0,1]).should == 5
    w.count_neighbors([1,1,0]).should == 5
    w.count_neighbors([1,1,1]).should == 5
    w.count_neighbors([2,0,0]).should == 4
    w.count_neighbors([2,0,1]).should == 5
    w.count_neighbors([2,1,0]).should == 5
    w.count_neighbors([2,1,1]).should == 4
  end

  it 'should count 4d neighbors correctly'

  it 'should generate the next setup correctly in 1d' do
    w = World.new :cells => '* ** '
    c_str = w.to_s
    neighbor_counts = {}
    c_str.length.times { |idx| neighbor_counts[idx] = w.count_neighbors [idx] }
    w.turn
    c2_str = w.to_s
    c_str.length.times do |idx|
      old = c_str[idx..idx]
      new = c2_str[idx..idx]
      nbrs = neighbor_counts[idx]
      new.should == w.apply_rule(old, nbrs)
    end
  end

  it 'should generate the next setup correctly in 2d' do
    c = ['*  ',
         ' * ',
         ' **']
    w = World.new :cells => c
    c_str = w.to_s
    neighbor_counts = {}
    c.length.times do |row_idx|
      c[row_idx].length.times do |col_idx|
        neighbor_counts["#{row_idx}_#{col_idx}"] = w.count_neighbors [row_idx, col_idx]
      end
    end
    w.turn
    c2_str = w.to_s
    c.length.times do |row_idx|
      cur_row = c[row_idx]
      cur_row.length.times do |col_idx|
        idx = row_idx * (cur_row.length + 1) + col_idx
        old = c_str[idx..idx]
        new = c2_str[idx..idx]
        nbrs = neighbor_counts["#{row_idx}_#{col_idx}"]
        new.should == w.apply_rule(old, nbrs)
      end
    end
  end

  it 'should generate the next setup correctly in 3d'

  it 'should generate the next setup correctly in 4d'

  it 'should store a 1d setup correctly' do
    c = '*  '
    w = World.new :cells => c
    w.to_s.should == c
  end

  it 'should store a 2d setup correctly' do
    c = ['*  ', ' * ', ' **']
    w = World.new :cells => c
    w.to_s.should == c.join("\n")
  end

  it 'should store a 3d setup correctly' do
    c = [['*  ',
          '*  ',
          ' * '],
         [' **',
          ' * ',
          '* *'],
         ['** ',
          '** ',
          ' **']]
    w = World.new :cells => c
    w.to_s.should == "|*  | **|** |\n"\
                     "|*  | * |** |\n"\
                     "| * |* *| **|"
  end

  it 'should store a 4d setup correctly' do
    c = [[['*  ', '*  ', ' * '],
          [' **', ' * ', '* *'],
          ['** ', '** ', ' **']],
         [['*  ', '*  ', ' * '],
          [' **', ' * ', '* *'],
          ['** ', '** ', ' **']],
         [['*  ', '*  ', ' * '],
          [' **', ' * ', '* *'],
          ['** ', '** ', ' **']]]
    w = World.new :cells => c
    w.to_s.should == "+---+---+---+\n"\
                     "|*  | **|** |\n"\
                     "|*  | * |** |\n"\
                     "| * |* *| **|\n"\
                     "+---+---+---+\n"\
                     "|*  | **|** |\n"\
                     "|*  | * |** |\n"\
                     "| * |* *| **|\n"\
                     "+---+---+---+\n"\
                     "|*  | **|** |\n"\
                     "|*  | * |** |\n"\
                     "| * |* *| **|\n"\
                     "+---+---+---+"
  end
                     
  it 'should store a 5d setup correctly'

  it 'should store a 6d setup correctly'

end

If you're especially observant (no I don't mean Orthodox!), you may have noticed that my previous tests were done with Test::Unit, and these are RSpec. Yes, I finally learned RSpec, at Ruby DCamp. Also note the profiler; that helped me spot the bottlenecks. (You may think you know where the bottlenecks in your code are, but chances are, you're wrong. That's just one of the things humans generally aren't very good at. Let the machine do the work and tell you where it hurts.)

And! If you order by midnight tonight, you get a special second bonus: a test driver, so you can watch this thing run, in six whopping dimensions! Just look:

#!/usr/bin/ruby

require 'nd_life'

# six dimensions
cells = [[[[['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']]],
          [[['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']]],
          [[['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']]]],
         [[[['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']]],
          [[['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']]],
          [[['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']]]],
         [[[['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']]],
          [[['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']]],
          [[['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']],
           [['   ','   ','   '],['   ','   ','   '],['   ','   ','   ']]]]]


def randomize cells
  if cells.class == Array then
    cells.map { |part| randomize part }
  else
    cells.length.times { |i| cells[i..i] = rand(2) == 1 ? '*' : ' ' }
  end
end

randomize cells
w = World.new :cells => cells

gen_num = 0

prev_worlds = []
stable = false

rep = w.to_s
while not stable do
  gen_num += 1
  puts "Generation #{gen_num}:"
  puts rep
  prev_worlds << rep
  w.turn
  rep = w.to_s
  idx = prev_worlds.rindex rep  # use rindex since later most likely to match
  if not idx.nil?
    stable = true
    puts "Next state is same as at Generation #{idx + 1}"
  end
end

Now how much would you pay? (crickets . . . crickets . . .) Well that's okay, because it's not $999, not $99, not even $9, but free, though as always I would appreciate you donating the time to tell me how I'm Doin' It Wrong.

Seriously. This thing is pretty slow, at about seven seconds per generation, on my machine. (On the other claw, it is pretty ancient, a Powerbook, and yes I mean Powerbook not Macbook.) Let's hear some ideas for speeding it up. The profiler says the main timesink is the counting of neighbors, specifically count_neighbors_helper. Any better, or at least other, ideas? Why would you do it your way?

As for other assorted improvements, I've already thought of, but decided not to do quite yet:

Telling it only how many dimensions it has, and how big it is, letting it assume that all dimensions are the same size, and generate random contents.
Telling it only how many dimensions it has, and how big each is, letting it generate random contents.
Having it detect when stability has been reached. (Actually I had that in earlier, just detecting when a new generation was the same as the old, but since so many forms of stability are cyclic, I took it out, in favor of moving that functionality to the driver.)

Monday, September 6, 2010

Code Kata Nine: Supermarket Checkout, Part A (Just Make It Work!)

Wow, I really managed to milk Conflicting Objectives for quite a run! Now let's move on to Dave Thomas's ninth code kata: Supermarket Checkout. Follow the link to go read up on it. I'll wait.

. . .

Ah, you're back. Good. Now, I want to make a little twist on the basic kata, as I believe this will be more instructive for the readers. First I am writing as though I am a naive but competent programmer, who actually fell for it when the customer swore up and down that the input format would never ever change. Of course, said programmer is also under pressure from The Boss (no I don't mean Bruce Springsteen) to just get the thing working and out the door.

The initial data format was not chosen with much thought in mind. That's not really part of the exercise, only abstracting it later. But that, as I said, is for later. Next time, I'll abstract it... as though the customer is, as a seasoned programmer would have predicted, changing it for the Nth time.

#! /usr/bin/ruby

# Dave Thomas Code Kata #9: Supermarket Checkout
# See http://codekata.pragprog.com/2007/01/kata_nine_back_.html
# Solution by Dave Aronson
# VERSION A: Just get it working

# This one is a bit tricky requirements-wise.  He doesn't say what
# format the rules are in!  My first cut at this will use YAML-ish
# input:
#
# sku
# quantitytotal
# quantitytotal
# sku
# quantitytotal
#
# See the definition of the RULES array for an example.
#
# For this one I am JUST going to get it working.  It will be as
# though this format was dictated by the customer and they swore up
# and down it would never ever change.  Of course you realize that in
# the real world, that means that the format is pretty much GUARANTEED
# to change, but this version will demonstrate how a naive (but
# otherwise competent) programmer would design it while holding that
# belief.
#
# Next time, in Version B, I will pretend as though the customer has
# changed his mind (as a seasoned programmer would predict), TWICE, so
# I'll be trying to make it easier for the NEXT time he does so.


class CheckOut

  def initialize rules
    cur_sku = ''
    @prices = {}
    rules.each do |line|
      line.chomp!  # just in case; not needed w/ my test, but....
      if line[0..0] == "\t"
        qty, tot = line[1..-1].split "\t"
        @prices[cur_sku] = {} if @prices[cur_sku].nil?
        @prices[cur_sku][qty.to_i] = tot.to_i
      else cur_sku = line
      end
    end
    @items = {}
  end

  def scan sku
    @items[sku] = 0 if @items[sku].nil?
    @items[sku] += 1
  end

  def total
    tot = 0
    @items.each_pair do |sku, qty|
      while qty > 0 do
        deal_qty, deal_price = find_best_deal sku, qty
        num_deals = qty / deal_qty
        tot += num_deals * deal_price
        qty -= num_deals * deal_qty
      end
    end
    tot
  end

  private

  # Find the best deal on a given quantity of a given SKU.
  # API design point: make args "sku, qty" or "qty, sku"?  The latter
  # is more like the above description, but the programmer writing a
  # call to this is likely to already be dealing with it the former
  # way (as a key and value).
  def find_best_deal sku, qty
    deal = nil
    @prices[sku].each_pair do |rule_qty, rule_tot|
      if rule_qty > qty then
        break
      else
        # Inefficient to keep overwriting this,
        # but shouldn't happen too much....
        deal = [rule_qty, rule_tot]
      end
    end
    puts "ERROR: NO PRICE FOUND for #{qty} of #{sku}!" if deal.nil?
    deal
  end

end


RULES = [
  "A",
  "\t1\t50",
  "\t3\t130",
  "B",
  "\t1\t30",
  "\t2\t45",
  "C",
  "\t1\t20",
  "D",
  "\t1\t15",
]


# TESTS, supplied by Dave Thomas

require 'test/unit'

class TestPrice < Test::Unit::TestCase

  def price(goods)
    co = CheckOut.new(RULES)
    goods.split(//).each { |item| co.scan(item) }
    co.total
  end

  def test_totals
    assert_equal(  0, price(""))
    assert_equal( 50, price("A"))
    assert_equal( 80, price("AB"))
    assert_equal(115, price("CDBA"))

    assert_equal(100, price("AA"))
    assert_equal(130, price("AAA"))
    assert_equal(180, price("AAAA"))
    assert_equal(230, price("AAAAA"))
    assert_equal(260, price("AAAAAA"))

    assert_equal(160, price("AAAB"))
    assert_equal(175, price("AAABB"))
    assert_equal(190, price("AAABBD"))
    assert_equal(190, price("DABABA"))
  end

  def test_incremental
    co = CheckOut.new(RULES)
    assert_equal(  0, co.total)
    co.scan("A");  assert_equal( 50, co.total)
    co.scan("B");  assert_equal( 80, co.total)
    co.scan("A");  assert_equal(130, co.total)
    co.scan("A");  assert_equal(160, co.total)
    co.scan("B");  assert_equal(175, co.total)
  end
end

So now it's your turn. Please hold off until next time on the topic of abstracting the rules. The rest of it may well be besides the main point of the kata, but I'd still be interested to hear how you would have done the rest of it differently, and why.

Sunday, August 29, 2010

Kata 8: Conflicting Objectives, Part 3B

Okay, now I've had some time to think about what it means for the code to be "extensible". Offhand I'm thinking it's as in the "open" part of the Open-Closed Principle, which states that a class should be open to extension but closed to modification. That is, you should be able to extend it o do new and different things, but not so much monkey around with the existing functionality and change how that works -- at least in its interactions with the rest of the world. (How it works inside is another story. That's why encapsulation and Information Hiding are such Good Things.)

That got me thinking, just wrap the functionality in a class, arranged such that one could easily override key methods. So, forthwith my class, and a couple of subclasses:

#! /usr/bin/ruby

# Dave Thomas Code Kata #8: Conflicting Objectives
# See http://codekata.pragprog.com/2007/01/kata_eight_conf.html
# Solution by Dave Aronson
# version 3: make it extensible
# part B: take "extensible" as meaning "open to doing additional things"
# or "open to doing the same things in different ways".  In an OO system,
# this generally means "more subclassable".  Remember the open part of the
# open/closed principle: a class should be *open to extension*.


class Word_Manager

  # This could be overridden to have, for instance, a variable
  # number of subparts; here we assume two.  It would also need to
  # override find_combos, and decide @max_whole_len differently.
  def initialize min_len, max_len
    @max_part_len = max_len
    @min_part_len = min_len
    @max_whole_len = @min_part_len + @max_part_len
    @long_words = []
    @short_word_lists = []
    (@min_part_len .. @max_part_len).each do |len|
      @short_word_lists[len] = {}
    end
  end

  def append word
    len = word.length
    if len == @max_whole_len then @long_words << word
    elsif @min_part_len <= len and len <= @max_part_len
      @short_word_lists[len][word] = nil
    end
  end

  def find_combos
    hits = []
    # hmmm, i'm thinking there may be some way to iterate over
    # the long words generically, passing in a block....
    @long_words.each do |word|
      (@min_part_len .. @max_part_len).each do |len|
        part_1 = word[0 .. len - 1]
        part_2 = word[len .. @max_whole_len - 1]
        if has_word? part_1 and has_word? part_2
          hits << "#{part_1} + #{part_2} => #{word}"
        end
      end
    end
    hits
  end

  def has_word? word
    len = word.length
    # note: generally for parts, not wholes
    if len == @max_whole_len then @long_words.include? word
    elsif @min_part_len <= len and len <= @max_part_len
      @short_word_lists[len].has_key? word
    else false
    end
  end

  def read_file file_name
    # a subclass could insert code to unify case, or translate, or . . . .
    File.new(file_name, 'r').each_line { |line| append line.chomp }
  end

end


# Now let's make a couple subclasses.


# one that's not case-sensitive
class Caseless_Word_Manager < Word_Manager
  def append word
    super word.downcase
  end
end


# one that will consider a part to be OK both forward and backward;
# only makes sense if also caseless
class Reversible_Word_Manager < Word_Manager
  def append word
    down = word.downcase
    super down
    # could instead override has_word to check for reversed version;
    # which one makes sense depends on frequency of insert vs. check
    super down.reverse if down.length < @max_whole_len
  end
end


# Now let's see what each one does with our word list!


def helper obj, description
  obj.read_file 'conflict_words.txt'
  hits = obj.find_combos
  puts "#{description} hits:\n#{hits.join "\n"}\n#{hits.length} hits"
end


helper Word_Manager.new(1, 5), 'Normal'
puts
helper Caseless_Word_Manager.new(1, 5), 'Caseless'
puts
helper Reversible_Word_Manager.new(1, 5), 'Reversible'

And while I'm at it... here is the list of words I've been using, so you can test the code yourself without waiting while it chugs through your system wordlist:

A
AT
ATTACK
Reb
ate
atwirl
be
bed
bedpan
bemoan
benighted
bie
big
bigger
boo
booger
but
conic
credible
days
debbie
ed
er
ers
ger
German
i
iconic
in
incredible
inside
insight
intake
irides
istoke
land
lander
man
moan
morrow
night
on
pan
rebate
rides
rub
rubber
side
sight
stoke
tack
tackon
take
takeon
to
todays
tomorrow
tubers
twirl

So now, as always, it's your turn. What does "extensible" mean to you? How would you have made it so?

Sunday, August 22, 2010

Kata 8: Conflicting Objectives, Part 3A

Part 3, much like Part 1, is open to much interpretation. I figure that the sense of "extensible" that Dave Thomas probably meant to drive at, was "easy to add features to", most likely by making subclassing easy, and accepting code blocks, and things like that. I've been giving that some thought, and it makes my head hurt. :-)

So for now, I'm going to "creatively misinterpret" it as "flexible" and "easy to put to many different uses by accepting a number of different ways it can do the job". Here is my ~~extensible~~ flexible version:

#! /usr/bin/ruby

# Dave Thomas Code Kata #8: Conflicting Objectives
# See http://codekata.pragprog.com/2007/01/kata_eight_conf.html
# Solution by Dave Aronson
# version 3: make it extensible
# part A: creatively misinterpret "extensible" as "flexible" :-)


# IDEA to make it more extensible, tho MUCH slower:
# ability to split into larger # of parts?
# could mitigate by also allowing minimum part length.


def make_word_lists file_name, case_sensitive, max_part_len, min_part_len
  long_words = []
  short_word_lists = []

  (min_part_len .. max_part_len).each { |len| short_word_lists[len] = {} }

  max_len = min_part_len + max_part_len

  File.new(file_name, 'r').each_line do |line|
    word = line.chomp
    len = word.length
    word.downcase! if not case_sensitive
    if len == max_len then long_words << word
    elsif min_part_len <= len and len <= max_part_len
      short_word_lists[len][word] = nil
    end
  end
  return [long_words, short_word_lists]
end


def find_combos long_words, short_word_lists, max_part_len, min_part_len
  hits = []
  max_len = min_part_len + max_part_len
  long_words.each do |word|
    (min_part_len .. max_part_len).each do |len|
      part_1 = word[0 .. len - 1]
      part_2 = word[len .. max_len - 1]
      if short_word_lists[len].has_key? part_1 and
         short_word_lists[max_len-len].has_key? part_2
        hits << "#{part_1} + #{part_2} => #{word}"
      end
    end
  end
  hits
end


file_name = 'conflict_words.txt'
max_len = 6
max_part_len = 5
min_part_len = 1
case_sensitive = false

# I'll skip all the overhead of getting and parsing the above as args....

start_time = Time.now
long_words, short_word_lists = make_word_lists file_name, case_sensitive,
                                               max_part_len, min_part_len

hits = find_combos long_words, short_word_lists, max_part_len, min_part_len
puts "hits:\n#{hits.join "\n"}\n#{hits.length} hits"
puts "Running time: #{Time.now - start_time} seconds"

Okay, now I could use a bit of help from the assorted Rubyists out there. Where do you see places I could install hooks on which people can hang code blocks? What do you think would ease subclassing to change the behavior? Or do you think that's the wrong approach entirely, that I should separate the workings out, as in the Strategy or Visitor patterns? I've got too much more important stuff to do today, to give this the thought it deserves right now.

Sunday, August 15, 2010

Code Kata Eight: Conflicting Objectives, Part 2 (Make It Fast)

As you may recall from last time, kata number eight has three parts: write it to be readable, write it to be fast, and write it to be extensible. I covered readability last time, so now let's go for speed. (Cue The Distance, by Cake.)

#! /usr/bin/ruby

# Dave Thomas Code Kata #8: Conflicting Objectives
# See http://codekata.pragprog.com/2007/01/kata_eight_conf.html
# Solution by Dave Aronson
# version 2: make it fast

# note that it's no longer in functions, to save the call overhead

# unroll the inner loop again, for speed
one_letter_words = {}
two_letter_words = {}
three_letter_words = {}
four_letter_words = {}
five_letter_words = {}

# this one is still fine as an array, since we just chug through it.
# no need for the extra functionality of a hash, nor the overhead.
six_letter_words = []

start_time = Time.now

File.new(ARGV[0], 'r').each_line do |line|
  word = line.chomp
  case word.length
    # TODO: check if there's a faster way than this... maybe a trie?
    # we don't need the value, just whether the key is there.
    when 1: one_letter_words  [word] = 0
    when 2: two_letter_words  [word] = 0
    when 3: three_letter_words[word] = 0
    when 4: four_letter_words [word] = 0
    when 5: five_letter_words [word] = 0
    when 6: six_letter_words << word
  end
end

# accumulate results, rather than reporting them as we find them;
# also use this to track # of hits
results = []

six_letter_words.each do |word|

  part_1 = word[0 .. 0]
  part_2 = word[1 .. 5]
  # do longer part first to make dropout most likely
  if five_letter_words.has_key? part_2 and one_letter_words.has_key? part_1
    results << "#{part_1} + #{part_2} => #{word}\n"
  end

  part_1 = word[0 .. 1]
  part_2 = word[2 .. 5]
  # do longer part first to make dropout most likely
  if four_letter_words.has_key? part_2 and two_letter_words.has_key? part_1
    results << "#{part_1} + #{part_2} => #{word}\n"
  end

  part_1 = word[0 .. 2]
  part_2 = word[3 .. 5]
  if three_letter_words.has_key? part_1 and three_letter_words.has_key? part_2
    results << "#{part_1} + #{part_2} => #{word}\n"
  end

  part_1 = word[0 .. 3]
  part_2 = word[4 .. 5]
  if four_letter_words.has_key? part_1 and two_letter_words.has_key? part_2
    results << "#{part_1} + #{part_2} => #{word}\n"
  end

  part_1 = word[0 .. 4]
  part_2 = word[5 .. 5]
  if five_letter_words.has_key? part_1 and one_letter_words.has_key? part_2
    results << "#{part_1} + #{part_2} => #{word}\n"
  end

end

puts results
puts "#{results.length} hits"

puts "Running time: #{Time.now - start_time} seconds"

This version is about 1/3 faster than the "readable to Ruby programmers" version. (And of course hugely faster than the "readable to non-Rubyists" version, with arrays!) I also surmised that perhaps the "case word.length" part might be inefficient, and used again an array of hashes... but apparently the indexing was sufficiently slow that it was in fact worse than before!

Now it's your turn again. Despite the fact that I'm doing these in Ruby, I don't claim to be a Ruby guru, so there may have been some cute Ruby tricks I missed, that could speed this up. Maybe even some algorithm tweaks. What would you do differently?

Saturday, August 7, 2010

Code Kata Eight: Conflicting Objectives, Part 1

Kata number seven is a non-coding kata, so let's move on to kata number eight. This one is about conflicting objectives. To quote the original:

Write the program three times.

The first time, make program as readable as you can make it.

The second time, optimize the program to run fast fast as you can make it.

The third time, write as extendible a program as you can.

Even that, though, leaves a lot of room for interpretation. For instance, "readable" to whom? I decided to divide this into two audiences. The first is people entirely new to the language (in this case, Ruby) and possibly to programming in general. Ruby has quite an advantage here, being so expressive and easy to pick up. The second is people who do know the language, including common idioms, and presumably programming in general.

Since the two are so radically different, at least the way I did them, I'll give you them both at once. First the one for complete newbies:

#! /usr/bin/ruby

# Dave Thomas Code Kata #8: Conflicting Objectives
# See http://codekata.pragprog.com/2007/01/kata_eight_conf.html
# Solution by Dave Aronson
# version 1: make it readable
# part A: make it readable to a ruby newbie, possibly non-programmer


def make_word_lists file_name

  word_lists = []

  for len in 1 .. 6 do
    word_lists[len] = []
  end

  puts "Reading #{file_name}..."
  dict = File.new file_name, 'r'
  for line in dict do
    word = line.chomp
    if    word.length == 1 then word_lists[1] << word
    elsif word.length == 2 then word_lists[2] << word
    elsif word.length == 3 then word_lists[3] << word
    elsif word.length == 4 then word_lists[4] << word
    elsif word.length == 5 then word_lists[5] << word
    elsif word.length == 6 then word_lists[6] << word
    end
  end
  dict.close

  return word_lists

end



def find_combinations word_lists

  puts 'Finding combinations....'
  
  combinations = 0
  
  for word in word_lists[6] do
  
    part_1 = word[0 .. 0]
    part_2 = word[1 .. 5]
    if word_lists[1].include? part_1 and word_lists[5].include? part_2
      puts "#{part_1} + #{part_2} = #{word}"
      combinations = combinations + 1
    end
  
    part_1 = word[0 .. 1]
    part_2 = word[2 .. 5]
    if word_lists[2].include? part_1 and word_lists[4].include? part_2
      puts "#{part_1} + #{part_2} = #{word}"
      combinations = combinations + 1
    end
  
    part_1 = word[0 .. 2]
    part_2 = word[3 .. 5]
    if word_lists[3].include? part_1 and word_lists[3].include? part_2
      puts "#{part_1} + #{part_2} = #{word}"
      combinations = combinations + 1
    end
  
    part_1 = word[0 .. 3]
    part_2 = word[4 .. 5]
    if word_lists[4].include? part_1 and word_lists[2].include? part_2
      puts "#{part_1} + #{part_2} = #{word}"
      combinations = combinations + 1
    end
  
    part_1 = word[0 .. 4]
    part_2 = word[5 .. 5]
    if word_lists[5].include? part_1 and word_lists[1].include? part_2
      puts "#{part_1} + #{part_2} = #{word}"
      combinations = combinations + 1
    end
  
  end
  
  return combinations

end


start_time = Time.now
word_lists = make_word_lists ARGV[0]
combinations = find_combinations word_lists
puts "We found a total of #{combinations} combinations"

end_time = Time.now
run_time = end_time - start_time

puts "Running time: #{run_time} seconds"

Yes, it's incredibly slow, due to use of simple arrays rather than hashes, since Ruby-newbies, or even more so, programming-newbies, would not be familiar with hashes or other such semi-advanced data structures. I could make it faster by using a binary search function, but that would also have to be in there and count against the readability! Besides, speed is for Part 2. And now the one for Ruby programmers (who are presumably familiar with hashes):

#! /usr/bin/ruby

# Dave Thomas Code Kata #8: Conflicting Objectives
# See http://codekata.pragprog.com/2007/01/kata_eight_conf.html
# Solution by Dave Aronson
# version 1: make it readable
# part B: make it readable to someone who knows Ruby


def make_word_lists file_name

  word_lists = []
  
  for len in 1 .. 6 do word_lists[len] = {} end
  
  dict = File.open file_name, 'r'
  dict.each_line do |line|
    word = line.chomp
    len = word.length
    word_lists[len][word] = nil if len <= 6
  end
  dict.close

  word_lists
end
  

def find_combinations word_lists
  combinations = 0
  for word in word_lists[6].keys.sort do
    for first_part_len in 1 .. 5 do
      part_1 = word[0 .. first_part_len - 1]
      part_2 = word[first_part_len .. 5]
      if word_lists[first_part_len].include? part_1 and
         word_lists[6-first_part_len].include? part_2
        puts "#{part_1} + #{part_2} = #{word}"
        combinations += 1
      end
    end
  end
  combinations
end


start_time = Time.now
word_lists = make_word_lists ARGV[0]
combinations = find_combinations word_lists
puts "We found a total of #{combinations} combinations"
puts "Running time: #{Time.now - start_time} seconds"

So, dear readers, let's hear your input. Did you find any of that poorly readable? What do you think would have been more readable? Should I have coded up a binary search, to make Part A tolerably fast? Or going the other way, do you think I went overboard? (Especially in the horribly verbose Part A, with the inner loop unrolled?)

Sunday, August 1, 2010

Anna Graham?

No, kata number six is about anagrams, not some lady named Anna Graham. Follow the link for more details, but briefly the objective is to load in a bunch of words, say which ones were anagrams of which, what the longest anagrammed word is, and what the largest set of anagrams (i.e., words using the same letters) is. My solution hinges around the load_words method. The implementation, in Ruby, complete with tests of course, is:

#! /usr/bin/ruby

# Dave Thomas Code Kata #6: Anagrams
# See http://codekata.pragprog.com/2007/01/kata_six_anagra.html
# for problem -- but not data, he's taken down the word list.
# Solution by Dave Aronson.
# Note, some of the things that are recalculated upon demand,
# could have been figured out up front once and for all and cached,
# and vice-versa.  Meant to show mix of options.  Real-world choice
# would depend on intended use.


class Anagram_Detector

  def initialize
    @sets = {}
    @longest_word = ''
    @longest_set = []
  end

  attr :sets
  attr :longest_set
  attr :longest_word

  def bad_sets
    @sets.reject { |key, val| val.length > 1 }
  end

  def canonicalize word
    word.split(//).sort.join
  end

  # TODO MAYBE: let user specify whether to
  # show anagrams of longest anagrammed word
  def dump
    puts 'Anagrams:'
    good_sets.each_pair { |key, val| puts "  #{key}: #{val.join ' '}" }
    puts "#{num_words} words in #{num_sets} sets"
    # note that the longest_word will always be the *first* of its set,
    # *in order of insertion*; whether that's alphabetical depends on use.
    # also it may not be the first one of its size, but the first one
    # with a second arrangement (i.e., an anagram) inserted.
    # further words of that same size will not replace it.
    puts "Longest word is #{@longest_word} (and its anagrams)"
    puts "Longest set is #{@longest_set.join ' '}"
    puts 'Others:'
    bad_sets.each_pair { |key, val| puts "  #{key}: #{val}" }
  end

  def good_sets
    @sets.reject { |key, val| val.length == 1 }
  end

  # TODO MAYBE: let user specify options, such as:
  # whether to fold case
  # whether to dump dupes
  # how to handle spaces
  def load_words words
    words.each do |word|
      letters = canonicalize word
      # could alternately use "sets.key? letters" but
      # we need to retrieve what's there (if any) anyway.
      so_far = @sets[letters]
      if so_far == nil
        so_far = []
      # only bother checking length upon finding 1st anagram
      elsif so_far.length == 1 and word.length > @longest_word.length:
        @longest_word = so_far[0] 
      end
      so_far << word
      @longest_set = so_far if so_far.length > @longest_set.length
      @sets[letters] = so_far
    end
  end

  def num_sets
    good_sets.length
  end

  def num_words
    good_sets.inject(0) { |count, set| count += set[1].length }
  end

end



# TESTS

Words = [ 'foo', 'bar', 'baz', 'clang', 'honk', 'tweet',
          'post', 'pots', 'spot', 'stop', 'tops',
          'dog', 'god', 'waster', 'waters', 'rawest' ]

require 'test/unit'

class TC_MyTest < Test::Unit::TestCase

  # WARNING: done by hand, so possibly wrong;
  # must be correct for tests to work correctly!
  # Should be the result for Words, below.
  Anagrammed_By_Hand = {
    'abr'      => [ 'bar' ],
    'abz'      => [ 'baz' ],
    'acgln'    => [ 'clang' ],
    'dgo'      => [ 'dog', 'god' ],
    'eettw'    => [ 'tweet' ],
    'foo'      => [ 'foo' ],
    'hkno'     => [ 'honk' ],
    'opst'     => [ 'post', 'pots', 'spot', 'stop', 'tops' ],
    'aerstw'   => [ 'rawest', 'waster', 'waters' ]
  }

  def setup
    @detector = Anagram_Detector.new
    @detector.load_words Words
  end

  def test_number_of_sets_match
    # beware magic numbers in tests... should find a way around this....
    assert_equal 3, @detector.num_sets
  end

  def test_number_of_words_match
    # beware magic numbers in tests... should find a way around this....
    assert_equal 10, @detector.num_words
  end

  # sort results just in case things got switched around;
  # it's set membership that's important, not order

  def test_keys_match
    assert_equal Anagrammed_By_Hand.keys.sort,
                 @detector.sets.keys.sort
  end

  def test_results_match
    @detector.sets.each_key do |key|
      assert_equal Anagrammed_By_Hand[key].sort,
                   @detector.sets[key].sort
    end
  end

  def test_got_right_longest_set
    assert_equal Anagrammed_By_Hand['opst'].sort,
                 @detector.longest_set.sort
  end

  def test_got_right_longest_word
    assert Anagrammed_By_Hand['aerstw'].include? @detector.longest_word
  end

  # TODO MAYBE: test erroneous conditions

end

detector = Anagram_Detector.new
detector.load_words Words
puts "Dump:"
detector.dump
puts "\nTests:"
# Test::Unit will run the tests AFTER this

Another thing a bit unusual about this one is that I tried to make more use of Ruby's functional-style methods, like inject and reject. (Sorry, Alice's Restaurant fans, I used inspect during informal testing, and select in a previous version, but couldn't find a decent excuse to use detect, and AFAIK there are no common Ruby classes with methods infect or neglect.) Having "grown up" mostly with C, the imperative/procedural style is most intuitive to me. However, IMNSHO, one should always strive to at least consider, or better yet understand, alternatives to The Way We've Always Done It -- and maybe even think up more.

So now, as always, it's your turn. Comments? Questions? Concerns? Character assassinations? Compliments even?

Saturday, July 24, 2010

Code Kata Five (Bloom Filter) Solution

Kata number five is about Bloom Filters. No, that isn't anything to do with flowers, though of course it could be used for that purpose. Dave Thomas' description will quite suffice. If you're not familiar with them, go read it now. I'll wait.

Oh, you're back. Good. Now that you have a good understanding of what they do and how, here's my implementation. It uses each pair of bytes out of an MD5 digest, as an individual hash. (For the security geeks out there, yes I know MD5 has been declared too weak to rely on for security, but this is just a simple demo.)

#! /usr/bin/ruby

# Dave Thomas Code Kata #5: Bloom Filter
# See http://codekata.pragprog.com/2007/01/kata_five_bloom.html
# for problem -- but not data, he's taken down the word list.
# Solution by Dave Aronson


require 'digest/md5'


class Bloom_Filter

  FixnumSize = 32

  def initialize
    # WARNING WILL ROBINSON, DANGER, DANGER!
    # There's probably some way to tell Ruby "give me this size chunk
    # of RAM" but I haven't stumbled across it yet.  We need 2^16 bits
    # cuz we're going to take the hash 2 bytes at a time; divide by
    # FixnumSize cuz that's the storage unit for our bits.
    @bytes = Array.new((2 ** 16) / FixnumSize, 0)
  end

  def add_datum datum
    hash = Digest::MD5.hexdigest datum
    # could also convert hash to arbitrary-size integer en masse
    # and then take byte pairs as binary... this way seemed easier....
    (hash.length / 2).times { |pair| set_bit hash[pair*2 .. pair*2+1].hex }
  end

  def check_bit bit_num
    (@bytes[bit_num / FixnumSize] & (1 << (bit_num % FixnumSize))) != 0
  end

  def check_datum datum
    hash = Digest::MD5.hexdigest datum
    (hash.length / 2).times do |subhash|
      return false if ! check_bit hash[subhash * 2 .. subhash * 2 + 1].hex
    end
    true
  end

  def load_data data
    data.each { |datum| add_datum datum }
  end

  def set_bit bit_num
    @bytes[bit_num / FixnumSize] |= (1 << bit_num % FixnumSize)
  end

end


# TESTS

require 'test/unit'

class TC_MyTest < Test::Unit::TestCase

  def setup
    @words = [ 'zero', 'one', 'two', 'three', 'four', 'five',
                       'six', 'seven', 'eight', 'nine', 'ten' ]
    @filter = Bloom_Filter.new
    @filter.load_data @words
  end

  def helper lang_words
    lang_words.each do |word|
      assert_equal @words.member?(word), @filter.check_datum(word),
                   "error: checking #{word} failed!"
    end
  end

  def test_finding_same_words
    helper @words
  end

  def test_find_commonality_with_english
    helper ['zero', 'one',    'two',    'three', 'four',   'five',
                    'six',    'seven',  'eight', 'nine',   'ten',]
  end

  # given how the helper works, this is really meaningless,
  # included just for illustration
  def test_finding_nothing
    helper []
  end

  def test_finding_empty
    helper ['']
  end

  def test_finding_space
    helper [' ']
  end

  def test_find_commonality_with_french
    helper ['zero', 'un',     'deux',   'trois', 'quatre', 'cinq',
                    'six',    'sept',   'huite', 'neuf',   'dix']
  end

  def test_find_commonality_with_spanish
    helper ['zero', 'uno',   'dos',    'tres',  'cuatro', 'cinco',
                    'seis',  'siete',  'ocho',  'neuve',  'dies']
  end

  def test_find_commonality_with_german
    helper ['zero', 'eins',   'zwei',   'drei',  'fier',   'funf',
                    'sex',    'sieben', 'ocht',  'neun',   'zehn']
  end

  def test_find_commonality_with_japanese
    helper ['zero', 'ichi',   'ni',     'san',   'yon',    'go',
                    'ryoku',  'shichi', 'hachi', 'kyu',    'ju']
  end

  def test_find_commonality_with_multiple
    helper ['onetwo', 'one two three']
  end

  def test_find_commonality_with_nonsense
    helper ['ooee', 'ooahah', 'tingtang', 'wallawallabingbang',
            'flibbertygibbet', 'asdfasdf', 'qwertyuiop', '@#$%^&*!']
  end

end

So now, as usual, it's the Peanut Gallery's turn. What would you do differently, and why? Would you use a completely different algorithm? A different way to allocate the memory, and set and check the bits? Would you test it differently? Or is it (ha!) perfect?

Saturday, July 17, 2010

Code Kata Four (Data Munging) Solution

   Kata number three isn't a coding exercise, so let's move on to number four: find the minimum temperature spread in a file of weather data, find the minumum win-loss spread in a file of soccer league results... and then refactor the two together, to remove duplication.

   This is an example of the Agile principle of DRY: Don't Repeat Yourself. If you have the same objective being done in multiple places, there are multiple places where code needs to be changed if a bug is found or the objective is changed. Likewise, if you suspect from the symptoms, that some particular algorithm is being done wrong, you know exactly where to change it if your code is dry, i.e., it adheres to the DRY principle. If it isn't... then you're all wet. (Badum-pum.)

   Actually, it's not so much an Agile principle, as a good basic software development principle that the Agile movement has chosen to emphasize. Somewhat like how so much of so-called Object Oriented programming is really just good old encapsulation, which was a well-known and commonly-taught software development principle long before OOP got so popular. Just about the only thing OOP added, other than convenient shorthand for such encapsulation, was polymorphism. But don't get me started -- I'm a self-starter. ;-)

   Anyway, here's my Ruby solution to #4, including of course tests, as Bryan "TATFT" Liles will be happy to see. (What's TATFT stand for? "Test All The Time". At least on a G-rated blog it does....)

#! /usr/bin/ruby

# Dave Thomas Code Kata 4, Data Munging.
# See http://codekata.pragprog.com/2007/01/kata_four_data_.html
# for purpose, and data files.
# Solution by Dave Aronson.

# Note: Could do these with File.readlines(file).each, but then
# I'd have to keep track of whether we'd read the header yet.



# Part 1: Weather Data

def min_weather_spread
  file = File.open 'weather.dat', 'r'
  first = file.gets.split[0] until first == 'Dy'
  min_spread = 9999  # unrealistically high value for this application
  while not file.eof?
    parts = file.gets.split
    next if parts.length == 0
    day = parts[0].to_i
    # to_i returns 0 on non-numeric; no exceptions thrown;
    # maybe there should be a String#is_numeric? method....
    break if day == 0
    spread = (parts[1].to_i - parts[2].to_i).abs
    if spread < min_spread
      min_spread = spread
      min_day = day
    end
  end
  puts "Min spread is #{min_spread}, on day #{min_day}"
  min_spread
end


# Part 2: Soccer League Table
# Soccer, football, whatever.  Most of the people reading this will be
# my fellow 'Murrikens, so I'll call it what we call it.

def min_soccer_spread
  file = File.open 'football.dat', 'r'
  first = file.gets.split[0] until first == 'Team'
  min_spread = 9999  # unrealistically high value for this application
  while not file.eof?
    parts = file.gets.split
    # data format is more consistent than weather one, so we can do this;
    # just be aware that in other applications, # of line parts may stay same.
    next if parts.length != 10
    spread = (parts[6].to_i - parts[8].to_i).abs
    if spread < min_spread
      min_spread = spread
      min_team = parts[1]
    end
  end
  puts "Min spread is #{min_spread}, for #{min_team}"
  min_spread
end



# Part 3: DRY Fusion.  Dry up the weather, and dry out the drunken football
# hooligans, er, I mean, soccer fans.  ;-)
#
# Note that the 9999 init, and skipping any where the first column evaluates to
# zero, works fine in THESE specific two cases, but may be problematic in other
# applications.  So why didn't I go ahead and make it fully generalized, such
# as by using the maximum supported number, or a parameter?  Another agile
# acronym: YAGNI.  If you don't grok it, Google it.

def min_spread filename, first_header, name_col, data_col_1, data_col_2
  file = File.open filename, 'r'
  first = file.gets.split[0] until first == first_header
  min_spread = 9999
  while not file.eof?
    parts = file.gets.split
    next if parts[0].to_i == 0
    spread = (parts[data_col_1].to_i - parts[data_col_2].to_i).abs
    if spread < min_spread
      min_spread = spread
      min_item = parts[name_col]
    end
  end
  # Could also have passed in a descriptor here, like 'day' or 'team'....
  puts "Min spread is #{min_spread}, for #{min_item}"
  min_spread
end



# TESTS


require 'test/unit'

class TC_MyTest < Test::Unit::TestCase

  def test_weather_case
    assert_equal min_spread('weather.dat', 'Dy', 0, 1, 2), min_weather_spread,
                 'Error: DRY equivalent does not work same as weather case'
  end

  def test_soccer_case
    assert_equal min_spread('football.dat', 'Team', 1, 6, 8), min_soccer_spread,
                 'Error: DRY equivalent does not work same as soccer case'

  end

end

So why am I not including the data files? I want you to go read Dave Thomas' CodeKata blog (and his regular blog as well) too. Yes, the first hasn't been updated in years and the second in months, but if you're the kind of person who'd read this blog, you'd want to see that one too, and the comments.

Thursday, July 8, 2010

Code Katas

Some of you may be familiar with the term kata. It comes from martial arts, mainly karate, and means a practice exercise, especially one done over and over to train your "muscle memory".

After enough years programming, you build up similar semi-automatic approaches, like how to organize a program, or thinking imperatively or functionally, or iteratively or recursively. Code katas help you practice such things, with deliberate emphasis on a particular aspect. (Actually, IMHO this makes them more like Toastmasters speech projects, but that's another story.)

There are many sources of lists of code katas out there. Recently, to sharpen my Ruby-fu, I began working through the ones that Dave Thomas, of Pragmatic Programmer fame, put up at http://codekata.pragprog.com/. The first kata does not involve code. So, here is my solution to the second one:

#! /usr/bin/ruby

# Dave Thomas Code Kata #2, Karate Chop
# See http://codekata.pragprog.com/2007/01/kata_two_karate.html
# Solution by Dave Aronson

# TODO MAYBE: see if I can do the helper funcs as lambdas or some such....


# version 1: iterative

def chop1 tgt, arr
  # TODO: check arg types
  # TODO: check for empty array
  hi = arr.length - 1
  lo = 0
  while hi >= lo do
     mid = (hi + lo) / 2
     test = arr[mid]
     if test < tgt: lo = mid + 1
     elsif test > tgt: hi = mid - 1
     else return mid
     end
  end
  -1
end



# version 2: recursive and not functional

def chop2 tgt, arr
  chop2_helper tgt, arr, 0, arr.length - 1
end

def chop2_helper tgt, arr, min, max
  slot = (min + max) / 2
  test = arr[slot]
  if test == tgt: slot
  elsif min == max: -1
  elsif test < tgt: chop2_helper tgt, arr, slot + 1, max
  else chop2_helper tgt, arr, min, slot - 1
  end
end



# version 3: recursive and functional, passing array slice.

def chop3 tgt, arr
  # use another func to stop having to calc middle slot over and over
  chop3_pass_mid tgt, arr, arr.length / 2
end

def chop3_pass_mid tgt, arr, mid
  # use another func to stop having to retrieve middle value over and over
  chop3_pass_test tgt, arr, mid, arr[mid]
end

def chop3_pass_test tgt, arr, mid, test
  if test == tgt: mid
  elsif mid == 0: -1
  elsif test < tgt
    chop3_fix_offset mid + 1, chop3(tgt, arr[mid + 1 .. arr.length])
  else chop3 tgt, arr[0 .. mid - 1]
  end
end

# kind of a kluge; TODO MAYBE: do something cleaner with this problem!
def chop3_fix_offset offset, result
  result == -1 ? -1 : result + offset
end



# version 4: recursive and functional, passing whole array.

def chop4 tgt, arr
  chop4_helper tgt, arr, 0, arr.length - 1
end

def chop4_helper tgt, arr, min, max
  # use another func to not have to calc midpoint over and over
  chop4_pass_midpoint tgt, arr, min, max, (min + max) / 2
end

def chop4_pass_midpoint tgt, arr, min, max, midpt
  # use another func to not have to retrieve middle value over and over
  chop4_pass_midval tgt, arr, min, max, midpt, arr[midpt]
end

def chop4_pass_midval tgt, arr, min, max, midpt, midval
  if midval == tgt: midpt
  elsif min == max: -1
  elsif midval < tgt: chop4_helper tgt, arr, midpt + 1, max
  else chop4_helper tgt, arr, min, midpt - 1
  end
end



# TESTS

require 'test/unit'

class TC_MyTest < Test::Unit::TestCase

  def setup
    @arr = [ 0, 1, 2, 3, 5, 8, 13, 21, 34 ]  # fibonacci, btw
  end

  def test_chops
    [:chop1, :chop2, :chop3, :chop4].each do |func|

      # test that each thing in there, is found
      @arr.length.times do |slot|
        tgt = @arr[slot]
        assert_equal slot, send(func, tgt, @arr), "testing #{func.to_s}(#{tgt})"
      end

      # test that a bunch of things that should NOT be in there, are not found
      [@arr[0]-1, @arr[0]-0.1, 4, 33, @arr.last+0.1, @arr.last+1].each do |tgt|
        assert_equal -1, send(func, tgt, @arr), "testing #{func.to_s}(#{tgt})"
      end

    end

  end

end

Note, I'm not saying that this is the best solution, let alone the One True Solution. Your mileage is practically guaranteed to vary, especially on matters of style.

In fact, I'd like to hear how you would have done it differently -- feedback is how one improves. What would you have done differently, and why? Would it be faster, clearer, more canonically Rubyish, or what? Inquiring minds want to know!

Welcome to Codosaurus!

Greetings, dear readers! This is my brand-spanking-new coding blog, as opposed to my leadership blog over at http://www.dare2xl.com/. Dare2XL is mainly about daring to stand up to The Man for what's right, such as choice of language, "wasting" time on developer-level testing, and so on, including many situations having nothing whatsoever to do with software development. This blog will be about how to choose and use your language, write and run unit and integration tests, and other aspects of software development -- primarily those dealing directly with code, versus, say, requirements definition, or writing documentation. You know, the fun stuff! :-)

Why "codosaurus"? Long story short, let's just say I'm a bit older than most of my current colleagues -- enough so, that I grew up on plain old C, not Java. But I'm not a dinosaur, as this old mammal continues to learn new tricks. I learned C++, Java, Python, Javascript (including AJAX), Perl, Ruby (including Rails), and a lot more, in my spare time. This blog is my way of sharing the overall software engineering experience I've gained over the years -- though often instantiated via specific languages, especially the ones the li'l whippersnappers have probably been exposed to -- or might want to be exposed to.

Now it's your turn. What advice would you seasoned programmers/engineers/developers/whatever, give to today's up-and-comers? What questions would you ask of me? What is the airspeed velocity of an unladen swallow?