hardscrabble

When to use defined? to memoize in Ruby

06 Feb 2017

Here’s a quick Ruby thing.

traditional memoization in Ruby

Let’s say you have an object whose responsibility is to give a haircut to a dog.

(I may have recently been reading about this)

class DogStylist
  def initialize(dog_id)
    @dog_id = dog_id
  end

  def perform
    if dog
      dog.sedate
      dog.groom
      dog.instagram
    end
  end

  private

  def dog
    Dog.find(@dog_id)
  end
end

This is kind of fine, but it has one problem: each time you reference dog, you’re calling the dog method, which queries the database each time it’s called, so you’re querying the database over and over, when you really only need to do so once.

Better to write it like this:

def dog
  @dog ||= Dog.find(@dog_id)
end

Here you’re still calling the dog method over and over, but now it’s “memoizing” the result of the database query.

But what does that mean?

Here’s a more verbose version of the dog method that does the same thing:

def dog
  @dog = @dog || Dog.find(@dog_id)
end

You can see that ||= is a syntactical shorthand similar to +=.

In case you’re unfamiliar with +=, here’s an example. These two statements are equivalent:

count = count + 1
count += 1

Here’s an even more verbose version of the dog method that does the same thing:

def dog
  if @dog
    @dog
  else
    @dog = Dog.find(@dog_id)
  end
end

The goal here is to avoid evaluating the database query more than once. The first time the method is called, the @dog instance variable is not defined. In Ruby, it’s safe to reference an instance variable that isn’t defined. It will return nil. And nil is falsey, so the database query will be evaluated, and its result assigned to the instance variable.

This is where things get interesting.

Ponder this question: does this memoization strategy guarantee that the database query will only be executed once, no matter how many times the dog method is called?

It doesn’t.

Why????

I’ll tell you.

What if there is no dog with that ID? Dog.find(4000) returns either a dog, or nil. And, like we said earlier, nil is falsey. So hypothetically, if our perform method looked like this:

def perform
  dog
  dog
  dog
  dog
  dog
end

… then we would execute the database query five times, even though we made an effort to prevent that.

This is actually totally fine, because our perform method isn’t written like that (again, that was just a hypothetical). Our perform method only calls the dog method more than once if it’s truthy, so there’s no problem here.

memoization using defined?

Let’s consider another example, where things aren’t as hunky-dory. Hold please while I contrive one.

OK, I’ve got it.

Let’s say we only want to groom a dog when he or she is unkempt. When she logs into our web site, we want to pepper some subtle calls to action throughout the page encouraging her to book an appointment. We’ll need a method to check if she is unkempt, and we’re going to call it a few times. It looks like this:

class Dog
  HAIRS_THRESHOLD = 3_000_000

  def unkempt?
    Hair.count_for(self) > HAIRS_THRESHOLD
  end
end

That’s right: we’ve got a table in our database for all of the hairs on all of our dogs.

You can imagine this unkempt? method might be kind of “expensive”, which is to say “slow”.

Let’s try adding some memoization to this method:

def unkempt?
  @unkempt ||= Hair.count_for(self) > HAIRS_THRESHOLD
end

Here our goal is to prevent doing the expensive database query (Hair.count_for(self)) more than once.

Ponder this question: does our memoization strategy accomplish this goal?

Answer: it does not.

What?????

I know. Let me show you.

You can try running this Ruby script yourself:

$count = 0
class Hair
  def self.count_for(dog)
    $count += 1
    puts "called #{$count} times"
    2_000_000
  end
end

class Dog
  HAIRS_THRESHOLD = 3_000_000

  def unkempt?
    @unkempt ||= Hair.count_for(self) > HAIRS_THRESHOLD
  end
end

dog = Dog.new
puts "Is the dog unkempt? #{dog.unkempt?}"
puts "Is the dog unkempt? #{dog.unkempt?}"

It outputs the following:

called 1 times
Is the dog unkempt? false
called 2 times
Is the dog unkempt? false

In this script, I have a fake implementation of the Hair class. It’s meant to demonstrate that the count_for method is being called more than once, even though we specifically tried for it not to.

So what’s going on here?

Well, in a way, everything is working as it’s supposed to. The first time we call the unkempt? method, the @unkempt instance variable is not defined, which means it returns nil, which is falsey. When the instance variable is falsey, we evaluate the expression and assign its result, false, to the instance variable. The second time we call the unkempt? method, the @unkempt instance variable is defined, but its value is now false, which is also falsey (which you have to admit is only fair). So, again, because the instance variable is falsey, we evaluate the expression and assign its result, false, to the instance variable.

Shoot – that kind of makes sense.

So what to do? Here’s another way to write this:

def unkempt?
  if defined?(@unkempt)
    @unkempt
  else
    @unkempt = Hair.count_for(self) > HAIRS_THRESHOLD
  end
end

This approach uses Ruby’s built-in defined? keyword to check whether the instance variable is defined at all, rather than if its value is truthy. This is more resilient to the possibility that your value may be falsey.

I wish there were a more succinct way to write this, because I think it’s generally how you actually want your code to behave when you use ||=.

To be fair, you can avoid defined? and instead write this method like this:

def unkempt?
  @hair_count ||= Hair.count_for(self)
  @hair_count > HAIRS_THRESHOLD
end

It’s really just a matter of taste if you prefer one over the other.

Alright, take care.