A title for your blog

Let’s talk about Memoization

Memoization1 is a common pattern in Ruby. So common Ruby has a built in operator for it. We have all seen or used something like these. (If this is new to you scroll to the end. There’s some code that shows how this works.)

# 1. simple 
foo ||= 42

# 2. multi-line version
foo ||= begin
  ...
end

# 3. more robust version
foo = 42 unless defined? foo

The memoization operator (or the "or equals" operator) means assign the value on right to the name on the left only if the name points to a nil value. Version 3 is a more robust version because it avoids one of the issues with the ||= operator where your memoized operation might return nil. The ||= operator only executes the righthand side if the lefthand side is nil. But if your expensive operation evaluated to nil it will assign nil and still be executed each time it’s called.

Memoization is most commonly used in accessor methods like below,

def recent_orders
  @recent_orders ||= Orders.where(updated_at: (Date.today - ONE_WEEK)..)
end

I often find a need for memoization in reports or complex views with dense information. You might have an object that is accessed in many partials so you can save a few cycles by caching the value.

These methods for memoization will get you what you need in 99% of cases but there’s one gotcha that I have seen folks fall victim to quite often. What happens if recent_orders takes an argument?

def recent_orders(date)
 @recent_orders ||= Orders.where(updated_at: date..)
end

The problem here is once the value has been evaluated the first time you are stuck with that value for subsequent calls. This is probably not what you want. You need a way to store the value of your expensive operation for each passed in argument value.

Here’s a tiny class that solves that issue.

class Memo
  def initialize
    @mutex = Mutex.new
    @values = {}
  end

  def memoize(key = :_default_key)
    raise ArgumentError, "Block is required" unless block_given?

    @mutex.synchronize do
      if @values.key?(key) 
        @values[key]
      else
        @values[key] = yield
      end
    end
  end
end

You use this class like so:

recent_orders_cache = Memo.new
recent_orders_cache.memoize(last_queried_at) { Order.where(updated_at: last_queried_at..) }

When this code is called the first time, the @values Hash won’t have the key so the block will be called and the result of the block will be saved in the @values hash with the argument as the key. Subsequent calls with the same argument will return the cached value skipping the block. Calls with different arguments will execute the block and save the value. Using a block in this way is a nice Rubyish way of doing things and feels quite natural IMO.

You can add method to clear the cache if you want. I’ve never found the need though.

# Clear specific or all memoized values
def clear(key = nil)
  @mutex.synchronize do
    if key
      @values.delete(key)
    else
      @values.clear
    end
  end
end

Do we really need a Mutex? That’s a good question. In theory the value is only ever going to be set once and if you have two threads trying to set the value at the same time it’s most likely that the value will be the same so does it matter? Maybe you will have a thread reading and another thread clearing the value? I think it’s better to be safe. Incidentally the builtin memoization operator might not be thread safe2

Multiple parameters

What do you do if your function takes more than one argument? Luckily Ruby Hash keys can be almost anything so passing in an array as your key is perfectly fine

recent_orders_cache.memoize([user_id, last_queried_at]) { User.find(user_id).orders.where(updated_at: last_queried_at..) }

Alternatives

There are a plenty of gems that provide similar functionality,

tycooon/memery

panorama-ed/memo_wise

matthewrudy/memoist

Ruby on Rails used to have a memoize module (ActiveSupport::Memoizable) and memoist is that code extracted into a gem.

There are probably more but why add an extra dependency to your Gemfile for something that’s quite simple to implement.

Ruby on Rails

Memo is effectively a cache and Ruby on Rails has caching built in3 so you could just use Rails’ lowlevel caching. Rails.cache.fetch is effectively the same as Memo#memoize. Based on this there’s an argument to be made that Memo#memoize should be called Memo#fetch.

Thread Local Storage

The first implementation creates a “global” cache. That is all threads share the same values. If you don’t want that and want each thread to have its own values you can change the class to use thread local variables. I’m not sure which is best. It’s probably up to your use case. I have flip-flopped between both, but have settled on the mutex version for my uses. I think if you have a value in your app it’s most likely to be consistent app wide. Anyway here’s the alternate version that uses a cache per thread.

class Memo
  MEMO_KEY = :memo_values
  private_constant :MEMO_KEY

  def memoize(key = :_default_key)
    raise ArgumentError, "Block is required" unless block_given?

    values = thread_values
    values.key?(key) ? values[key] : values[key] = yield
  end

  private

  def thread_values
    Thread.current.thread_variable_get(MEMO_KEY) || begin
      hash = {}
      Thread.current.thread_variable_set(MEMO_KEY, hash)
      hash
    end
  end
end

This version stores its state in Thread.current and doesn’t store any state of its own. As such it could be a module with a couple of module_methods. This will simplify it even more and get rid of the need to instantiate the class, eg

Memo.memoize(last_queried_at) { Order.where(updated_at: last_queried_at..) }

Or, as I said above, you could remove the Mutex code entirely and YOLO it 😜.

The final Memo class is here

memo.rb


Here’s an explainer if you haven’t come across Ruby memoization before. Paste these code samples into IRB and have a play.

def long_process
  p "Waiting..."
  42
end

def foo
  @foo ||= long_process
end

foo
"Waiting..."
=> 42
foo
=> 42

We see the first time we call #foo we call #long_process but the second time we don’t because @foo is no longer nil

If the result of the #long_process call is nil we will keep calling it.

def long_process
  p "Waiting for nil..."
  nil
end

def nil_foo
  @nil_foo ||= long_process
end

nil_foo
"Waiting for nil..."
=> nil
nil_foo
"Waiting for nil..."
=> nil

This is where defined? comes in useful

def better_nil_foo
  @better_nil_foo = long_process unless defined? @better_nil_foo
end

better_nil_foo
"Waiting for nil..."
=> nil
better_nil_foo
=> nil

In this case our behaviour is correct. So why use ||=? Good question. I think it might be as simple as less to type ¯\_(ツ)_/¯

This demonstrates the problem with arguments

def long_process(arg)
  p "Waiting for #{arg} days..."
  "#{arg} process"
end

def arg_foo(arg)
  @arg_foo ||= long_process(arg)
end

arg_foo("one")
"Waiting for one days..."
=> "one process"
arg_foo("one")
=> "one process"
arg_foo("two")
=> "one process"

Our second call to #arg_foo behaves as expected, ie. we didn’t call the long running process, but our third call with a different argument isn’t what we want.


  1. Wikipedia defines memoization as: “In computing, memoization or memoisation is an optimisation technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again.”

  2. So You Want To Remove The GVL? An article about thread safety in Ruby that coincidentally mentions ||=

  3. You need to set up some infrastructure to use Rails cache. Caching with Rails: An Overview — Ruby on Rails Guides

#ruby