Let’s talk about Memoization
Memoization1 is a common pattern in Ruby. So common Ruby has a built in operator for it. We have all seen or used something like these. (If this is new to you scroll to the end. There’s some code that shows how this works.)
# 1. simple
foo ||= 42
# 2. multi-line version
foo ||= begin
...
end
# 3. more robust version
foo = 42 unless defined? foo
The memoization operator (or the "or equals" operator) means assign the value on right to the name on the left only if the name points to a nil
value.
Version 3 is a more robust version because it avoids one of the issues with the ||=
operator where your memoized operation might return nil
. The ||=
operator only executes the righthand side if the lefthand side is nil
. But if your expensive operation evaluated to nil
it will assign nil
and still be executed each time it’s called.
Memoization is most commonly used in accessor methods like below,
def recent_orders
@recent_orders ||= Orders.where(updated_at: (Date.today - ONE_WEEK)..)
end
I often find a need for memoization in reports or complex views with dense information. You might have an object that is accessed in many partials so you can save a few cycles by caching the value.
These methods for memoization will get you what you need in 99% of cases but there’s one gotcha that I have seen folks fall victim to quite often. What happens if recent_orders
takes an argument?
def recent_orders(date)
@recent_orders ||= Orders.where(updated_at: date..)
end
The problem here is once the value has been evaluated the first time you are stuck with that value for subsequent calls. This is probably not what you want. You need a way to store the value of your expensive operation for each passed in argument value.
Here’s a tiny class that solves that issue.
class Memo
def initialize
@mutex = Mutex.new
@values = {}
end
def memoize(key = :_default_key)
raise ArgumentError, "Block is required" unless block_given?
@mutex.synchronize do
if @values.key?(key)
@values[key]
else
@values[key] = yield
end
end
end
end
You use this class like so:
recent_orders_cache = Memo.new
recent_orders_cache.memoize(last_queried_at) { Order.where(updated_at: last_queried_at..) }
When this code is called the first time, the @values
Hash
won’t have the key so the block will be called and the result of the block will be saved in the @values
hash with the argument as the key. Subsequent calls with the same argument will return the cached value skipping the block. Calls with different arguments will execute the block and save the value. Using a block in this way is a nice Rubyish way of doing things and feels quite natural IMO.
You can add method to clear the cache if you want. I’ve never found the need though.
# Clear specific or all memoized values
def clear(key = nil)
@mutex.synchronize do
if key
@values.delete(key)
else
@values.clear
end
end
end
Do we really need a Mutex
? That’s a good question. In theory the value is only ever going to be set once and if you have two threads trying to set the value at the same time it’s most likely that the value will be the same so does it matter? Maybe you will have a thread reading and another thread clearing the value? I think it’s better to be safe. Incidentally the builtin memoization operator might not be thread safe2
Multiple parameters
What do you do if your function takes more than one argument? Luckily Ruby Hash
keys can be almost anything so passing in an array as your key is perfectly fine
recent_orders_cache.memoize([user_id, last_queried_at]) { User.find(user_id).orders.where(updated_at: last_queried_at..) }
Alternatives
There are a plenty of gems that provide similar functionality,
Ruby on Rails used to have a memoize module (ActiveSupport::Memoizable) and memoist is that code extracted into a gem.
There are probably more but why add an extra dependency to your Gemfile for something that’s quite simple to implement.
Ruby on Rails
Memo is effectively a cache and Ruby on Rails has caching built in3 so you could just use Rails’ lowlevel caching. Rails.cache.fetch
is effectively the same as Memo#memoize
. Based on this there’s an argument to be made that Memo#memoize
should be called Memo#fetch
.
Thread Local Storage
The first implementation creates a “global” cache. That is all threads share the same values. If you don’t want that and want each thread to have its own values you can change the class to use thread local variables. I’m not sure which is best. It’s probably up to your use case. I have flip-flopped between both, but have settled on the mutex version for my uses. I think if you have a value in your app it’s most likely to be consistent app wide. Anyway here’s the alternate version that uses a cache per thread.
class Memo
MEMO_KEY = :memo_values
private_constant :MEMO_KEY
def memoize(key = :_default_key)
raise ArgumentError, "Block is required" unless block_given?
values = thread_values
values.key?(key) ? values[key] : values[key] = yield
end
private
def thread_values
Thread.current.thread_variable_get(MEMO_KEY) || begin
hash = {}
Thread.current.thread_variable_set(MEMO_KEY, hash)
hash
end
end
end
This version stores its state in Thread.current
and doesn’t store any state of its own. As such it could be a module with a couple of module_method
s. This will simplify it even more and get rid of the need to instantiate the class, eg
Memo.memoize(last_queried_at) { Order.where(updated_at: last_queried_at..) }
Or, as I said above, you could remove the Mutex
code entirely and YOLO it 😜.
The final Memo
class is here
Here’s an explainer if you haven’t come across Ruby memoization before. Paste these code samples into IRB and have a play.
def long_process
p "Waiting..."
42
end
def foo
@foo ||= long_process
end
foo
"Waiting..."
=> 42
foo
=> 42
We see the first time we call #foo
we call #long_process
but the second time we don’t because @foo
is no longer nil
If the result of the #long_process
call is nil
we will keep calling it.
def long_process
p "Waiting for nil..."
nil
end
def nil_foo
@nil_foo ||= long_process
end
nil_foo
"Waiting for nil..."
=> nil
nil_foo
"Waiting for nil..."
=> nil
This is where defined?
comes in useful
def better_nil_foo
@better_nil_foo = long_process unless defined? @better_nil_foo
end
better_nil_foo
"Waiting for nil..."
=> nil
better_nil_foo
=> nil
In this case our behaviour is correct. So why use ||=
? Good question. I think it might be as simple as less to type ¯\_(ツ)_/¯
This demonstrates the problem with arguments
def long_process(arg)
p "Waiting for #{arg} days..."
"#{arg} process"
end
def arg_foo(arg)
@arg_foo ||= long_process(arg)
end
arg_foo("one")
"Waiting for one days..."
=> "one process"
arg_foo("one")
=> "one process"
arg_foo("two")
=> "one process"
Our second call to #arg_foo
behaves as expected, ie. we didn’t call the long running process, but our third call with a different argument isn’t what we want.
Wikipedia defines memoization as: “In computing, memoization or memoisation is an optimisation technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again.”↩
So You Want To Remove The GVL? An article about thread safety in Ruby that coincidentally mentions
||=
↩You need to set up some infrastructure to use Rails cache. Caching with Rails: An Overview — Ruby on Rails Guides↩