Beginners Guide to Ruby on Rails Performance: Part 2
This is the continuation of my post about simple changes to improve the performance of Ruby on Rails application.
Part 1 is here Beginners Guide to Ruby on Rails Performance: Part 1
Part 1 dealt with fairly straight forward fixes. Part 2 is a bit more complicated, but not much more.
Nested loops
When you first learn the Rails Way⢠youāre shown to write nested loops because of the way associations are defined on Models. For example if we have set of models with has_many
associations
class Customer < ActiveRecord::Base
has_many :invoices
end
class Invoice < ActiveRecord::Base
has_many :products
belongs_to :customer
end
class Product < ActiveRecord::Base
belongs_to :invoice
belongs_to :refund, optional: true
has_many :charges
end
class Charge < ActiveRecord::Base
belongs_to :product
scope :discounted, -> { where(discount: true) }
end
class Refund < ActiveRecord::Base
has_one :product
end
Itās easy and tempting (and kind of encouraged) to write code like this
discounts = []
Customer.find_by(name: 'Jane').invoices.each do |invoice|
invoice.products.each do |product|
product.charges.each do |charge|
discounts << charge if charge.discount?
end
end
end
We iterate through each association looking at each record and their associations as we go. Itās easy to understand because it aligns with the models, but itās really inefficient.
[Lots of queries]
Memory Usage: 41.44 MB
GC Runs: 12
Objects Created: 245,969
The garbage collector ran thirteen times! Thatās not great. This code loads every record in the database to find what could be a small sub-set of data. To fix this we want to turn the loops into joins. This can be a bit hard to reason about but the way to think about it is we turn the loops inside out. That is, start with the object ātypeā that you are looking for, usually the deepest loop, and join your way back up to the outer loop. An example will make it clearer what I mean. For the above code we want to return a list of charges that have discounts so we start from there and join each association as we move out of the loops adding any where
clauses if needed as we go.
Charge.where(discount: true)
.joins(product: { invoice: :customer })
.where(customers: { name: 'Jane' }) # <---
Iād argue that this version is more readable. Itās definitely more declarative though it requires you to understand SQL more. One thing to note when doing joins like this is the table name in the where clause. The association from Invoice
to Customer
is customer
but we have to use the table name customers
. This is especially important if the table name doesnāt follow the Rails convention.
Charge Load (6.8ms) SELECT "charges".* FROM "charges" INNER JOIN "products" ON "products"."id" = "charges"."product_id" INNER JOIN "invoices" ON "invoices"."id" = "products"."invoice_id" INNER JOIN "customers" ON "customers"."id" = "invoices"."customer_id" WHERE "charges"."discount" = ? AND "customers"."name" = ? [["discount", 1], ["name", "Jane"]]
Memory Usage: 2.88 MB
GC Runs: 0
Objects Created: 12,643
Comparison:
query: 195.4 i/s
nest loop with includes: 5.9 i/s - 32.99x slower
nest loop: 1.7 i/s - 113.85x slower
Whereas the first version loads every Charge
, even if we immediately throw them away, the new version only loads Charge
s we know we need. It executes a single SQL query and will be much faster even for small data sets. At the risk of spoiling my next post I benchmarked a version of the nested loop code using includes
. This made four queries rather than one hundred thousand plus that the naive version makes. Memory consumption didnāt change. See my next post for details.
This isnāt exactly a nested loop problem, but donāt forget about update_all
, insert_all
, delete_all
and destroy_all
Customer.find_by(name: 'Jane').invoices.each do |invoice|
invoice.products.each do |product|
product.charges.each do |charge|
charge.discount_amount = 0 if charge.discount?
end
end
end
Charge.where(discount: true)
.joins(product: { invoice: :customer })
.where(customers: { name: 'Jane' })
.update_all(discount_amount: 0)
This will make a single UPADTE
query. Not always possible but useful when it is.
Queries
Earlier I had this code example.
warm_hues = Colour.where(name: %w|red orange yellow|).pluck(:id)
warm_records = TestRecord.where(colour_id: warm_hues)
Colour Pluck (0.3ms) SELECT "colours"."id" FROM "colours" WHERE "colours"."name" IN (?, ?, ?) [["name", "red"], ["name", "orange"], ["name", "yellow"]]
TestRecord Load (344.4ms) SELECT "test_records".* FROM "test_records" WHERE "test_records"."colour_id" IN (?, ?, ?) [["colour_id", 1], ["colour_id", 2], ["colour_id", 3]]
Memory Usage: 83.25 MB
GC Runs: 6
Objects Created: 3,000,272
This executes two queries when it can be easily reduced to one
warm_records = TestRecord.joins(:colour).where(colours: {name: %w|red orange yellow|})
TestRecord Load (425.0ms) SELECT "test_records".* FROM "test_records" INNER JOIN "colours" ON "colours"."id" = "test_records"."colour_id" WHERE "colours"."name" IN (?, ?, ?) [["name", "red"], ["name", "orange"], ["name", "yellow"]]
Memory Usage: 88.98 MB
GC Runs: 5
Objects Created: 3,000,281
Weāre not saving anything here, but not all ActiveRecord query chains are this simple.
Hereās a more complex example that will benefit more from a bit of tuning . This is actual code from a system that I worked on. The names have been changed to protect the innocent (me). This code was spread across several functions but inlining it like this hasnāt changed the behaviour.
count_objects do
Customer.where(name: 'Jane')
.invoices
.flat_map(&:products)
.select { |product| product.refund_id.nil? }
.uniq
.flat_map(&:charges)
.select(&:discount?)
.sum(&:total_price_incl_gst)
end
[Lots of queries]
Memory Usage: 38.05 MB
GC Runs: 14
Objects Created: 279,756
To optimise these sorts of complex queries we use the same technique as we did with nested loops. Turn the query inside out. What we are after is the charges so we start with those and join in reverse to the customer, adding the where clauses as you go.
count_objects do
Charge.where(discount: true)
.joins(product: { invoice: :customer })
.where(products: { refund_id: nil }) # <--- table name
.where(customers: { name: 'Jane' }) # <--- table name
.sum(&:total_price_incl_gst)
end
Charge Load (7.2ms) SELECT "charges".* FROM "charges" INNER JOIN "products" ON "products"."id" = "charges"."product_id" INNER JOIN "invoices" ON "invoices"."id" = "products"."invoice_id" INNER JOIN "customers" ON "customers"."id" = "invoices"."customer_id" WHERE "charges"."discount" = ? AND "products"."refund_id" IS NULL AND "customers"."name" = ? [["discount", 1], ["name", "Jane"]]
Memory Usage: 1.42 MB
GC Runs: 0
Objects Created: 15,449
Comparison:
query: 156.2 i/s
iterate: 1.8 i/s - 85.71x slower
This time we have made significant savings. It should be possible to get the database to sum the fields too. Iāll leave that as an exercise for the reader. I mentioned that the original code was spread across multiple functions. Thereās nothing stopping you doing that here either. You just have to pass ActiveRecord::Relation
s around instead of arrays. Thereās an example of that below.
Here are a couple more examples to help you spot patterns.
Filters are a common place where these sorts of improvements can be made. Once again hereās some real production code. Names have been changed.
scope = customer.invoices
.order(processed_at: :desc)
.filter { |invoice| invoice.processed_at.present? }
selected = scope
.filter { |invoice| invoice.processed_at >= 5.days.ago }
.filter { |invoice| invoice.status == params[:filter][:status] }
selected.map { |invoice| Invoice.find_by(id: invoice.id) }
In this case all the records will be loaded by the first filter
so the next two will be operating on records in memory. Then we load them again on the last line.
[Lots of queries]
Memory Usage: 0.13 MB
GC Runs: 0
Objects Created: 8,367
And the refactored code
Invoice.where(customer: customer)
.order(processed_at: :desc)
.where.not(processed_at: nil)
.where(processed_at: 5.days.ago..)
.where(status: params[:filter][:status])
Invoice Load (0.5ms) SELECT "invoices".* FROM "invoices" WHERE "invoices"."customer_id" = ? AND "invoices"."processed_at" IS NOT NULL AND "invoices"."processed_at" >= ? AND "invoices"."status" = ? ORDER BY "invoices"."processed_at" [["customer_id", 1], ["processed_at", "2025-02-22 23:38:49.125166"], ["status", "pending"]
Memory Usage: 0.0 MB
GC Runs: 0
Objects Created: 164
Comparison:
scopes: 3355.8 i/s
filter: 156.5 i/s - 21.45x slower
Memory consumption isnāt that different on this data set but the improved version is much faster.
Often we inadvertently pass arrays around as well as load too many records. The new_invoices
method returns a potential large array.
def new_invoices
invoices.select { |invoice| invoice.status == 'processed' }
end
def recalculate_estimated_invoices(reason)
new_invoices.each do |invoice|
invoice.recalculate!(reason) if invoice.estimated?
end
end
select and each
Invoice Load (15.6ms) SELECT "invoices".* FROM "invoices"
Memory Usage: 1.86 MB
GC Runs: 0
Objects Created: 8,523
This can be fixed a couple of ways. We could write this as a single expression
Invoice.where(status: 'processed', estimated: true).each do |invoice|
invoice.recalculate!(:reason)
end
Or, if you want to keep the methods we can passActiveRecord::Relation
s instead of arrays. Remember ActiveRecord::Relation
s arenāt evaluated until they are accessed so they can be passed around with (almost) zero cost.
def new_invoices
invoices.where(status: 'processed')
end
def recalculate_estimated_invoices(reason)
new_invoices.where(estimated: true).each do |invoice|
invoice.recalculate!(reason)
end
end
single query
Invoice Load (3.6ms) SELECT "invoices".* FROM "invoices" WHERE "invoices"."status" = ? AND "invoices"."estimated" = ? [["status", "processed"], ["estimated", 1]]
Memory Usage: 0.0 MB
GC Runs: 0
Objects Created: 1,016
Even better is to write scopes, but now weāre getting a bit off topic
def recalculate_estimated_invoices(reason)
invoices.processed.estimated.each do |invoice|
invoice.recalculate!(reason)
end
end
N+1
Some of you are probably shouting at the screen āWhat about N+1 and include!?ā I havenāt forgotten. I have a separate post about that coming soon
Stuff to think about when deciding to make these changes
How many objects do you expect?
If you know that the collection will always be small then there is no harm in grabbing the entire set. PostgreSQL (and probably other databases) does this. If your dataset is small PostgreSQL will often ignore an index because itās quicker to just scan the table.
TestRecord.limit(10).length
TestRecord Load (0.3ms) SELECT "test_records".* FROM "test_records" LIMIT ? [["LIMIT", 10]]
Memory Usage: 22.63 MB
GC Runs: 0
Objects Created: 216
Just make sure you consider the future possibility that your app goes viral!
Has the data already been loaded?
As we saw in the discussion about size
and blank?
If the data has already been loaded then it might be quicker to use Ruby to process the data. Otherwise you might trigger a second unnecessary trip to the database. Itās worth tracking backwards to find out why the data has been loaded. Can you defer the load? Should you memoize the collection to make sure it doesnāt get reloaded?
Conclusion
Hopefully you got this far and learnt something along the way. As I said in the introduction I think the goal is to make this way of coding your default so you donāt have to think about it. You might get some raised eyebrows at code review time because even now folks still code the Rails Wayā¢. But if you show the stats thereās really no argument. And, with the exception of the nested loops and queries, these are easy to find in your code. A simple grep will find uses of enumerable methods. Nested loops and queries can be a bit harder to track down especially if youāre are passing arrays around without realising it. Then you want to start using more dedicated performance techniques like profilers and application performance monitoring (APM) tools like Sentry
Donāt just look in your application code for these improvements. Tests are a place where people take a lot of short cuts and your tests are probably run a lot more than production code. You might save a lot of developer time be speeding up your tests. On the other hand your tests probably arenāt dealing with millions of rows so YMMV.
When this way of coding becomes second nature you will be start from a position of strength and you will save a lot of time in the future, both for you and your customers.
Code
Here is the code I used to generate the results in this post.
First the script to generate the database of sample data. I asked Claude to write this for me. This is something LLMs are quite good at. Using SQLite makes the pure SQL versions look a bit better because thereās no network latency, but not by much.
# frozen_string_literal: true
require 'sqlite3'
# Create a new SQLite database (or open if it exists)
db = SQLite3::Database.new "test.db"
# Drop the table if it exists
db.execute("DROP TABLE IF EXISTS test_records;")
db.execute("DROP TABLE IF EXISTS colours;")
# Create the table
db.execute(<<~SQL)
CREATE TABLE test_records (
id INTEGER PRIMARY KEY AUTOINCREMENT,
data INTEGER,
colour_id INTEGER
);
SQL
# Create the colours table
db.execute(<<~SQL)
CREATE TABLE IF NOT EXISTS colours (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
code TEXT
);
SQL
# Use a transaction to speed up insertion
db.execute("BEGIN TRANSACTION;")
COLOURS = {
'red' => '#FF0000',
'orange' => '#FFA500',
'yellow' => '#FFFF00',
'green' => '#008000',
'blue' => '#0000FF',
'indigo' => '#4B0082',
'violet' => '#8F00FF'
}.freeze
colour_ids = COLOURS.map do |name, hex|
db.execute("INSERT INTO colours (name, code) VALUES (?, ?);", [name, hex])
db.get_first_value("SELECT last_insert_rowid()")
end
1_000_000.times do
db.execute("INSERT INTO test_records (data, colour_id) VALUES (?, ?);", [rand(1..1_000_000), colour_ids.sample])
end
# Commit the transaction
db.execute("COMMIT;")
db.close
puts "Database and table created, 1,000,000 rows inserted."
I had a second script to generate the data for the nested loop and query tests. Thatās just more of the same but more.
This is the code used to measure the memory consumption. This isnāt a good way of gathering performance metrics. To do it properly you should run the tests in a loop multiple times and take the averages (see the benchmarking section below). You should also be more rigorous with recording the memory consumption. For this post weāre not measuring per se, we are just looking at comparisons between different methods and we expect the differences to be orders of magnitude different, not a few percentage points so this code is fine. I did run the test multiple times to get the stats I used in the post. I picked the stats that were roughly in the mean and the object counts calculation was verified using a more rigorous method1. You can use this to test any Ruby. If you are testing Rails youāll need to define your models and the database connection somewhere.
Update 14 Mar 2025:
The original code had an object counting bug. This is the fixed version. I also changed the GC.start
for ObjectSpace.garbage_collect
. It seems to give more stable numbers.
# frozen_string_literal: true
require 'active_record'
require 'benchmark/ips'
ActiveRecord::Base.logger = Logger.new($stdout)
EXCLUDED_TYPES = %i[FREE TOTAL T_IMEMO].freeze
def format_number_with_commas(number)
number.to_s.gsub(/(\d)(?=(\d{3})+(?!\d))/, '\1,')
end
def memory_usage
`ps -o rss= -p #{Process.pid}`.strip.to_i # Get memory usage in KB
end
def count_objects(message = nil)
puts message if message
# Run GC multiple times to ensure a clean baseline
3.times { ObjectSpace.garbage_collect }
sleep(1)
# Capture baseline metrics
start_objects = ObjectSpace.count_objects
start_memory = memory_usage
start_gc_count = GC.count
start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
result = yield
# Measure after execution
end_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
end_objects = ObjectSpace.count_objects
end_memory = memory_usage
# Calculate differences
stats = {
wall_time: end_time - start_time,
memory_diff_mb: (end_memory - start_memory).to_f / 1024,
gc_runs: GC.count - start_gc_count
}
object_diffs = end_objects.to_h do |key, count|
[key, count - (start_objects[key] || 0)]
end
p object_diffs
stats[:total_objects_created] = object_diffs
.except(*EXCLUDED_TYPES)
.sum { |_, diff| [diff, 0].max }
print_stats(stats)
result
end
def print_stats(stats)
puts <<~STATS
Time: #{stats[:wall_time].round(6)} seconds
Memory Usage: #{stats[:memory_diff_mb].round(2)} MB
GC Runs: #{stats[:gc_runs]}
Objects Created: #{format_number_with_commas(stats[:total_objects_created])}
STATS
end
This is a handy function to have lying around. Feel free to add it to your tool bag. You use it like this
count_objects "map(&:id)" do
TestRecord.where(colour_id: red_id).map(&:id)
end
This function will take some timing information but itās better to use a proper tool for that like the excellent benchmark-ips
gem.
Benchmark.ips do |x|
x.report('map') { TestRecord.where(colour_id: red_id).map(&:id) }
x.report('pluck') { TestRecord.where(colour_id: red_id).pluck(:id) }
x.report('ids') { TestRecord.where(colour_id: red_id).ids }
x.report("select") { TestRecord.where(colour_id: red_id).select(:id).load }
x.compare!
end
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
Warming up --------------------------------------
map 1.000 i/100ms
pluck 1.000 i/100ms
ids 1.000 i/100ms
select 1.000 i/100ms
Calculating -------------------------------------
map 2.868 (± 0.0%) i/s (348.66 ms/i) - 15.000 in 5.244710s
pluck 15.291 (± 6.5%) i/s (65.40 ms/i) - 77.000 in 5.051319s
ids 15.689 (± 6.4%) i/s (63.74 ms/i) - 79.000 in 5.043278s
select 3.534 (± 0.0%) i/s (283.00 ms/i) - 18.000 in 5.186966s
Comparison:
ids: 15.7 i/s
pluck: 15.3 i/s - same-ish: difference falls within error
select: 3.5 i/s - 4.44x slower
map: 2.9 i/s - 5.47x slower
This gem runs the benchmark through a couple of phases. It "warms up" the code and gets a rough measurement of speed. Then it runs the code for a fixed period of time to get the timings. Finally it spits out the comparison which is the part that I posted in the main post. Itās a solid benchmarking technique. The gem has a heap more functionality so if youāre interested in benchmarking Ruby have a read of the docs.
I verified the object counts and memory consumption with this gem MemoryProfiler↩