David Bernheisel
Fast Postgres table counting in Ecto
`SELECT COUNT(id) from my_big_table` could be too slow for huge tables since it traverses all rows. Instead, you can get an approximation that's updated every AUTOVACUUM.
defmodule MyApp.Repo do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.Postgres
@doc """
# Fast Postgres counting.
This will query internal PG tables that're updated by AUTOVACUUM. It's an approximate count based on the total disk
size of the table divided by the average size of row.
For example:
ACTUAL count `SELECT count(id) from my_big_table` = 4535133
APPROXIMATE count 4499917 (off by 35216)
Crash course:
reltuples = # of rows (relation) on a page
relpages = # of pages of rows
These ^ are updated based on the last AUTOVACUUM which runs periodically
Get the average # of rows on a page
eg: reltuples = 4499917, relpages = 2050761
4499917 / 2050761 = 2.1942669087232
Postgres essentially stores a page on a disk block.
So now we need to query disk information:
pg_relation_size = disk size of table
current_setting('block_size') = disk block size
Get the row size per block for the table
eg: pg_relation_size = 16799834112, block_size = 8192
16799834112 / 8192 = 2050761
Now multiply the rows per page with the row size per block
eg: 2.1942669087232 * 2050761 = 4499917
"""
def fast_table_count(ecto_schema) do
table = ecto_schema.__schema__(:source)
case Ecto.Adapters.SQL.query!(
__MODULE__,
"""
SELECT
(NULLIF(reltuples, 0) / NULLIF(relpages, 0)) * (
pg_relation_size('#{table}') / (
current_setting('block_size')::INTEGER)
)
FROM pg_class
WHERE relname = '#{table}'
"""
) do
%{rows: [[nil]]} -> 0
%{rows: [[count]]} -> count
end
end
end
29
upvotes