Developer Utilities

David Bernheisel 2021-08-05

Fast Postgres table counting in Ecto

`SELECT COUNT(id) from my_big_table` could be too slow for huge tables since it traverses all rows. Instead, you can get an approximation that's updated every AUTOVACUUM.

defmodule MyApp.Repo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.Postgres

  @doc """
  # Fast Postgres counting.

  This will query internal PG tables that're updated by AUTOVACUUM. It's an approximate count based on the total disk
  size of the table divided by the average size of row.

  For example:
    ACTUAL count `SELECT count(id) from my_big_table` = 4535133
    APPROXIMATE count 4499917 (off by 35216)

  Crash course:

    reltuples = # of rows (relation) on a page
    relpages = # of pages of rows
    These ^ are updated based on the last AUTOVACUUM which runs periodically

    Get the average # of rows on a page
      eg: reltuples = 4499917, relpages = 2050761
          4499917 / 2050761 = 2.1942669087232

    Postgres essentially stores a page on a disk block.
    So now we need to query disk information:

    pg_relation_size = disk size of table
    current_setting('block_size') = disk block size

    Get the row size per block for the table
      eg: pg_relation_size = 16799834112, block_size = 8192
          16799834112 / 8192 = 2050761

    Now multiply the rows per page with the row size per block
      eg: 2.1942669087232 * 2050761 = 4499917
  """
  def fast_table_count(ecto_schema) do
    table = ecto_schema.__schema__(:source)
    case Ecto.Adapters.SQL.query!(
           __MODULE__,
           """
           SELECT
             (NULLIF(reltuples, 0) / NULLIF(relpages, 0)) * (
              pg_relation_size('#{table}') / (
              current_setting('block_size')::INTEGER)
             )
           FROM pg_class
           WHERE relname = '#{table}'
           """
         ) do
      %{rows: [[nil]]} -> 0
      %{rows: [[count]]} -> count
    end
  end
end

Elixir Tips

Fast Postgres table counting in Ecto