[SOLVED] Inconsistent cold start query performance in Postgres query

Issue

We’re having issues trying to build queries for a Postgres hosted datamart. Our query is simple, and contains a modest amount of data. We’ve seen some vast differences in the execution time of this query between runs- sometimes taking around 20 seconds, other times taking just 3 seconds- but we cannot seem to see what causes these differences and we’re aiming to get consistent results. There are only 2 tables involved in the query, one representing order rows (OrderItemTransactionFact 2,937,264 rows) and the other recording current stock levels for each item (stocklevels 62,353 rows). There are no foreign keys due to this being a datamart which we run ETL processes against so require fast loading.

The query is;

select 
    oitf."SKUId",
    sum(oitf."ConvertedLineTotal") as "totalrevenue",
    sum(oitf."Quantity") as "quantitysold",
    coalesce (sl."Available",0) as "availablestock"
from "OrderItemTransactionFact" oitf
left join stocklevels sl on sl."SKUId" = oitf."SKUId"
where 
    oitf."transactionTypeId" = 2 
    and oitf."hasComposite" = false
    and oitf."ReceivedDate" >= extract(epoch from timestamp '2020-07-01 00:00:00') 
    and oitf."ReceivedDate" <= extract(epoch from timestamp '2021-10-01 00:00:00')
group by 
    oitf."SKUId", sl."Available"
order by oitf."SKUId";

The OrderItemTransactionFact table has a couple indexes;

create index IX_OrderItemTransactionFact_ReceivedDate on public."OrderItemTransactionFact" ("ReceivedDate" DESC);
create index IX_OrderItemTransactionFact_ReceivedDate_transactionTypeId on public."OrderItemTransactionFact" ("ReceivedDate" desc, "transactionTypeId");

Execution plan output for a 26 second run is;

GroupAggregate  (cost=175096.24..195424.66 rows=813137 width=52) (actual time=24100.268..24874.065 rows=26591 loops=1)
  Group Key: oitf."SKUId", sl."Available"
  Buffers: shared hit=659 read=43311 written=1042
  ->  Sort  (cost=175096.24..177129.08 rows=813137 width=19) (actual time=24100.249..24275.594 rows=916772 loops=1)
        Sort Key: oitf."SKUId", sl."Available"
        Sort Method: quicksort  Memory: 95471kB
        Buffers: shared hit=659 read=43311 written=1042
        ->  Hash Left Join  (cost=20671.85..95274.08 rows=813137 width=19) (actual time=239.392..23127.993 rows=916772 loops=1)
              Hash Cond: (oitf."SKUId" = sl."SKUId")
              Buffers: shared hit=659 read=43311 written=1042
              ->  Bitmap Heap Scan on "OrderItemTransactionFact" oitf  (cost=18091.90..73485.91 rows=738457 width=15) (actual time=200.178..22413.601 rows=701397 loops=1)
                    Recheck Cond: (("ReceivedDate" >= '1585699200'::double precision) AND ("ReceivedDate" <= '1625097600'::double precision))
                    Filter: ((NOT "hasComposite") AND ("transactionTypeId" = 2))
                    Rows Removed by Filter: 166349
                    Heap Blocks: exact=40419
                    Buffers: shared hit=55 read=42738 written=1023
                    ->  Bitmap Index Scan on ix_orderitemtransactionfact_receiveddate  (cost=0.00..17907.29 rows=853486 width=0) (actual time=191.274..191.274 rows=867746 loops=1)
                          Index Cond: (("ReceivedDate" >= '1585699200'::double precision) AND ("ReceivedDate" <= '1625097600'::double precision))
                          Buffers: shared hit=9 read=2365 written=181
              ->  Hash  (cost=1800.53..1800.53 rows=62353 width=8) (actual time=38.978..38.978 rows=62353 loops=1)
                    Buckets: 65536  Batches: 1  Memory Usage: 2948kB
                    Buffers: shared hit=604 read=573 written=19
                    ->  Seq Scan on stocklevels sl  (cost=0.00..1800.53 rows=62353 width=8) (actual time=0.031..24.301 rows=62353 loops=1)
                          Buffers: shared hit=604 read=573 written=19
Planning Time: 0.543 ms
Execution Time: 24889.522 ms

But then execution plan for the same query when it took just 3 seconds;

GroupAggregate  (cost=173586.52..193692.59 rows=804243 width=52) (actual time=2616.588..3220.394 rows=26848 loops=1)
  Group Key: oitf."SKUId", sl."Available"
  Buffers: shared hit=2 read=43929
  ->  Sort  (cost=173586.52..175597.13 rows=804243 width=19) (actual time=2616.570..2813.571 rows=889937 loops=1)
        Sort Key: oitf."SKUId", sl."Available"
        Sort Method: quicksort  Memory: 93001kB
        Buffers: shared hit=2 read=43929
        ->  Hash Left Join  (cost=20472.48..94701.25 rows=804243 width=19) (actual time=185.018..1512.626 rows=889937 loops=1)
              Hash Cond: (oitf."SKUId" = sl."SKUId")
              Buffers: shared hit=2 read=43929
              ->  Bitmap Heap Scan on "OrderItemTransactionFact" oitf  (cost=17892.54..73123.18 rows=730380 width=15) (actual time=144.000..960.232 rows=689090 loops=1)
                    Recheck Cond: (("ReceivedDate" >= '1593561600'::double precision) AND ("ReceivedDate" <= '1633046400'::double precision))
                    Filter: ((NOT "hasComposite") AND ("transactionTypeId" = 2))
                    Rows Removed by Filter: 159949
                    Heap Blocks: exact=40431
                    Buffers: shared read=42754
                    ->  Bitmap Index Scan on ix_orderitemtransactionfact_receiveddate  (cost=0.00..17709.94 rows=844151 width=0) (actual time=134.806..134.806 rows=849039 loops=1)
                          Index Cond: (("ReceivedDate" >= '1593561600'::double precision) AND ("ReceivedDate" <= '1633046400'::double precision))
                          Buffers: shared read=2323
              ->  Hash  (cost=1800.53..1800.53 rows=62353 width=8) (actual time=40.500..40.500 rows=62353 loops=1)
                    Buckets: 65536  Batches: 1  Memory Usage: 2948kB
                    Buffers: shared hit=2 read=1175
                    ->  Seq Scan on stocklevels sl  (cost=0.00..1800.53 rows=62353 width=8) (actual time=0.025..24.620 rows=62353 loops=1)
                          Buffers: shared hit=2 read=1175
Planning Time: 0.565 ms
Execution Time: 3235.300 ms

The server config is;
version: PostgreSQL 12.1, compiled by Visual C++ build 1914, 64-bit
work_mem : 1048576kb
shared_buffers : 16384 (x8kb)

Thanks in advance!

Solution

It is the filesystem cache. The slow one had to read the data off disk. The fast one just had to fetch the data from memory, probably because the slow one already read it and left it there. You can make this show up explicitly in the plans by turning on track_io_timing.

It should help a little to have an index on ("transactionTypeId","hasComposite","ReceivedDate"), perhaps a lot to crank up effective_io_concurrency (depending on your storage system).

But mostly, get faster disks.

Answered By – jjanes

Answer Checked By – Terry (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *