[SOLVED] Join to XML query is only performant with redundant hardcoded WHERE clause

Issue

Oracle 18c:

I have a query that joins from a view to a subquery (related post: Join between XML queries).

with
subtype as (select * from sub_lc_events_asset_class_activity_vw),
domain  as (select      
                cast(rownum as number(38,0)) as rownum_,
                x.code,
                x.description,
                i.name as domain_name
            from        
                sde.gdb_items_vw i
            cross apply xmltable(
                '/GPCodedValueDomain2/CodedValues/CodedValue' 
                passing xmltype(i.definition)
                columns
                    code        varchar2(255) path './Code',
                    description varchar2(255) path './Name'
                ) x    
            where      
                i.name in('ACTIVITY_ATN','ACTIVITY_GCSM','ACTIVITY_MS','ACTIVITY_RD','ACTIVITY_SS_SL','ACTIVITY_WAT_RES','ACTIVITY_SWM_FACILITY','ACTIVITY_UND_SERV','ACTIVITY_BARRIER','ACTIVITY_WM_SAN')
                and i.name is not null)
select
    d.code as domain_code,
    d.description as domain_description,
    d.domain_name,
    s.subtype_code,
    s.subtype_description,
    s.subtype_field,
    s.subtype_field_domain,
    s.table_name as table_name
from
    subtype s
left join
    domain d
    on s.subtype_field_domain = d.domain_name

Explain Plan Screenshot

Normally, I’d include the explain plan as text, not as a screenshot. But this particular explain plain is a nightmare due to the XML extraction and the underlying data source (a view called GDB_ITEMS_VW) that is beyond my control. If I were to provide the explain plan as text in this post, it would be unreadable, and make the post too long. So I’ve included a screenshot link from SQL Developer instead.

DOMAIN_COD DOMAIN_DESCRIPTION                       DOMAIN_NAME   SUBTYPE_CODE  SUBTYPE_DESCRIPTION  SUBTYPE_FIELD SUBTYPE_FIELD_DOMAIN TABLE_NAME        
---------- ---------------------------------------- ------------- ------------- -------------------- ------------- -------------------- ------------------
RECON_CL   RECONSTRUCT CYCLING LANE                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_CRS  RECONSTRUCT CYCLING ROUTE - SIGNED ONLY  ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_CRU  RECONSTRUCT CYCLING ROUTE - URBAN PAVED  ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_FW   RECONSTRUCT FACILITY WALKWAY             ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_MUP  RECONSTRUCT MULTI-USE PATH               ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_NT   RECONSTRUCT NATURE TRAIL                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_PP   RECONSTRUCT PARK PATHWAY                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_R    RECONSTRUCT RAMP                         ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_S    RECONSTRUCT SIDEWALK                     ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_STWY RECONSTRUCT STAIRWAY                     ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
...
121 rows selected. 

The query is very fast: 0.09 seconds.


However, that query is hardcoded to certain domain names:

where
    i.name in('ACTIVITY_ATN','ACTIVITY_GCSM','ACTIVITY_MS','ACTIVITY_RD','ACTIVITY_SS_SL','ACTIVITY_WAT_RES','ACTIVITY_SWM_FACILITY','ACTIVITY_UND_SERV','ACTIVITY_BARRIER','ACTIVITY_WM_SAN')
    and i.name is not null)

I want to avoid hardcoding the domain name if I can help it.

If I remove the hardcoded domain names from the WHERE clause, then the query produces the same result, which is expected, due to the left join.

... 
where      
    --I removed the domain names from the WHERE clause.
    i.name is not null)
...

Explain Plan Screenshot

DOMAIN_COD DOMAIN_DESCRIPTION                       DOMAIN_NAME   SUBTYPE_CODE  SUBTYPE_DESCRIPTION  SUBTYPE_FIELD SUBTYPE_FIELD_DOMAIN TABLE_NAME        
---------- ---------------------------------------- ------------- ------------- -------------------- ------------- -------------------- ------------------
RECON_CL   RECONSTRUCT CYCLING LANE                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_CRS  RECONSTRUCT CYCLING ROUTE - SIGNED ONLY  ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_CRU  RECONSTRUCT CYCLING ROUTE - URBAN PAVED  ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_FW   RECONSTRUCT FACILITY WALKWAY             ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_MUP  RECONSTRUCT MULTI-USE PATH               ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_NT   RECONSTRUCT NATURE TRAIL                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_PP   RECONSTRUCT PARK PATHWAY                 ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_R    RECONSTRUCT RAMP                         ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_S    RECONSTRUCT SIDEWALK                     ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
RECON_STWY RECONSTRUCT STAIRWAY                     ACTIVITY_ATN  0             ACTIVE TRANSPORTATIO ACTIVITY      ACTIVITY_ATN         INFRASTR.LC_EVENTS
...
121 rows selected. 

Unfortunately, the query is now extremely slow: 35 seconds.

I don’t understand why the query is so slow now. I wouldn’t have expected the domain names in the WHERE clause to have served much of a purpose, since the left join achieves the same thing; the join omits the same rows.


Question:

Why is the query only performant when the domains are hardcoded in the WHERE clause?

Apologies if I haven’t included enough information. But I felt this post was long enough already. It’s hard to know what amount of information would be right in the "sweet spot" (not too much, not too little). If more information is needed, then feel free to let me know.

Solution

There is a combination of factors:

  • When using ROWNUM Oracle must materialise the sub-query to generate the row numbering and generates all the rows from the XMLTABLE at that point (see 1, 2, 3 for examples of problems which require materialising the sub-query).
  • When you filter on i.name in(...) then the explain plan shows that the SQL engine can use an index on the column and without the filter it is not using the index and is performing a full table scan (which is much slower).
  • Because the SQL engine has to materialise the sub-query then it cannot use the join condition from the outer query to filter the rows as it has to generate all the rows to generate the row numbering first.

If you remove ROWNUM from the SELECT clause of the sub-query then:

  1. The SQL engine does not need to materialise the sub-query.
  2. Which means it has the opportunity to re-write the query to push the sub-query into the outer query and perform the LEFT JOIN before it generates the XMLTABLE.
  3. Once it re-writes the query then the ON condition can be used to provide the values to use the index, similar to using i.name in(...), and the full table scan can be eliminated again and index scans used.

Answered By – MT0

Answer Checked By – Pedro (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.