Saturday, 13 October 2012

Using Statistical Views in DB2 to improve query performance

In a previous post i mentioned about how to find slow queries in DB2 and this has proved very useful. By looking through a lot of different bits of SQL that were consistently slow i found a common bit of SQL was used in lots of places.

Here is the common bit of SQL I was finding in lots of places...

...
LEFT OUTER JOIN (  
 eone.TPM_USER_ADRS_DTL UAD
  INNER JOIN eone.TSCH_LKP_DTL LKP
       ON UAD.AddressType = LKP.LOOKUPDATAID 
       AND LKP.LOOKUPID = 2 
      AND LKP.ISDELETED = '0' 
       AND LKP.LOOKUPCode = 'C'
  ) ON PCTD.CONTACTID = UAD.USERID
...

Just these joins caused the SQL to take 1-2 minutes to run.

My first thought is to check to see if an index would help but unfortunately it didn't. So i logged a ticket in our Jira for each different bit of SQL and waited patiently for someone to fix each and every place this bit of SQL was used. However this can take months to be fixed and released due to our internal development and release processes.

So after a couple of months with no progress i decided to have a look at the SQL again and see if i could help DB2 pick a better access plan using a statistical view.

I created the following statistical view...

CREATE VIEW EONE.Statistical_User_Address_View
    ( Col1 )
AS
select 1 from 
       eone.TPM_USER_ADRS_DTL UAD
       INNER JOIN eone.TSCH_LKP_DTL LKP
       ON UAD.AddressType = LKP.LOOKUPDATAID 
          AND LKP.LOOKUPID = 2 
          AND LKP.ISDELETED = '0' 
          AND LKP.LOOKUPCode = 'C'
WITH NO ROW MOVEMENT;

--This makes the view a statistical view...
ALTER VIEW EONE.Statistical_User_Address_View ENABLE QUERY OPTIMIZATION;

runstats on table eone.TPM_USER_ADRS_DTL WITH DISTRIBUTION AND DETAILED INDEXES ALL;
runstats on table eone.tsch_lkp_dtl WITH DISTRIBUTION AND DETAILED INDEXES ALL;
runstats on table EONE.Statistical_User_Address_View WITH DISTRIBUTION;

After creating this view all the SQL statements that matched this views joins and where clauses were now taking just a couple of seconds to run. A statistical view won't cause any issues with existing functionality and it won't require much testing so its very quick and easy to apply to production.

This statistical view is a great temporary solution to a performance problem like this. However i would only recommend it as a temporary solution because fixing the SQL is always a better option.

Also if you want to find all your statistical views in the SYSCAT.TABLES table then you need to run this...

--The 13th charater in the PROPERTY column will be Y for statistical views...
select * from SYSCAT.TABLES where TYPE='V' and substr(PROPERTY,13,1)='Y';

1 comment: