KEMBAR78
Analytic & Windowing functions in oracle | PPSX
Analytic and Window Functions in Oracle

           Logan Palanisamy
Agenda

 Difference between aggregate and analytic
  functions
 Introduction to various analytic functions
 Functions that are both aggregate and
  analytic
 Break
 More examples
 Enhanced Aggregation (CUBE, ROLLUP)
Meeting Basics

 Put your phones/pagers on vibrate/mute
 Messenger: Change the status to offline or
  in-meeting
 Remote attendees: Mute yourself (*6). Ask
  questions via Adobe Connect.
Aggregates vs. Analytics
 Aggregate functions
   Rows are collapsed. One row per group

   Non-Group-By columns not allowed in SELECT list.

 Analytic functions
   Rows are not collapsed

   As many rows in the output as in the input

   No restrictions on the columns in the SELECT list

   Evaluated after joins, WHERE, GROUP BY, HAVING clauses

   Nesting not allowed

   Can appear only in the SELECT or ORDER BY clause

 analytic_aggr_diff.sql
Analytics vs. other methods
 Show the dept, empno, sal and the sum of all salaries
  in their dept
 Three possible ways
    Using Joins
    Using Scalar Sub-queries
    Using Analytic Functions

 analytics_vs_others.sql
Anatomy of an analytic funcion
 function (arg1, ..., argN) OVER ([partition_by_clause]
    [order_by_clause [windowing_clause]])
   The OVER keyword
   partition_by_clause: Optional. Not related to table/index
    partitions. Analogous to GROUP BY
   order_by_clause: Mandatory for Ranking and Windowing
    functions. Optional or meaningless for others
   windowing_clause: Optional. Should always be preceded by
    ORDER BY clause
Types of analytical functions
 Ranking functions
 FIRST_VALUE/LAST_VALUE/NTH_VALUE
 Windowing functions
 Reporting functions
 LAG/LEAD
 FIRST/LAST
Ranking Functions
 ROW_NUMBER()
 RANK() – Skips ranks after duplicate ranks
 DENSE_RANK() – Doesn't skip rank after duplicate ranks
 NTILE(n) – Sorts the rows into N equi-sized buckets
 CUME_DIST() – % of rows with values lower or equal
 PERCENT_RANK() - (rank of row -1)/(#rows – 1)
 function OVER ([PARTITION BY <c1,c2..>] ORDER BY
  <c3, ..>)
 PARTITION BY clause: Optional
 ORDER BY clause: Mandatory
 rank_dense_rank.sql
FIRST_VALUE/LAST_VALUE/NTH_VALUE

 Returns the first/last/nth value from an ordered set
 FIRST_VALUE(expr, [IGNORE NULLS]) OVER
    ([partitonby_clause] orderby_clause)
   IGNORE NULLS options helps you "carry forward".
    Often used in "Data Densification"
   Operates on Default Window (unbounded preceding
    and current row) when a window is not explicitly
    specified.
   NTH_VALUE introduced in 11gR2
   flnth_value.sql
Window functions

 Used for computing cumulative/running totals (YTD, MTD,
    QTD), moving/centered averages
   function(args) OVER([partition_by_clause] order_by_clause
    [windowing_clause])
   ORDER BY clause: mandatory.
   Windowing Clause: Optional. Defaults to: UNBOUNDED
    PRECEDING and CURRENT ROW
   anchored or sliding windows
   Two ways to specify windows: ROWS, RANGE
   [ROW | RANGE ] BETWEEN <start_exp> AND <end_exp>
   window.sql
ROWS type windows

 Physical offset. Number of rows before or after current
    row
   Non deterministic results if rows are not sorted uniquely
   Any number of columns in the ORDER BY clause
   ORDER BY columns can be of any type
   function(args) OVER ([partition_by_clause] order by c1,
    .., cN ROWS between <start_exp> and <end_exp>)
   windows_rows.sql
RANGE type Windows

 Logical offset
 non-unique rows treated as one logical row
 Only one column allowed in ORDER BY clause
 ORDER BY column should be numeric or date
 function(args) OVER ([partition_by_clause] order
  by c1 RANGE between <start_exp> and <end_exp>)
 windows_range.sql
Reporting function

 Computes the ratio of a value to the sum of a set of
  values
 RATIO_TO_REPORT(arg) OVER ([PARTITION BY
  <c1, .., cN>]
 PARTITION BY clause: Optional.
 ratio_to_report.sql
LAG/LEAD

 Gives the ability to access other rows without self-join.
 Allows you to treat cursor as an array
 Useful for making inter-row calculations (year-over-year
    comparison, time between events)
   LEAD (expr, <offset>, <default value>) [IGNORE
    NULLS] OVER ([partioning_clause] orderby_clause)
   Physical offset. Can be fixed or varying. default offset is 1
   default value: value returned if offset points to a non-
    existent row
   IGNORE NULLS determines whether null values of are
    included or eliminated from the calculation.
   lead_lag.sql
FIRST/LAST

 Very different from FIRST_VALUE/LAST_VALUE
 Returns the results of aggregate/analytic function applied
    on column B on the first or last ranked rows sorted by
    column A
   function (expr_with_colB) KEEP (DENSE_RANK
    FIRST/LAST ORDER BY colA) [OVER
    (<partitioning_clause)>)]
   Slightly different syntax. Note the word KEEP
   analytic clause is optional.
   first_last.sql
Above/Below average calculation
 Find the list of employees whose salary is higher than
  the department average.
 above_average.sql
Top-N queries
 Find the full details of "set of" employees with the top-
  N salaries
 Find the two most recent hires in each department
 List the names and employee count of departments
  with the highest employee count
 top_n.sql
Top-N% queries
 List the top 5% of the customers by revenue
 top_np.sql
Multi Top-N queries
 For each customer, find out
   the maximum sale in the last 7 days

   the date of that sale

   the maximum sale in the last 30 days

   the date of that sale

 multi_top.sql
De-Duping
 Deleting duplicate records
 dedup.txt
Hypothetical Ranking
 CUME_DIST, DENSE_RANK, RANK,
  PERCENT_RANK
 Used for what-if analysis
 hypothetical_rank.sql
Inverse Percentile functions

 Return the value corresponding to a certain
    percentile (opposite of CUME_DIST)
   PERCENTILE_CONT (continuous)
   PERCENTILE_DISC (discrete)
   PERCENTILE_CONT(0.5) is the same as MEDIAN
   inverse_p.sql
String Aggregation: LISTAGG, STRAGG

 Concatenated string of values for a particular group
    (e.g. employees working in a dept)
   Tom Kyte's STRAGG
   11gR2 has LISTAGG
   10g has COLLECT
   listagg.sql
Pivoting/Unpivoting
 Pivoting
   transposes rows to columns

   DECODE/CASE and GROUP BY used

 Unpivoting
   Converts columns to rows

   Join the base table with a one column serial number table



 11gR2 introduced PIVOT and UNPIVOT clauses to
  SELECT
 pivot.sql
Data Densification

 Data normally stored in sparse form (e.g. No rows
    if there is no sales for a particular period)
   Missing data needed for comparison (e.g. month-
    over-month comparison)
   Data Densification comes in handy
   LAG (col, INGORE NULLS), and PARTITION BY
    OUTER JOIN are used.
   http://hoopercharles.wordpress.com/2009/12/07
    /sql-filling-in-gaps-in-the-source-data/
When not to use analytics
 When a simple group by would do the job
 when_not_to_use_analytics.sq
Drawback of analytics
 Lot of sorting.
 Set
  PGA_AGGREGATE_TARGET/SORT_AREA_SIZE
  appropriately
 New versions reduce the number of sorts (same
  partition_by and order_by clauses on multiple
  analytic functions use single sort)
 http://asktom.oracle.com/pls/asktom/f?p=100:11:0::
  NO::P11_QUESTION_ID:1137250200346660664
 http://jonathanlewis.wordpress.com/2009/09/07/an
  alytic-agony/
Recap of Analytic Functions
 Analytic Functions:
   Were introduced in 8.1.6 (~1998)

   Are supported within PL/SQL only from 10g. Use "view" or
    "dynamic sql" older versions.
   Compute the 'aggregates' while preserving the 'details'

   Eliminate the need for self-joins or multiple passes on the same
    table
   Reduce the amount of data transferred between DB and client.

   Can be used only in SELECT and ORDER BY clauses. Use sub-
    queries if there is a need to filter.
   Are computed at the end - after join, where, group by, having
    clauses
Advanced Aggregation

 GROUP BY col1, col2
 GROUP BY ROLLUP(col1, col2)
 GROUP BY CUBE(col1, col2)
 GROUP BY GROUPING SETS ((col1, col2), col1)
ROLLUP

 GROUP BY ROLLUP(col1, col2)
 Generates subtotals automatically
 Generally used in hierarchical dimensions (region,
  state, city), (year, quarter, month, day)
 n + 1 different groupings where n is the number of
  expressions in the ROLLUP operator in the GROUP
  BY clause.
 Order of the columns in ROLLUP matter.
  ROLLUP(col1, col2), ROLLUP(col2, col1) produce
  different outputs
CUBE


 GROUP BY CUBE(col1, col2)
 Gives subtotals automatically for every possible combination
 Used in cross-tabular reports.
 Suitable when dimensions are independent of each other
 2n different groupings where n is the number of expressions
  in the CUBE operator in the GROUP BY clause.
 Have to be careful with higher values for n
 Order of the columns in CUBE doesn’t really matter.
  CUBE(col1, col2), CUBE(col2, col1) produce same results, but
  in a different order.
Grouping Sets

 GROUP BY GROUPING SETS (col1, (col1, col2))
 Explicitly lists the needed groupings
 GROUPING, GROUPING_ID, GROUP_ID functions
  help you differentiate one grouping from the other.
 Advanced aggregation functions more efficient than
  their UNION ALL equivalents (why?)
        Grouping   Equivalent GROUPING SETS
 advanced_agg.sql
       CUBE(a,b)     GROUPING SETS((a,b), (a), (b), ())
       ROLLUP(a,b)   GROUPING SETS((a,b), (a), ())
       ROLLUP(b,a)   GROUPING SETS((a,b), (b), ())
       ROLLUP(a)     GROUPING SETS((a), ())
Composite Columns

 Treat multiple columns as a single column
 Composite_columns.sql
Concatenated Groupings

 GROUP BY GROUPING SETs (a,b), GROUPING
  SETS (c,d)
 The above is same as GROUP BY GROUPING SETS
  ((a,c), (a,d), (b,c), (b,d))
References

 http://orafaq.com/node/55
 http://orafaq.com/node/56
 http://www.orafaq.com/node/1874
 http://www.morganslibrary.org/reference/a
  nalytic_functions.html
 http://morganslibrary.org/reference/rollup.
  html
 http://www.oracle.com/technology/oramag/
  oracle/05-mar/o25dba.html
 http://www.gennick.com/magic.html
References

 Chapter 12 of "Expert Database Architecture"
  by Tom Kyte
 Business-Savy SQL by Ganesh Variar, Oracle
  magazine, Mar/April 2002
 http://asktom.oracle.com/pls/asktom/asktom.searc
  h?p_string=rock+and+roll
 http://forums.oracle.com/forums/search.jspa?q=an
  alytic&objID=f75&dateRange=all&numResults=30&
  forumID=75&rankBy=10001&start=0
Q&A

 devel_oracle@
Predicate merging in views with analytics
 create view v select .. over(partition by ...) from t;
 select ... from v where col1 = 'A'
 In some cases predicates don't get merged.
 Reasons:
     http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QU
      ESTION_ID:12864646978683#30266389821111
     http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_
      QUESTION_ID:1137250200346660664
     http://forums.oracle.com/forums/thread.jspa?messageID=416
      9151&#4169151

Analytic & Windowing functions in oracle

  • 1.
    Analytic and WindowFunctions in Oracle Logan Palanisamy
  • 2.
    Agenda  Difference betweenaggregate and analytic functions  Introduction to various analytic functions  Functions that are both aggregate and analytic  Break  More examples  Enhanced Aggregation (CUBE, ROLLUP)
  • 3.
    Meeting Basics  Putyour phones/pagers on vibrate/mute  Messenger: Change the status to offline or in-meeting  Remote attendees: Mute yourself (*6). Ask questions via Adobe Connect.
  • 4.
    Aggregates vs. Analytics Aggregate functions  Rows are collapsed. One row per group  Non-Group-By columns not allowed in SELECT list.  Analytic functions  Rows are not collapsed  As many rows in the output as in the input  No restrictions on the columns in the SELECT list  Evaluated after joins, WHERE, GROUP BY, HAVING clauses  Nesting not allowed  Can appear only in the SELECT or ORDER BY clause  analytic_aggr_diff.sql
  • 5.
    Analytics vs. othermethods  Show the dept, empno, sal and the sum of all salaries in their dept  Three possible ways  Using Joins  Using Scalar Sub-queries  Using Analytic Functions  analytics_vs_others.sql
  • 6.
    Anatomy of ananalytic funcion  function (arg1, ..., argN) OVER ([partition_by_clause] [order_by_clause [windowing_clause]])  The OVER keyword  partition_by_clause: Optional. Not related to table/index partitions. Analogous to GROUP BY  order_by_clause: Mandatory for Ranking and Windowing functions. Optional or meaningless for others  windowing_clause: Optional. Should always be preceded by ORDER BY clause
  • 7.
    Types of analyticalfunctions  Ranking functions  FIRST_VALUE/LAST_VALUE/NTH_VALUE  Windowing functions  Reporting functions  LAG/LEAD  FIRST/LAST
  • 8.
    Ranking Functions  ROW_NUMBER() RANK() – Skips ranks after duplicate ranks  DENSE_RANK() – Doesn't skip rank after duplicate ranks  NTILE(n) – Sorts the rows into N equi-sized buckets  CUME_DIST() – % of rows with values lower or equal  PERCENT_RANK() - (rank of row -1)/(#rows – 1)  function OVER ([PARTITION BY <c1,c2..>] ORDER BY <c3, ..>)  PARTITION BY clause: Optional  ORDER BY clause: Mandatory  rank_dense_rank.sql
  • 9.
    FIRST_VALUE/LAST_VALUE/NTH_VALUE  Returns thefirst/last/nth value from an ordered set  FIRST_VALUE(expr, [IGNORE NULLS]) OVER ([partitonby_clause] orderby_clause)  IGNORE NULLS options helps you "carry forward". Often used in "Data Densification"  Operates on Default Window (unbounded preceding and current row) when a window is not explicitly specified.  NTH_VALUE introduced in 11gR2  flnth_value.sql
  • 10.
    Window functions  Usedfor computing cumulative/running totals (YTD, MTD, QTD), moving/centered averages  function(args) OVER([partition_by_clause] order_by_clause [windowing_clause])  ORDER BY clause: mandatory.  Windowing Clause: Optional. Defaults to: UNBOUNDED PRECEDING and CURRENT ROW  anchored or sliding windows  Two ways to specify windows: ROWS, RANGE  [ROW | RANGE ] BETWEEN <start_exp> AND <end_exp>  window.sql
  • 11.
    ROWS type windows Physical offset. Number of rows before or after current row  Non deterministic results if rows are not sorted uniquely  Any number of columns in the ORDER BY clause  ORDER BY columns can be of any type  function(args) OVER ([partition_by_clause] order by c1, .., cN ROWS between <start_exp> and <end_exp>)  windows_rows.sql
  • 12.
    RANGE type Windows Logical offset  non-unique rows treated as one logical row  Only one column allowed in ORDER BY clause  ORDER BY column should be numeric or date  function(args) OVER ([partition_by_clause] order by c1 RANGE between <start_exp> and <end_exp>)  windows_range.sql
  • 13.
    Reporting function  Computesthe ratio of a value to the sum of a set of values  RATIO_TO_REPORT(arg) OVER ([PARTITION BY <c1, .., cN>]  PARTITION BY clause: Optional.  ratio_to_report.sql
  • 14.
    LAG/LEAD  Gives theability to access other rows without self-join.  Allows you to treat cursor as an array  Useful for making inter-row calculations (year-over-year comparison, time between events)  LEAD (expr, <offset>, <default value>) [IGNORE NULLS] OVER ([partioning_clause] orderby_clause)  Physical offset. Can be fixed or varying. default offset is 1  default value: value returned if offset points to a non- existent row  IGNORE NULLS determines whether null values of are included or eliminated from the calculation.  lead_lag.sql
  • 15.
    FIRST/LAST  Very differentfrom FIRST_VALUE/LAST_VALUE  Returns the results of aggregate/analytic function applied on column B on the first or last ranked rows sorted by column A  function (expr_with_colB) KEEP (DENSE_RANK FIRST/LAST ORDER BY colA) [OVER (<partitioning_clause)>)]  Slightly different syntax. Note the word KEEP  analytic clause is optional.  first_last.sql
  • 16.
    Above/Below average calculation Find the list of employees whose salary is higher than the department average.  above_average.sql
  • 17.
    Top-N queries  Findthe full details of "set of" employees with the top- N salaries  Find the two most recent hires in each department  List the names and employee count of departments with the highest employee count  top_n.sql
  • 18.
    Top-N% queries  Listthe top 5% of the customers by revenue  top_np.sql
  • 19.
    Multi Top-N queries For each customer, find out  the maximum sale in the last 7 days  the date of that sale  the maximum sale in the last 30 days  the date of that sale  multi_top.sql
  • 20.
    De-Duping  Deleting duplicaterecords  dedup.txt
  • 21.
    Hypothetical Ranking  CUME_DIST,DENSE_RANK, RANK, PERCENT_RANK  Used for what-if analysis  hypothetical_rank.sql
  • 22.
    Inverse Percentile functions Return the value corresponding to a certain percentile (opposite of CUME_DIST)  PERCENTILE_CONT (continuous)  PERCENTILE_DISC (discrete)  PERCENTILE_CONT(0.5) is the same as MEDIAN  inverse_p.sql
  • 23.
    String Aggregation: LISTAGG,STRAGG  Concatenated string of values for a particular group (e.g. employees working in a dept)  Tom Kyte's STRAGG  11gR2 has LISTAGG  10g has COLLECT  listagg.sql
  • 24.
    Pivoting/Unpivoting  Pivoting  transposes rows to columns  DECODE/CASE and GROUP BY used  Unpivoting  Converts columns to rows  Join the base table with a one column serial number table  11gR2 introduced PIVOT and UNPIVOT clauses to SELECT  pivot.sql
  • 25.
    Data Densification  Datanormally stored in sparse form (e.g. No rows if there is no sales for a particular period)  Missing data needed for comparison (e.g. month- over-month comparison)  Data Densification comes in handy  LAG (col, INGORE NULLS), and PARTITION BY OUTER JOIN are used.  http://hoopercharles.wordpress.com/2009/12/07 /sql-filling-in-gaps-in-the-source-data/
  • 26.
    When not touse analytics  When a simple group by would do the job  when_not_to_use_analytics.sq
  • 27.
    Drawback of analytics Lot of sorting.  Set PGA_AGGREGATE_TARGET/SORT_AREA_SIZE appropriately  New versions reduce the number of sorts (same partition_by and order_by clauses on multiple analytic functions use single sort)  http://asktom.oracle.com/pls/asktom/f?p=100:11:0:: NO::P11_QUESTION_ID:1137250200346660664  http://jonathanlewis.wordpress.com/2009/09/07/an alytic-agony/
  • 28.
    Recap of AnalyticFunctions  Analytic Functions:  Were introduced in 8.1.6 (~1998)  Are supported within PL/SQL only from 10g. Use "view" or "dynamic sql" older versions.  Compute the 'aggregates' while preserving the 'details'  Eliminate the need for self-joins or multiple passes on the same table  Reduce the amount of data transferred between DB and client.  Can be used only in SELECT and ORDER BY clauses. Use sub- queries if there is a need to filter.  Are computed at the end - after join, where, group by, having clauses
  • 29.
    Advanced Aggregation  GROUPBY col1, col2  GROUP BY ROLLUP(col1, col2)  GROUP BY CUBE(col1, col2)  GROUP BY GROUPING SETS ((col1, col2), col1)
  • 30.
    ROLLUP  GROUP BYROLLUP(col1, col2)  Generates subtotals automatically  Generally used in hierarchical dimensions (region, state, city), (year, quarter, month, day)  n + 1 different groupings where n is the number of expressions in the ROLLUP operator in the GROUP BY clause.  Order of the columns in ROLLUP matter. ROLLUP(col1, col2), ROLLUP(col2, col1) produce different outputs
  • 31.
    CUBE  GROUP BYCUBE(col1, col2)  Gives subtotals automatically for every possible combination  Used in cross-tabular reports.  Suitable when dimensions are independent of each other  2n different groupings where n is the number of expressions in the CUBE operator in the GROUP BY clause.  Have to be careful with higher values for n  Order of the columns in CUBE doesn’t really matter. CUBE(col1, col2), CUBE(col2, col1) produce same results, but in a different order.
  • 32.
    Grouping Sets  GROUPBY GROUPING SETS (col1, (col1, col2))  Explicitly lists the needed groupings  GROUPING, GROUPING_ID, GROUP_ID functions help you differentiate one grouping from the other.  Advanced aggregation functions more efficient than their UNION ALL equivalents (why?) Grouping Equivalent GROUPING SETS  advanced_agg.sql CUBE(a,b) GROUPING SETS((a,b), (a), (b), ()) ROLLUP(a,b) GROUPING SETS((a,b), (a), ()) ROLLUP(b,a) GROUPING SETS((a,b), (b), ()) ROLLUP(a) GROUPING SETS((a), ())
  • 33.
    Composite Columns  Treatmultiple columns as a single column  Composite_columns.sql
  • 34.
    Concatenated Groupings  GROUPBY GROUPING SETs (a,b), GROUPING SETS (c,d)  The above is same as GROUP BY GROUPING SETS ((a,c), (a,d), (b,c), (b,d))
  • 35.
    References  http://orafaq.com/node/55  http://orafaq.com/node/56 http://www.orafaq.com/node/1874  http://www.morganslibrary.org/reference/a nalytic_functions.html  http://morganslibrary.org/reference/rollup. html  http://www.oracle.com/technology/oramag/ oracle/05-mar/o25dba.html  http://www.gennick.com/magic.html
  • 36.
    References  Chapter 12of "Expert Database Architecture" by Tom Kyte  Business-Savy SQL by Ganesh Variar, Oracle magazine, Mar/April 2002  http://asktom.oracle.com/pls/asktom/asktom.searc h?p_string=rock+and+roll  http://forums.oracle.com/forums/search.jspa?q=an alytic&objID=f75&dateRange=all&numResults=30& forumID=75&rankBy=10001&start=0
  • 37.
  • 38.
    Predicate merging inviews with analytics  create view v select .. over(partition by ...) from t;  select ... from v where col1 = 'A'  In some cases predicates don't get merged.  Reasons:  http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QU ESTION_ID:12864646978683#30266389821111  http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_ QUESTION_ID:1137250200346660664  http://forums.oracle.com/forums/thread.jspa?messageID=416 9151&#4169151