Analytic & Windowing functions in oracle

Analytic and Window Functions in Oracle

Logan Palanisamy

Agenda

 Difference between aggregate and analytic
functions
 Introduction to various analytic functions
 Functions that are both aggregate and
analytic
 Break
 More examples
 Enhanced Aggregation (CUBE, ROLLUP)

Meeting Basics

 Put your phones/pagers on vibrate/mute
 Messenger: Change the status to offline or
in-meeting
 Remote attendees: Mute yourself (*6). Ask
questions via Adobe Connect.

Aggregates vs. Analytics
 Aggregate functions
 Rows are collapsed. One row per group

 Non-Group-By columns not allowed in SELECT list.

 Analytic functions
 Rows are not collapsed

 As many rows in the output as in the input

 No restrictions on the columns in the SELECT list

 Evaluated after joins, WHERE, GROUP BY, HAVING clauses

 Nesting not allowed

 Can appear only in the SELECT or ORDER BY clause

 analytic_aggr_diff.sql

Analytics vs. other methods
 Show the dept, empno, sal and the sum of all salaries
in their dept
 Three possible ways
 Using Joins
 Using Scalar Sub-queries
 Using Analytic Functions

 analytics_vs_others.sql

Anatomy of an analytic funcion
 function (arg1, ..., argN) OVER ([partition_by_clause]
[order_by_clause [windowing_clause]])
 The OVER keyword
 partition_by_clause: Optional. Not related to table/index
partitions. Analogous to GROUP BY
 order_by_clause: Mandatory for Ranking and Windowing
functions. Optional or meaningless for others
 windowing_clause: Optional. Should always be preceded by
ORDER BY clause

Types of analytical functions
 Ranking functions
 FIRST_VALUE/LAST_VALUE/NTH_VALUE
 Windowing functions
 Reporting functions
 LAG/LEAD
 FIRST/LAST

Ranking Functions
 ROW_NUMBER()
 RANK() – Skips ranks after duplicate ranks
 DENSE_RANK() – Doesn't skip rank after duplicate ranks
 NTILE(n) – Sorts the rows into N equi-sized buckets
 CUME_DIST() – % of rows with values lower or equal
 PERCENT_RANK() - (rank of row -1)/(#rows – 1)
 function OVER ([PARTITION BY <c1,c2..>] ORDER BY
<c3, ..>)
 PARTITION BY clause: Optional
 ORDER BY clause: Mandatory
 rank_dense_rank.sql

FIRST_VALUE/LAST_VALUE/NTH_VALUE

 Returns the first/last/nth value from an ordered set
 FIRST_VALUE(expr, [IGNORE NULLS]) OVER
([partitonby_clause] orderby_clause)
 IGNORE NULLS options helps you "carry forward".
Often used in "Data Densification"
 Operates on Default Window (unbounded preceding
and current row) when a window is not explicitly
specified.
 NTH_VALUE introduced in 11gR2
 flnth_value.sql

Window functions

 Used for computing cumulative/running totals (YTD, MTD,
QTD), moving/centered averages
 function(args) OVER([partition_by_clause] order_by_clause
[windowing_clause])
 ORDER BY clause: mandatory.
 Windowing Clause: Optional. Defaults to: UNBOUNDED
PRECEDING and CURRENT ROW
 anchored or sliding windows
 Two ways to specify windows: ROWS, RANGE
 [ROW | RANGE ] BETWEEN <start_exp> AND <end_exp>
 window.sql

ROWS type windows

 Physical offset. Number of rows before or after current
row
 Non deterministic results if rows are not sorted uniquely
 Any number of columns in the ORDER BY clause
 ORDER BY columns can be of any type
 function(args) OVER ([partition_by_clause] order by c1,
.., cN ROWS between <start_exp> and <end_exp>)
 windows_rows.sql

RANGE type Windows

 Logical offset
 non-unique rows treated as one logical row
 Only one column allowed in ORDER BY clause
 ORDER BY column should be numeric or date
 function(args) OVER ([partition_by_clause] order
by c1 RANGE between <start_exp> and <end_exp>)
 windows_range.sql

Reporting function

 Computes the ratio of a value to the sum of a set of
values
 RATIO_TO_REPORT(arg) OVER ([PARTITION BY
<c1, .., cN>]
 PARTITION BY clause: Optional.
 ratio_to_report.sql

LAG/LEAD

 Gives the ability to access other rows without self-join.
 Allows you to treat cursor as an array
 Useful for making inter-row calculations (year-over-year
comparison, time between events)
 LEAD (expr, <offset>, <default value>) [IGNORE
NULLS] OVER ([partioning_clause] orderby_clause)
 Physical offset. Can be fixed or varying. default offset is 1
 default value: value returned if offset points to a non-
existent row
 IGNORE NULLS determines whether null values of are
included or eliminated from the calculation.
 lead_lag.sql

FIRST/LAST

 Very different from FIRST_VALUE/LAST_VALUE
 Returns the results of aggregate/analytic function applied
on column B on the first or last ranked rows sorted by
column A
 function (expr_with_colB) KEEP (DENSE_RANK
FIRST/LAST ORDER BY colA) [OVER
(<partitioning_clause)>)]
 Slightly different syntax. Note the word KEEP
 analytic clause is optional.
 first_last.sql

Above/Below average calculation
 Find the list of employees whose salary is higher than
the department average.
 above_average.sql

Top-N queries
 Find the full details of "set of" employees with the top-
N salaries
 Find the two most recent hires in each department
 List the names and employee count of departments
with the highest employee count
 top_n.sql

Top-N% queries
 List the top 5% of the customers by revenue
 top_np.sql

Multi Top-N queries
 For each customer, find out
 the maximum sale in the last 7 days

 the date of that sale

 the maximum sale in the last 30 days

 the date of that sale

 multi_top.sql

De-Duping
 Deleting duplicate records
 dedup.txt

Hypothetical Ranking
 CUME_DIST, DENSE_RANK, RANK,
PERCENT_RANK
 Used for what-if analysis
 hypothetical_rank.sql

Inverse Percentile functions

 Return the value corresponding to a certain
percentile (opposite of CUME_DIST)
 PERCENTILE_CONT (continuous)
 PERCENTILE_DISC (discrete)
 PERCENTILE_CONT(0.5) is the same as MEDIAN
 inverse_p.sql

String Aggregation: LISTAGG, STRAGG

 Concatenated string of values for a particular group
(e.g. employees working in a dept)
 Tom Kyte's STRAGG
 11gR2 has LISTAGG
 10g has COLLECT
 listagg.sql

Pivoting/Unpivoting
 Pivoting
 transposes rows to columns

 DECODE/CASE and GROUP BY used

 Unpivoting
 Converts columns to rows

 Join the base table with a one column serial number table

 11gR2 introduced PIVOT and UNPIVOT clauses to
SELECT
 pivot.sql

Data Densification

 Data normally stored in sparse form (e.g. No rows
if there is no sales for a particular period)
 Missing data needed for comparison (e.g. month-
over-month comparison)
 Data Densification comes in handy
 LAG (col, INGORE NULLS), and PARTITION BY
OUTER JOIN are used.
 http://hoopercharles.wordpress.com/2009/12/07
/sql-filling-in-gaps-in-the-source-data/

When not to use analytics
 When a simple group by would do the job
 when_not_to_use_analytics.sq

Drawback of analytics
 Lot of sorting.
 Set
PGA_AGGREGATE_TARGET/SORT_AREA_SIZE
appropriately
 New versions reduce the number of sorts (same
partition_by and order_by clauses on multiple
analytic functions use single sort)
 http://asktom.oracle.com/pls/asktom/f?p=100:11:0::
NO::P11_QUESTION_ID:1137250200346660664
 http://jonathanlewis.wordpress.com/2009/09/07/an
alytic-agony/

Recap of Analytic Functions
 Analytic Functions:
 Were introduced in 8.1.6 (~1998)

 Are supported within PL/SQL only from 10g. Use "view" or
"dynamic sql" older versions.
 Compute the 'aggregates' while preserving the 'details'

 Eliminate the need for self-joins or multiple passes on the same
table
 Reduce the amount of data transferred between DB and client.

 Can be used only in SELECT and ORDER BY clauses. Use sub-
queries if there is a need to filter.
 Are computed at the end - after join, where, group by, having
clauses

Advanced Aggregation

 GROUP BY col1, col2
 GROUP BY ROLLUP(col1, col2)
 GROUP BY CUBE(col1, col2)
 GROUP BY GROUPING SETS ((col1, col2), col1)

ROLLUP

 GROUP BY ROLLUP(col1, col2)
 Generates subtotals automatically
 Generally used in hierarchical dimensions (region,
state, city), (year, quarter, month, day)
 n + 1 different groupings where n is the number of
expressions in the ROLLUP operator in the GROUP
BY clause.
 Order of the columns in ROLLUP matter.
ROLLUP(col1, col2), ROLLUP(col2, col1) produce
different outputs

CUBE

 GROUP BY CUBE(col1, col2)
 Gives subtotals automatically for every possible combination
 Used in cross-tabular reports.
 Suitable when dimensions are independent of each other
 2n different groupings where n is the number of expressions
in the CUBE operator in the GROUP BY clause.
 Have to be careful with higher values for n
 Order of the columns in CUBE doesn’t really matter.
CUBE(col1, col2), CUBE(col2, col1) produce same results, but
in a different order.

Grouping Sets

 GROUP BY GROUPING SETS (col1, (col1, col2))
 Explicitly lists the needed groupings
 GROUPING, GROUPING_ID, GROUP_ID functions
help you differentiate one grouping from the other.
 Advanced aggregation functions more efficient than
their UNION ALL equivalents (why?)
Grouping Equivalent GROUPING SETS
 advanced_agg.sql
CUBE(a,b) GROUPING SETS((a,b), (a), (b), ())
ROLLUP(a,b) GROUPING SETS((a,b), (a), ())
ROLLUP(b,a) GROUPING SETS((a,b), (b), ())
ROLLUP(a) GROUPING SETS((a), ())

Composite Columns

 Treat multiple columns as a single column
 Composite_columns.sql

Concatenated Groupings

 GROUP BY GROUPING SETs (a,b), GROUPING
SETS (c,d)
 The above is same as GROUP BY GROUPING SETS
((a,c), (a,d), (b,c), (b,d))

References

 http://orafaq.com/node/55
 http://orafaq.com/node/56
 http://www.orafaq.com/node/1874
 http://www.morganslibrary.org/reference/a
nalytic_functions.html
 http://morganslibrary.org/reference/rollup.
html
 http://www.oracle.com/technology/oramag/
oracle/05-mar/o25dba.html
 http://www.gennick.com/magic.html

References

 Chapter 12 of "Expert Database Architecture"
by Tom Kyte
 Business-Savy SQL by Ganesh Variar, Oracle
magazine, Mar/April 2002
 http://asktom.oracle.com/pls/asktom/asktom.searc
h?p_string=rock+and+roll
 http://forums.oracle.com/forums/search.jspa?q=an
alytic&objID=f75&dateRange=all&numResults=30&
forumID=75&rankBy=10001&start=0

Predicate merging in views with analytics
 create view v select .. over(partition by ...) from t;
 select ... from v where col1 = 'A'
 In some cases predicates don't get merged.
 Reasons:
 http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QU
ESTION_ID:12864646978683#30266389821111
 http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_
QUESTION_ID:1137250200346660664
 http://forums.oracle.com/forums/thread.jspa?messageID=416
9151&#4169151

Analytic & Windowing functions in oracle

More Related Content

What's hot

Similar to Analytic & Windowing functions in oracle

Recently uploaded

Analytic & Windowing functions in oracle