Ranges, ranges everywhere (Oracle SQL)

Ranges, Ranges Everywhere!
Stew Ashton (stewashton.wordpress.com)
UKOUG Tech 2016
Can you read the following line? If not, please move closer.
It's much better when you can read the code ;)

Agenda
• Defining ranges
• Relating ranges: gaps, overlaps
• Range DDL: sensible data
• Ranges in one table
• Ranges in two tables
2

Who am I?
• 36 years in IT
– Developer, Technical Sales Engineer, Technical Architect
– Aeronautics, IBM, Finance
– Mainframe, client-server, Web apps
• 12 years using Oracle database
– SQL performance analysis
– Replace Java with SQL
• 4 years as in-house “Oracle Development Expert”
• Conference speaker since 2014
• Currently independent
3

What is a range?
• Two values that can be compared
– Always use the same datatype 
– Comparable datatypes:
• integer, date (without time)
• number, datetime, interval, (n)(var)char
• rowid
• Range design questions:
– Is the "end" value part of the range?
– Are NULLs allowed?
5

Allen’s
Interval
Algebra
6
1 2 3 4
A precedes B 1 2
B preceded by A 3 4
A meets B 1 2
B met by A 2 3
A overlaps B 1 3
B overlapped by A 2 4
A finished by B 1 3
B finishes A 2 3
A contains B 1 4
B during A 2 3
A starts B 1 2
B started by A 1 3
A and B 1 2
are equal 1 2
Meet
Gap
"Overlap"
1 2 3 41 2 3 4
A precedes B 1 2
B preceded by A 3 4
1 2 3 4
A precedes B 1 2
B preceded by A 3 4
A meets B 1 2
B met by A 2 3

End value: Inclusive or Exclusive
• Design must allow ranges to "meet"
• Discrete quantities can be inclusive
– [1-3] meets [4-6] : no intermediate integer
– [Jan. 1-31] meets [Feb. 1-28] : no intermediate date
• Continuous quantities require exclusive
– Most ranges are continuous (including dates, really)
7

Votes for Exclusive end values
• SQL:2013 and Oracle 12c Temporal Validity
– "Period": date/time range
• [Closed-Open): includes start time but not end time
• WIDTH_BUCKET() function
– Puts values in equiwidth histogram
– Buckets must touch
– [Closed-open): upper boundary value goes in higher bucket
• Me!
– Exclusive end values work for every kind of range
– Except: ROWID ranges must be inclusive
8

DDL: make sure data is sensible
• Start_range < End_range
• If date without time, CHECK( dte = trunc(dte))
• If integer, say so
• Is NULL allowed?
– If so, what does it mean?
– Ex. Temporal Validity :
NULL end value means "until the end of time"
• Are overlaps allowed?
9

Overlaps avoided by unique constraints
10
Unique(start,end) Unique(start) Unique(end) 1 2 3 4
No constraint works
A overlaps B 1 3
Y
A finished by B 1 3
B finishes A 2 3
No constraint works
A contains B 1 4
B during A 2 3
Y
A starts B 1 2
B started by A 1 3
Y Y Y
A and B 1 2
are equal 1 2

Avoiding Overlaps: 3 solutions
1. Triggers
– Hard to do right, not very scalable
2. "Refresh on commit" materialized views
– Not scalable?
3. Virtual ranges
11

Virtual range: no gaps, no overlaps
• One column: start value
• End value is calculated:
= next row's start
– Putting identical value in 2
rows is denormalization
• Last row has unlimited
end
• Maybe OK for audit trails?
START_VALUE END_VALUE
16-11-15 08:30 16-11-15 09:30
16-11-15 09:30 16-11-15 18:30
16-11-15 18:30 (null)
12
START_VALUE
16-11-15 08:30
16-11-15 09:30
16-11-15 18:30
Physical (table)
Virtual (view)

Semi-Virtual range: no overlaps
• Start column always used
• End column optional:
– If null, use next row's start
– If not null, use lesser of end
column and next row's start
– Last row can have limited end
• Or: intermediate row with
'not exists' flag
– ≅ Change Data Capture
format
13
START_VALUE END_VALUE
16-11-15 08:30 16-11-15 09:30
16-11-15 18:30 (null)
START_VALUE D
16-11-15 08:30
16-11-15 09:30 D
16-11-15 18:30

Range-related SQL
• Why hard?
– Can't use BETWEEN
– Inequality joins impact performance
– With overlaps, 1 value point can be in any number of rows
– Joining 2 tables with overlaps -> row explosion
– NULLs have special meanings
• Common problems
– Find gaps
– Intersect: find overlaps
– Union: packing ranges between gaps
– Joins
• Today, ends are exclusive, everything is NOT NULL (unless specified)
14

FROM_TM TO_TM
07:00 08:00
09:00 10:50
10:00 10:45
12:00 12:45
18:00 23:00
select * from (
select
max (to_tm) over(order by from_tm)
as gap_from,
lead(from_tm) over(order by from_tm)
as gap_to
from t
) where gap_from < gap_to;
select
to_tm
as gap_from,
lead(from_tm) over(order by from_tm)
as gap_to
from t
FROM_TM GAP_FROM GAP_TO
07:00 08:00 09:00
09:00 10:50 10:00
10:00 10:45 12:00
12:00 12:45 18:00
18:00 23:00
GAP_FROM GAP_TO
08:00 09:00
10:50 12:00
12:45 18:00
Gaps, ex. Free time in calendar
16
FROM_TM GAP_FROM GAP_TO
07:00 08:00 09:00
09:00 10:50 10:00
10:00 10:50 12:00
12:00 12:45 18:00
18:00 23:00

Intersect: finding Overlaps
17
Test case Start End
01:precedes 1 2
01:precedes 3 4
02:meets 1 2
02:meets 2 3
03:overlaps 1 3
03:overlaps 2 4
04:finished by 1 3
04:finished by 2 3
05:contains 1 4
05:contains 2 3
06:starts 1 2
06:starts 1 3
07:equals 1 2
07:equals 1 2
select test_case, dte, col
from t
unpivot (dte for col in (
start_date as 1, end_date as -1))
A overlaps B 1 3
1 2
2 3
3 4

select test_case, dte, col
from t
select test_case, dte "Start",
lead(dte,1,dte) over(
partition by test_case
order by dte, col desc
) "End",
sum(col) over(
) "Rows"
from t
18
Test case Dte Col
01:precedes 1 1
01:precedes 2 -1
01:precedes 3 1
01:precedes 4 -1
02:meets 1 1
02:meets 2 -1
02:meets 2 1
02:meets 3 -1
03:overlaps 1 1
03:overlaps 3 -1
03:overlaps 2 1
03:overlaps 4 -1

) "End",
sum(col) over(
) "Rows"
from t
select * from (
) "End",
sum(col) over(
) "Rows"
from t
) where
"Start" < "End";
19
Test case Start End Rows
01:precedes 1 2 1
01:precedes 2 3 0
01:precedes 3 4 1
01:precedes 4 4 0
02:meets 1 2 1
02:meets 2 2 2
02:meets 2 3 1
02:meets 3 3 0
03:overlaps 1 2 1
03:overlaps 2 3 2
03:overlaps 3 4 1
03:overlaps 4 4 0
✖
✖
✖
✖

select * from (
) "End",
sum(col) over(
) "Rows"
from t
) where
"Start" < "End";
select * from (
) "End",
sum(col) over(
) "Rows"
from t
) where "Rows" > 1
and "Start" < "End";
20
01:precedes 1 2 1
01:precedes 2 3 0
01:precedes 3 4 1
02:meets 1 2 1
02:meets 2 3 1
03:overlaps 1 2 1
03:overlaps 2 3 2
03:overlaps 3 4 1
03:overlaps 2 3 2
04:finished by 2 3 2
05:contains 2 3 2
06:starts 1 2 2
07:equals 1 2 2

Test case Start End
01:precedes 1 2
01:precedes 3 4
02:meets 1 2
02:meets 2 3
03:overlaps 1 3
03:overlaps 2 4
04:finished by 1 3
04:finished by 2 3
05:contains 1 4
05:contains 2 3
06:starts 1 2
06:starts 1 3
07:equals 1 2
07:equals 1 2
Packing Ranges
21
Test case Start End
01:precedes 1 2
01:precedes 3 4
02:meets 1 3
03:overlaps 1 4
04:finished by 1 3
05:contains 1 4
06:starts 1 3
07:equals 1 2
Test case Start End
01:precedes 1 2
01:precedes 3
02:meets 1
03:overlaps 1
04:finished by 1
05:contains 1
06:starts 1
07:equals 1
select * from t
match_recognize(
order by end_date, start_date
measures min(start_date) start_date,
last(end_date) end_date
pattern(a* b)
define a as end_date >= next(start_date)
);
select * from t
match_recognize(
order by end_date, start_date
measures min(start_date) start_date,
last(end_date) end_date
pattern(a* b)
define a as end_date >= next(start_date)
or end_date is null
);

JOIN: range to range
22
> create table A(start_n, end_n) as
select level, level+1 from dual
connect by level <= 10000;
> create table B as
select start_n+9995 start_n,
end_n+9996 end_n
from A;
> select * from A
join B
on (A.start_n <= B.start_n
and B.start_n < A.end_n)
or (B.start_n <= A.start_n
and A.start_n < B.end_n);
Elapsed: 00:00:13.332
Exadata?
All data in buffer cache
Elapsed: 00:00:13.332
InMemory?
Elapsed: 00:00:09.842

JOIN: range to range
23
------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:17.82 | 90 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:17.82 | 90 |
| 2 | CONCATENATION | | 1 | | 10 |00:00:00.01 | 90 |
| 3 | MERGE JOIN | | 1 | 55 | 10 |00:00:00.01 | 45 |
| 4 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.01 | 24 |
| 5 | TABLE ACCESS FULL | T_NEW | 1 | 10000 | 10000 |00:00:00.01 | 24 |
|* 6 | FILTER | | 10000 | | 10 |00:00:00.01 | 21 |
|* 7 | SORT JOIN | | 10000 | 10000 | 55 |00:00:00.01 | 21 |
| 8 | TABLE ACCESS FULL| T_OLD | 1 | 10000 | 10000 |00:00:00.02 | 21 |
| 9 | MERGE JOIN | | 1 | 55 | 0 |00:00:17.80 | 45 |
| 10 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.02 | 24 |
| 11 | TABLE ACCESS FULL | T_NEW | 1 | 10000 | 10000 |00:00:00.01 | 24 |
|* 12 | FILTER | | 10000 | | 0 |00:00:17.78 | 21 |
|* 13 | SORT JOIN | | 10000 | 10000 | 99M|00:01:21.50 | 21 |
| 14 | TABLE ACCESS FULL| T_OLD | 1 | 10000 | 10000 |00:00:00.01 | 21 |
------------------------------------------------------------------------------------------

Join, or Sort and Match?
24
A 1 4
B is equal 1 4
B started by A 1 5
B during A 2 3
B finishes A 3 4
B overlapped by A 3 4 5
B met by A 4 5
B preceded by A 5 6
another A 5 7
✔
✖
?
✔
✔
✔
✔

Join, or Sort and Match?
25
A 1 4
B is equal 1 4
B started by A 1 5
B during A 2 3
B finishes A 3 4
B overlapped by A 3 4 5
B met by A 4 5
B preceded by A 5 6
another A 5 7
✖
?
3
3
3
3

26
select A_start_n, A_end_n, B_start_n, B_end_n from (
select 'A' ttype, A.* from A
union all
select 'B' ttype, B.* from B
) match_recognize (
order by start_n, end_n
measures decode(f.ttype,'A',f.start_n, o.start_n) A_start_n,
decode(f.ttype,'A',f.end_n, o.end_n) A_end_n,
decode(f.ttype,'B',f.start_n, o.start_n) B_start_n,
decode(f.ttype,'B',f.end_n, o.end_n) B_end_n
all rows per match
after match skip to next row
pattern ( {-f-} (o|{-x-})+ )
define o as ttype != f.ttype and start_n < f.end_n,
x as start_n < f.end_n
);
Elapsed: 00:00:00.063
{- exclusion -}
( grouping )
+ at least one
Alternation A | B
✔
✔

More!
• Overlapping ranges with priority
• Data warehouses with date ranges:
– Trickle feed
• Impact on foreign keys
• OLTP
• Take advantage of MATCH_RECOGNIZE ,
28

Ranges, ranges everywhere (Oracle SQL)

More Related Content

Similar to Ranges, ranges everywhere (Oracle SQL)

Recently uploaded

Ranges, ranges everywhere (Oracle SQL)