Tuesday, October 9, 2012

How To Use Oracle Analytic Functions in Oracle SQL

Analytic FunctionA while ago I wrote several similar articles on how to use the different Analytic functions in Oracle. I have now chosen to merge these articles into one useful article/document that you can print out or save to your computer for your own reference.
In this tutorial I will try to explain to you how to use them, through practical examples. You may have some problems following this tutorial if you do not have knowledge about Oracle’s aggregated functions.
An analytic function differs from a regular aggregated function (like i.e. SUM, AVG, MAX, MIN, etc) in the way that they return multiple rows for each group. The group of rows is called a window. In a way you can think about it as if you have a resultset from your query, and for each record there is a little window to set of other data.
All the samples in this tutorial are using the tables in the default SCOTT schema that follows a standard Oracle database installation. This schema is not always available in your database, depending on the administrative database setup. And, it might not be available in newer versions of Oracle. You can i.e. find the scripts here: Oracle SCOTT demo tables.
The common format of the use of an analytic funtion is as follows:
analytic_function(arguments) OVER (PARTITION BY ORDER BY ) []

The syntax can vary a little bit based on which of the analytic functions it is.
 If you have a big interest for details you can read about Oracle’s Analytic Functions (and everything else about Oracle SQL) in the SQL Reference, found in the Oracle Database Documentation Library .
 UPDATE 6/27/2012: I added a post on how to use the RANK() and DENSE_RANK() analytic functions: How To Rank Records in Oracle SQL

COUNT aggregate function sample

Consider this use of the COUNT aggregate function
1SQL> select  e.deptno
22          ,count(e.empno) emp_count
3from    emp e
4group by e.deptno
55  /
6DEPTNO  EMP_COUNT
7------ ----------
830          6
920          5
1010          3
As you can see, for each of the departments in the resultset you will find one number.

COUNT analytic function sample

Now, let us use the COUNT analytic function and se what happens:
1SQL> select   e.deptno
22           ,e.empno
33           ,count(e.empno) over (partition by e.deptno) emp_count
4from     emp e
5order by e.deptno, e.empno
66  /
7DEPTNO EMPNO  EMP_COUNT
8------ ----- ----------
910  7782          3
1010  7839          3
1110  7934          3
1220  7369          5
1320  7566          5
1420  7788          5
1520  7876          5
1620  7902          5
1730  7499          6
1830  7521          6
1930  7654          6
2030  7698          6
2130  7844          6
2230  7900          6
I added the empno field to the query so that you easier can see what happens in the resultset. And, as you can see the result set shows the count of the number of employees in each department…for EACH of the employees in the department.
The equivalent query using the COUNT aggregate function would be:
1SQL> select   e.deptno
22           ,e.empno
33           ,(
44               select      count(e.empno)
55               from        emp e2
66               where       e2.deptno = e.deptno
77            ) emp_count
8from     emp e
9order by e.deptno, e.empno
1010  /
11DEPTNO EMPNO  EMP_COUNT
12------ ----- ----------
1310  7782          3
1410  7839          3
1510  7934          3
1620  7369          5
1720  7566          5
1820  7788          5
1920  7876          5
2020  7902          5
2130  7499          6
2230  7521          6
2330  7654          6
2430  7698          6
2530  7844          6
2630  7900          6
Ok, so how can we practically use the COUNT analytic funtion?

COUNT analytic function, practical example

Let us pretend that the HR department wants to know how many employees there are in each department in our company, and also how many percent the count of department employees is out of the total employees. This is one way you can do this using the COUNT analytic function:
1SQL> select  distinct e.deptno
22          ,count(e.empno) over (partition by e.deptno) dept_emp_count
33          ,count(e.empno) over (partition by 1) overall_emp_count
44          ,round((
55              (count(e.empno) over (partition by e.deptno) * 100) / count(e.empno) over (partition by 1)
66          ), 2) deptno_emp_perc
7from    emp e
88  /
9DEPTNO DEPT_EMP_COUNT OVERALL_EMP_COUNT DEPTNO_EMP_PERC
10------ -------------- ----------------- ---------------
1120              5                14           35.71
1230              6                14           42.86
1310              3                14           21.43
In the query above you will see a “trick” that I use many times in my queries containing analytic function: partition by 1. This simply means every record in the resultset..

SUM aggregate function sample

Consider this use of the SUM aggregate function:
1SQL> select  e.deptno
22          ,sum(e.sal) emp_salary_sum
3from    emp e
4group by e.deptno
5order by e.deptno
66  /
7DEPTNO EMP_SALARY_SUM
8------ --------------
910           8750
1020          10875
1130           9400
As you can see, for each of the departments in the resultset you will find one number.

SUM analytic function sample

Now, let us use the SUM analytic function and se what happens:
1SQL> select   e.deptno
22           ,e.empno
33           ,e.sal
44           ,sum(e.sal) over (partition by e.deptno) emp_salary_sum
5from     emp e
6order by e.deptno, e.empno
77  /
8DEPTNO EMPNO       SAL EMP_SALARY_SUM
9------ ----- --------- --------------
1010  7782   2450.00           8750
1110  7839   5000.00           8750
1210  7934   1300.00           8750
1320  7369    800.00          10875
1420  7566   2975.00          10875
1520  7788   3000.00          10875
1620  7876   1100.00          10875
1720  7902   3000.00          10875
1830  7499   1600.00           9400
1930  7521   1250.00           9400
2030  7654   1250.00           9400
2130  7698   2850.00           9400
2230  7844   1500.00           9400
2330  7900    950.00           9400
I added the empno field to the query so that you easier can see what happens in the resultset. And, as you can see the result set shows the sum of the amount of salary for the employees in each department…for EACH of the employees in the department.
The equivalent query using the SUM aggregate function would be:
1SQL> select   e.deptno
22           ,e.empno
33           ,e.sal
44           ,(
55               select      sum(e2.sal)
66               from        emp e2
77               where       e2.deptno = e.deptno
88            ) sum_dept_salary
9from     emp e
1010  order by e.deptno, e.empno
1111  /
12DEPTNO EMPNO       SAL SUM_DEPT_SALARY
13------ ----- --------- ---------------
1410  7782   2450.00            8750
1510  7839   5000.00            8750
1610  7934   1300.00            8750
1720  7369    800.00           10875
1820  7566   2975.00           10875
1920  7788   3000.00           10875
2020  7876   1100.00           10875
2120  7902   3000.00           10875
2230  7499   1600.00            9400
2330  7521   1250.00            9400
2430  7654   1250.00            9400
2530  7698   2850.00            9400
2630  7844   1500.00            9400
2730  7900    950.00            9400
Ok, so how can we practically use the SUM analytic funtion?

SUM analytic function, practical example

Let us pretend that the HR department wants to know how much money each of departments in our company make, and also how many percent this is out of the total infome for the whole comapny. This is one way you can do this using the SUM analytic function:
1SQL> select  distinct e.deptno
22          ,sum(e.sal) over (partition by e.deptno) dept_emp_salary
33          ,sum(e.sal) over (partition by 1) overall_emp_salary
44          ,round((
55              (sum(e.sal) over (partition by e.deptno) * 100) / sum(e.sal) over (partition by 1)
66          ), 2) deptno_emp_perc
7from    emp e
88  /
9DEPTNO DEPT_EMP_SALARY OVERALL_EMP_SALARY DEPTNO_EMP_PERC
10------ --------------- ------------------ ---------------
1110            8750              29025           30.15
1220           10875              29025           37.47
1330            9400              29025           32.39
In the query above you will see a “trick” that I use many times in my queries containing analytic function: partition by 1. This simply means every record in the resultset..

AVG aggregate function sample

Consider this use of the AVG aggregate function:
1SQL> select  e.deptno
22          ,round(avg(e.sal), 2) avg_salary
3from    emp e
4group by e.deptno
5order by e.deptno
66  /
7DEPTNO AVG_SALARY
8------ ----------
910     2916.67
1020        2175
1130     1566.67
As you can see, for each of the departments in the resultset you will find one number.

AVG analytic function sample

Now, let us use the AVG analytic function and se what happens:
1SQL> select  e.deptno
22          ,e.empno
33          ,e.sal
44          ,round(avg(e.sal) over (partition by e.deptno), 2) emp_salary_avg
5from     emp e
6order by e.deptno, e.empno
77  /
8DEPTNO EMPNO       SAL EMP_SALARY_AVG
9------ ----- --------- --------------
1010      7782   2450.00        2916.67
1110      7839   5000.00        2916.67
1210      7934   1300.00        2916.67
1320      7369    800.00           2175
1420      7566   2975.00           2175
1520      7788   3000.00           2175
1620      7876   1100.00           2175
1720      7902   3000.00           2175
1830      7499   1600.00        1566.67
1930      7521   1250.00        1566.67
2030      7654   1250.00        1566.67
2130      7698   2850.00        1566.67
2230      7844   1500.00        1566.67
2330      7900    950.00        1566.67
I added the empno field to the query so that you easier can see what happens in the resultset. And, as you can see the result set shows the avg of the amount of salary for the employees in each department…for EACH of the employees in the department.
The equivalent query using the AVG aggregate function would be:
1SQL> SELECT    e.deptno
22          ,e.empno
33          ,e.sal
44          ,(
55              SELECT    ROUND(AVG(e2.sal), 2)
66              FROM    scott.emp e2
77              WHERE    e2.deptno = e.deptno
88          ) sum_dept_salary
9FROM    scott.emp e
1010  ORDER BY e.deptno, e.empno
1111  /
12DEPTNO EMPNO       SAL SUM_DEPT_SALARY
13------ ----- --------- ---------------
1410      7782   2450.00         2916.67
1510      7839   5000.00         2916.67
1610      7934   1300.00         2916.67
1720      7369    800.00            2175
1820      7566   2975.00            2175
1920      7788   3000.00            2175
2020      7876   1100.00            2175
2120      7902   3000.00            2175
2230      7499   1600.00         1566.67
2330      7521   1250.00         1566.67
2430      7654   1250.00         1566.67
2530      7698   2850.00         1566.67
2630      7844   1500.00         1566.67
2730      7900    950.00         1566.67
Ok, so how can we practically use the AVG analytic funtion?

AVG analytic function, practical example

Let us pretend that the HR department wants to know how much money, in average, each of departments in our company make, and also how many percent this is out of the total income for the whole company. This is one way you can do this using the AVG analytic function:
1SQL> select    distinct e.deptno
22          ,round(avg(e.sal) over (partition by e.deptno), 2) avg_dept_salary
33          ,round(avg(e.sal) over (partition by 1), 2) avg_overall_salary
44          ,round(((avg(e.sal) over (partition by e.deptno) * 100) / avg(e.sal) over (partition by 1)), 2) avg_perc_of_avg_tot
5from    emp e
66  /
7DEPTNO AVG_DEPT_SALARY AVG_OVERALL_SALARY AVG_PERC_OF_AVG_TOT
8------ --------------- ------------------ -------------------
910             2916.67            2073.21              140.68
1030             1566.67            2073.21               75.57
1120                2175            2073.21              104.91
In the query above you will see a “trick” that I use many times in my queries containing analytic function: partition by 1. This simply means every record in the resultset..

MIN aggregate function sample

Consider this use of the MIN aggregate function:
1SQL> SELECT    e.deptno
22          ,ROUND(MIN(e.sal), 2) min_salary
3FROM    scott.emp e
4GROUP BY e.deptno
5ORDER BY e.deptno
66  /
7DEPTNO MIN_SALARY
8------ ----------
910       1300
1020        800
1130        950
As you can see, for each of the departments in the resultset you will find one number.

MIN analytic function sample

Now, let us use the MIN analytic function and se what happens:
1SQL> SELECT    e.deptno
22          ,e.empno
33          ,e.sal
44          ,MIN(e.sal) OVER (PARTITION BY e.deptno) emp_salary_min
5FROM    scott.emp e
6ORDER BY e.deptno, e.empno
77  /
8DEPTNO EMPNO       SAL EMP_SALARY_MIN
9------ ----- --------- --------------
1010      7782   2450.00           1300
1110      7839   5000.00           1300
1210      7934   1300.00           1300
1320      7369    800.00            800
1420      7566   2975.00            800
1520      7788   3000.00            800
1620      7876   1100.00            800
1720      7902   3000.00            800
1830      7499   1600.00            950
1930      7521   1250.00            950
2030      7654   1250.00            950
2130      7698   2850.00            950
2230      7844   1500.00            950
2330      7900    950.00            950
I added the empno field to the query so that you easier can see what happens in the resultset. And, as you can see the result set shows the min salary for the employees in each department…for EACH of the employees in the department.
The equivalent query using the MIN aggregate function would be:
1SQL> SELECT    e.deptno
22          ,e.empno
33          ,e.sal
44          ,(
55              SELECT    ROUND(MIN(e2.sal), 2)
66              FROM    scott.emp e2
77              WHERE    e2.deptno = e.deptno
88          ) sum_dept_salary
9FROM    scott.emp e
1010  ORDER BY e.deptno, e.empno
1111  /
12DEPTNO EMPNO       SAL SUM_DEPT_SALARY
13------ ----- --------- ---------------
1410      7782   2450.00            1300
1510      7839   5000.00            1300
1610      7934   1300.00            1300
1720      7369    800.00             800
1820      7566   2975.00             800
1920      7788   3000.00             800
2020      7876   1100.00             800
2120      7902   3000.00             800
2230      7499   1600.00             950
2330      7521   1250.00             950
2430      7654   1250.00             950
2530      7698   2850.00             950
2630      7844   1500.00             950
2730      7900    950.00             950
Ok, so how can we practically use the MIN analytic funtion?

MIN analytic function, practical example

Let us pretend that the HR department wants to know how many employees in each department that make more than the minimum salary for the department. This is one way you can do this using the MIN analytic function (together with an outer SUM aggregate function):
1SQL> SELECT    x.deptno
22          ,x.dept_emp_salary_min
33          ,SUM(CASE
44              WHEN x.sal > dept_emp_salary_min THEN 1
55              ELSE 0
66          END) dept_emp_sal_above_min
7FROM    (
88              SELECT    e.deptno
99                      ,e.sal
1010                      ,MIN(e.sal) OVER (PARTITION BY e.deptno) dept_emp_salary_min
1111              FROM    scott.emp e
1212          ) x
1313  GROUP BY x.deptno
1414          ,x.dept_emp_salary_min
1515  /
16DEPTNO DEPT_EMP_SALARY_MIN DEPT_EMP_SAL_ABOVE_MIN
17------ ------------------- ----------------------
1820                     800                      4
1930                     950                      5
2010                    1300                      2

MAX aggregate function sample

Consider this use of the MAX aggregate function:
1SQL> SELECT    e.deptno
22          ,ROUND(MAX(e.sal), 2) max_salary
3FROM    scott.emp e
4GROUP BY e.deptno
5ORDER BY e.deptno
66  /
7DEPTNO MAX_SALARY
8------ ----------
910           5000
1020           3000
1130           2850
As you can see, for each of the departments in the resultset you will find one number.

MAX analytic function sample

Now, let us use the MAX analytic function and se what happens:
1SQL> SELECT    e.deptno
22          ,e.empno
33          ,e.sal
44          ,MAX(e.sal) OVER (PARTITION BY e.deptno) emp_salary_max
5FROM    scott.emp e
6ORDER BY e.deptno, e.empno
77  /
8DEPTNO EMPNO       SAL EMP_SALARY_MAX
9------ ----- --------- --------------
1010      7782   2450.00           5000
1110      7839   5000.00           5000
1210      7934   1300.00           5000
1320      7369    800.00           3000
1420      7566   2975.00           3000
1520      7788   3000.00           3000
1620      7876   1100.00           3000
1720      7902   3000.00           3000
1830      7499   1600.00           2850
1930      7521   1250.00           2850
2030      7654   1250.00           2850
2130      7698   2850.00           2850
2230      7844   1500.00           2850
2330      7900    950.00           2850
I added the empno field to the query so that you easier can see what happens in the resultset. And, as you can see the result set shows the max salary for the employees in each department…for EACH of the employees in the department.
The equivalent query using the MAX aggregate function would be:
1SQL> SELECT    e.deptno
22          ,e.empno
33          ,e.sal
44          ,(
55              SELECT    ROUND(MAX(e2.sal), 2)
66              FROM    scott.emp e2
77              WHERE    e2.deptno = e.deptno
88          ) max_dept_salary
9FROM    scott.emp e
1010  ORDER BY e.deptno, e.empno
1111  /
12DEPTNO EMPNO       SAL MAX_DEPT_SALARY
13------ ----- --------- ---------------
1410      7782   2450.00            5000
1510      7839   5000.00            5000
1610      7934   1300.00            5000
1720      7369    800.00            3000
1820      7566   2975.00            3000
1920      7788   3000.00            3000
2020      7876   1100.00            3000
2120      7902   3000.00            3000
2230      7499   1600.00            2850
2330      7521   1250.00            2850
2430      7654   1250.00            2850
2530      7698   2850.00            2850
2630      7844   1500.00            2850
2730      7900    950.00            2850
Ok, so how can we practically use the MAX analytic funtion?

MAX analytic function, practical example

Let us pretend that the HR department wants to know how many employees in each department that make less than the maximum salary for the department. This is one way you can do this using the MAX analytic function (together with an outer SUM aggregate function):
1SQL> SELECT    x.deptno
22          ,x.dept_emp_salary_max
33          ,SUM(CASE
44              WHEN x.sal < dept_emp_salary_max THEN 1
55              ELSE 0
66          END) dept_emp_sal_less_max
7FROM    (
88              SELECT    e.deptno
99                      ,e.sal
1010                      ,MAX(e.sal) OVER (PARTITION BY e.deptno) dept_emp_salary_max
1111              FROM    scott.emp e
1212          ) x
1313  GROUP BY x.deptno
1414          ,x.dept_emp_salary_max
1515  /
16DEPTNO DEPT_EMP_SALARY_MAX DEPT_EMP_SAL_LESS_MAX
17------ ------------------- ---------------------
1810                    5000                     2
1920                    3000                     3
2030                    2850                     5

The ROWNUM pseudocolumn

Before we start using the ROW_NUMBER analytic function, let’s look a little bit on how ROWNUM works in a query. ROWNUM is a so-called pseudocolumn. It can be added to any select statement as a field.
Here is a sample of the use of ROWNUM:
1SELECT     e.deptno
2    ,e.hiredate
3    ,e.empno
4    ,e.ename
5    ,ROWNUM
6FROM scott.emp e 
7
8Resultset
9DEPTNO    HIREDATE     EMPNO     ENAME ROWNUM
1020     12/17/1980     7369     SMITH 1
1130     2/20/1981     7499     ALLEN 2
1230     2/22/1981     7521     WARD 3
1320     4/2/1981     7566     JONES 4
1430     9/28/1981     7654     MARTIN 5
1530     5/1/1981     7698     BLAKE 6
1610     6/9/1981     7782     CLARK 7
1720     4/19/1987     7788     SCOTT 8
1810     11/17/1981     7839     KING 9
1930     9/8/1981     7844     TURNER 10
2020     5/23/1987     7876     ADAMS 11
2130     12/3/1981     7900     JAMES 12
2220     12/3/1981     7902     FORD 13
2310     1/23/1982     7934     MILLER 14
The ROWNUM field just shows the row number in the resultset that is returned from the query. If you i.e. wanted to only see the first five records in the resultset you could do like this:
1SELECT e.deptno
2 ,e.hiredate
3 ,e.empno
4 ,e.ename
5 ,ROWNUM
6FROM scott.emp e
7WHERE ROWNUM <= 5
 Note that if you want to do an ORDER BY you will need to do an ORDER BY in the query above, and then use ROWNUM in a query that you wrap around the inner query…otherwise the ORDER BY will not work correctly.

The ROW_NUMBER analytic function usage and sample

The ROW_NUMBER analytic function is a bit different than the ROWNUM pseudocolumn. When you use ROWNUM you get the row number based on the WHOLE resultset. When you use the ROW_NUMBER analytical function you will get a row number based on what is in the PARTITION BY and the ORDER BY segments of the expression. Let’s look at a sample. 
1SELECT     e.deptno
2    ,e.hiredate
3    ,e.empno
4    ,e.ename
5    ,ROW_NUMBER() OVER (PARTITION BY e.deptno ORDER BY e.hiredate) rownumber
6FROM     scott.emp e
7
8Resultset
9DEPTNO     HIREDATE     EMPNO     ENAME     ROWNUMBER
1010     6/9/1981     7782     CLARK     1
1110     11/17/1981     7839     KING     2
1210     1/23/1982     7934     MILLER     3
1320     12/17/1980     7369     SMITH     1
1420     4/2/1981     7566     JONES     2
1520     12/3/1981     7902     FORD     3
1620     4/19/1987     7788     SCOTT     4
1720     5/23/1987     7876     ADAMS     5
1830     2/20/1981     7499     ALLEN     1
1930     2/22/1981     7521     WARD     2
2030     5/1/1981     7698     BLAKE     3
2130     9/8/1981     7844     TURNER     4
2230     9/28/1981     7654     MARTIN     5
2330     12/3/1981     7900     JAMES     6
In the above sample the employees are ranged within their department when they were hired. The first hire in the department is number 1, etc.
So, if we only want to see i.e. the first employee hired in each department, we can do like this:
1SELECT    x.deptno
2    ,x.empno
3    ,x.ename
4    ,x.hiredate
5    ,x.rownumber
6FROM (
7    SELECT    e.deptno
8        ,e.hiredate
9        ,e.empno
10        ,e.ename
11        ,ROW_NUMBER() OVER (PARTITION BY e.deptno ORDER BY e.hiredate) rownumber
12    FROM    scott.emp e
13     ) x
14WHERE rownumber = 1
Please note that it might be tempting to do this:
1SELECT    e.deptno
2    ,e.hiredate
3    ,e.empno
4    ,e.ename
5    ,ROW_NUMBER() OVER (PARTITION BY e.deptno ORDER BY e.hiredate) rownumber
6FROM    scott.emp e
7WHERE    ROW_NUMBER() OVER (PARTITION BY e.deptno ORDER BY e.hiredate) = 1
But, this will cause the following error: “ORA-30483: window functions are not allowed here”.

No comments :

Post a Comment