Monday, October 8, 2012

How To Rank Records in Oracle SQL



Oracle Rank Analytic FunctionA long time ago I put together a tutorial (How To Use Oracle Analytic Functions in Oracle SQL) on how to use some of the more common Oracle Analytic Functions. In this tutorial I am going to discuss the RANK()  and the DENSE_RANK() analytic functions. I chose to put this in a separate post, since it is more advanced than the other analytic functions I covered in the mentioned previous post.

Introduction

In the examples in this little tutorial I will be using the tables found in the “OE” (Order Entry) sample schema that comes with most Oracle Installations. You can i.e. read more about it here: Oracle® Database Sample Schemas
Ok, so let us pretend that we have gotten the task to look at the data for a given year, and then rank the customers with the highest order total, and we want to do this for each month.

Example # 1: Using the RANK() analytic function

If you want to present your data with people having the same order total show as ties, RANK() is the analytic function you should use.
The syntax for using the RANK() analytic function is as follows:
RANK() OVER ( [ query_partition_clause] ORDER BY clause )

1SQL> set pagesize 500
2SQL>
3SQL> SELECT  x.customer_id
4  2         ,x.mnth
5  3          ,x.order_total
6  4          ,RANK() OVER (PARTITION BY x.mnth ORDER BY x.order_total DESC) month_rank
7  FROM   (
8  6              SELECT o.customer_id
9  7                      ,EXTRACT (MONTH FROM o.order_date) mnth
10  8                      ,ROUND(SUM(o.order_total), -3) order_total
11  9              FROM   oe.orders o
12 10              WHERE  EXTRACT (YEAR FROM o.order_date) = 2007
13 11              GROUP BY o.customer_id, EXTRACT (MONTH FROM o.order_date)
14 12         ) x
15 13  ORDER BY x.mnth, 4 ASC
16 14  ;
17
18CUSTOMER_ID       MNTH ORDER_TOTAL MONTH_RANK
19----------- ---------- ----------- ----------
20        122          2      104000          1
21        119          2       16000          2
22        109          3       73000          1
23        149          3       53000          2
24        107          3       32000          3
25        108          3       30000          4
26        102          3       11000          5
27        105          3        2000          6
28        104          3           0          7
29        147          4        2000          1
30        118          5       79000          1
31        143          5       27000          2
32        142          5       26000          3
33        146          5       18000          4
34        169          5       16000          5
35        150          6      283000          1
36        141          6       38000          2
37        123          6       11000          3
38        116          6       11000          3
39        165          6        3000          5
40        148          6        3000          5
41        145          6        1000          7
42        167          6           0          8
43        146          6           0          8
44        166          6           0          8
45        117          7      104000          1
46        109          7       78000          2
47        159          7       69000          3
48        154          7       27000          4
49        163          7        1000          5
50        161          7        1000          5
51        160          7        1000          5
52        164          7        1000          5
53        162          7           0          9
54        101          8       78000          1
55        107          8       71000          2
56        147          8       37000          3
57        146          8       27000          4
58        105          8       22000          5
59        145          8       10000          6
60        149          9      269000          1
61        144          9       62000          2
62        145          9       14000          3
63        102          9       11000          4
64        116          9        6000          5
65        104          9        5000          6
66        103          9           0          7
67        101         10       30000          1
68        118         10       22000          2
69        149         10       13000          3
70        144         10       10000          4
71        103         10        7000          5
72        116         10           0          6
73        108         11      120000          1
74        109         11       93000          2
75        117         11       50000          3
76        102         11       42000          4
77        158         11       25000          5
78        116         11       15000          6
79        146         11       14000          7
80        145         11       10000          8
81        149         11        9000          9
82        152         11        8000         10
83        157         11        7000         11
84        107         11        1000         12
85        147         12      296000          1
86        148         12       32000          2
87        144         12       11000          3
88
8968 rows selected
NOTE:  To be able to show you how the data ties, I wrapped the ROUND function around the order_total field like this: ROUND(SUM(o.order_total), -3) .
As you can see, we have some ties.  Take a look at i.e. month # 6, and month # 7.
Also, if you look at month # 7, you will see that the rank numbers are consecutive:
1SQL> SELECT  x.customer_id
2  2         ,x.mnth
3  3          ,x.order_total
4  4          ,RANK() OVER (PARTITION BY x.mnth ORDER BY x.order_total DESC) month_rank
5  FROM   (
6  6              SELECT o.customer_id
7  7                      ,EXTRACT (MONTH FROM o.order_date) mnth
8  8                      ,ROUND(SUM(o.order_total), -3) order_total
9  9              FROM   oe.orders o
10 10              WHERE  EXTRACT (YEAR FROM o.order_date) = 2007
11 11              GROUP BY o.customer_id, EXTRACT (MONTH FROM o.order_date)
12 12         ) x
13 13  WHERE  x.mnth = 7
14 14  ORDER BY x.mnth, 4 ASC
15 15  ;
16
17CUSTOMER_ID       MNTH ORDER_TOTAL MONTH_RANK
18----------- ---------- ----------- ----------
19        117          7      104000          1
20        109          7       78000          2
21        159          7       69000          3
22        154          7       27000          4
23        163          7        1000          5
24        161          7        1000          5
25        160          7        1000          5
26        164          7        1000          5
27        162          7           0          9
28
299 rows selected
The rank number show # 4, then four of rank # 5, and then rank # 9.

Example # 2: Using the DENSE_RANK() analytic function

Some times you do not want to show the rank numbers as consecutive. If this is the case for you, you should use the DENSE_RANK() analytic function.
The syntax for using the DENSE_RANK() analytic function is as follows:
DENSE_RANK() OVER ( [ query_partition_clause] ORDER BY clause )

1SQL> SELECT  x.customer_id
2  2         ,x.mnth
3  3          ,x.order_total
4  4          ,DENSE_RANK() OVER (PARTITION BY x.mnth ORDER BY x.order_total DESC) month_rank
5  FROM   (
6  6              SELECT o.customer_id
7  7                      ,EXTRACT (MONTH FROM o.order_date) mnth
8  8                      ,ROUND(SUM(o.order_total), -3) order_total
9  9              FROM   oe.orders o
10 10              WHERE  EXTRACT (YEAR FROM o.order_date) = 2007
11 11              GROUP BY o.customer_id, EXTRACT (MONTH FROM o.order_date)
12 12         ) x
13 13  WHERE  x.mnth = 7
14 14  ORDER BY x.mnth, 4 ASC
15 15  ;
16
17CUSTOMER_ID       MNTH ORDER_TOTAL MONTH_RANK
18----------- ---------- ----------- ----------
19        117          7      104000          1
20        109          7       78000          2
21        159          7       69000          3
22        154          7       27000          4
23        163          7        1000          5
24        161          7        1000          5
25        160          7        1000          5
26        164          7        1000          5
27        162          7           0          6
28
299 rows selected
If you look at the rank numbers in example # 2, you will see they will go like this: # 4, four of rank # 5, and then rank # 6. The rank numbers are then non-consecutive.

Example # 3: What if I do not want to show any ties?

In some situations you will not want to show any ties at all in your list. How do we do this? One answer is to actually use the ROW_NUMBER() analytic function like this:
1SQL> SELECT  x.customer_id
2  2         ,x.mnth
3  3          ,x.order_total
4  4          ,ROW_NUMBER() OVER (PARTITION BY x.mnth ORDER BY x.order_total DESC) month_rank
5  FROM   (
6  6              SELECT o.customer_id
7  7                      ,EXTRACT (MONTH FROM o.order_date) mnth
8  8                      ,ROUND(SUM(o.order_total), -3) order_total
9  9              FROM   oe.orders o
10 10              WHERE  EXTRACT (YEAR FROM o.order_date) = 2007
11 11              GROUP BY o.customer_id, EXTRACT (MONTH FROM o.order_date)
12 12         ) x
13 13  WHERE  x.mnth = 7
14 14  ORDER BY x.mnth, 4 ASC
15 15  ;
16
17CUSTOMER_ID       MNTH ORDER_TOTAL MONTH_RANK
18----------- ---------- ----------- ----------
19        117          7      104000          1
20        109          7       78000          2
21        159          7       69000          3
22        154          7       27000          4
23        163          7        1000          5
24        161          7        1000          6
25        160          7        1000          7
26        164          7        1000          8
27        162          7           0          9
28
299 rows selected
As you can see in example # 3: Mission accomplished – no ties. This example was kinda outside the scope of this post, but it also kinda belongs here.

A bit of cautiousness

I just wanted to add a note for example # 1 and 2. If you use the sql as shown in these examples, you might run into a situation where you get a different result each time you run it. The reason for this is that it is not described in the examples how to sort the people that ties. This might be ok for what you are doing, but if you are i.e. showing results for a contest for hundreds of distributors in a Multi Level Marketing (MLM) company, and the results keep on changing…I guarantee you: The phones will be chiming.
So, to solve this you can i.e. change the ORDER BY line to something like this:
ORDER BY x.mnth, 4 ASC, x.customer_id
This way you will always get the same result.

Conclusion

So, in conclusion we have the following rules:
  • If you want to show ties, and non-consecutive numbers: Use the RANK() analytic function
  • If you want to show ties, and consecutive numbers: Use the DENSE_RANK() analytic function
  • If you do not want to show ties at all, but consecutive numbers: Use the ROW_NUMBER analytic function
  • Make sure you always show the same result

No comments :

Post a Comment