
If you have worked with SQL and PL/SQL for a while, you have probably been in a situation where you have a
string that is delimited with a character, and you want the chunks of the string by themselves… In this tutorial we are going to look at a few ways to do this.
Introduction
In my previous post (
Quick Tip: Slicing A Long PLSQL String Into Smaller Pieces ),
I kind of touched something similar, but there was no logic in this
code that handled any delimiters, etc. We just brutally cut the string
into pieces.
So, in this tutorial we are actually going to show
how to pick out the pieces more carefully. Throughout the examples we
are going to use a collection,
SQL, the
REGEXPR_SUBSTR function,
SUBSTR,
INSTR, and a
PL/SQL Collection.
Ok, let’s get down to some business…
Example 1: Using SQL and REGEXP_SUBSTR
In this example we are only going to look at how to
tokenize ONE single string, since that is the purpose of this tutorial. If you would like to see an example using
REGEXPR_SUBSTR handling multiple records, please take a look at Alex Nuijten’s post found
here.
For this example, here is the code:
1 | SQL> set serveroutput on |
6 | 4 SELECT 'Paris#London#Rome#Oslo#Amsterdam#New York' city_string |
9 | 7 SELECT regexp_substr (city_string, '[^#]+' , 1, ROWNUM) city |
11 | 9 CONNECT BY LEVEL <= LENGTH(regexp_replace (city_string, '[^#]+' )) + 1 |
15 | 13 FOR rec IN cur LOOP |
16 | 14 dbms_output.put_line( 'City:' || rec.city); |
28 | PL/SQL procedure successfully completed |
Ok, so then we’re done with the SQL part. Let’s jump over to PL/SQL code to do some cool stuff.
Example 2: Using a PL/SQL Collection
In this example we are going to
tokenize our string and put it into a PL/SQL collection for later use.
3 | 3 TYPE typTokenTab IS TABLE OF VARCHAR (20) INDEX BY BINARY_INTEGER; |
4 | 4 vglCityString VARCHAR2(200); |
6 | 6 tglTokenTab typTokenTab; |
15 | 15 FUNCTION creTokenList(pLine IN VARCHAR2, pDelimiter IN VARCHAR2) RETURN typTokenTab IS |
16 | 16 sLine VARCHAR2(2000); |
22 | 22 sToken VARCHAR2(200); |
23 | 23 tTokenTab typTokenTab; |
26 | 26 IF (SUBSTR(sLine, LENGTH(sLine), 1) <> '|' ) THEN |
27 | 27 sLine := sLine || '|' ; |
32 | 32 nLength := LENGTH(sLine); |
35 | 35 FOR nIndex IN 1..nLength LOOP |
36 | 36 IF ((SUBSTR(sLine, nIndex, 1) = pDelimiter) OR (nIndex = nLength)) THEN |
40 | 40 sToken := SUBSTR(sLine, nPosOld + 1, nPos - nPosOld - 1); |
42 | 42 tTokenTab(nCnt) := sToken; |
51 | 51 vglCityString := 'Paris#London#Rome#Oslo#Amsterdam#New York' ; |
53 | 53 tglTokenTab := creTokenList( vglCityString, '#' ); |
55 | 55 FOR indx IN tglTokenTab. FIRST ..tglTokenTab. LAST LOOP |
56 | 56 dbms_output.put_line( 'City:' || tglTokenTab(indx)); |
59 | 59 tglTokenTab. DELETE ; |
70 | PL/SQL procedure successfully completed |
Remember
that when we are working with PL/SQL Collections, we need to clean up
our mess. What I mean with this is that when we are playing with memory
it is like cluttering up a child’s room with toys. We need to put the
toys back into the box and into the closet again so that we can use the
floor space for something else later. Hmm…to be technically correct we
would need to put the toys in the box and give the box to the Salvation
Army…but I don’t think we would have very happy children then.
Example # 3: Using INSTR and SUBSTR to get a token at a certain position
So,
let us say that you have a customer notes table with a field that
always follows the same structure for how the customer service reps
enter the notes. Let’s say it has the following format:
“
;;#“
That would i.e. give us something similar to this in our table:
“
2/5/2012;23412;Customer
called reg. invoice#2/6/2012;23412;Contacted Lisa in the collection
department.#2/6/2012;23412;Compliance department have decided that
customer have sold products earlier on eBay. Customer’s status set to
ON-HOLD. Waiting for further instruction“
If we break down the notes field we will see that there are actually three different parts in the note:
- 2/5/2012;23412;Customer called reg. invoice
- 2/6/2012;23412;Contacted Lisa in the collection department.
- 2/6/2012;23412;Compliance
department have decided that customer have sold products earlier on
eBay. Customer’s status set to ON-HOLD. Waiting for further instruction
Our task is to create a function that can get any of the three parts, AND get the individual elements of a single note.
Here is a sample of how this can be done:
1 | SQL> set serveroutput on |
4 | 2 vglNotesString VARCHAR2(2000); |
8 | 6 ,pPosition IN INTEGER |
9 | 7 ,pDelimiter IN VARCHAR2 DEFAULT ';' |
10 | 8 ,pNullVal IN VARCHAR2 DEFAULT 'n/a' |
11 | 9 ) RETURN VARCHAR2 IS |
12 | 10 sLine VARCHAR2(2000); |
16 | 14 vToken VARCHAR2(200); |
17 | 15 bContinue BOOLEAN := TRUE ; |
20 | 18 IF ( pPosition = 1) THEN |
22 | 20 ELSIF ( pPosition < 0) THEN |
23 | 21 iStartPos := INSTR(pString, pDelimiter, -1, ABS (pPosition)) + 1; |
25 | 23 iStartPos := INSTR(pString, pDelimiter, 1, pPosition - 1) + 1; |
26 | 24 IF (iStartPos = 1) THEN |
27 | 25 bContinue := FALSE ; |
31 | 29 IF (bContinue) THEN |
33 | 31 IF (pPosition > 0) THEN |
34 | 32 iEndPos := INSTR(pString, pDelimiter, 1, pPosition ); |
35 | 33 IF (iEndPos > 0) THEN |
36 | 34 iLength := iEndPos - iStartPos; |
37 | 35 ELSIF (iEndPos = 0) THEN |
38 | 36 iLength := LENGTH(pString) - iStartPos + 1; |
41 | 39 IF (pPosition = -1) THEN |
42 | 40 iEndPos := LENGTH(pString); |
44 | 42 iEndPos := INSTR(pString, pDelimiter, -1, ABS (pPosition) - 1) - 1; |
46 | 44 iLength := iEndPos - iStartPos + 1; |
49 | 47 vToken := NVL(SUBSTR(pString, iStartPos, iLength), pNullVal); |
51 | 49 vToken := pNullVal; |
54 | 52 RETURN TO_CHAR(vToken); |
58 | 56 vglNotesString := '2/5/2012;23412;Customer called reg. invoice#2/6/2012;23412;Contacted Lisa in the collection ' ; |
59 | 57 vglNotesString := vglNotesString || 'department.#2/6/2012;23412;Compliance department have decided that customer ' ; |
60 | 58 vglNotesString := vglNotesString || 'have sold products earlier on eBay. Customer' 's status set to ON-HOLD. ' ; |
61 | 59 vglNotesString := vglNotesString || 'Waiting for further instruction' ; |
64 | 62 dbms_output.put_line( 'Positive tokenizing:' ); |
65 | 63 dbms_output.put_line( 'Note 1:[' || getToken(vglNotesString, 1, '#' , 'n/a' ) || ']' ); |
66 | 64 dbms_output.put_line( 'Note 2:[' || getToken(vglNotesString, 2, '#' , 'n/a' ) || ']' ); |
67 | 65 dbms_output.put_line( 'Note 3:[' || getToken(vglNotesString, 3, '#' , 'n/a' ) || ']' ); |
68 | 66 dbms_output.put_line( 'Note 4:[' || getToken(vglNotesString, 4, '#' , 'n/a' ) || ']' ); |
69 | 67 dbms_output.put_line( ' -- ' ); |
70 | 68 dbms_output.put_line( 'Negative tokenizing:' ); |
71 | 69 dbms_output.put_line( 'Note -1:[' || getToken(vglNotesString, -1, '#' , 'n/a' ) || ']' ); |
72 | 70 dbms_output.put_line( 'Note -2:[' || getToken(vglNotesString, -2, '#' , 'n/a' ) || ']' ); |
73 | 71 dbms_output.put_line( 'Note -3:[' || getToken(vglNotesString, -3, '#' , 'n/a' ) || ']' ); |
74 | 72 dbms_output.put_line( 'Note -4:[' || getToken(vglNotesString, -4, '#' , 'n/a' ) || ']' ); |
75 | 73 dbms_output.put_line( ' -- ' ); |
76 | 74 dbms_output.put_line( 'Positive tokenizing of a tokenized string:' ); |
77 | 75 dbms_output.put_line( 'Note 1:[' || getToken(getToken(vglNotesString, 1, '#' , 'n/a' ), 1, ';' , 'n/a' ) || ']' ); |
78 | 76 dbms_output.put_line( 'Note 2:[' || getToken(getToken(vglNotesString, 1, '#' , 'n/a' ), 2, ';' , 'n/a' ) || ']' ); |
79 | 77 dbms_output.put_line( 'Note 3:[' || getToken(getToken(vglNotesString, 1, '#' , 'n/a' ), 3, ';' , 'n/a' ) || ']' ); |
84 | Note 1:[2/5/2012;23412;Customer called reg. invoice] |
85 | Note 2:[2/6/2012;23412;Contacted Lisa in the collection department.] |
86 | Note 3:[2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer 's status set to ON-HOLD. Waiting for further instruction] |
90 | Note -1:[2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer' s status set to ON -HOLD. Waiting for further instruction] |
91 | Note -2:[2/6/2012;23412;Contacted Lisa in the collection department.] |
92 | Note -3:[2/5/2012;23412;Customer called reg. invoice] |
95 | Positive tokenizing of a tokenized string: |
98 | Note 3:[Customer called reg. invoice] |
100 | PL/SQL procedure successfully completed |
As
you can see, I left the third parameter as a VARCHAR2, rather than
using a CHAR. The reason for this is that sometimes our delimiter can
actually consist of more than one character, like i.e. “
##“
In addition I have made the function so that it can handle both
positive and negative string positioning.
This will be valuable if you for some reason wanted to tokenize / parse
a string where the token positions are numbered from the end of the
string, going towards the beginning.
Both the
INSTR and the
SUBSTR functions can handle both positive and negative positioning.
In Conclusion
As you can see, there are many ways to
tokenize or parse a string in PL/SQL.
In this little tutorial we have touched a few of them. No matter what
method you choose, an advice would be to choose your delimiter wisely.
Usually I use a character that most likely would not show up in a normal
string, like:
#, @, $, ^, etc.
One thing to consider as
well is what you want your function to do if there is nothing returned.
Would you i.e. just return the text “
n/a“, or something?
No comments :
Post a Comment