If you have worked with SQL and PL/SQL for a while, you have probably been in a situation where you have a string that is delimited with a character, and you want the chunks of the string by themselves… In this tutorial we are going to look at a few ways to do this.
So, in this tutorial we are actually going to show how to pick out the pieces more carefully. Throughout the examples we are going to use a collection, SQL, the REGEXPR_SUBSTR function, SUBSTR, INSTR, and a PL/SQL Collection.
Ok, let’s get down to some business…
For this example, here is the code:
Ok, so then we’re done with the SQL part. Let’s jump over to PL/SQL code to do some cool stuff.
Remember
that when we are working with PL/SQL Collections, we need to clean up
our mess. What I mean with this is that when we are playing with memory
it is like cluttering up a child’s room with toys. We need to put the
toys back into the box and into the closet again so that we can use the
floor space for something else later. Hmm…to be technically correct we
would need to put the toys in the box and give the box to the Salvation
Army…but I don’t think we would have very happy children then.
“;;# “
That would i.e. give us something similar to this in our table:
“2/5/2012;23412;Customer called reg. invoice#2/6/2012;23412;Contacted Lisa in the collection department.#2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer’s status set to ON-HOLD. Waiting for further instruction“
If we break down the notes field we will see that there are actually three different parts in the note:
Here is a sample of how this can be done:
As you can see, I left the third parameter as a VARCHAR2, rather than using a CHAR. The reason for this is that sometimes our delimiter can actually consist of more than one character, like i.e. “## “
In addition I have made the function so that it can handle both positive and negative string positioning. This will be valuable if you for some reason wanted to tokenize / parse a string where the token positions are numbered from the end of the string, going towards the beginning.
Both the INSTR and the SUBSTR functions can handle both positive and negative positioning.
One thing to consider as well is what you want your function to do if there is nothing returned. Would you i.e. just return the text “n/a“, or something?
Introduction
In my previous post ( Quick Tip: Slicing A Long PLSQL String Into Smaller Pieces ), I kind of touched something similar, but there was no logic in this code that handled any delimiters, etc. We just brutally cut the string into pieces.So, in this tutorial we are actually going to show how to pick out the pieces more carefully. Throughout the examples we are going to use a collection, SQL, the REGEXPR_SUBSTR function, SUBSTR, INSTR, and a PL/SQL Collection.
Ok, let’s get down to some business…
Example 1: Using SQL and REGEXP_SUBSTR
In this example we are only going to look at how to tokenize ONE single string, since that is the purpose of this tutorial. If you would like to see an example using REGEXPR_SUBSTR handling multiple records, please take a look at Alex Nuijten’s post found here.For this example, here is the code:
1 | SQL> set serveroutput on |
2 | SQL> |
3 | SQL> DECLARE |
4 | 2 CURSOR cur IS |
5 | 3 WITH qry AS ( |
6 | 4 SELECT 'Paris#London#Rome#Oslo#Amsterdam#New York' city_string |
7 | 5 FROM dual |
8 | 6 ) |
9 | 7 SELECT regexp_substr (city_string, '[^#]+' , 1, ROWNUM) city |
10 | 8 FROM qry |
11 | 9 CONNECT BY LEVEL <= LENGTH(regexp_replace (city_string, '[^#]+' )) + 1 |
12 | 10 ; |
13 | 11 |
14 | 12 BEGIN |
15 | 13 FOR rec IN cur LOOP |
16 | 14 dbms_output.put_line( 'City:' || rec.city); |
17 | 15 END LOOP; |
18 | 16 END ; |
19 | 17 / |
20 |
21 | City:Paris |
22 | City:London |
23 | City:Rome |
24 | City:Oslo |
25 | City:Amsterdam |
26 | City:New York |
27 |
28 | PL/SQL procedure successfully completed |
Example 2: Using a PL/SQL Collection
In this example we are going to tokenize our string and put it into a PL/SQL collection for later use.1 | SQL> DECLARE |
2 | 2 |
3 | 3 TYPE typTokenTab IS TABLE OF VARCHAR (20) INDEX BY BINARY_INTEGER; |
4 | 4 vglCityString VARCHAR2(200); |
5 | 5 |
6 | 6 tglTokenTab typTokenTab; |
7 | 7 |
8 | 8 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
9 | 9 -- Function: creTokenList |
10 | 10 -- |
11 | 11 -- This function takes a string with "tokens" delimited by pDelimiter |
12 | 12 -- and put each "token" into a separate record in a PL/SQL collection. The |
13 | 13 -- PL/SQL collection is returned back to the caller of the function. |
14 | 14 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
15 | 15 FUNCTION creTokenList(pLine IN VARCHAR2, pDelimiter IN VARCHAR2) RETURN typTokenTab IS |
16 | 16 sLine VARCHAR2(2000); |
17 | 17 nPos INTEGER ; |
18 | 18 nPosOld INTEGER ; |
19 | 19 nIndex INTEGER ; |
20 | 20 nLength INTEGER ; |
21 | 21 nCnt INTEGER ; |
22 | 22 sToken VARCHAR2(200); |
23 | 23 tTokenTab typTokenTab; |
24 | 24 BEGIN |
25 | 25 sLine := pLine; |
26 | 26 IF (SUBSTR(sLine, LENGTH(sLine), 1) <> '|' ) THEN |
27 | 27 sLine := sLine || '|' ; |
28 | 28 END IF; |
29 | 29 |
30 | 30 nPos := 0; |
31 | 31 sToken := '' ; |
32 | 32 nLength := LENGTH(sLine); |
33 | 33 nCnt := 0; |
34 | 34 |
35 | 35 FOR nIndex IN 1..nLength LOOP |
36 | 36 IF ((SUBSTR(sLine, nIndex, 1) = pDelimiter) OR (nIndex = nLength)) THEN |
37 | 37 nPosOld := nPos; |
38 | 38 nPos := nIndex; |
39 | 39 nCnt := nCnt + 1; |
40 | 40 sToken := SUBSTR(sLine, nPosOld + 1, nPos - nPosOld - 1); |
41 | 41 |
42 | 42 tTokenTab(nCnt) := sToken; |
43 | 43 END IF; |
44 | 44 |
45 | 45 END LOOP; |
46 | 46 |
47 | 47 RETURN tTokenTab; |
48 | 48 END creTokenList; |
49 | 49 |
50 | 50 BEGIN |
51 | 51 vglCityString := 'Paris#London#Rome#Oslo#Amsterdam#New York' ; |
52 | 52 |
53 | 53 tglTokenTab := creTokenList( vglCityString, '#' ); |
54 | 54 |
55 | 55 FOR indx IN tglTokenTab. FIRST ..tglTokenTab. LAST LOOP |
56 | 56 dbms_output.put_line( 'City:' || tglTokenTab(indx)); |
57 | 57 END LOOP; |
58 | 58 |
59 | 59 tglTokenTab. DELETE ; |
60 | 60 END ; |
61 | 61 / |
62 |
63 | City:Paris |
64 | City:London |
65 | City:Rome |
66 | City:Oslo |
67 | City:Amsterdam |
68 | City:New York |
69 |
70 | PL/SQL procedure successfully completed |
Example # 3: Using INSTR and SUBSTR to get a token at a certain position
So, let us say that you have a customer notes table with a field that always follows the same structure for how the customer service reps enter the notes. Let’s say it has the following format:“
That would i.e. give us something similar to this in our table:
“2/5/2012;23412;Customer called reg. invoice#2/6/2012;23412;Contacted Lisa in the collection department.#2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer’s status set to ON-HOLD. Waiting for further instruction“
If we break down the notes field we will see that there are actually three different parts in the note:
- 2/5/2012;23412;Customer called reg. invoice
- 2/6/2012;23412;Contacted Lisa in the collection department.
- 2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer’s status set to ON-HOLD. Waiting for further instruction
Here is a sample of how this can be done:
1 | SQL> set serveroutput on |
2 | SQL> |
3 | SQL> DECLARE |
4 | 2 vglNotesString VARCHAR2(2000); |
5 | 3 |
6 | 4 FUNCTION getToken( |
7 | 5 pString IN VARCHAR2 |
8 | 6 ,pPosition IN INTEGER |
9 | 7 ,pDelimiter IN VARCHAR2 DEFAULT ';' |
10 | 8 ,pNullVal IN VARCHAR2 DEFAULT 'n/a' |
11 | 9 ) RETURN VARCHAR2 IS |
12 | 10 sLine VARCHAR2(2000); |
13 | 11 iStartPos INTEGER ; |
14 | 12 iEndPos INTEGER ; |
15 | 13 iLength INTEGER ; |
16 | 14 vToken VARCHAR2(200); |
17 | 15 bContinue BOOLEAN := TRUE ; |
18 | 16 BEGIN |
19 | 17 -- Get start position |
20 | 18 IF ( pPosition = 1) THEN |
21 | 19 iStartPos := 1; |
22 | 20 ELSIF ( pPosition < 0) THEN |
23 | 21 iStartPos := INSTR(pString, pDelimiter, -1, ABS (pPosition)) + 1; |
24 | 22 ELSE |
25 | 23 iStartPos := INSTR(pString, pDelimiter, 1, pPosition - 1) + 1; |
26 | 24 IF (iStartPos = 1) THEN |
27 | 25 bContinue := FALSE ; |
28 | 26 END IF; |
29 | 27 END IF; |
30 | 28 |
31 | 29 IF (bContinue) THEN |
32 | 30 -- Get the end position, and then the length of the token |
33 | 31 IF (pPosition > 0) THEN |
34 | 32 iEndPos := INSTR(pString, pDelimiter, 1, pPosition ); |
35 | 33 IF (iEndPos > 0) THEN |
36 | 34 iLength := iEndPos - iStartPos; |
37 | 35 ELSIF (iEndPos = 0) THEN |
38 | 36 iLength := LENGTH(pString) - iStartPos + 1; |
39 | 37 END IF; |
40 | 38 ELSE |
41 | 39 IF (pPosition = -1) THEN |
42 | 40 iEndPos := LENGTH(pString); |
43 | 41 ELSE |
44 | 42 iEndPos := INSTR(pString, pDelimiter, -1, ABS (pPosition) - 1) - 1; |
45 | 43 END IF; |
46 | 44 iLength := iEndPos - iStartPos + 1; |
47 | 45 END IF; |
48 | 46 |
49 | 47 vToken := NVL(SUBSTR(pString, iStartPos, iLength), pNullVal); |
50 | 48 ELSE |
51 | 49 vToken := pNullVal; |
52 | 50 END IF; |
53 | 51 |
54 | 52 RETURN TO_CHAR(vToken); |
55 | 53 END getToken; |
56 | 54 |
57 | 55 BEGIN |
58 | 56 vglNotesString := '2/5/2012;23412;Customer called reg. invoice#2/6/2012;23412;Contacted Lisa in the collection ' ; |
59 | 57 vglNotesString := vglNotesString || 'department.#2/6/2012;23412;Compliance department have decided that customer ' ; |
60 | 58 vglNotesString := vglNotesString || 'have sold products earlier on eBay. Customer' 's status set to ON-HOLD. ' ; |
61 | 59 vglNotesString := vglNotesString || 'Waiting for further instruction' ; |
62 | 60 |
63 | 61 |
64 | 62 dbms_output.put_line( 'Positive tokenizing:' ); |
65 | 63 dbms_output.put_line( 'Note 1:[' || getToken(vglNotesString, 1, '#' , 'n/a' ) || ']' ); |
66 | 64 dbms_output.put_line( 'Note 2:[' || getToken(vglNotesString, 2, '#' , 'n/a' ) || ']' ); |
67 | 65 dbms_output.put_line( 'Note 3:[' || getToken(vglNotesString, 3, '#' , 'n/a' ) || ']' ); |
68 | 66 dbms_output.put_line( 'Note 4:[' || getToken(vglNotesString, 4, '#' , 'n/a' ) || ']' ); |
69 | 67 dbms_output.put_line( ' -- ' ); |
70 | 68 dbms_output.put_line( 'Negative tokenizing:' ); |
71 | 69 dbms_output.put_line( 'Note -1:[' || getToken(vglNotesString, -1, '#' , 'n/a' ) || ']' ); |
72 | 70 dbms_output.put_line( 'Note -2:[' || getToken(vglNotesString, -2, '#' , 'n/a' ) || ']' ); |
73 | 71 dbms_output.put_line( 'Note -3:[' || getToken(vglNotesString, -3, '#' , 'n/a' ) || ']' ); |
74 | 72 dbms_output.put_line( 'Note -4:[' || getToken(vglNotesString, -4, '#' , 'n/a' ) || ']' ); |
75 | 73 dbms_output.put_line( ' -- ' ); |
76 | 74 dbms_output.put_line( 'Positive tokenizing of a tokenized string:' ); |
77 | 75 dbms_output.put_line( 'Note 1:[' || getToken(getToken(vglNotesString, 1, '#' , 'n/a' ), 1, ';' , 'n/a' ) || ']' ); |
78 | 76 dbms_output.put_line( 'Note 2:[' || getToken(getToken(vglNotesString, 1, '#' , 'n/a' ), 2, ';' , 'n/a' ) || ']' ); |
79 | 77 dbms_output.put_line( 'Note 3:[' || getToken(getToken(vglNotesString, 1, '#' , 'n/a' ), 3, ';' , 'n/a' ) || ']' ); |
80 | 78 END ; |
81 | 79 / |
82 |
83 | Positive tokenizing: |
84 | Note 1:[2/5/2012;23412;Customer called reg. invoice] |
85 | Note 2:[2/6/2012;23412;Contacted Lisa in the collection department.] |
86 | Note 3:[2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer 's status set to ON-HOLD. Waiting for further instruction] |
87 | Note 4:[n/a] |
88 | -- |
89 | Negative tokenizing: |
90 | Note -1:[2/6/2012;23412;Compliance department have decided that customer have sold products earlier on eBay. Customer' s status set to ON -HOLD. Waiting for further instruction] |
91 | Note -2:[2/6/2012;23412;Contacted Lisa in the collection department.] |
92 | Note -3:[2/5/2012;23412;Customer called reg. invoice] |
93 | Note -4:[n/a] |
94 | -- |
95 | Positive tokenizing of a tokenized string: |
96 | Note 1:[2/5/2012] |
97 | Note 2:[23412] |
98 | Note 3:[Customer called reg. invoice] |
99 |
100 | PL/SQL procedure successfully completed |
As you can see, I left the third parameter as a VARCHAR2, rather than using a CHAR. The reason for this is that sometimes our delimiter can actually consist of more than one character, like i.e. “
In addition I have made the function so that it can handle both positive and negative string positioning. This will be valuable if you for some reason wanted to tokenize / parse a string where the token positions are numbered from the end of the string, going towards the beginning.
Both the INSTR and the SUBSTR functions can handle both positive and negative positioning.
In Conclusion
As you can see, there are many ways to tokenize or parse a string in PL/SQL. In this little tutorial we have touched a few of them. No matter what method you choose, an advice would be to choose your delimiter wisely. Usually I use a character that most likely would not show up in a normal string, like: #, @, $, ^, etc.One thing to consider as well is what you want your function to do if there is nothing returned. Would you i.e. just return the text “n/a“, or something?
No comments :
Post a Comment