Oracle introduced regular expression functions in
10g version.
REGEXP_INSTR, REGEXP_REPLACE and REGEXP_SUBSTR. These functions are similar in functionality that of standard string functions INSTR, REPLACE and SUBSTR respectively but with more powerful capabilities of pattern matching. Using regular expression functions, we can search, validate, remove unwanted text and/or replace characters from the string. It can be used for any character data including CLOB and NCLOB as well.
Introduced in Oracle 11g the REGEXP_COUNT function greatly simplifies counting the number of times a pattern appears inside a string. The following examples demonstrate how to use REGEXP_COUNT with some simple patterns. For versions prior to 11g alternative logic that provides similar functionality is also presented.
REGEXP_COUNT functions returns the number of occurrences of the pattern in the source string. It takes four arguments.
Run the code on this page in SQL*Plus to create the sample tables, data, etc. used by the examples in this section.
Be sure to read Using SQL Snippets ™ before executing any of these setup steps.
=====================================================================
create table targets ( target varchar2(15) );
—————————————————
3
There are 3 incidents of the ‘i’ in the string. If we change the match parameter to ‘c’, occurrence will be only 2.
—————————————————
2
Following is the example, where we are searching for pattern starting from the position 3.
———————————————
2
This function really helps to identify number of occurrences of the specific pattern in the source string. It reduces the long PL/SQL code to identify number of occurrences into one single function call.
We can even make it case sensitive or insensitive by specifying the Match Option as under
While Oracle 10g supports most of the regular
expression features described in this section it unfortunately does not support
REGEXP_COUNT. However, we can use the LENGTH and REGEXP_REPLACE
functions to produce similar results.
To count the occurrences of the letter "b" in the target strings we can use a query like this (the column INTERMEDIATE_STRING is included to demonstrate the inner workings of each solution).
column intermediate_string format a12 heading "INTERMEDIATE|STRING"
This query finds the total occurrences of the
letters "a", "c", or "f" in the target strings.
To calculate the number of times a string like
"def" occurs we use a query like this.
(I found
this last technique in this OTN Discussion Forum
thread by cd after reading this blog post by Volder.)
To find the number of words in a string we can use a query like this.
Count String Pattern Occurrences (Any Oracle
Version)
In any Oracle version we can use the LENGTH and REPLACE functions to count patterns, though the logic is slightly more complex than the techniques above.
To count the occurrences of the letter "b" in the target strings we can use a query like this.
column target_length heading "TARGET|LENGTH"
column intermediate_length heading "INTERMEDIATE|LENGTH"
(The INTERMEDIATE_STRING value in the third row is
six spaces. In the fourth row it is three spaces followed by "xx"
followed by two spaces.)
This query finds the total occurrences of the letters "a", "c", or "f" in the target strings.
To find the number of words in a string we can use
a query like this. Note the query will not work for target strings containing
more than one space between words.
column trimmed_length_plus_1 heading "TRIMMED LENGTH|PLUS 1"
REGEXP_INSTR, REGEXP_REPLACE and REGEXP_SUBSTR. These functions are similar in functionality that of standard string functions INSTR, REPLACE and SUBSTR respectively but with more powerful capabilities of pattern matching. Using regular expression functions, we can search, validate, remove unwanted text and/or replace characters from the string. It can be used for any character data including CLOB and NCLOB as well.
Introduced in Oracle 11g the REGEXP_COUNT function greatly simplifies counting the number of times a pattern appears inside a string. The following examples demonstrate how to use REGEXP_COUNT with some simple patterns. For versions prior to 11g alternative logic that provides similar functionality is also presented.
REGEXP_COUNT functions returns the number of occurrences of the pattern in the source string. It takes four arguments.
•
Source string: It is the
source string from which we are going to find the occurrence of the pattern.
This can be CLOB or NCLOB characters as well.
•
Pattern: Is any string,
pattern or text which will be found in the source string. Data type of pattern
will be converted to data type of source string, if it is different than the
source string.
•
Position: It is a non-zero
positive value from which search should begin.
•
Matching parameter: which
controls the behavior of the search? It takes one or more value from.
o
I (ignore case)
o
c – Case Sensitive
o
m – Allows source
string as multiple lines
o
n – Allows period.
o
x – Ignore white space
characters
Now let us start with the some simple example. In
following example, we are counting occurrence of ‘i’ in the source string. We are
ignoring case in matching parameter.Setup Regular Expression Data
Before proceeding however, the following code will set up some test data for us (see Setup if you want to create the SET_TARGET and ADD_TARGET procedures on your own test system).Run the code on this page in SQL*Plus to create the sample tables, data, etc. used by the examples in this section.
Be sure to read Using SQL Snippets ™ before executing any of these setup steps.
=====================================================================
Setup for pattern matching examples
=====================================================================create table targets ( target varchar2(15) );
create
procedure set_target( p_target varchar2 ) as
begin
delete from targets ;
insert into targets values ( p_target ) ;
end;
/
create
procedure add_target( p_target varchar2 ) as
begin
insert into targets values ( p_target );
end;
/
create table patterns ( pattern varchar2(25),
sort_key number );
create
procedure set_pattern( p_pattern varchar2 ) as
begin
delete from patterns;
insert into patterns values ( p_pattern, 1 );
end;
/
create
procedure add_pattern( p_pattern varchar2 ) as
begin
insert into patterns
values
( p_pattern ,
( select count(*) + 1 from patterns )
);
end;
/
create
view test_results as
select
pattern ,
target, case when regexp_like( target, pattern ) then
'Y' else 'N' end as match,
regexp_substr( target, pattern ) as
matched_value ,
regexp_instr( target, pattern ) as position
from patterns,
targets
order
by patterns.sort_key, target;
column
pattern format a15
column
target format a10
column
match format a5
column
matched_value format a15
column
modifier format a8
column
replaced_target format a15
column
position format 999
column
substring format a15
break on
pattern skip 1
set
null '(null)'
set
linesize 100
=====================================================================
Setup for the section on POSIX character
classes
=====================================================================
create
table character_class_map
(
data varchar2(2) ,
symbol
varchar2(3) ,
dec
number ,
hex
varchar2(4) ,
upper
varchar2(1) ,
lower
varchar2(1) ,
alpha
varchar2(1) ,
digit
varchar2(1) ,
xdigit
varchar2(1) ,
alnum
varchar2(1) ,
blank
varchar2(1) ,
space
varchar2(1) ,
cntrl
varchar2(1) ,
punct
varchar2(1) ,
graph
varchar2(1) ,
print
varchar2(1)
);
declare
v_data
varchar2(2) ;
v_hex_value varchar2(4) ;
v_symbol
varchar2(3) ;
begin
for
i in 0
.. 255
loop
v_hex_value := to_char( i, 'FM000X' ) ;
v_data := unistr( '\' || v_hex_value ) ;
v_symbol :=
case i
when 0
then 'NUL' when 1 then 'SOH'
when 2 then 'STX'
when 3
then 'ETX' when 4 then 'EOT'
when 5 then 'ENQ'
when 6
then 'ACK' when 7 then 'BEL'
when 8 then 'BS'
when 9
then 'TAB' when 10 then 'LF'
when 11 then 'VT'
when 12
then 'FF' when 13 then 'CR'
when 14 then 'SO'
when 15
then 'SI' when 16 then 'DLE'
when 17 then 'DC1'
when 18
then 'DC2' when 19 then 'DC3'
when 20 then 'DC4'
when 21
then 'NAK' when 22 then 'SYN'
when 23 then 'ETB'
when 24
then 'CAN' when 25 then 'EM'
when 26 then 'SUB'
when 27
then 'ESC' when 28 then 'FS'
when 29 then 'GS'
when 30
then 'RS' when 31 then 'US'
when 127 then 'DEL'
when 128 then 'XXX' when 129 then 'XXX' when 130 then 'BPH'
when 131 then 'NBH' when 132 then 'IND' when 133 then 'NEL'
when 134 then 'SSA' when 135 then
'ESA' when 136 then 'HTS'
when 137 then 'HTJ' when 138 then 'VTS' when 139 then 'PLD'
when 140 then 'PLU' when 141 then 'RI' when 142 then 'SS2'
when 143 then 'SS3' when 144 then 'DCS' when 145 then 'PU1'
when 146 then 'PU2' when 147 then 'STS' when 148 then 'CCH'
when 149 then 'MW' when 150 then 'SPA' when 151 then 'EPA'
when 152 then 'SOS' when 153 then 'XXX' when 154 then 'SCI'
when 155 then 'CSI' when 156 then 'ST' when 157 then 'OSC'
when 158 then 'PM' when 159 then 'APC' when 173 then 'SHY'
else v_data
end
;
insert
into character_class_map (data , symbol , dec , hex )
values ( v_data , v_symbol , i , v_hex_value );
end loop;
end;
/
update
character_class_map set upper = 'Y' where regexp_like( data, '[[:upper:]]' );
update
character_class_map set lower = 'Y' where regexp_like( data, '[[:lower:]]' );
update
character_class_map set alpha = 'Y' where regexp_like( data, '[[:alpha:]]' );
update
character_class_map set digit = 'Y' where regexp_like( data, '[[:digit:]]' );
update
character_class_map set xdigit= 'Y' where regexp_like( data, '[[:xdigit:]]');
update
character_class_map set alnum = 'Y' where regexp_like( data, '[[:alnum:]]' );
update
character_class_map set blank = 'Y' where regexp_like( data, '[[:blank:]]' );
update
character_class_map set space = 'Y' where regexp_like( data, '[[:space:]]' );
update
character_class_map set cntrl = 'Y' where regexp_like( data, '[[:cntrl:]]' );
update
character_class_map set punct = 'Y' where regexp_like( data, '[[:punct:]]' );
update
character_class_map set graph = 'Y' where regexp_like( data, '[[:graph:]]' );
update
character_class_map set print = 'Y' where regexp_like( data, '[[:print:]]' );
commit;
column
symbol format a6
column
dec format 999
column
hex format a4
column
upper format a5
column
lower format a5
column
alpha format a5
column
digit format a5
column
xdigit format a6
column
alnum format a5
column
blank format a5
column
space format a5
column
cntrl format a5
column
punct format a5
column
graph format a5
column
print format a5
execute
set_target( 'aaa' )
execute
add_target( 'a b c' )
execute
add_target( 'b b b b' )
execute
add_target( ' bbb xx ' )
execute
add_target( 'wdef def w' )
execute
add_target( 'defxdefdef' )
execute
add_target( null )
set null
"(null)"
column
target heading
"TARGET|STRING"
column
pattern_count heading "PATTERN|COUNT"
REGEXP_COUNT
SQL> SELECT REGEXP_COUNT(‘Decipher Information
System’,’i’,1,’i’) FROM DUAL;
REGEXP_COUNT(‘DECIPHERINFORMATIONSYSTEM’,’I’,1,’I’)—————————————————
3
There are 3 incidents of the ‘i’ in the string. If we change the match parameter to ‘c’, occurrence will be only 2.
SQL> SELECT REGEXP_COUNT('Decipher Information
System','i',1,'c') FROM DUAL;
REGEXP_COUNT(‘DECIPHERINFORMATIONSYSTEM’,’I’,1,’C’)—————————————————
2
Following is the example, where we are searching for pattern starting from the position 3.
SQL> SELECT REGEXP_COUNT('Cat in the hat is on the
bat.','at',4) FROM DUAL;
REGEXP_COUNT(‘CATINTHEHATISONTHEBAT.’,’AT’,4)———————————————
2
This function really helps to identify number of occurrences of the specific pattern in the source string. It reduces the long PL/SQL code to identify number of occurrences into one single function call.
We can start counting from a specified
position as shown under
We have started counting from position 10 till the
end of the string. That means the operation will happen inside 'learning the REGEXP_COUNT function of Oracle 11g' test and hence 3 matches found. So
the count is 3.Select
REGEXP_COUNT ('We
are now learning the
REGEXP_COUNT function
of Oracle 11g.more at www.beyondrelational.com', 'r',10) As
Result From
Dual;
We can even make it case sensitive or insensitive by specifying the Match Option as under
Select REGEXP_COUNT ('We are
now learning the regexp_count function of Oracle 11g.more at
www.beyondrelational.com', 'R',10) As CaseSensitiveResult
,REGEXP_COUNT ('We are
now learning the regexp_count function of Oracle 11g.more at
www.beyondrelational.com', 'R',10,'i') As CaseInSensitiveResult
From Dual;
We can even find occurrences of multiple characters.
We can identify digits using "\d" or "[0-9]" and the "{4}"
operator signifies 4 of them, so using "\d{4}" or
"[0-9]{4}" with the REGEXP_COUNT function
seems to be a valid option.
WITH T1 AS
(SELECT '1234' DATA FROM DUAL
UNION ALL
SELECT '1234 1234' FROM DUAL
UNION ALL
SELECT '1234 1234 1234' FROM DUAL
)SELECT REGEXP_COUNT(data, '[0-9]{4}') AS pattern_count_1,
REGEXP_COUNT(data, '\d{4}') AS pattern_count_2 FROM T1
Count String Pattern Occurrences (Oracle 11g)
To count
the occurrences of the letter "b" in the target strings we can use a
query like this.
Select target , REGEXP_COUNT( TARGET, 'b' ) as pattern_count from targets ;
TARGET |
PATTERN_COUNT |
bbb
xx |
3 |
a b c |
1 |
aaa |
0 |
b b b b |
4 |
defxdefdef |
0 |
wdef def w |
0 |
(null)
|
(null)
|
This query
finds the total occurrences of the letters "a", "c", or
"f" in the target strings.
Select target , REGEXP_COUNT( TARGET, '[acf]' ) as
pattern_count from targets ;
TARGET |
PATTERN_COUNT |
aaa |
3 |
a b c |
2 |
b b b b |
0 |
bbb
xx |
0 |
wdef def w |
2 |
defxdefdef |
3 |
(null)
|
(null)
|
To
calculate the number of times a string like "def" occurs we use a
query like this.
Select target , REGEXP_COUNT( TARGET, 'def' ) as
pattern_count from targets;
TARGET |
PATTERN_COUNT |
aaa |
0 |
a b c |
0 |
b b b b |
0 |
bbb
xx |
0 |
wdef def w |
2 |
defxdefdef |
3 |
(null)
|
(null)
|
To find
the number of words in a string we can use a query like this (for our purposes
we define a word as any series of characters that are not a space).
Select target , REGEXP_COUNT( TARGET, '[^ ]+' ) as pattern_count
from targets;
TARGET |
PATTERN_COUNT |
aaa |
1 |
a b c |
3 |
b b b b |
4 |
bbb
xx |
2 |
wdef def w |
3 |
defxdefdef |
1 |
(null)
|
(null)
|
To ensure
PATTERN_COUNT always returns a non-null value use NVL.
Select target , NVL( regexp_count( target, '[^ ]+' ), 0 ) as
pattern_count from targets;
TARGET |
PATTERN_COUNT |
aaa |
1 |
a b c |
3 |
b b b b |
4 |
bbb
xx |
2 |
wdef def w |
3 |
defxdefdef |
1 |
(null) |
0 |
To count the occurrences of the letter "b" in the target strings we can use a query like this (the column INTERMEDIATE_STRING is included to demonstrate the inner workings of each solution).
column intermediate_string format a12 heading "INTERMEDIATE|STRING"
select target , regexp_replace( target, '[^b]', null ) as intermediate_string ,
NVL( LENGTH(
REGEXP_REPLACE( TARGET, '[^b]', NULL ) ), 0 ) as pattern_count
From targets;
TARGET |
INTERMEDIATE_STRING |
PATTERN_COUNT |
aaa |
(null) |
0 |
a b c |
b |
1 |
b b b b |
bbbb |
4 |
bbb
xx |
bbb |
3 |
wdef def w |
(null) |
0 |
defxdefdef |
(null) |
0 |
(null) |
(null) |
0 |
Select target , regexp_replace( target, '[^acf]', null ) as intermediate_string ,
NVL( LENGTH(
REGEXP_REPLACE( TARGET, '[^acf]', NULL ) ), 0 ) as pattern_count
from targets;
TARGET |
INTERMEDIATE_STRING |
PATTERN_COUNT |
aaa |
aaa |
3 |
a b c |
ac |
2 |
b b b b |
(null) |
0 |
bbb
xx |
(null) |
0 |
wdef def w |
ff |
2 |
defxdefdef |
fff |
3 |
(null) |
(null) |
0 |
select target , regexp_replace( target, '(d)ef|.', '\1'
) as intermediate_string ,
NVL( LENGTH(
REGEXP_REPLACE( TARGET, '(d)ef|.', '\1' ) ), 0 ) as pattern_count
From targets;
TARGET |
INTERMEDIATE_STRING |
PATTERN_COUNT |
aaa |
(null) |
0 |
a b c |
(null) |
0 |
b b b b |
(null) |
0 |
bbb
xx |
(null) |
0 |
wdef def w |
dd |
2 |
defxdefdef |
ddd |
3 |
(null) |
(null) |
0 |
To find the number of words in a string we can use a query like this.
Select target , regexp_replace( target, ' *[^ ]+ *', 'w'
) as intermediate_string ,
NVL( LENGTH( REGEXP_REPLACE( TARGET, ' *[^ ]+ *', 'w' ) ), 0 )
as pattern_count
From targets;
TARGET |
INTERMEDIATE_STRING |
PATTERN_COUNT |
aaa |
w |
1 |
a b c |
www |
3 |
b b b b |
wwww |
4 |
bbb
xx |
ww |
2 |
wdef def w |
www |
3 |
defxdefdef |
w |
1 |
(null) |
(null) |
0 |
In any Oracle version we can use the LENGTH and REPLACE functions to count patterns, though the logic is slightly more complex than the techniques above.
To count the occurrences of the letter "b" in the target strings we can use a query like this.
column target_length heading "TARGET|LENGTH"
column intermediate_length heading "INTERMEDIATE|LENGTH"
select target , replace( target, 'b', null ) as intermediate_string ,
nvl( length( target ), 0
) as target_length
,
nvl( length( replace(
target, 'b', null ) ), 0 ) as intermediate_length ,
NVL( LENGTH( TARGET ), 0
) - NVL( LENGTH( REPLACE( TARGET, 'b',
NULL ) ), 0 )
as pattern_count
from targets;
TARGET |
INTERMEDIATE_STRING |
TARGET_LENGTH |
INTERMEDIATE_LENGTH |
PATTERN_COUNT |
aaa |
aaa |
3 |
3 |
0 |
a b c |
a c |
5 |
4 |
1 |
b b b b |
|
10 |
6 |
4 |
bbb
xx |
xx |
10 |
7 |
3 |
wdef def w |
wdef def w |
10 |
10 |
0 |
defxdefdef |
defxdefdef |
10 |
10 |
0 |
(null) |
(null) |
0 |
0 |
0 |
This query finds the total occurrences of the letters "a", "c", or "f" in the target strings.
Select target, translate(target, 'xacf', 'x') as intermediate_string, nvl(length( target ),0) as target_length , nvl( length( translate(
target, 'xacf', 'x' ) ), 0 ) as intermediate_length ,
NVL( LENGTH( TARGET ), 0
) - NVL( LENGTH( TRANSLATE( TARGET, 'xacf', 'x' ) ), 0 )
as pattern_count
from targets;
TARGET |
INTERMEDIATE_STRING |
TARGET_LENGTH |
INTERMEDIATE_LENGTH |
PATTERN_COUNT |
aaa |
(null) |
3 |
0 |
3 |
a b c |
b |
5 |
3 |
2 |
b b b b |
b b b b |
10 |
10 |
0 |
bbb
xx |
bbb
xx |
10 |
10 |
0 |
wdef def w |
wde de w |
10 |
8 |
2 |
defxdefdef |
dexdede |
10 |
7 |
3 |
(null) |
(null) |
0 |
0 |
0 |
To
calculate the number of times a string like "def" occurs we use a
query like this.
column
intermediate_string_1 format a12 heading "INTERMEDIATE|STRING 1"
column
intermediate_string_2 format a12 heading "INTERMEDIATE|STRING 2"
select target, replace(
target, 'def', 'd' ) as
intermediate_string_1 , replace (target,'def',
null) as intermediate_string_2, nvl(length(replace(target,'def','d'
)),0) as length_1 , nvl( length(
replace( target, 'def', null ) ), 0 ) as length_2 , NVL( LENGTH( REPLACE(
TARGET, 'def', 'd' ) ), 0 ) - NVL( LENGTH( REPLACE( TARGET, 'def', NULL ) ), 0
) as pattern_count
from targets;
TARGET |
INTERMEDIATE_STRING_1 |
INTERMEDIATE_STRING_2 |
LENGTH_1 |
LENGTH_2 |
PATTERN_COUNT |
aaa |
aaa |
aaa |
3 |
3 |
0 |
a b c |
a b c |
a b c |
5 |
5 |
0 |
b b b b |
b b b b |
b b b b |
10 |
10 |
0 |
bbb
xx |
bbb xx |
bbb
xx |
10 |
10 |
0 |
wdef def w |
wd d w |
w w |
6 |
4 |
2 |
defxdefdef |
dxdd |
x |
4 |
1 |
3 |
(null) |
(null) |
(null) |
0 |
0 |
0 |
column trimmed_length_plus_1 heading "TRIMMED LENGTH|PLUS 1"
select target , trimmed ,
replace( trimmed, ' ', null ) as intermediate_string ,
nvl( length( trimmed ) +
1, 0 ) as
trimmed_length_plus_1 ,
nvl( length( replace(
trimmed, ' ', null ) ), 0 ) as intermediate_length ,
NVL( LENGTH( TRIMMED ) +
1, 0 ) - NVL( LENGTH( REPLACE( TRIMMED,
' ', NULL ) ), 0 ) as pattern_count
from ( select target,
ltrim( rtrim(target) ) as trimmed from targets )
where nvl( ltrim( rtrim(
target ) ), 'x' ) not like '% %' ;
TARGET |
TRIMMED |
INTERMEDIATE_STRING |
TRIMMED_LENGTH_PLUS_1 |
INTERMEDIATE_LENGTH |
PATTERN_COUNT |
aaa |
aaa |
aaa |
4 |
3 |
1 |
a b c |
a b c |
abc |
6 |
3 |
3 |
bbb
xx |
bbb xx |
bbbxx |
7 |
5 |
2 |
wdef def w |
wdef def w |
wdefdefw |
11 |
8 |
3 |
defxdefdef |
defxdefdef |
defxdefdef |
11 |
10 |
1 |
(null) |
(null) |
(null) |
0 |
0 |
0 |
In Oracle
8i or later the LTRM and RTRIM functions can be replaced with a single TRIM
call.
select target , trimmed ,
replace( trimmed, ' ', null ) as intermediate_string ,
nvl( length( trimmed ) +
1, 0 ) as
trimmed_length_plus_1 ,
nvl( length( replace(
trimmed, ' ', null ) ), 0 ) as intermediate_length ,
nvl( length( trimmed ) +
1, 0 ) - nvl( length( replace( trimmed,
' ', null ) ), 0 ) as pattern_count
from ( select target,
TRIM( target ) as trimmed from targets )
Where nvl( TRIM( target
), 'x' ) not like '% %' ;
TARGET |
TRIMMED |
INTERMEDIATE_STRING |
TRIMMED_LENGTH_PLUS_1 |
INTERMEDIATE_LENGTH |
PATTERN_COUNT |
aaa |
aaa |
aaa |
4 |
3 |
1 |
a b c |
a b c |
abc |
6 |
3 |
3 |
bbb
xx |
bbb xx |
bbbxx |
7 |
5 |
2 |
wdef def w |
wdef def w |
wdefdefw |
11 |
8 |
3 |
defxdefdef |
defxdefdef |
defxdefdef |
11 |
10 |
1 |
(null) |
(null) |
(null) |
0 |
0 |
0 |
No comments :
Post a Comment