Knowledge Bank https://www.youtube.com/c/BhavamCreations: Regular Expressions 1

Oracle introduced regular expression functions in 10g version.
REGEXP_INSTR, REGEXP_REPLACE and REGEXP_SUBSTR. These functions are similar in functionality that of standard string functions INSTR, REPLACE and SUBSTR respectively but with more powerful capabilities of pattern matching. Using regular expression functions, we can search, validate, remove unwanted text and/or replace characters from the string. It can be used for any character data including CLOB and NCLOB as well.
Introduced in Oracle 11g the REGEXP_COUNT function greatly simplifies counting the number of times a pattern appears inside a string. The following examples demonstrate how to use REGEXP_COUNT with some simple patterns. For versions prior to 11g alternative logic that provides similar functionality is also presented.
REGEXP_COUNT functions returns the number of occurrences of the pattern in the source string. It takes four arguments.

• Source string: It is the source string from which we are going to find the occurrence of the pattern. This can be CLOB or NCLOB characters as well.

• Pattern: Is any string, pattern or text which will be found in the source string. Data type of pattern will be converted to data type of source string, if it is different than the source string.

• Position: It is a non-zero positive value from which search should begin.

• Matching parameter: which controls the behavior of the search? It takes one or more value from.

o I (ignore case)

o c – Case Sensitive

o m – Allows source string as multiple lines

o n – Allows period.

o x – Ignore white space characters

Now let us start with the some simple example. In following example, we are counting occurrence of ‘i’ in the source string. We are ignoring case in matching parameter.

Setup Regular Expression Data

Before proceeding however, the following code will set up some test data for us (see Setup if you want to create the SET_TARGET and ADD_TARGET procedures on your own test system).
Run the code on this page in SQL*Plus to create the sample tables, data, etc. used by the examples in this section.
Be sure to read Using SQL Snippets ™ before executing any of these setup steps.
=====================================================================

Setup for pattern matching examples

=====================================================================
create table targets ( target varchar2(15) );

create procedure set_target( p_target varchar2 ) as

begin

delete from targets ;

insert into targets values ( p_target ) ;

end;

create procedure add_target( p_target varchar2 ) as

begin

insert into targets values ( p_target );

end;

create table patterns ( pattern varchar2(25), sort_key number );

create procedure set_pattern( p_pattern varchar2 ) as

begin

delete from patterns;

insert into patterns values ( p_pattern, 1 );

end;

create procedure add_pattern( p_pattern varchar2 ) as

begin

insert into patterns

values

( p_pattern ,

( select count(*) + 1 from patterns )

);

end;

create view test_results as

select

pattern , target, case when regexp_like( target, pattern ) then 'Y' else 'N' end as match,

regexp_substr( target, pattern ) as matched_value ,

regexp_instr( target, pattern ) as position

from patterns, targets

order by patterns.sort_key, target;

column pattern format a15

column target format a10

column match format a5

column matched_value format a15

column modifier format a8

column replaced_target format a15

column position format 999

column substring format a15

break on pattern skip 1

set null '(null)'

set linesize 100

=====================================================================

Setup for the section on POSIX character classes

=====================================================================

create table character_class_map

( data varchar2(2) ,

symbol varchar2(3) ,

dec number ,

hex varchar2(4) ,

upper varchar2(1) ,

lower varchar2(1) ,

alpha varchar2(1) ,

digit varchar2(1) ,

xdigit varchar2(1) ,

alnum varchar2(1) ,

blank varchar2(1) ,

space varchar2(1) ,

cntrl varchar2(1) ,

punct varchar2(1) ,

graph varchar2(1) ,

print varchar2(1)

);

declare

v_data varchar2(2) ;

v_hex_value varchar2(4) ;

v_symbol varchar2(3) ;

begin

for i in 0 .. 255

loop

v_hex_value := to_char( i, 'FM000X' ) ;

v_data := unistr( '\' || v_hex_value ) ;

v_symbol :=

case i

when 0 then 'NUL' when 1 then 'SOH' when 2 then 'STX'

when 3 then 'ETX' when 4 then 'EOT' when 5 then 'ENQ'

when 6 then 'ACK' when 7 then 'BEL' when 8 then 'BS'

when 9 then 'TAB' when 10 then 'LF' when 11 then 'VT'

when 12 then 'FF' when 13 then 'CR' when 14 then 'SO'

when 15 then 'SI' when 16 then 'DLE' when 17 then 'DC1'

when 18 then 'DC2' when 19 then 'DC3' when 20 then 'DC4'

when 21 then 'NAK' when 22 then 'SYN' when 23 then 'ETB'

when 24 then 'CAN' when 25 then 'EM' when 26 then 'SUB'

when 27 then 'ESC' when 28 then 'FS' when 29 then 'GS'

when 30 then 'RS' when 31 then 'US' when 127 then 'DEL'

when 128 then 'XXX' when 129 then 'XXX' when 130 then 'BPH'

when 131 then 'NBH' when 132 then 'IND' when 133 then 'NEL'

when 134 then 'SSA' when 135 then 'ESA' when 136 then 'HTS'

when 137 then 'HTJ' when 138 then 'VTS' when 139 then 'PLD'

when 140 then 'PLU' when 141 then 'RI' when 142 then 'SS2'

when 143 then 'SS3' when 144 then 'DCS' when 145 then 'PU1'

when 146 then 'PU2' when 147 then 'STS' when 148 then 'CCH'

when 149 then 'MW' when 150 then 'SPA' when 151 then 'EPA'

when 152 then 'SOS' when 153 then 'XXX' when 154 then 'SCI'

when 155 then 'CSI' when 156 then 'ST' when 157 then 'OSC'

when 158 then 'PM' when 159 then 'APC' when 173 then 'SHY'

else v_data

end

;

insert into character_class_map (data , symbol , dec , hex )

values ( v_data , v_symbol , i , v_hex_value );

end loop;

end;

update character_class_map set upper = 'Y' where regexp_like( data, '[[:upper:]]' );

update character_class_map set lower = 'Y' where regexp_like( data, '[[:lower:]]' );

update character_class_map set alpha = 'Y' where regexp_like( data, '[[:alpha:]]' );

update character_class_map set digit = 'Y' where regexp_like( data, '[[:digit:]]' );

update character_class_map set xdigit= 'Y' where regexp_like( data, '[[:xdigit:]]');

update character_class_map set alnum = 'Y' where regexp_like( data, '[[:alnum:]]' );

update character_class_map set blank = 'Y' where regexp_like( data, '[[:blank:]]' );

update character_class_map set space = 'Y' where regexp_like( data, '[[:space:]]' );

update character_class_map set cntrl = 'Y' where regexp_like( data, '[[:cntrl:]]' );

update character_class_map set punct = 'Y' where regexp_like( data, '[[:punct:]]' );

update character_class_map set graph = 'Y' where regexp_like( data, '[[:graph:]]' );

update character_class_map set print = 'Y' where regexp_like( data, '[[:print:]]' );

commit;

column symbol format a6

column dec format 999

column hex format a4

column upper format a5

column lower format a5

column alpha format a5

column digit format a5

column xdigit format a6

column alnum format a5

column blank format a5

column space format a5

column cntrl format a5

column punct format a5

column graph format a5

column print format a5

execute set_target( 'aaa' )

execute add_target( 'a b c' )

execute add_target( 'b b b b' )

execute add_target( ' bbb xx ' )

execute add_target( 'wdef def w' )

execute add_target( 'defxdefdef' )

execute add_target( null )

set null "(null)"

column target heading "TARGET|STRING"

column pattern_count heading "PATTERN|COUNT"

REGEXP_COUNT

SQL> SELECT REGEXP_COUNT(‘Decipher Information System’,’i’,1,’i’) FROM DUAL;

REGEXP_COUNT(‘DECIPHERINFORMATIONSYSTEM’,’I’,1,’I’)
—————————————————
3
There are 3 incidents of the ‘i’ in the string. If we change the match parameter to ‘c’, occurrence will be only 2.

SQL> SELECT REGEXP_COUNT('Decipher Information System','i',1,'c') FROM DUAL;

REGEXP_COUNT(‘DECIPHERINFORMATIONSYSTEM’,’I’,1,’C’)
—————————————————
2
Following is the example, where we are searching for pattern starting from the position 3.

SQL> SELECT REGEXP_COUNT('Cat in the hat is on the bat.','at',4) FROM DUAL;

REGEXP_COUNT(‘CATINTHEHATISONTHEBAT.’,’AT’,4)
———————————————
2
This function really helps to identify number of occurrences of the specific pattern in the source string. It reduces the long PL/SQL code to identify number of occurrences into one single function call.

We can start counting from a specified position as shown under Select REGEXP_COUNT

('We
are now learning the

REGEXP_COUNT

function
of Oracle 11g.more at www.beyondrelational.com', 'r',10) As

Result From Dual;

We have started counting from position 10 till the end of the string. That means the operation will happen inside 'learning the REGEXP_COUNT function of Oracle 11g' test and hence 3 matches found. So the count is 3.
We can even make it case sensitive or insensitive by specifying the Match Option as under

Select REGEXP_COUNT ('We are now learning the regexp_count function of Oracle 11g.more at www.beyondrelational.com', 'R',10) As CaseSensitiveResult

,REGEXP_COUNT ('We are now learning the regexp_count function of Oracle 11g.more at www.beyondrelational.com', 'R',10,'i') As CaseInSensitiveResult

From Dual;

We can even find occurrences of multiple characters. We can identify digits using "\d" or "[0-9]" and the "{4}" operator signifies 4 of them, so using "\d{4}" or "[0-9]{4}" with the REGEXP_COUNT function seems to be a valid option.

WITH T1 AS

(SELECT '1234' DATA FROM DUAL

UNION ALL

SELECT '1234 1234' FROM DUAL

UNION ALL

SELECT '1234 1234 1234' FROM DUAL

)SELECT REGEXP_COUNT(data, '[0-9]{4}') AS pattern_count_1,

REGEXP_COUNT(data, '\d{4}') AS pattern_count_2 FROM T1

Count String Pattern Occurrences (Oracle 11g)

To count the occurrences of the letter "b" in the target strings we can use a query like this.

Select target , REGEXP_COUNT( TARGET, 'b' ) as pattern_count from targets ;

TARGET	PATTERN_COUNT
bbb xx	3
a b c	1
aaa	0
b b b b	4
defxdefdef	0
wdef def w	0
(null)	(null)

This query finds the total occurrences of the letters "a", "c", or "f" in the target strings.

Select target , REGEXP_COUNT( TARGET, '[acf]' ) as pattern_count from targets ;

TARGET	PATTERN_COUNT
aaa	3
a b c	2
b b b b	0
bbb xx	0
wdef def w	2
defxdefdef	3
(null)	(null)

To calculate the number of times a string like "def" occurs we use a query like this.

Select target , REGEXP_COUNT( TARGET, 'def' ) as pattern_count from targets;

TARGET	PATTERN_COUNT
aaa	0
a b c	0
b b b b	0
bbb xx	0
wdef def w	2
defxdefdef	3
(null)	(null)

To find the number of words in a string we can use a query like this (for our purposes we define a word as any series of characters that are not a space).

Select target , REGEXP_COUNT( TARGET, '[^ ]+' ) as pattern_count from targets;

TARGET	PATTERN_COUNT
aaa	1
a b c	3
b b b b	4
bbb xx	2
wdef def w	3
defxdefdef	1
(null)	(null)

To ensure PATTERN_COUNT always returns a non-null value use NVL.

Select target , NVL( regexp_count( target, '[^ ]+' ), 0 ) as pattern_count from targets;

TARGET	PATTERN_COUNT
aaa	1
a b c	3
b b b b	4
bbb xx	2
wdef def w	3
defxdefdef	1
(null)	0

While Oracle 10g supports most of the regular expression features described in this section it unfortunately does not support REGEXP_COUNT. However, we can use the LENGTH and REGEXP_REPLACE functions to produce similar results.
To count the occurrences of the letter "b" in the target strings we can use a query like this (the column INTERMEDIATE_STRING is included to demonstrate the inner workings of each solution).
column intermediate_string format a12 heading "INTERMEDIATE|STRING"

select target , regexp_replace( target, '[^b]', null ) as intermediate_string ,

NVL( LENGTH( REGEXP_REPLACE( TARGET, '[^b]', NULL ) ), 0 ) as pattern_count

From targets;

TARGET	INTERMEDIATE_STRING	PATTERN_COUNT
aaa	(null)	0
a b c	b	1
b b b b	bbbb	4
bbb xx	bbb	3
wdef def w	(null)	0
defxdefdef	(null)	0
(null)	(null)	0

This query finds the total occurrences of the letters "a", "c", or "f" in the target strings.

Select target , regexp_replace( target, '[^acf]', null ) as intermediate_string ,

NVL( LENGTH( REGEXP_REPLACE( TARGET, '[^acf]', NULL ) ), 0 ) as pattern_count

from targets;

TARGET	INTERMEDIATE_STRING	PATTERN_COUNT
aaa	aaa	3
a b c	ac	2
b b b b	(null)	0
bbb xx	(null)	0
wdef def w	ff	2
defxdefdef	fff	3
(null)	(null)	0

To calculate the number of times a string like "def" occurs we use a query like this.

select target , regexp_replace( target, '(d)ef|.', '\1' ) as intermediate_string ,

NVL( LENGTH( REGEXP_REPLACE( TARGET, '(d)ef|.', '\1' ) ), 0 ) as pattern_count

From targets;

TARGET	INTERMEDIATE_STRING	PATTERN_COUNT
aaa	(null)	0
a b c	(null)	0
b b b b	(null)	0
bbb xx	(null)	0
wdef def w	dd	2
defxdefdef	ddd	3
(null)	(null)	0

(I found this last technique in this OTN Discussion Forum thread by cd after reading this blog post by Volder.)
To find the number of words in a string we can use a query like this.

Select target , regexp_replace( target, ' *[^ ]+ *', 'w' ) as intermediate_string ,

NVL( LENGTH( REGEXP_REPLACE( TARGET, ' *[^ ]+ *', 'w' ) ), 0 ) as pattern_count

From targets;

TARGET	INTERMEDIATE_STRING	PATTERN_COUNT
aaa	w	1
a b c	www	3
b b b b	wwww	4
bbb xx	ww	2
wdef def w	www	3
defxdefdef	w	1
(null)	(null)	0

Count String Pattern Occurrences (Any Oracle Version)
In any Oracle version we can use the LENGTH and REPLACE functions to count patterns, though the logic is slightly more complex than the techniques above.
To count the occurrences of the letter "b" in the target strings we can use a query like this.
column target_length heading "TARGET|LENGTH"
column intermediate_length heading "INTERMEDIATE|LENGTH"

select target , replace( target, 'b', null ) as intermediate_string ,

nvl( length( target ), 0 ) as target_length ,

nvl( length( replace( target, 'b', null ) ), 0 ) as intermediate_length ,

NVL( LENGTH( TARGET ), 0 ) - NVL( LENGTH( REPLACE( TARGET, 'b', NULL ) ), 0 )

as pattern_count

from targets;

TARGET	INTERMEDIATE_STRING	TARGET_LENGTH	INTERMEDIATE_LENGTH	PATTERN_COUNT
aaa	aaa	3	3	0
a b c	a c	5	4	1
b b b b		10	6	4
bbb xx	xx	10	7	3
wdef def w	wdef def w	10	10	0
defxdefdef	defxdefdef	10	10	0
(null)	(null)	0	0	0

(The INTERMEDIATE_STRING value in the third row is six spaces. In the fourth row it is three spaces followed by "xx" followed by two spaces.)
This query finds the total occurrences of the letters "a", "c", or "f" in the target strings.

Select target, translate(target, 'xacf', 'x') as intermediate_string, nvl(length( target ),0) as target_length , nvl( length( translate( target, 'xacf', 'x' ) ), 0 ) as intermediate_length ,

NVL( LENGTH( TARGET ), 0 ) - NVL( LENGTH( TRANSLATE( TARGET, 'xacf', 'x' ) ), 0 )

as pattern_count

from targets;

TARGET	INTERMEDIATE_STRING	TARGET_LENGTH	INTERMEDIATE_LENGTH	PATTERN_COUNT
aaa	(null)	3	0	3
a b c	b	5	3	2
b b b b	b b b b	10	10	0
bbb xx	bbb xx	10	10	0
wdef def w	wde de w	10	8	2
defxdefdef	dexdede	10	7	3
(null)	(null)	0	0	0

To calculate the number of times a string like "def" occurs we use a query like this.

column intermediate_string_1 format a12 heading "INTERMEDIATE|STRING 1"

column intermediate_string_2 format a12 heading "INTERMEDIATE|STRING 2"

select target, replace( target, 'def', 'd' ) as intermediate_string_1 , replace (target,'def', null) as intermediate_string_2, nvl(length(replace(target,'def','d' )),0) as length_1 , nvl( length( replace( target, 'def', null ) ), 0 ) as length_2 , NVL( LENGTH( REPLACE( TARGET, 'def', 'd' ) ), 0 ) - NVL( LENGTH( REPLACE( TARGET, 'def', NULL ) ), 0 ) as pattern_count

from targets;

TARGET	INTERMEDIATE_STRING_1	INTERMEDIATE_STRING_2	LENGTH_1	LENGTH_2	PATTERN_COUNT
aaa	aaa	aaa	3	3	0
a b c	a b c	a b c	5	5	0
b b b b	b b b b	b b b b	10	10	0
bbb xx	bbb xx	bbb xx	10	10	0
wdef def w	wd d w	w w	6	4	2
defxdefdef	dxdd	x	4	1	3
(null)	(null)	(null)	0	0	0

To find the number of words in a string we can use a query like this. Note the query will not work for target strings containing more than one space between words.
column trimmed_length_plus_1 heading "TRIMMED LENGTH|PLUS 1"

select target , trimmed , replace( trimmed, ' ', null ) as intermediate_string ,

nvl( length( trimmed ) + 1, 0 ) as trimmed_length_plus_1 ,

nvl( length( replace( trimmed, ' ', null ) ), 0 ) as intermediate_length ,

NVL( LENGTH( TRIMMED ) + 1, 0 ) - NVL( LENGTH( REPLACE( TRIMMED, ' ', NULL ) ), 0 ) as pattern_count

from ( select target, ltrim( rtrim(target) ) as trimmed from targets )

where nvl( ltrim( rtrim( target ) ), 'x' ) not like '% %' ;

TARGET	TRIMMED	INTERMEDIATE_STRING	TRIMMED_LENGTH_PLUS_1	INTERMEDIATE_LENGTH	PATTERN_COUNT
aaa	aaa	aaa	4	3	1
a b c	a b c	abc	6	3	3
bbb xx	bbb xx	bbbxx	7	5	2
wdef def w	wdef def w	wdefdefw	11	8	3
defxdefdef	defxdefdef	defxdefdef	11	10	1
(null)	(null)	(null)	0	0	0

In Oracle 8i or later the LTRM and RTRIM functions can be replaced with a single TRIM call.

select target , trimmed , replace( trimmed, ' ', null ) as intermediate_string ,

nvl( length( trimmed ) + 1, 0 ) as trimmed_length_plus_1 ,

nvl( length( replace( trimmed, ' ', null ) ), 0 ) as intermediate_length ,

nvl( length( trimmed ) + 1, 0 ) - nvl( length( replace( trimmed, ' ', null ) ), 0 ) as pattern_count

from ( select target, TRIM( target ) as trimmed from targets )

Where nvl( TRIM( target ), 'x' ) not like '% %' ;

TARGET	TRIMMED	INTERMEDIATE_STRING	TRIMMED_LENGTH_PLUS_1	INTERMEDIATE_LENGTH	PATTERN_COUNT
aaa	aaa	aaa	4	3	1
a b c	a b c	abc	6	3	3
bbb xx	bbb xx	bbbxx	7	5	2
wdef def w	wdef def w	wdefdefw	11	8	3
defxdefdef	defxdefdef	defxdefdef	11	10	1
(null)	(null)	(null)	0	0	0

Discover Millions of Products your daily Need

Amazon::::Discover Millions of Products your daily Need

Amway Products in Hyderabad. All India Distributors (Certified Dietitian / Nutrition Advisor )

Amway Products in Kakinada. All India Distributors ( Certified Dietitian / Nutrition Advisor)

Friday, November 2, 2012

Regular Expressions 1

Setup Regular Expression Data

REGEXP_COUNT

No comments :

Post a Comment