Thursday, July 23, 2015

Regular Expressions 5

Column pattern format a15 
The following table shows how the first 256 Unicode characters map to POSIX classes. (Note: If you run the following snippet in SQL*Plus some of the values under the SYMBOL column may appear different in your system from what is shown. You should not see any differences if you use iSQL*Plus or SQL Developer.)
set null ''
select  symbol, dec, hex,  cntrl, punct, graph, print , upper, lower, alpha, alnum, digit, xdigit, blank, space
from  character_class_map
order by  dec;



set null '(null)' 
Equivalence Classes
To match any character in the set of all its accented variations, use the equivalence class syntax '[=character=]'.  
This example finds all accented variations of the character 'a'.

execute set_pattern( '[[=a=]]' )
execute set_target( 'a' )
execute add_target( 'à' )
execute add_target( 'á' )
execute add_target( 'ã' )
execute add_target( 'â' )
execute add_target( 'ä' )
execute add_target( 'å' )
execute add_target( 'æ' )
execute add_target( 'b' )
execute add_target( 'c' )
execute add_target( 'ç' )
select * from test_results ;
PATTERN
TARGET
MATCH
MATCHED_VALUE
POSITION
[[=a=]]
a
Y
a
1
[[=a=]]
a
Y
a
1
[[=a=]]
b
N

0
[[=a=]]
c
N

0
[[=a=]]

N

0
[[=a=]]
ƒ
N

0
[[=a=]]

N

0
[[=a=]]

N

0
[[=a=]]

N

0
[[=a=]]

N

0
[[=a=]]

N

0
Specifying an equivalence class for 'à' yields the same results as using 'a'.
execute set_pattern( '[[=à=]]' )
select * from test_results ;
 PATTERN
TARGET
MATCH
MATCHED_VALUE
POSITION
[[=…=]]
a
N

0
[[=…=]]
a
N

0
[[=…=]]
b
N

0
[[=…=]]
c
N

0
[[=…=]]

N

0
[[=…=]]
ƒ
N

0
[[=…=]]

N

0
[[=…=]]

N

0
[[=…=]]

N

0
[[=…=]]

N

0
[[=…=]]

Y

1
You can combine equivalence classes like this.
execute set_pattern( '[[=a=][=c=]]' )
select * from test_results ; 
PATTERN
TARGET
MATCH
MATCHED_VALUE
POSITION
[[=a=][=c=]]
a
Y
a
1
[[=a=][=c=]]
a
Y
a
1
[[=a=][=c=]]
b
N

0
[[=a=][=c=]]
c
Y
c
1
[[=a=][=c=]]

N

0
[[=a=][=c=]]
ƒ
N

0
[[=a=][=c=]]

N

0
[[=a=][=c=]]

N

0
[[=a=][=c=]]

N

0
[[=a=][=c=]]

N

0
[[=a=][=c=]]

N

0

 Specifying WHAT to Match - Part 3

In this tutorial we continue examining ways to specify WHAT text to match. Patterns for strings of two or more characters is explored.
Matching Strings with Two or More Characters
To matching a string with two or more characters simply use the string itself as the regular expression.
execute set_pattern( 'bc' );

execute set_target( 'aaa'  );
execute add_target( 'ab'   );
execute add_target( 'abc'  );
execute add_target( 'ac'   );
execute add_target( 'bcd'  );
execute add_target( 'cba'  );

select * from test_results ;

PATTERN
TARGET
MATCH
MATCHED_VALUE
POSITION
bc
aaa
N

0
bc
ab
N

0
bc
abc
Y
bc
2
bc
ac
N

0
bc
bcd
Y
bc
1
bc
cba
N

0
OR Conditions and Grouping
Use the "|" character to specify an OR condition between strings. For example, the pattern "ab|ba" matches either an "ab" OR a "ba".
execute set_pattern( 'ab|ba' );
select * from test_results ;

PATTERN
TARGET
MATCH
MATCHED_VALUE
POSITION
ab|ba
aaa
N

0
ab|ba
ab
Y
ab
1
ab|ba
abc
Y
ab
1
ab|ba
ac
N

0
ab|ba
bcd
N

0
ab|ba
cba
Y
ba
2
To avoid ambiguity, it helps to use "(" and ")" to group the terms of OR expressions. For example, does the pattern "a|bc" mean

("a" OR "b") followed by "c" or does it mean "a" OR ("bc")?

Using parentheses removes any ambiguity, as we see in the next three examples.

execute set_pattern( 'a|bc' );
select * from test_results ;

PATTERN
TARGET
MATCH
MATCHED_VALUE
POSITION
a|bc
aaa
Y
a
1
a|bc
ab
Y
a
1
a|bc
abc
Y
a
1
a|bc
ac
Y
a
1
a|bc
bcd
Y
bc
1
a|bc
cba
Y
a
3

execute set_pattern( 'a|(bc)' );
select * from test_results ;

PATTERN
TARGET
MATCH
MATCHED_VALUE
POSITION
a|(bc)
aaa
Y
a
1
a|(bc)
ab
Y
a
1
a|(bc)
abc
Y
a
1
a|(bc)
ac
Y
a
1
a|(bc)
bcd
Y
bc
1
a|(bc)
cba
Y
a
3

execute set_pattern( '(a|b)c' );
select * from test_results ;

PATTERN
TARGET
MATCH
MATCHED_VALUE
POSITION
(a|b)c
aaa
N

0
(a|b)c
ab
N

0
(a|b)c
abc
Y
bc
2
(a|b)c
ac
Y
ac
1
(a|b)c
bcd
Y
bc
1
(a|b)c
cba
N

0
Once you have grouped a subexpression you can reference the string it matched by using a back reference in other sections of the pattern. The back reference "\1" refers to the result of the first bracketed subexpression in the pattern; "\2" refers to the second, and so on up to "\9".
In this example, the back reference "\1" matches the string "abc".
execute set_pattern( '(abc),\1' );

execute set_target( 'abc,'    );
execute add_target( 'abc,abc' );
execute add_target( 'abc,def' );
execute add_target( 'def,def' );

select * from test_results ;

PATTERN
TARGET
MATCH
MATCHED_VALUE
POSITION
(abc),\1
abc,
N

0
(abc),\1
abc,abc
Y
abc,abc
1
(abc),\1
abc,def
N

0
(abc),\1
def,def
N

0
Here, the backreference "\1" matches the string "abc" in one case and "def" in another.
execute set_pattern( '(...),\1' );
select * from test_results ;

PATTERN
TARGET
MATCH
MATCHED_VALUE
POSITION
(...),\1
abc,
N

0
(...),\1
abc,abc
Y
abc,abc
1
(...),\1
abc,def
N

0
(...),\1
def,def
Y
def,def
1
Ignoring White Space
To ignore white space in a regular expression, use the pattern matching modifier "x" in REGEXP conditions and functions.
execute set_target( 'abc' )
select   'abc' as pattern ,  null as modifier ,  target ,  regexp_substr( target, 'abc' ) as atched_value
from  targets
union
select   'abc' as pattern ,  'x' as modifier ,  target ,  regexp_substr( target, 'abc', 1, 1, 'x' ) as matched_value
from  targets
union
select  'a b c' as pattern,  null as modifier ,  target ,  regexp_substr( target, 'a b c' ) as atched_value
from  targets
union
select  'a b c' as pattern,  'x' as modifier ,  target ,  regexp_substr( target, 'a b c', 1, 1, 'x' ) as matched_value
from  targets
order by  1 desc ,  2 desc ,  3;

 PATTERN
MODIFIER
TARGET
MATCHED_VALUE
abc

abc
abc
abc
x
abc
abc
a b c

abc

a b c
x
abc
abc
Note that the "x" modifier ignores white space only in the pattern, *not* the search target.
execute set_target( 'a b c' )
select  'abc' as pattern ,  null as modifier ,  target ,  regexp_substr( target, 'abc' ) as matched_value
from  targets
union
select  'abc' as pattern ,  'x' as modifier ,  target ,  regexp_substr( target, 'abc', 1, 1, 'x' ) as matched_value
from  targets
union
select  'a b c' as pattern,  null as modifier ,  target ,  regexp_substr( target, 'a b c' ) as matched_value
from  targets
union
select  'a b c' as pattern,  'x' as modifier ,  target ,  regexp_substr( target, 'a b c', 1, 1, 'x' ) as matched_value
from  targets
order by  1 desc ,  2 desc ,  3; 

PATTERN
MODIFIER
TARGET
MATCHED_VALUE
abc

a b c

abc
x
a b c

a b c

a b c
a b c
a b c
x
a b c

Excluding Strings
Techniques for excluding a string from a pattern match vary depending on the requirements.
To find all targets that do not contain a string like "ab", use the "not" condition.
execute set_target( 'abb'  );
execute add_target( 'abc'  );
execute add_target( 'acc'  );

select target from   targets where  not regexp_like( target, 'ab' );

TARGET
----------
acc 
To match a pattern like 'a..' where the second and third characters after an "a" are anything other than "bc" use two WHERE conditions.
select target from   targets where  regexp_like( target, 'a..' ) and    not regexp_like( target, 'abc' );

TARGET
----------
abb
acc 












































































































































































































































































































































No comments :

Post a Comment