Column pattern format a15
The following table shows how the first 256 Unicode characters map to POSIX classes. (Note: If you run the following snippet in SQL*Plus some of the values under the SYMBOL column may appear different in your system from what is shown. You should not see any differences if you use iSQL*Plus or SQL Developer.)
set null '(null)'
Equivalence Classes
Specifying an equivalence class for 'à' yields the
same results as using 'a'.
execute set_pattern( '[[=à=]]' )
select * from test_results ;
You can combine equivalence classes like this.
execute set_pattern( '[[=a=][=c=]]' )
select * from test_results ;
Matching Strings with Two or More Characters
To matching a string with two or more characters simply use the string itself as the regular expression.
OR Conditions
and Grouping
Use the "|" character to specify an OR condition between strings. For example, the pattern "ab|ba" matches either an "ab" OR a "ba".
To avoid ambiguity, it helps to use "("
and ")" to group the terms of OR expressions. For example, does the
pattern "a|bc" mean
("a" OR "b") followed by "c" or does it mean "a" OR ("bc")?
Using parentheses removes any ambiguity, as we see in the next three examples.
Once you have grouped a subexpression you can
reference the string it matched by using a back reference in other sections of
the pattern. The back reference "\1" refers to the result of the
first bracketed subexpression in the pattern; "\2" refers to the
second, and so on up to "\9".
In this example, the back reference "\1" matches the string "abc".
Here, the backreference "\1" matches the
string "abc" in one case and "def" in another.
Ignoring
White Space
To ignore white space in a regular expression, use the pattern matching modifier "x" in REGEXP conditions and functions.
execute set_target( 'abc' )
Note that the "x" modifier ignores white
space only in the pattern, *not* the search target.
Excluding
Strings
The following table shows how the first 256 Unicode characters map to POSIX classes. (Note: If you run the following snippet in SQL*Plus some of the values under the SYMBOL column may appear different in your system from what is shown. You should not see any differences if you use iSQL*Plus or SQL Developer.)
set null
''
select symbol, dec, hex, cntrl, punct, graph, print , upper, lower, alpha,
alnum, digit, xdigit, blank, space
from character_class_map
order
by dec;
set null '(null)'
Equivalence Classes
To match
any character in the set of all its accented variations, use the equivalence
class syntax '[=character=]'.
This
example finds all accented variations of the character 'a'.
execute
set_pattern( '[[=a=]]' )
execute
set_target( 'a' )
execute
add_target( 'à' )
execute
add_target( 'á' )
execute
add_target( 'ã' )
execute
add_target( 'â' )
execute
add_target( 'ä' )
execute
add_target( 'å' )
execute
add_target( 'æ' )
execute
add_target( 'b' )
execute
add_target( 'c' )
execute
add_target( 'ç' )
select * from test_results ;
PATTERN |
TARGET |
MATCH |
MATCHED_VALUE |
POSITION |
[[=a=]] |
a |
Y |
a |
1 |
[[=a=]] |
a |
Y |
a |
1 |
[[=a=]] |
b |
N |
|
0 |
[[=a=]] |
c |
N |
|
0 |
[[=a=]] |
|
N |
|
0 |
[[=a=]] |
ƒ |
N |
|
0 |
[[=a=]] |
‘ |
N |
|
0 |
[[=a=]] |
„ |
N |
|
0 |
[[=a=]] |
† |
N |
|
0 |
[[=a=]] |
‡ |
N |
|
0 |
[[=a=]] |
… |
N |
|
0 |
execute set_pattern( '[[=à=]]' )
select * from test_results ;
PATTERN |
TARGET |
MATCH |
MATCHED_VALUE |
POSITION |
[[=…=]] |
a |
N |
|
0 |
[[=…=]] |
a |
N |
|
0 |
[[=…=]] |
b |
N |
|
0 |
[[=…=]] |
c |
N |
|
0 |
[[=…=]] |
|
N |
|
0 |
[[=…=]] |
ƒ |
N |
|
0 |
[[=…=]] |
‘ |
N |
|
0 |
[[=…=]] |
„ |
N |
|
0 |
[[=…=]] |
† |
N |
|
0 |
[[=…=]] |
‡ |
N |
|
0 |
[[=…=]] |
… |
Y |
… |
1 |
execute set_pattern( '[[=a=][=c=]]' )
select * from test_results ;
PATTERN |
TARGET |
MATCH |
MATCHED_VALUE |
POSITION |
[[=a=][=c=]] |
a |
Y |
a |
1 |
[[=a=][=c=]] |
a |
Y |
a |
1 |
[[=a=][=c=]] |
b |
N |
|
0 |
[[=a=][=c=]] |
c |
Y |
c |
1 |
[[=a=][=c=]] |
|
N |
|
0 |
[[=a=][=c=]] |
ƒ |
N |
|
0 |
[[=a=][=c=]] |
‘ |
N |
|
0 |
[[=a=][=c=]] |
„ |
N |
|
0 |
[[=a=][=c=]] |
† |
N |
|
0 |
[[=a=][=c=]] |
‡ |
N |
|
0 |
[[=a=][=c=]] |
… |
N |
|
0 |
Specifying WHAT to Match - Part 3
In this tutorial we continue examining ways to specify WHAT text to match. Patterns for strings of two or more characters is explored.Matching Strings with Two or More Characters
To matching a string with two or more characters simply use the string itself as the regular expression.
execute
set_pattern( 'bc' );
execute
set_target( 'aaa' );
execute
add_target( 'ab' );
execute
add_target( 'abc' );
execute
add_target( 'ac' );
execute
add_target( 'bcd' );
execute
add_target( 'cba' );
select *
from test_results ;
PATTERN |
TARGET |
MATCH |
MATCHED_VALUE |
POSITION |
bc |
aaa |
N |
|
0 |
bc |
ab |
N |
|
0 |
bc |
abc |
Y |
bc |
2 |
bc |
ac |
N |
|
0 |
bc |
bcd |
Y |
bc |
1 |
bc |
cba |
N |
|
0 |
Use the "|" character to specify an OR condition between strings. For example, the pattern "ab|ba" matches either an "ab" OR a "ba".
execute
set_pattern( 'ab|ba' );
select *
from test_results ;
PATTERN |
TARGET |
MATCH |
MATCHED_VALUE |
POSITION |
ab|ba |
aaa |
N |
|
0 |
ab|ba |
ab |
Y |
ab |
1 |
ab|ba |
abc |
Y |
ab |
1 |
ab|ba |
ac |
N |
|
0 |
ab|ba |
bcd |
N |
|
0 |
ab|ba |
cba |
Y |
ba |
2 |
("a" OR "b") followed by "c" or does it mean "a" OR ("bc")?
Using parentheses removes any ambiguity, as we see in the next three examples.
execute
set_pattern( 'a|bc' );
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
a|bc
|
aaa
|
Y
|
a
|
1
|
a|bc
|
ab
|
Y
|
a
|
1
|
a|bc
|
abc
|
Y
|
a
|
1
|
a|bc
|
ac
|
Y
|
a
|
1
|
a|bc
|
bcd
|
Y
|
bc
|
1
|
a|bc
|
cba
|
Y
|
a
|
3
|
execute
set_pattern( 'a|(bc)' );
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
a|(bc)
|
aaa
|
Y
|
a
|
1
|
a|(bc)
|
ab
|
Y
|
a
|
1
|
a|(bc)
|
abc
|
Y
|
a
|
1
|
a|(bc)
|
ac
|
Y
|
a
|
1
|
a|(bc)
|
bcd
|
Y
|
bc
|
1
|
a|(bc)
|
cba
|
Y
|
a
|
3
|
execute
set_pattern( '(a|b)c' );
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
(a|b)c
|
aaa
|
N
|
|
0
|
(a|b)c
|
ab
|
N
|
|
0
|
(a|b)c
|
abc
|
Y
|
bc
|
2
|
(a|b)c
|
ac
|
Y
|
ac
|
1
|
(a|b)c
|
bcd
|
Y
|
bc
|
1
|
(a|b)c
|
cba
|
N
|
|
0
|
In this example, the back reference "\1" matches the string "abc".
execute
set_pattern( '(abc),\1' );
execute
set_target( 'abc,' );
execute
add_target( 'abc,abc' );
execute
add_target( 'abc,def' );
execute
add_target( 'def,def' );
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
(abc),\1
|
abc,
|
N
|
|
0
|
(abc),\1
|
abc,abc
|
Y
|
abc,abc
|
1
|
(abc),\1
|
abc,def
|
N
|
|
0
|
(abc),\1
|
def,def
|
N
|
|
0
|
execute
set_pattern( '(...),\1' );
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
(...),\1
|
abc,
|
N
|
|
0
|
(...),\1
|
abc,abc
|
Y
|
abc,abc
|
1
|
(...),\1
|
abc,def
|
N
|
|
0
|
(...),\1
|
def,def
|
Y
|
def,def
|
1
|
To ignore white space in a regular expression, use the pattern matching modifier "x" in REGEXP conditions and functions.
execute set_target( 'abc' )
select 'abc' as pattern , null as modifier , target ,
regexp_substr( target, 'abc' ) as atched_value
from targets
union
select 'abc' as pattern , 'x' as modifier , target ,
regexp_substr( target, 'abc', 1, 1, 'x' ) as matched_value
from targets
union
select 'a b c' as pattern, null as modifier , target ,
regexp_substr( target, 'a b c' ) as atched_value
from targets
union
select 'a b c' as pattern, 'x' as modifier , target ,
regexp_substr( target, 'a b c', 1, 1, 'x' ) as matched_value
from targets
order
by 1 desc , 2 desc ,
3;
PATTERN
|
MODIFIER
|
TARGET
|
MATCHED_VALUE
|
abc
|
|
abc
|
abc
|
abc
|
x
|
abc
|
abc
|
a b c
|
|
abc
|
|
a b c
|
x
|
abc
|
abc
|
execute
set_target( 'a b c' )
select 'abc' as pattern , null as modifier , target ,
regexp_substr( target, 'abc' ) as matched_value
from targets
union
select 'abc' as pattern , 'x' as modifier , target ,
regexp_substr( target, 'abc', 1, 1, 'x' ) as matched_value
from targets
union
select 'a b c' as pattern, null as modifier , target ,
regexp_substr( target, 'a b c' ) as matched_value
from targets
union
select 'a b c' as pattern, 'x' as modifier , target ,
regexp_substr( target, 'a b c', 1, 1, 'x' ) as matched_value
from targets
order by 1 desc ,
2 desc , 3;
PATTERN
|
MODIFIER
|
TARGET
|
MATCHED_VALUE
|
abc
|
|
a b c
|
|
abc
|
x
|
a b c
|
|
a b c
|
|
a b c
|
a b c
|
a b c
|
x
|
a b c
|
|
Techniques
for excluding a string from a pattern match vary depending on the requirements.
To find all
targets that do not contain a string like "ab", use the
"not" condition.
execute
set_target( 'abb' );
execute
add_target( 'abc' );
execute
add_target( 'acc' );
select
target from targets where not regexp_like( target, 'ab' );
TARGET
----------
acc
To match a pattern like 'a..' where the second and
third characters after an "a" are anything other than "bc"
use two WHERE conditions.
select
target from targets where regexp_like( target, 'a..' ) and not regexp_like( target, 'abc' );
TARGET
----------
abb
acc
No comments :
Post a Comment