Specifying WHAT to Match - Part 1
In this tutorial we begin examining ways to specify WHAT text to match by exploring patterns for individual characters.Matching Individual Characters
To match a single character string, e.g. the letter 'b', simply use it as the regular expression pattern.
execute
set_pattern( 'b' )
execute
set_target( 'aaa' )
execute
add_target( 'abc' )
execute
add_target( 'bbb' )
execute
add_target( 'bcd' )
execute
add_target( 'cde' )
execute
add_target( 'def' )
execute
add_target( 'efg' )
create or
replace view test_results as
select pattern , target, case
when regexp_like( target, pattern ) then 'Y' else 'N' end as match,
regexp_substr( target, pattern ) as
matched_value , regexp_instr( target,
pattern ) as position
from patterns,
targets
order
by patterns.sort_key, target;
column
pattern format a15
column
target format a10
column
match format a5
column
matched_value format a15
column
modifier format a8
column
replaced_target format a15
column
position format 999
column
substring format a15
break on
pattern skip 1
set
null '(null)'
set
linesize 100
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
b
|
aaa
|
N
|
|
0
|
b
|
abc
|
Y
|
b
|
2
|
b
|
bbb
|
Y
|
b
|
1
|
b
|
bcd
|
Y
|
b
|
1
|
b
|
cde
|
N
|
|
0
|
b
|
def
|
N
|
|
0
|
b
|
efg
|
N
|
|
0
|
To match
any one of a set of individual characters use character list syntax '[ ]'.
The
following example matches either 'a', 'c', or 'g'.
execute
set_pattern( '[acg]' )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
[acg]
|
aaa
|
Y
|
a
|
1
|
[acg]
|
abc
|
Y
|
a
|
1
|
[acg]
|
bbb
|
N
|
|
0
|
[acg]
|
bcd
|
Y
|
c
|
2
|
[acg]
|
cde
|
Y
|
c
|
1
|
[acg]
|
def
|
N
|
|
0
|
[acg]
|
efg
|
Y
|
g
|
3
|
To exclude
characters place '^' after the '['.
execute
set_pattern( '[^acg]' )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
[^acg]
|
aaa
|
N
|
|
0
|
[^acg]
|
abc
|
Y
|
b
|
2
|
[^acg]
|
bbb
|
Y
|
b
|
1
|
[^acg]
|
bcd
|
Y
|
b
|
1
|
[^acg]
|
cde
|
Y
|
d
|
2
|
[^acg]
|
def
|
Y
|
d
|
1
|
[^acg]
|
efg
|
Y
|
e
|
1
|
Matching
any one of a range of characters, e.g. 'a' through 'c', can be done with the
range operator '-'.
execute
set_pattern( '[a-c]' )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
[a-c]
|
aaa
|
Y
|
a
|
1
|
[a-c]
|
abc
|
Y
|
a
|
1
|
[a-c]
|
bbb
|
Y
|
b
|
1
|
[a-c]
|
bcd
|
Y
|
b
|
1
|
[a-c]
|
cde
|
Y
|
c
|
1
|
[a-c]
|
def
|
N
|
|
0
|
[a-c]
|
efg
|
N
|
|
0
|
Be careful
when working with alphabets other than English since a range like '[a-z]' may
not include all characters in your alphabet.
execute
set_pattern( '[a-z]' )
execute
set_target( 'aeiou' )
execute
add_target( 'à éîõü' )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
[a-z]
|
aeiou
|
Y
|
a
|
1
|
[a-z]
|
…‚Å’o
|
Y
|
o
|
4
|
POSIX
character classes and equivalence classes (discussed in WHAT to Match - 2)
offer a solution when working with alphabets other than English.
Case Insensitive Matching
By
default, pattern matching with regular expressions is case sensitive. To
perform case insensitive matches use the pattern matching modifier 'i' in
REGEXP conditions and functions.
First,
consider a case sensitive match.
execute
set_target( 'a' )
execute
add_target( 'b' )
execute
add_target( 'c' )
execute
add_target( 'A' )
execute
add_target( 'B' )
execute
add_target( 'C' )
select '[b]' as pattern, target ,
regexp_substr( target, '[b]' ) as matched_value from targets;
PATTERN
|
TARGET
|
MATCHED_VALUE
|
[b]
|
a
|
(null)
|
[b]
|
b
|
b
|
[b]
|
c
|
(null)
|
[b]
|
A
|
(null)
|
[b]
|
B
|
(null)
|
[b]
|
C
|
(null)
|
Now
consider the same pattern in a case insensitive match. The 'i' modifier is
specified in the fifth parameter of the REGEXP_SUBSTR function.
Select '[b]'
as pattern, target, regexp_substr( target, '[b]', 1, 1, 'i' ) as matched_value from
targets
PATTERN
|
TARGET
|
MATCHED_VALUE
|
[b]
|
a
|
(null)
|
[b]
|
b
|
b
|
[b]
|
c
|
(null)
|
[b]
|
A
|
(null)
|
[b]
|
B
|
B
|
[b]
|
C
|
(null)
|
Escaping Special Characters
Attempting
to match special characters, like '[', may cause problems if you fail to escape
them. The pattern 'a[b', for example, will not work.
execute
set_pattern( 'a[b' )
execute
set_target( 'abc' )
execute
add_target( 'a[b[c' )
execute
add_target( 'a\b\c' )
select *
from test_results ;
select *
from test_results
*
ERROR at
line 1:
ORA-12726:
unmatched bracket in regular expression
To escape
special characters, prefix them with the '\' character.
execute
set_pattern( 'a\[b' )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
a\[b
|
a[b[c
|
Y
|
a[b
|
1
|
a\[b
|
a\b\c
|
N
|
(null)
|
0
|
a\[b
|
abc
|
N
|
(null)
|
0
|
execute
set_pattern( 'a\\b' )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
a\\b
|
a[b[c
|
N
|
(null)
|
0
|
a\\b
|
a\b\c
|
Y
|
a\b
|
1
|
a\\b
|
abc
|
N
|
(null)
|
0
|
Wildcard Character
To match
any character at all, except null and the newline character, use the '.'
character.
execute
set_pattern( '.' );
execute
set_target( 'abc' )
execute
add_target( '#24' )
execute
add_target( '123' )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
.
|
#24
|
Y
|
#
|
1
|
.
|
123
|
Y
|
1
|
1
|
.
|
abc
|
Y
|
a
|
1
|
Be careful
with nulls, nuls, line feeds, and carriage returns. '.' does not match them
all.
execute
set_pattern( '.' );
begin
set_target( chr(0) ); end; -- NUL
character, do not confuse with NULL
begin
add_target( chr(10) ); end; -- a Linefeed
begin
add_target( chr(13) ); end; -- a Carriage Return
begin
add_target( 'a' ); end;
begin
add_target( null ); end;
begin
add_target( '' ); end; -- an empty
string is equivalent to NULL
select pattern ,
ascii( target ) as target_ascii_code ,
match ,
ascii( matched_value ) as
matched_value_ascii_code , position
from test_results;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
.
|
0
|
Y
|
0
|
1
|
.
|
10
|
N
|
(null)
|
0
|
.
|
13
|
Y
|
13
|
1
|
.
|
97
|
Y
|
97
|
1
|
.
|
(null)
|
N
|
(null)
|
(null)
|
.
|
(null)
|
N
|
(null)
|
(null)
|
If you
need '.' to match the newline character you can use the 'n' modifier in REGEXP
conditions and functions.
Select '.' as pattern , ascii( target ) as target_ascii_code ,
case
when regexp_like( target, '.', 'n' ) then 'Y' else 'N'
end as match ,
regexp_instr( target, '.', 1, 1, 0, 'n' ) as
position
from targets;
PATTERN
|
TARGET_ASCII_CODE
|
MATCH
|
POSITION
|
.
|
0
|
Y
|
1
|
.
|
10
|
Y
|
1
|
.
|
13
|
Y
|
1
|
.
|
97
|
Y
|
1
|
.
|
(null)
|
N
|
(null)
|
.
|
(null)
|
N
|
(null)
|
Nulls
Using a
null for either a regular expression pattern or a search target always yields
an unknown result.
execute
set_pattern( null )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
(null)
|
|
N
|
(null)
|
(null)
|
(null)
|
|
N
|
(null)
|
(null)
|
(null)
|
|
N
|
(null)
|
(null)
|
(null)
|
a
|
N
|
(null)
|
(null)
|
(null)
|
(null)
|
N
|
(null)
|
(null)
|
(null)
|
(null)
|
N
|
(null)
|
(null)
|
execute
set_pattern( '.' )
execute
set_target( 'abc' )
execute
add_target( null )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
.
|
abc
|
Y
|
a
|
1
|
.
|
(null)
|
N
|
(null)
|
(null)
|
Gotchas
You cannot
specify multi-character strings in character lists. A pattern like 'a[(bc)]'
does NOT mean 'a' followed by the string 'bc'. It has the same meaning as
'a[bc]'.
execute
set_pattern( 'a[(bc)]' )
execute
set_target( 'abc' )
execute
add_target( 'ab' )
execute
add_target( 'ac' )
execute
add_target( 'bc' )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
a[(bc)]
|
ab
|
Y
|
ab
|
1
|
a[(bc)]
|
abc
|
Y
|
ab
|
1
|
a[(bc)]
|
ac
|
Y
|
ac
|
1
|
a[(bc)]
|
bc
|
N
|
(null)
|
0
|
execute
set_pattern( 'a[bc]' )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
a[bc)]
|
ab
|
Y
|
ab
|
1
|
a[bc)]
|
abc
|
Y
|
ab
|
1
|
a[bc)]
|
ac
|
Y
|
ac
|
1
|
a[bc)]
|
bc
|
N
|
(null)
|
0
|
You cannot
use 'not' to turn an expression with an unknown result into an expression with
a true result. For example, neither of the following two queries return any
rows, though some might expect the second one would.
set
feedback on
select * from dual where
regexp_like( null, 'abc' ) ;
no rows
selected
No comments :
Post a Comment