Specifying WHAT to Match - Part 2
In this tutorial we continue examining ways to specify WHAT text to match by exploring POSIX character classes and equivalence classes.Character Classes
We saw earlier how a range expression like '[a-z]' works well with the English alphabet but it does not yield accurate results with other alphabets. POSIX character classes, on the other hand, are portable between languages.
With character classes you can match any one of a group of characters, like a lowercase character or a numeric digit, inside a character list pattern. Character class names take the format '[:class:]'. A character list containing a character class looks like '[[:class:]]'. "class" can be any one of the following terms.
Class
|
Description
|
lower
|
lowercase letters
|
upper
|
uppercase letters
|
alpha
|
all characters in the
classes 'lower' and 'upper'
|
digit
|
numeric digits
|
xdigit
|
hexadecimal digits
|
alnum
|
all characters in the
classes 'alpha' and 'digit'
|
blank
|
|
space
|
white-space characters
|
cntrl
|
control characters
|
punct
|
punctuation characters
|
graph
|
printable characters,
not including
|
print
|
printable characters,
including
|
execute
set_pattern( '[[:lower:]]' )
execute
set_target( '+24' )
execute
add_target( '-abc' )
execute
add_target( '.1415' )
execute
add_target( '@@@' )
execute
add_target( 'ABC' )
execute
add_target( 'àéîõü' )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
[[:lower:]]
|
24
|
N
|
(null)
|
0
|
[[:lower:]]
|
-abc
|
Y
|
a
|
2
|
[[:lower:]]
|
0.1415
|
N
|
(null)
|
0
|
[[:lower:]]
|
@@@
|
N
|
(null)
|
0
|
[[:lower:]]
|
ABC
|
N
|
(null)
|
0
|
[[:lower:]]
|
…‚Œo
|
Y
|
o
|
4
|
The
'digit' class includes all numeric digits.
execute
set_pattern( '[[:digit:]]' )
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
[[:digit:]]
|
24
|
Y
|
2
|
2
|
[[:digit:]]
|
-abc
|
N
|
(null)
|
0
|
[[:digit:]]
|
0.1415
|
Y
|
1
|
2
|
[[:digit:]]
|
@@@
|
N
|
(null)
|
0
|
[[:digit:]]
|
ABC
|
N
|
(null)
|
0
|
[[:digit:]]
|
…‚Œo
|
N
|
(null)
|
0
|
To get all
characters in both the 'lower' and the 'digit' classes, combine them like this.
execute
set_pattern( '[[:lower:][:digit:]]' )
column
pattern format a25
select *
from test_results ;
PATTERN
|
TARGET
|
MATCH
|
MATCHED_VALUE
|
POSITION
|
[[:lower:][:digit:]]
|
24
|
Y
|
2
|
2
|
[[:lower:][:digit:]]
|
-abc
|
Y
|
a
|
2
|
[[:lower:][:digit:]]
|
0.1415
|
Y
|
1
|
2
|
[[:lower:][:digit:]]
|
@@@
|
N
|
(null)
|
0
|
[[:lower:][:digit:]]
|
ABC
|
N
|
(null)
|
0
|
[[:lower:][:digit:]]
|
…‚Œo
|
Y
|
o
|
4
|
column pattern
format a15
This
example shows how to exclude all characters in both classes.
execute
set_pattern( '[^[:lower:][:digit:]]' )
column
pattern format a25
select *
from test_results ;
PATTERN |
TARGET |
MATCH |
MATCHED_VALUE |
POSITION |
[^[:lower:][:digit:]] |
24 |
Y |
+ |
1 |
[^[:lower:][:digit:]] |
-abc |
Y |
- |
1 |
[^[:lower:][:digit:]] |
0.1415 |
Y |
. |
1 |
[^[:lower:][:digit:]] |
@@@ |
Y |
@ |
1 |
[^[:lower:][:digit:]] |
ABC |
Y |
A |
1 |
[^[:lower:][:digit:]] |
…‚Œo |
Y |
… |
1 |
No comments :
Post a Comment