Thursday, July 23, 2015

Regular Expressions 4

Specifying WHAT to Match - Part 2

In this tutorial we continue examining ways to specify WHAT text to match by exploring POSIX character classes and equivalence classes.
Character Classes
We saw earlier how a range expression like '[a-z]' works well with the English alphabet but it does not yield accurate results with other alphabets. POSIX character classes, on the other hand, are portable between languages.
With character classes you can match any one of a group of characters, like a lowercase character or a numeric digit, inside a character list pattern. Character class names take the format '[:class:]'. A character list containing a character class looks like '[[:class:]]'. "class" can be any one of the following terms.
Class
Description
lower
lowercase letters
upper
uppercase letters
alpha
all characters in the classes 'lower' and 'upper'
digit
numeric digits
xdigit
hexadecimal digits
alnum
all characters in the classes 'alpha' and 'digit'
blank
and
space
white-space characters
cntrl
control characters
punct
punctuation characters
graph
printable characters, not including
print
printable characters, including
This example shows how the class named 'lower' works, an alternative to using the range '[a-z]'.
execute set_pattern( '[[:lower:]]' )
execute set_target( '+24'   )
execute add_target( '-abc'  )
execute add_target( '.1415' )
execute add_target( '@@@'   )
execute add_target( 'ABC'   )
execute add_target( 'àéîõü' )

select * from test_results ;

PATTERN
TARGET
MATCH
MATCHED_VALUE 
POSITION
[[:lower:]]
24
N
 (null)
0
[[:lower:]]
-abc
Y
a
2
[[:lower:]]
0.1415
N
 (null)
0
[[:lower:]]
@@@
N
 (null)
0
[[:lower:]]
ABC
N
 (null)
0
[[:lower:]]
…‚Œo
Y
o
4

The 'digit' class includes all numeric digits.
execute set_pattern( '[[:digit:]]' )

select * from test_results ;

PATTERN
TARGET
MATCH
MATCHED_VALUE 
POSITION
[[:digit:]]
24
Y
2
2
[[:digit:]]
-abc
N
  (null)
0
[[:digit:]]
0.1415
Y
1
2
[[:digit:]]
@@@
N
  (null)
0
[[:digit:]]
ABC
N
  (null)
0
[[:digit:]]
…‚Œo
N
  (null)
0

To get all characters in both the 'lower' and the 'digit' classes, combine them like this.

execute set_pattern( '[[:lower:][:digit:]]' )
column pattern format a25
select * from test_results ;

PATTERN
TARGET
MATCH
MATCHED_VALUE 
POSITION
[[:lower:][:digit:]]
24
Y
2
2
[[:lower:][:digit:]]
-abc
Y
a
2
[[:lower:][:digit:]]
0.1415
Y
1
2
[[:lower:][:digit:]]
@@@
N
  (null)
0
[[:lower:][:digit:]]
ABC
N
  (null)
0
[[:lower:][:digit:]]
…‚Œo
Y
o
4

column pattern format a15
This example shows how to exclude all characters in both classes.

execute set_pattern( '[^[:lower:][:digit:]]' )
column pattern format a25
select * from test_results ;

PATTERN
TARGET
MATCH
MATCHED_VALUE 
POSITION
[^[:lower:][:digit:]]
24
Y
+
1
[^[:lower:][:digit:]]
-abc
Y
-
1
[^[:lower:][:digit:]]
0.1415
Y
.
1
[^[:lower:][:digit:]]
@@@
Y
@
1
[^[:lower:][:digit:]]
ABC
Y
A
1
[^[:lower:][:digit:]]
…‚Œo
Y

1

No comments :

Post a Comment