Thursday, July 23, 2015

Regular Expressions 2

REGEXP_LIKE

The REGEXP_LIKE pattern matching condition is similar to the LIKE condition. Whereas LIKE only performs simple pattern matching using the wildcards "%" and "_", REGEXP_LIKE performs complex regular expression pattern matching allowing it to match a much greater range of string patterns than LIKE.
The following examples compare pattern matching with both LIKE and REGEXP_LIKE.
Before proceeding however, the following code will set up some test data for us (see Regular Expression Data Setup if you want to create the SET_TARGET and ADD_TARGET procedures on your own test system).
execute set_target( 'aaa'  )
execute add_target( 'abc'  )
execute add_target( 'bbb'  )
execute add_target( 'bcd'  )
execute add_target( 'cde'  )
execute add_target( 'def'  )
execute add_target( 'efg'  )

column target heading "TARGET|STRING"
LIKE vs REGEXP_LIKE
Matching an entire string is performed as follows.
select target select target
from   targets from   targets
where  target LIKE 'abc' ; where  REGEXP_LIKE( target, '^abc$' ) ;
TARGET STRING TARGET STRING
----------------------- ----------------------
abc abc
This is how to match a string anywhere in the target.
select target select target
from   targets from   targets
where  target LIKE '%bc%' ; where  REGEXP_LIKE( target, 'bc' ) ;
TARGET STRING TARGET STRING
----------------------- ----------------------
abc abc
bcd bcd
These examples show how to match a string at the start of the target.
select target select target
from   targets from   targets
where  target LIKE 'b%' ; where  REGEXP_LIKE( target, '^b' ) ;
TARGET STRING TARGET STRING
----------------------- ----------------------
bbb bbb
bcd bcd
These examples show how to match a string at the end of the target.
select target select target
from   targets from   targets
where  target LIKE '%b' ; where  REGEXP_LIKE( target, 'b$' ) ;
TARGET STRING TARGET STRING
----------------------- ----------------------
bbb bbb
To match any single characters do the following.
select target select target
from   targets from   targets
where  target LIKE 'a_c' ; where  REGEXP_LIKE( target, '^a.c$' ) ;
TARGET STRING TARGET STRING
----------------------- ----------------------
abc abc
One example of a complex pattern that REGEXP_LIKE can handle but a single, basic LIKE condition cannot (without using text manipulation functions) is a list of characters. In this example we search for strings that have either an "a", "c", or "f" as the second character of the string.
-- select target
-- not possible from   targets
-- where  REGEXP_LIKE( target, '^.[acf]' ) ;

TARGET STRING

-----------------------

aaa

bcd

efg

REGEXP_INSTR

The REGEXP_INSTR function is similar to the INSTR function except that it finds the location of a pattern of characters, specified with a regular expression, instead of finding the location of a simple string literal. The following examples compare searches using INSTR with those that use REGEXP_INSTR.
Before proceeding however, the following code will set up some test data for us (see Regular Expression Data Setup if you want to create the SET_TARGET and ADD_TARGET procedures on your own test system).
execute set_target( 'aaa'  )
execute add_target( 'abc'  )
execute add_target( 'bbb'  )
execute add_target( 'bcd'  )
execute add_target( 'cde'  )
execute add_target( 'def'  )
execute add_target( 'efg'  )

column target heading "TARGET|STRING"
INSTR vs REGEXP_INSTR
Matching a string literal is performed as follows.
select   target ,   INSTR( target, 'b' ) position from   targets
select   target ,    REGEXP_INSTR( target, 'b' ) position from   targets
TARGET
TARGET
STRING     POSITION
STRING     POSITION
---------- --------
---------- --------
aaa               0
aaa               0
abc               2
abc               2
bbb               1
bbb               1
bcd               1
bcd               1
cde               0
cde               0
def               0
def               0
efg               0
efg               0
Matching a complex pattern is possible with REGEXP_INSTR, but not INSTR (unless you use other text manipulation functions as well). The following example shows how to find the position of either the first "a", "c", or "f" character in the target.
--
select   target ,      REGEXP_INSTR( target, '[acf]' ) position from   targets
-- not possible
TARGET

STRING     POSITION

---------- --------

aaa               1

abc               1

bbb               0

bcd               2

cde               1

def               3

efg               2

REGEXP_SUBSTR

The REGEXP_SUBSTR function provides a superset of the functionality available with the SUBSTR function. While SUBSTR extracts a string from a specific location in the target, REGEXP_SUBSTR extracts a string that matches a given pattern, specified with a regular expression, from anywhere in the target. The following examples compare expressions using SUBSTR with those that use REGEXP_SUBSTR.
Before proceeding however, the following code will set up some test data for us (see Regular Expression Data Setup if you want to create the SET_TARGET and ADD_TARGET procedures on your own test system).
execute set_target( 'aaa'  )
execute add_target( 'abc'  )
execute add_target( 'bbb'  )
execute add_target( 'bcd'  )
execute add_target( 'cde'  )
execute add_target( 'def'  )
execute add_target( 'efg'  )

column target heading "TARGET|STRING"
SUBSTR vs REGEXP_SUBSTR
Extracting a string from a fixed position, e.g. characters 2 and 3, is performed as follows.

select   target , SUBSTR( target, 2, 2 ) substring from   targets
select   target ,      REGEXP_SUBSTR( target, '..', 2 ) substring from   targets
TARGET
TARGET
STRING     SUBSTRING
STRING     SUBSTRING
---------- ---------------
---------- ---------------
aaa        aa
aaa        aa
abc        bc
abc        bc
bbb        bb
bbb        bb
bcd        cd
bcd        cd
cde        de
cde        de
def        ef
def        ef
efg        fg
efg        fg
Matching a complex pattern is possible with REGEXP_SUBSTR, but not SUBSTR (unless you use other functions as well). The following example shows how to extract either the first "a", "c", or "f" character in the target.
--
select   target ,        REGEXP_SUBSTR( target, '^[acf]' ) substring from   targets
-- not possible
TARGET

STRING     SUBSTRING

---------- ---------------

aaa        a

abc        a

bbb        (null)

bcd        (null)

cde        c

def        (null)

efg        (null)

Pattern Matching

These patterns can be used in any of the REGEXP_ features described earlier in this section. For simplicity, pattern matching will be broken down into four different categories.
-          WHAT single piece of text are we trying to match?
-          HOW MANY repetitions of that text are we trying to match?
-          WHERE in the source string should we look for a match?
-          WHICH occurrence of the text do we want to match?
Note: When using regular expressions in your own code you will do so using the REGEXP_LIKE, REGEXP_INSTR, REGEXP_REPLACE, and REGEXP_SUBSTR features described in preceding tutorials. In many of the examples to follow you will not see these features used directly. Instead they are embedded in a view called TEST_RESULTS, which we will select from to see the effect of various pattern and target combinations (initialized with custom procedures like SET_PATTERN, SET_TARGET, and ADD_TARGET). This is done so that we can focus solely on pattern matching and not repeat syntax for features already presented.
The source code for the custom procedures and the view can be found in the Regular Expression Data Setup topic at the Beginning of this section. The TEST_RESULTS view contains the following columns.
Column Name Description
PATTERN
contains the regular expression for the pattern being tested
TARGET
contains the target string being examined
MATCH
indicates whether or not a match was found using REGEXP_LIKE
MATCHED_VALUE
contains the value within the TARGET that matched the PATTERN; the value is extracted using REGEXP_SUBSTR; if a match was not found or a match to an empty string was made MATCHED_VALUE will contain a null (indicated as '(null)')
POSITION
indicates the matched value's location within the TARGET as returned by REGEXP_INSTR

No comments :

Post a Comment