Recently, I needed to filter out some instance paths from my UVM testbench hierarchy. I discovered that this can be done using regular expressions and that UVM already has a function called uvm_pkg::uvm_re_match(), which is a DPI-C function that makes use of the POSIX function regexec() to perform a string match.
The uvm_re_match function will return zero if there is a match and 1 if the regular expression does NOT match.
This function is very easy to use. Here is an example which can be found on EDAPlayground:
module top;
import uvm_pkg::*;
bit match;
string str = "abcdef.ghij[2]";
string regex;
initial begin
// match - returns 0
regex="abcdef.ghij[[][2-7][]]";
match = uvm_re_match(regex, str);
printResult();
//match - returns 0
regex="abcdef*";
match = uvm_re_match(regex, str);
printResult();
//NO match - return 1
regex="xyz";
match = uvm_re_match(regex, str);
printResult();
end
function void printResult();
$display(" MATCH=", match, " when searching for regular expression:", regex, " inside string: ", str);
endfunction
endmodule
OUTPUT:
MATCH=0 when searching for regular expression:abcdef.ghij[[][2-7][]] inside string: abcdef.ghij[2] MATCH=0 when searching for regular expression:abcdef* inside string: abcdef.ghij[2] MATCH=1 when searching for regular expression:xyz inside string: abcdef.ghij[2]
So I started to use the uvm_pkg::uvm_re_match() function to match my class instances.
While playing with this function, I discovered some non-obvious behavior, which I thought I would share with you.
This is best illustrated using this example on EDAPlayground:
module top;
import uvm_pkg::*;
bit match;
string str = "abcdef.ghij[2]";
string regex;
initial begin
//case 1 - NO match
regex = "abcdef.ghij[2]";
$display("Case1:", regex);
match =uvm_re_match(regex, str);
$display(match);
//case 2 - NO match
regex = "abcdef.ghij\[2\]";
$display("Case2:", regex);
match =uvm_re_match(regex, str);
$display(match);
//case 3 - MATCHES
regex = "abcdef.ghij\\[2\\]";
$display("Case3:", regex);
match =uvm_re_match(regex, str);
$display(match);
//case 4 - MATCHES
regex = "abcdef.ghij[[]2[]]";
$display("Case4:", regex);
match =uvm_re_match(regex, str);
$display(match);
end
endmodule
OUTPUT:
Case1:abcdef.ghij[2] 1 Case2:abcdef.ghij[2] 1 Case3:abcdef.ghij\[2\] 0 Case4:abcdef.ghij[[]2[]] 0
“Case 1” is clearly a mistake because according to POSIX regex the [2] will try to match the character found between the brackets, which is 2, and no matching is performed for the bracket characters [ and ] themselves. Here is a great website for testing the behavior of regular expressions on a sample text.
I expected “Case 2” to work because the bracket characters are escaped using \[ and \], but in SystemVerilog it seems that the \ character also needs to be escaped because it is itself the escape character used inside a string (for more details see this stackoverflow question). See the output when printing the regex for “Case 2”. I therefore need to escape this escape character with another \ character, as in “Case 3”.
“Case 4” is also a solution because we use the character set from regular expressions. We add the opening and closing brackets inside the character set operator [ ] like this: [[] and []].
uvm_re_match inside the UVM code
Note that the implementation of uvm_re_match() has two variants:
- The POSIX regular expression (default)
- The glob style
The implementation is chosen based on the DPI mode of the UVM library. DPI mode is selected whenever UVM_NO_DPI is not defined. If DPI mode is used, then the uvm_re_match function will use the POSIX implementation, otherwise it will use the glob style implementation, as can be seen below:
`ifdef UVM_NO_DPI
`define UVM_REGEX_NO_DPI
`endif
`ifndef UVM_REGEX_NO_DPI
import "DPI-C" context function int uvm_re_match(string re, string str);
import "DPI-C" context function void uvm_dump_re_cache();
import "DPI-C" context function string uvm_glob_to_re(string glob);
`else
// The Verilog only version does not match regular expressions,
// it only does glob style matching.
function int uvm_re_match(string re, string str);
//...code
endfunction
function void uvm_dump_re_cache();
endfunction
function string uvm_glob_to_re(string glob);
// code
endfunction
`endif
If your code defines UVM_NO_DPI or UVM_REGEX_NO_DPI, then the uvm_re_match function will not be able to process POSIX regular expressions and the regular expressions will not work as expected.
Conclusion
When using the escape character \ in a SystemVerilog string, don’t forget to check whether you need to escape it once more like this \\. Otherwise, it might not do what you expect it to do.
Have you always done this? Please share your experience of using regular expressions in SystemVerilog.
One Response
Thank you for sharing.