mb_ereg
(PHP 4 >= 4.2.0, PHP 5, PHP 7, PHP 8)
mb_ereg — Regular expression match with multibyte support
说明
Executes the regular expression match with multibyte support.
参数
pattern-
The search pattern.
string-
The search string.
matches-
If matches are found for parenthesized substrings of
patternand the function is called with the third argumentmatches, the matches will be stored in the elements of the arraymatches. If no matches are found,matchesis set to an empty array.$matches[1] will contain the substring which starts at the first left parenthesis; $matches[2] will contain the substring starting at the second, and so on. $matches[0] will contain a copy of the complete string matched.
返回值
Returns whether pattern matches string.
更新日志
| 版本 | 说明 |
|---|---|
| 8.0.0 |
This function returns true on success now. Previously, it returned the byte length
of the matched string if a match for pattern was found in
string and matches was passed.
If the optional parameter matches was not passed or
the length of the matched string was 0, this function returned 1.
|
| 7.1.0 |
mb_ereg() will now set matches to
an empty array, if nothing matched. Formerly,
matches was not modified in that case.
|
注释
注意:
mb_regex_encoding() 指定的内部编码或字符编码将会当作此函数用的字符编码。
参见
- mb_regex_encoding() - Set/Get character encoding for multibyte regex
- mb_eregi() - Regular expression match ignoring case with multibyte support
用户贡献的备注 12 notes
One of the differences between preg_match() & mb_ereg()
about "captured parenthesized subpattern".
<?php
preg_match('/(abc)(.*)/', 'abc', $match);
var_dump($match);
mb_ereg('(abc)(.*)', 'abc', $match);
var_dump($match);
?>
array(3) {
[0]=>
string(3) "abc"
[1]=>
string(3) "abc"
[2]=>
string(0) "" // <-- "string"(0) "" : preg_match()
}
array(3) {
[0]=>
string(3) "abc"
[1]=>
string(3) "abc"
[2]=>
bool(false) // <-- "bool"(false) : mb_ereg()
}Old link to Oniguruma regex syntax is not working anymore, there is a working one:
https://github.com/geoffgarside/oniguruma/blob/master/Syntax.txtNote that mb_ereg() does not support the \uFFFF unicode syntax but uses \x{FFFF} instead:
<?PHP
$text = 'Peter is a boy.'; // english
$text = 'بيتر هو صبي.'; // arabic
//$text = 'פיטר הוא ילד.'; // hebrew
mb_regex_encoding('UTF-8');
if(mb_ereg('[\x{0600}-\x{06FF}]', $text)) // arabic range
//if(mb_ereg('[\x{0590}-\x{05FF}]', $text)) // hebrew range
{
echo "Text has some arabic/hebrew characters.";
}
else
{
echo "Text doesnt have arabic/hebrew characters.";
}
?>mb_ereg() cannot match over 100,000 (100K) characters (not bytes but characters)
whereas preg_match() can over 1,000,000,000 (1G, if it's within "memory_limit").
Try this.
<?php
ini_set("memory_limit", "512M"); // <-- must be changed if you try 1G.
$length = 100000; // <-- 99999 is OK / 100000 is NG
$str = "";
for ($i=0; $i<$length; $i++):
$str .= "1"; // <-- same result if it is a multibyte character.
endfor;
if (mb_ereg('.*', $str)):
echo '<br><span style="background-color:lightgreen">OK!</span><br>memory_limit = '.ini_get("memory_limit").'<br>$length = '.$length;
else:
echo '<br><span style="background-color:orange">NG!</span><br>memory_limit = '.ini_get("memory_limit").'<br>$length = '.$length;
endif;
?>If adding ".*" at the end of the pattern returns "false"
whereas only one "." returns "true",
Suspect the string is too long for the pattern matching.
In this case, using preg_match() returns "true" when putting ".*"
, but adding more "$" or "\z" returns "false" as expected.mb_ereg() with a named-subpattern
never catches non-named-subpattern.
(Oniguruma's restriction)
<?php
$str = 'abcdefg';
$patternA = '\A(abcd)(.*)\z'; // both caught [1]abcd [2]efg
$patternB = '\A(abcd)(?<rest>.*)\z'; // non-named 'abcd' never caught
mb_ereg($patternA, $str, $match);
echo '<pre>'.print_r($match, true).'</pre>';
mb_ereg($patternB, $str, $match);
echo '<pre>'.print_r($match, true).'</pre>';
?>
Array
(
[0] => abcdefg
[1] => abcd
[2] => efg
)
Array
(
[0] => abcdefg
[1] => efg
[rest] => efg
)<?php
# What mb_ereg() returns & changes $_3rd_argument into
# (Just run this script)
function dump2str($var) {
ob_start();
var_dump($var);
$output = ob_get_contents();
ob_end_clean();
return $output;
}
# (PHP7)empty pattern returns bool(false) with Warning
# (PHP8)empty pattern throws ValueError
$emp_ptn = '';
try{
$emp_ptn.= dump2str(mb_ereg('', 'abcde'));
}catch(Exception | Error $e){
$emp_ptn.= get_class($e).'<br>';
$emp_ptn.= $e->getMessage();
$emp_ptn.= '<pre>'.$e->getTraceAsString().'</pre>';
}
echo
'PHP '.phpversion().'<br><br>'.
'# match<br>'.
dump2str(mb_ereg("bcd", "abcde")).
' : mb_ereg("bcd", "abcde")<br><br>'.
'# match with 3rd argument<br>'.
dump2str(mb_ereg("bcd", "abcde", $_3rd)).
' : mb_ereg("bcd", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>'.
'# match (0 byte)<br>'.
dump2str(mb_ereg("^", "abcde")).
' : mb_ereg("^", "abcde")<br><br>'.
'# match (0 byte) with 3rd argument<br>'.
dump2str(mb_ereg("^", "abcde", $_3rd)).
' : mb_ereg("^", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>'.
'# unmatch<br>'.
dump2str(mb_ereg("f", "abcde")).
' : mb_ereg("f", "abcde")<br><br>'.
'# unmatch with 3rd argument<br>'.
dump2str(mb_ereg("f", "abcde", $_3rd)).
' : mb_ereg("f", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>'.
'# empty pattern<br>'.
$emp_ptn.
' : mb_ereg("", "abcde")<br><br>'.
'# empty pattern with 3rd argument<br>'.
$emp_ptn.
' : mb_ereg("", "abcde", $_3rd) // '.dump2str($_3rd).'<br><br>';
?>I hope this information is shown somewhere on php.net.
According to "https://github.com/php/php-src/tree/PHP-5.6/ext/mbstring/oniguruma",
the bundled Oniguruma regex library version seems ...
4.7.1 between PHP 5.3 - 5.4.45,
5.9.2 between PHP 5.5 - 7.1.16,
6.3.0 since PHP 7.2 - .mb_ereg() seems unable to Use "named subpattern".
preg_match() seems a substitute only in UTF-8 encoding.
<?php
$text = 'multi_byte_string';
$pattern = '.*(?<name>string).*'; // "?P" causes "mbregex compile err" in PHP 5.3.5
if(mb_ereg($pattern, $text, $matches)){
echo '<pre>'.print_r($matches, true).'</pre>';
}else{
echo 'no match';
}
?>
This code ignores "?<name>" in $pattern and displays below.
Array
(
[0] => multi_byte_string
[1] => string
)
$pattern = '/.*(?<name>string).*/u';
if(preg_match($pattern, $text, $matches)){
instead of lines 2 & 3
displays below (in UTF-8 encoding).
Array
(
[0] => multi_byte_string
[name] => string
[1] => string
)<?php
// in PHP_VERSION 7.1
// WITHOUT $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_'); // [5 bytes match]
var_dump($int); // int(1)
$int = mb_ereg('ab', '_ab_'); // [2 bytes match]
var_dump($int); // int(1)
$int = mb_ereg('^', '_ab_'); // [0 bytes match]
var_dump($int); // int(1)
$int = mb_ereg('ab', '__'); // [not match]
var_dump($int); // bool(false)
$int = mb_ereg('', '_ab_'); // [error : empty pattern]
// Warning: mb_ereg(): empty pattern in ...
var_dump($int); // bool(false)
$int = mb_ereg('ab'); // [error : fewer arguments]
// Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int); // bool(false)
// Without 3rd argument, mb_ereg() returns either int(1) or bool(false).
// WITH $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_', $regs);// [5 bytes match]
var_dump($int); // int(5)
var_dump($regs); // array(1) { [0]=> string(5) "abcde" }
$int = mb_ereg('ab', '_ab_', $regs); // [2 bytes match]
var_dump($int); // int(2)
var_dump($regs); // array(1) { [0]=> string(2) "ab" }
$int = mb_ereg('^', '_ab_', $regs); // [0 bytes match]
var_dump($int); // int(1)
var_dump($regs); // array(1) { [0]=> bool(false) }
$int = mb_ereg('ab', '__', $regs); // [not match]
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }
$int = mb_ereg('', '_ab_', $regs); // [error : empty pattern]
// Warning: mb_ereg(): empty pattern in ...
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }
$int = mb_ereg('ab'); // [error : fewer arguments]
// Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }
// With 3rd argument, mb_ereg() returns either int(how many bytes matched) or bool(false)
// and 3rd argument is a bit complicated.
?>While hardly mentioned anywhere, it may be useful to note that mb_ereg uses Oniguruma library internally. The syntax for the default mode (ruby) is described here:
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txtHebrew regex tested on PHP 5, Ubuntu 8.04.
Seems to work fine without the mb_regex_encoding lines (commented out).
Didn't seem to work with \uxxxx (also commented out).
<?php
echo "Line ";
//mb_regex_encoding("ISO-8859-8");
//if(mb_ereg(".*([\u05d0-\u05ea]).*", $this->current_line))
if(mb_ereg(".*([א-ת]).*", $this->current_line))
{
echo "has";
}
else
{
echo "doesn't have";
}
echo " Hebrew characters.<br>";
//mb_regex_encoding("UTF-8");
?>