本文整理汇总了PHP中LanguageUtf8::stripForSearch方法的典型用法代码示例。如果您正苦于以下问题:PHP LanguageUtf8::stripForSearch方法的具体用法?PHP LanguageUtf8::stripForSearch怎么用?PHP LanguageUtf8::stripForSearch使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类LanguageUtf8
的用法示例。
在下文中一共展示了LanguageUtf8::stripForSearch方法的2个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的PHP代码示例。
示例1: stripForSearch
function stripForSearch($string)
{
$fname = "LanguageZh::stripForSearch";
wfProfileIn($fname);
// eventually this should be a word segmentation
// for now just treat each character as a word
$t = preg_replace("/([\\xc0-\\xff][\\x80-\\xbf]*)/e", "' ' .\"\$1\"", $string);
//always convert to zh-cn before indexing. it should be
//better to use zh-cn for search, since conversion from
//Traditional to Simplified is less ambiguous than the
//other way around
$t = $this->mConverter->autoConvert($t, 'zh-cn');
$t = LanguageUtf8::stripForSearch($t);
wfProfileOut($fname);
return $t;
}
示例2: stripForSearch
function stripForSearch($string)
{
# MySQL fulltext index doesn't grok utf-8, so we
# need to fold cases and convert to hex
$s = $string;
# Strip known punctuation ?
#$s = preg_replace( '/\xe3\x80[\x80-\xbf]/', '', $s ); # U3000-303f
# Space strings of like hiragana/katakana/kanji
$hiragana = '(?:\\xe3(?:\\x81[\\x80-\\xbf]|\\x82[\\x80-\\x9f]))';
# U3040-309f
$katakana = '(?:\\xe3(?:\\x82[\\xa0-\\xbf]|\\x83[\\x80-\\xbf]))';
# U30a0-30ff
$kanji = '(?:\\xe3[\\x88-\\xbf][\\x80-\\xbf]' . '|[\\xe4-\\xe8][\\x80-\\xbf]{2}' . '|\\xe9[\\x80-\\xa5][\\x80-\\xbf]' . '|\\xe9\\xa6[\\x80-\\x99])';
# U3200-9999 = \xe3\x88\x80-\xe9\xa6\x99
$s = preg_replace("/({$hiragana}+|{$katakana}+|{$kanji}+)/", ' $1 ', $s);
# Double-width roman characters: ff00-ff5f ~= 0020-007f
$s = preg_replace('/\\xef\\xbc([\\x80-\\xbf])/e', 'chr((ord("$1") & 0x3f) + 0x20)', $s);
$s = preg_replace('/\\xef\\xbd([\\x80-\\x99])/e', 'chr((ord("$1") & 0x3f) + 0x60)', $s);
# Do general case folding and UTF-8 armoring
return LanguageUtf8::stripForSearch($s);
}