本文整理汇总了Java中com.hankcs.hanlp.seg.Segment.seg方法的典型用法代码示例。如果您正苦于以下问题:Java Segment.seg方法的具体用法?Java Segment.seg怎么用?Java Segment.seg使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类com.hankcs.hanlp.seg.Segment
的用法示例。
在下文中一共展示了Segment.seg方法的15个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。
示例1: main
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public static void main(String[] args)
{
String[] testCase = new String[]{
"签约仪式前,秦光荣、李纪恒、仇和等一同会见了参加签约的企业家。",
"区长庄木弟新年致辞",
"朱立伦:两岸都希望共创双赢 习朱历史会晤在即",
"陕西首富吴一坚被带走 与令计划妻子有交集",
"据美国之音电台网站4月28日报道,8岁的凯瑟琳·克罗尔(凤甫娟)和很多华裔美国小朋友一样,小小年纪就开始学小提琴了。她的妈妈是位虎妈么?",
"凯瑟琳和露西(庐瑞媛),跟她们的哥哥们有一些不同。",
"王国强、高峰、汪洋、张朝阳光着头、韩寒、小四",
"张浩和胡健康复员回家了",
"王总和小丽结婚了",
"编剧邵钧林和稽道青说",
"这里有关天培的有关事迹",
"龚学平等领导说,邓颖超生前杜绝超生",
};
Segment segment = HanLP.newSegment().enableNameRecognize(true);
for (String sentence : testCase)
{
List<Term> termList = segment.seg(sentence);
System.out.println(termList);
}
}
示例2: testSpeedOfSecondViterbi
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public void testSpeedOfSecondViterbi() throws Exception
{
String text = "王总和小丽结婚了";
Segment segment = new ViterbiSegment().enableAllNamedEntityRecognize(false)
.enableNameRecognize(false) // 人名识别需要二次维特比,比较慢
.enableCustomDictionary(false);
System.out.println(segment.seg(text));
long start = System.currentTimeMillis();
int pressure = 1000000;
for (int i = 0; i < pressure; ++i)
{
segment.seg(text);
}
double costTime = (System.currentTimeMillis() - start) / (double) 1000;
System.out.printf("分词速度:%.2f字每秒", text.length() * pressure / costTime);
}
示例3: testIssue193
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public void testIssue193() throws Exception
{
String[] testCase = new String[]{
"以每台约200元的价格送到苹果售后维修中心换新机(苹果的保修基本是免费换新机)",
"可能以2500~2800元的价格回收",
"3700个益农信息社打通服务“最后一公里”",
"一位李先生给高政留言说上周五可以帮忙献血",
"一位浩宁达高层透露",
"五和万科长阳天地5个普宅项目",
"以1974点低点和5178点高点作江恩角度线",
"纳入统计的18家京系基金公司",
"华夏基金与嘉实基金两家京系基金公司",
"则应从排名第八的投标人开始依次递补三名投标人"
};
Segment segment = HanLP.newSegment().enableOrganizationRecognize(true).enableNumberQuantifierRecognize(true);
for (String sentence : testCase)
{
List<Term> termList = segment.seg(sentence);
System.out.println(termList);
}
}
示例4: testSpeedOfSecondViterbi
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public void testSpeedOfSecondViterbi() throws Exception
{
String text = "王总和小丽结婚了";
Segment segment = new ViterbiSegment().enableAllNamedEntityRecognize(false)
.enableNameRecognize(false) // 人名识别需要二次维特比,比较慢
.enableCustomDictionary(false)
;
System.out.println(segment.seg(text));
long start = System.currentTimeMillis();
int pressure = 1000000;
for (int i = 0; i < pressure; ++i)
{
segment.seg(text);
}
double costTime = (System.currentTimeMillis() - start) / (double)1000;
System.out.printf("分词速度:%.2f字每秒", text.length() * pressure / costTime);
}
示例5: main
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public static void main(String[] args)
{
String[] testCase = new String[]{
"蓝翔给宁夏固原市彭阳县红河镇黑牛沟村捐赠了挖掘机",
};
Segment segment = HanLP.newSegment().enablePlaceRecognize(true);
for (String sentence : testCase)
{
List<Term> termList = segment.seg(sentence);
System.out.println(termList);
}
}
示例6: main
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public static void main(String[] args)
{
String[] testCase = new String[]{
"我在上海林原科技有限公司兼职工作,",
"我经常在台川喜宴餐厅吃饭,",
"偶尔去开元地中海影城看电影。",
};
Segment segment = HanLP.newSegment().enableOrganizationRecognize(true);
for (String sentence : testCase)
{
List<Term> termList = segment.seg(sentence);
System.out.println(termList);
}
}
示例7: main
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public static void main(String[] args)
{
String[] testCase = new String[]{
"北川景子参演了林诣彬导演的《速度与激情3》",
"林志玲亮相网友:确定不是波多野结衣?",
"龟山千广和近藤公园在龟山公园里喝酒赏花",
};
Segment segment = HanLP.newSegment().enableJapaneseNameRecognize(true);
for (String sentence : testCase)
{
List<Term> termList = segment.seg(sentence);
System.out.println(termList);
}
}
示例8: main
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public static void main(String[] args)
{
String[] testCase = new String[]{
"一桶冰水当头倒下,微软的比尔盖茨、Facebook的扎克伯格跟桑德博格、亚马逊的贝索斯、苹果的库克全都不惜湿身入镜,这些硅谷的科技人,飞蛾扑火似地牺牲演出,其实全为了慈善。",
"世界上最长的姓名是简森·乔伊·亚历山大·比基·卡利斯勒·达夫·埃利奥特·福克斯·伊维鲁莫·马尔尼·梅尔斯·帕特森·汤普森·华莱士·普雷斯顿。",
};
Segment segment = HanLP.newSegment().enableTranslatedNameRecognize(true);
for (String sentence : testCase)
{
List<Term> termList = segment.seg(sentence);
System.out.println(termList);
}
}
示例9: main
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public static void main(String[] args)
{
HanLP.Config.ShowTermNature = false; // 关闭词性显示
Segment segment = new HMMSegment();
String[] sentenceArray = new String[]
{
"HanLP是由一系列模型与算法组成的Java工具包,目标是普及自然语言处理在生产环境中的应用。",
"高锰酸钾,强氧化剂,紫红色晶体,可溶于水,遇乙醇即被还原。常用作消毒剂、水净化剂、氧化剂、漂白剂、毒气吸收剂、二氧化碳精制剂等。", // 专业名词有一定辨识能力
"《夜晚的骰子》通过描述浅草的舞女在暗夜中扔骰子的情景,寄托了作者对庶民生活区的情感", // 非新闻语料
"这个像是真的[委屈]前面那个打扮太江户了,一点不上品[email protected]", // 微博
"鼎泰丰的小笼一点味道也没有...每样都淡淡的...淡淡的,哪有食堂2A的好次",
"克里斯蒂娜·克罗尔说:不,我不是虎妈。我全家都热爱音乐,我也鼓励他们这么做。",
"今日APPS:Sago Mini Toolbox培养孩子动手能力",
"财政部副部长王保安调任国家统计局党组书记",
"2.34米男子娶1.53米女粉丝 称夫妻生活没问题",
"你看过穆赫兰道吗",
"乐视超级手机能否承载贾布斯的生态梦"
};
for (String sentence : sentenceArray)
{
List<Term> termList = segment.seg(sentence);
System.out.println(termList);
}
// 测个速度
String text = "江西鄱阳湖干枯,中国最大淡水湖变成大草原";
System.out.println(segment.seg(text));
long start = System.currentTimeMillis();
int pressure = 1000;
for (int i = 0; i < pressure; ++i)
{
segment.seg(text);
}
double costTime = (System.currentTimeMillis() - start) / (double)1000;
System.out.printf("HMM2分词速度:%.2f字每秒\n", text.length() * pressure / costTime);
}
示例10: testSegment
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public void testSegment() throws Exception
{
HanLP.Config.ShowTermNature = false;
String text = "我实现了一个基于Character Based TriGram的分词器";
Segment segment = new HMMSegment();
List<Term> termList = segment.seg(text);
System.out.println(termList);
}
示例11: testViterbi
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public void testViterbi() throws Exception
{
HanLP.Config.enableDebug(true);
CustomDictionary.add("网剧");
Segment seg = new DijkstraSegment();
List<Term> termList = seg.seg("优酷总裁魏明介绍了优酷2015年的内容战略,表示要以“大电影、大网剧、大综艺”为关键词");
System.out.println(termList);
}
示例12: testIssue199
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public void testIssue199() throws Exception
{
Segment segment = new CRFSegment();
segment.enableCustomDictionary(false);// 开启自定义词典
segment.enablePartOfSpeechTagging(true);
List<Term> termList = segment.seg("更多采购");
System.out.println(termList);
for (Term term : termList)
{
if (term.nature == null)
{
System.out.println("识别到新词:" + term.word);
}
}
}
示例13: testMultiThreading
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public void testMultiThreading() throws Exception
{
Segment segment = BasicTokenizer.SEGMENT;
// 测个速度
String text = "江西鄱阳湖干枯,中国最大淡水湖变成大草原。";
System.out.println(segment.seg(text));
int pressure = 100000;
StringBuilder sbBigText = new StringBuilder(text.length() * pressure);
for (int i = 0; i < pressure; i++)
{
sbBigText.append(text);
}
text = sbBigText.toString();
long start = System.currentTimeMillis();
List<Term> termList1 = segment.seg(text);
double costTime = (System.currentTimeMillis() - start) / (double) 1000;
System.out.printf("单线程分词速度:%.2f字每秒\n", text.length() / costTime);
segment.enableMultithreading(4);
start = System.currentTimeMillis();
List<Term> termList2 = segment.seg(text);
costTime = (System.currentTimeMillis() - start) / (double) 1000;
System.out.printf("四线程分词速度:%.2f字每秒\n", text.length() / costTime);
assertEquals(termList1.size(), termList2.size());
Iterator<Term> iterator1 = termList1.iterator();
Iterator<Term> iterator2 = termList2.iterator();
while (iterator1.hasNext())
{
Term term1 = iterator1.next();
Term term2 = iterator2.next();
assertEquals(term1.word, term2.word);
assertEquals(term1.nature, term2.nature);
assertEquals(term1.offset, term2.offset);
}
}
示例14: fileSegment
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public static void fileSegment(Segment segment, String inputFilePath, String outputFilePath) {
try {
WordFreqStatistics.statistics(segment, inputFilePath);
BufferedReader reader = IOUtil.newBufferedReader(inputFilePath);
long allCount = 0;
long lexCount = 0;
long start = System.currentTimeMillis();
String outPath = inputFilePath.replace(".txt", "") + "-Segment-Result.txt";
if (outputFilePath != null && outputFilePath.trim().length() > 0) outPath = outputFilePath;
FileOutputStream fos = new FileOutputStream(new File(outPath));
String temp;
while ((temp = reader.readLine()) != null) {
List<Term> parse = segment.seg(temp);
StringBuilder sb = new StringBuilder();
for (Term term : parse) {
sb.append(term.toString() + "\t");
if (term.word.trim().length() > 0) {
allCount += term.length();
lexCount += 1;
}
}
fos.write(sb.toString().trim().getBytes());
fos.write("\n".getBytes());
}
fos.flush();
fos.close();
reader.close();
long end = System.currentTimeMillis();
System.out.println("segment result save:" + outPath);
System.out.println("共 " + allCount + " 个字符,共 " + lexCount + " 个词语,用时" + (end - start) + "毫秒," +
"每秒处理了:" + (allCount * 1000 / (end - start)));
} catch (IOException e) {
logger.error("IO error: " + e.getLocalizedMessage());
}
}
示例15: main
import com.hankcs.hanlp.seg.Segment; //导入方法依赖的package包/类
public static void main(String[] args)
{
HanLP.Config.ShowTermNature = false; // 关闭词性显示
Segment segment = new CRFSegment();
String[] sentenceArray = new String[]
{
"HanLP是由一系列模型与算法组成的Java工具包,目标是普及自然语言处理在生产环境中的应用。",
"鐵桿部隊憤怒情緒集結 馬英九腹背受敵", // 繁体无压力
"馬英九回應連勝文“丐幫說”:稱黨內同志談話應謹慎",
"高锰酸钾,强氧化剂,紫红色晶体,可溶于水,遇乙醇即被还原。常用作消毒剂、水净化剂、氧化剂、漂白剂、毒气吸收剂、二氧化碳精制剂等。", // 专业名词有一定辨识能力
"《夜晚的骰子》通过描述浅草的舞女在暗夜中扔骰子的情景,寄托了作者对庶民生活区的情感", // 非新闻语料
"这个像是真的[委屈]前面那个打扮太江户了,一点不上品[email protected]", // 微博
"鼎泰丰的小笼一点味道也没有...每样都淡淡的...淡淡的,哪有食堂2A的好次",
"克里斯蒂娜·克罗尔说:不,我不是虎妈。我全家都热爱音乐,我也鼓励他们这么做。",
"今日APPS:Sago Mini Toolbox培养孩子动手能力",
"财政部副部长王保安调任国家统计局党组书记",
"2.34米男子娶1.53米女粉丝 称夫妻生活没问题",
"你看过穆赫兰道吗",
"国办发布网络提速降费十四条指导意见 鼓励流量不清零",
"乐视超级手机能否承载贾布斯的生态梦"
};
for (String sentence : sentenceArray)
{
List<Term> termList = segment.seg(sentence);
System.out.println(termList);
}
}