本文整理汇总了Python中SolrClient.SolrClient.get_industry_term_field_analysis方法的典型用法代码示例。如果您正苦于以下问题:Python SolrClient.get_industry_term_field_analysis方法的具体用法?Python SolrClient.get_industry_term_field_analysis怎么用?Python SolrClient.get_industry_term_field_analysis使用的例子?那么, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类SolrClient.SolrClient
的用法示例。
在下文中一共展示了SolrClient.get_industry_term_field_analysis方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: TaggingProcessor
# 需要导入模块: from SolrClient import SolrClient [as 别名]
# 或者: from SolrClient.SolrClient import get_industry_term_field_analysis [as 别名]
#.........这里部分代码省略.........
if not self.dict_tagging:
self._logger.info("dictionary tagging is set to false. Disable dictionary tagging.")
return
self._logger.info("Dictionary tagging is enabled.")
try:
self.dictionary_file = config['DICTIONARY_TAGGER']['dictionary_file']
except KeyError:
self._logger.exception("Oops! 'dict_tagging' is set incorrectly in config file. Default to use default csv file in config dir.")
self.dictionary_file = os.path.join(os.path.dirname(__file__), '..','config','Steel-Terminology-Tata-Steel.csv')
try:
self.dict_tagger_fuzzy_matching=config['DICTIONARY_TAGGER']['dict_tagger_fuzzy_matching']
if "true" == self.dict_tagger_fuzzy_matching.lower():
self.dict_tagger_fuzzy_matching = True
elif "false" == self.dict_tagger_fuzzy_matching.lower():
self.dict_tagger_fuzzy_matching = False
except KeyError:
self._logger.exception("Oops! 'dict_tagger_fuzzy_matching' is set incorrectly in config file. Default to False.")
self.dict_tagger_fuzzy_matching=False
try:
self.dict_tagger_sim_threshold=float(config['DICTIONARY_TAGGER']['dict_tagger_sim_threshold'])
except KeyError:
self._logger.exception("Oops! 'dict_tagger_sim_threshold' is set incorrectly in config file. Default to 0.95.")
self.dict_tagger_sim_threshold=float(0.95)
self.dict_terms = load_terms_from_csv(self.dictionary_file)
self._logger.info("normalising terms from dictionary...")
self.dict_terms = [self.solrClient.get_industry_term_field_analysis(dict_term) for dict_term in self.dict_terms]
self._logger.info("dictionary terms are normalised and loaded successfully. Total dictionary term size is [%s]", str(len(self.dict_terms)))
if self.dict_tagger_fuzzy_matching:
self._logger.info("loading into Trie nodes for fuzzy matching...")
self.dict_terms_trie = TrieNode()
[self.dict_terms_trie.insert(normed_term) for normed_term in self.dict_terms]
self._logger.info("loaded into Trie nodes successfully.")
else:
self.dict_terms_trie = TrieNode()
def load_grammars(self):
grammars=[]
pos_sequences = read_by_line(self.pos_sequences_file)
for sequence_str in pos_sequences:
grammars.append(sequence_str.replace('\n','').strip())
return grammars
def parsing_candidates_regexp(self, text_pos_tokens,candidate_grammar):
cp = nltk.RegexpParser(candidate_grammar)
candidate_chunk=cp.parse(text_pos_tokens)
term_candidates=set()
for node_a in candidate_chunk:
if type(node_a) is nltk.Tree:
if node_a.label() == 'TermCandidate':
term_tokens=[]
for node_b in node_a:
if node_b[0] == '"':
#TODO: find a more elegant way to deal with spurious POS tagging for quotes
continue