當前位置: 首頁>>代碼示例>>Python>>正文


Python lancaster.LancasterStemmer方法代碼示例

本文整理匯總了Python中nltk.stem.lancaster.LancasterStemmer方法的典型用法代碼示例。如果您正苦於以下問題:Python lancaster.LancasterStemmer方法的具體用法?Python lancaster.LancasterStemmer怎麽用?Python lancaster.LancasterStemmer使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在nltk.stem.lancaster的用法示例。


在下文中一共展示了lancaster.LancasterStemmer方法的7個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Python代碼示例。

示例1: __init__

# 需要導入模塊: from nltk.stem import lancaster [as 別名]
# 或者: from nltk.stem.lancaster import LancasterStemmer [as 別名]
def __init__(self):

        ###############################################################
        #
        # Sets up all default requirements and placeholders 
        # needed for the NLU engine to run. 
        #
        # - Helpers: Useful global functions
        # - Logging: Logging class
        # - LancasterStemmer: Word stemmer
        #
        ###############################################################
        
        self.ignore  = [',','.','!','?']
        
        self.Helpers = Helpers()
        self._confs  = self.Helpers.loadConfigs()
        self.LogFile = self.Helpers.setLogFile(self._confs["aiCore"]["Logs"]+"JumpWay/")
        
        self.LancasterStemmer = LancasterStemmer() 
開發者ID:GeniSysAI,項目名稱:NLU,代碼行數:22,代碼來源:Data.py

示例2: __init__

# 需要導入模塊: from nltk.stem import lancaster [as 別名]
# 或者: from nltk.stem.lancaster import LancasterStemmer [as 別名]
def __init__(self, Logging, LogFile):
        
        self.LancasterStemmer = LancasterStemmer()

        self.Logging          = Logging
        self.LogFile          = LogFile
        
        self.ignore  = [
            '?',
            '!'
        ]
        
        self.Logging.logMessage(
            self.LogFile,
            "Data",
            "INFO",
            "Data Helper Ready") 
開發者ID:GeniSysAI,項目名稱:NLU,代碼行數:19,代碼來源:Users.py

示例3: getTokens

# 需要導入模塊: from nltk.stem import lancaster [as 別名]
# 或者: from nltk.stem.lancaster import LancasterStemmer [as 別名]
def getTokens(self, removeStopwords=True):
        """ Tokenizes the text, breaking it up into words, removing punctuation. """
        tokenizer = nltk.RegexpTokenizer('[a-zA-Z]\w+\'?\w*') # A custom regex tokenizer.
        spans = list(tokenizer.span_tokenize(self.text))
        # Take note of how many spans there are in the text
        self.length = spans[-1][-1]
        tokens = tokenizer.tokenize(self.text)
        tokens = [ token.lower() for token in tokens ] # make them lowercase
        stemmer = LancasterStemmer()
        tokens = [ stemmer.stem(token) for token in tokens ]
        if not removeStopwords:
            self.spans = spans
            return tokens
        tokenSpans = list(zip(tokens, spans)) # zip it up
        stopwords = nltk.corpus.stopwords.words('english') # get stopwords
        tokenSpans = [ token for token in tokenSpans if token[0] not in stopwords ] # remove stopwords from zip
        self.spans = [ x[1] for x in tokenSpans ] # unzip; get spans
        return [ x[0] for x in tokenSpans ] # unzip; get tokens 
開發者ID:JonathanReeve,項目名稱:text-matcher,代碼行數:20,代碼來源:matcher.py

示例4: extract

# 需要導入模塊: from nltk.stem import lancaster [as 別名]
# 或者: from nltk.stem.lancaster import LancasterStemmer [as 別名]
def extract(self, data=None, splitIt=False):

        ###############################################################
        #
        # Extracts words from sentences, stripping out characters in 
        # the ignore list above
        # 
        # https://www.nltk.org/_modules/nltk/stem/lancaster.html
        # http://insightsbot.com/blog/R8fu5/bag-of-words-algorithm-in-python-introduction
        #
        ###############################################################
        
        return [self.LancasterStemmer.stem(word) for word in (data.split() if splitIt == True else data) if word not in self.ignore] 
開發者ID:GeniSysAI,項目名稱:NLU,代碼行數:15,代碼來源:Data.py

示例5: extract

# 需要導入模塊: from nltk.stem import lancaster [as 別名]
# 或者: from nltk.stem.lancaster import LancasterStemmer [as 別名]
def extract(self, data=None, lowerIt=True, splitIt=False, ignoreWords=False):
        
        if ignoreWords:
            return [self.LancasterStemmer.stem(word if lowerIt == False else word.lower()) for word in (data.split() if splitIt == True else data) if word not in self.ignore]
        else:
            return [self.LancasterStemmer.stem(word if lowerIt == False else word.lower()) for word in (data.split() if splitIt == True else data)] 
開發者ID:GeniSysAI,項目名稱:NLU,代碼行數:8,代碼來源:Users.py

示例6: __init__

# 需要導入模塊: from nltk.stem import lancaster [as 別名]
# 或者: from nltk.stem.lancaster import LancasterStemmer [as 別名]
def __init__(self):

        ###############################################################
        #
        # Sets up all default requirements
        #
        # - Helpers: Useful global functions
        # - LancasterStemmer: Word stemmer
        #
        ###############################################################
        
        self.Helpers = Helpers()
        self._confs  = self.Helpers.loadConfigs()

        self.stemmer = LancasterStemmer() 
開發者ID:GeniSysAI,項目名稱:NLU,代碼行數:17,代碼來源:Mitie.py

示例7: get_vocabularies

# 需要導入模塊: from nltk.stem import lancaster [as 別名]
# 或者: from nltk.stem.lancaster import LancasterStemmer [as 別名]
def get_vocabularies(dataset, vocab_file, nearby_file):
  """Create map from example ID to (basic_words, nearby_words."""
  with open(vocab_file) as f:
    basic_vocab = [line.strip() for line in f]
  with open(nearby_file) as f:
    nearby_words = json.load(f)
  stemmer = LancasterStemmer()
  vocabs = {}
  for a in dataset['data']:
    for p in a['paragraphs']:
      for q in p['qas']:
        q_words = [w.lower() for w in word_tokenize(q['question'])]
        if OPTS.mode == 'basic':
          vocabs[q['id']] = (basic_vocab, [])
        elif OPTS.mode == 'add-question-words':
          vocabs[q['id']] = (basic_vocab, q_words)
        elif OPTS.mode.endswith('-nearby'):
          q_stems = [stemmer.stem(qw) for qw in q_words]
          cur_vocab = [w for w in basic_vocab if w not in q_stems]
          cur_nearby = []
          for q_word, q_stem in zip(q_words, q_stems):
            if q_word in nearby_words:
              qw_nearby = []
              for nearby_word in nearby_words[q_word]:
                if len(qw_nearby) == OPTS.num_nearby: break
                if nearby_word['word'] in PUNCTUATION: continue
                nearby_stem = stemmer.stem(nearby_word['word'])
                if nearby_stem != q_stem:
                  qw_nearby.append(nearby_word['word'])
              cur_nearby.extend(qw_nearby)
          vocabs[q['id']] = (cur_vocab, cur_nearby)
  return vocabs 
開發者ID:robinjia,項目名稱:adversarial-squad,代碼行數:34,代碼來源:adversarial_squad.py


注:本文中的nltk.stem.lancaster.LancasterStemmer方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。