本文整理匯總了Python中sklearn.feature_extraction.text.TfidfVectorizer.count方法的典型用法代碼示例。如果您正苦於以下問題:Python TfidfVectorizer.count方法的具體用法?Python TfidfVectorizer.count怎麽用?Python TfidfVectorizer.count使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類sklearn.feature_extraction.text.TfidfVectorizer
的用法示例。
在下文中一共展示了TfidfVectorizer.count方法的1個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Python代碼示例。
示例1: TfidfVectorizer
# 需要導入模塊: from sklearn.feature_extraction.text import TfidfVectorizer [as 別名]
# 或者: from sklearn.feature_extraction.text.TfidfVectorizer import count [as 別名]
data = pd.read_csv('../dataset/combined/Combined_News_DJIA.csv')
train = data[data['Date'] < '2015-01-01']
test = data[data['Date'] > '2014-12-31']
example = train.iloc[3, 10]
print 'EXAMPLE 1 -- ', example
example2 = example.lower()
print'EXAMPLE 2 -- ', example2
example3 = TfidfVectorizer().build_tokenizer()(example2)
print 'EXAMPLE 3 -- ', example3
pd.DataFrame([[x,example3.count(x)] for x in set(example3)], columns = ['Word', 'Count'])
trainheadlines = []
for row in range(0,len(train.index)):
trainheadlines.append(' '.join(str(x) for x in train.iloc[row,2:27]))
basicvectorizer = TfidfVectorizer()
# 將trainheadlines轉換為稀疏矩陣,表示每日的新聞裏每個詞出現的次數
basictrain = basicvectorizer.fit_transform(trainheadlines)
# basictrain is a sparse matrix
# (x,y),x組數據,y組特征
print 'The shape of the sparce matrix -- ',basictrain.shape
basicmodel = LogisticRegression() # 邏輯回歸分類器
basicmodel = basicmodel.fit(basictrain, train["Label"]) # 輸入數據,分類目標,開始訓練