本文整理汇总了Python中Document.Document.tokenize方法的典型用法代码示例。如果您正苦于以下问题:Python Document.tokenize方法的具体用法?Python Document.tokenize怎么用?Python Document.tokenize使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类Document.Document
的用法示例。
在下文中一共展示了Document.tokenize方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: main
# 需要导入模块: from Document import Document [as 别名]
# 或者: from Document.Document import tokenize [as 别名]
def main():
#read in training documents and documents to classify
documentList = createDocuments( sys.argv[ 1 ] )
trainDocs = documentList[ 0 ]
sampleDocs = documentList[ 1 ]
#read in stopwords
with open( "stopwords.txt" , "r" ) as f:
stopwords = Document.tokenize( f.read() )
#classify the documents with missing authors
attributor = Attributor( trainDocs , sampleDocs , stopwords )
attributor.train()
attributor.classify()
writeup = Writeup()
results = attributor.get_results()
writeup.print_accuracy( sampleDocs , results )
writeup.print_confusion_matrix( sampleDocs , results )
print
featureRankings = attributor.get_feature_ranking()
for i in range( 0 , 20 ):
print featureRankings[ i ][ 0 ] , featureRankings[ i ][ 1 ]
print
featureFrequencies = attributor.get_feature_frequencies()
featurePlotDataX = []
featurePlotDataY = []
for numFeatures in range(10,len(featureFrequencies)+1,10):
newStopwords = [ featureFrequencies[ i ][ 0 ] for i in range( 0 , numFeatures ) ]
newAttributor = Attributor( trainDocs , sampleDocs , newStopwords )
newAttributor.train()
newAttributor.classify()
newResults = newAttributor.get_results()
accuracy = writeup.get_accuracy( sampleDocs , newResults )
featurePlotDataX.append( numFeatures )
featurePlotDataY.append( accuracy )
print "Feature curve:"
for i in range(len(featurePlotDataX)):
print featurePlotDataX[ i ] , featurePlotDataY[ i ]
plt.plot( featurePlotDataX , featurePlotDataY )
plt.xlabel( "Number of Features" )
plt.ylabel( "Accuracy" )
plt.title( "Accuracy vs. Number of Features" )
plt.axis( [0, 450, -0.1, 1.1] )
plt.show()