当前位置: 首页>>代码示例>>Python>>正文


Python Document.tokenize方法代码示例

本文整理汇总了Python中Document.Document.tokenize方法的典型用法代码示例。如果您正苦于以下问题:Python Document.tokenize方法的具体用法?Python Document.tokenize怎么用?Python Document.tokenize使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在Document.Document的用法示例。


在下文中一共展示了Document.tokenize方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。

示例1: main

# 需要导入模块: from Document import Document [as 别名]
# 或者: from Document.Document import tokenize [as 别名]
def main():
    
    #read in training documents and documents to classify
    documentList = createDocuments( sys.argv[ 1 ] )
    trainDocs = documentList[ 0 ]
    sampleDocs = documentList[ 1 ]
        
    #read in stopwords
    with open( "stopwords.txt" , "r" ) as f:
        stopwords = Document.tokenize( f.read() )
    
    #classify the documents with missing authors
    attributor = Attributor( trainDocs , sampleDocs , stopwords )
    attributor.train()
    attributor.classify()
    
    writeup = Writeup()
    results = attributor.get_results()
    writeup.print_accuracy( sampleDocs , results )
    
    writeup.print_confusion_matrix( sampleDocs , results )
    
    print
    featureRankings = attributor.get_feature_ranking()
    for i in range( 0 , 20 ):
        print featureRankings[ i ][ 0 ] , featureRankings[ i ][ 1 ]
    
    print
    featureFrequencies = attributor.get_feature_frequencies()
    featurePlotDataX = []
    featurePlotDataY = []
    for numFeatures in range(10,len(featureFrequencies)+1,10):
        newStopwords = [ featureFrequencies[ i ][ 0 ] for i in range( 0 , numFeatures ) ]
        newAttributor = Attributor( trainDocs , sampleDocs , newStopwords )
        newAttributor.train()
        newAttributor.classify()
        newResults = newAttributor.get_results()
        accuracy = writeup.get_accuracy( sampleDocs , newResults )
        featurePlotDataX.append( numFeatures )
        featurePlotDataY.append( accuracy )
        
    print "Feature curve:"
    for i in range(len(featurePlotDataX)):
        print featurePlotDataX[ i ] , featurePlotDataY[ i ]
        
    plt.plot( featurePlotDataX , featurePlotDataY )
    plt.xlabel( "Number of Features" )
    plt.ylabel( "Accuracy" )
    plt.title( "Accuracy vs. Number of Features" )
    plt.axis( [0, 450, -0.1, 1.1] )
    plt.show()
开发者ID:mjchao,项目名称:Authorship-Attribution,代码行数:53,代码来源:mycode.py


注:本文中的Document.Document.tokenize方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。