Python Filter.check_duplicates方法代碼示例

本文整理匯總了Python中Filter.Filter.check_duplicates方法的典型用法代碼示例。如果您正苦於以下問題：Python Filter.check_duplicates方法的具體用法？Python Filter.check_duplicates怎麽用？Python Filter.check_duplicates使用的例子？那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類Filter.Filter的用法示例。

在下文中一共展示了Filter.check_duplicates方法的2個代碼示例，這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚，您的評價將有助於係統推薦出更棒的Python代碼示例。

示例1: time

# 需要導入模塊: from Filter import Filter [as 別名]
# 或者: from Filter.Filter import check_duplicates [as 別名]
for document in cursor:
    text = ' '.join(document["text"].encode("utf-8").split())
    corpus.append(text)
    ids.append(document["_id"])

# filter repeated tweets
t0 = time()
i = 0
status = -1
unique_tweets = ["Dummy Tweet"]
length = len(corpus)

print("Filtering tweets may take a few minutes...")
for document in corpus:
    for tweet in unique_tweets:
        status = tweet_filter.check_duplicates(document, tweet)
        if status:
            break
    if not status:
        unique_tweets.append(document)
    i += 1
    if i > 3000:
        break

print("done in %0.3fs." % (time() - t0))
unique_tweets.pop(0)
corpus = unique_tweets
# create sample by bootstrap sampling
random_indices = random.sample(range(0, len(corpus)), q.num_of_docs)

# Open file I/O streams

開發者ID:kearnsw，項目名稱:Twitt.IR，代碼行數:33，代碼來源:createSample.py

示例2: str

# 需要導入模塊: from Filter import Filter [as 別名]
# 或者: from Filter.Filter import check_duplicates [as 別名]
# Open file I/O streams
directory = os.path.dirname(os.getcwd())
fn = "sample_" + str(months[month]) + "_" + str(day) + ".json"
f = open(directory + "/data/" + fn, "w+")

# load tweet with id
corpus = [{"text": "dummy"}]
tweetFilter = Filter(45)
i = 0
print("Filtering Results...")
for document in cursor:
    document["_id"] = str(document["_id"])
    document["text"] = document["text"].replace('"', "'")
    for tweet in corpus:
        # If return a match then append to unique tweets
        status = tweetFilter.check_duplicates(document["text"], tweet["text"])
        if status:
            break
    if not status:
        corpus.append(document["text"])
        i += 1
    if i >= 100:
        break
    print(i)


# Remove header
corpus.pop(0)
json.dump(corpus, f, indent=1)

開發者ID:kearnsw，項目名稱:Twitt.IR，代碼行數:31，代碼來源:dump.py

注：本文中的Filter.Filter.check_duplicates方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台，相關代碼片段篩選自各路編程大神貢獻的開源項目，源碼版權歸原作者所有，傳播和使用請參考對應項目的License；未經允許，請勿轉載。