Apache日誌文件可能很龐大且難以閱讀。
這裏提供一種從Apache日誌文件中獲取訪問量最大的頁麵(或文件)列表的方法。
在此示例中,我們隻需要知道GET請求中的URL。編程實現將使用Python的集合中強大的Counter計數器
import collections
logfile = open("yourlogfile.log", "r")
clean_log=[]
for line in logfile:
try:
# copy the URLS to an empty list.
# We get the part between GET and HTTP
clean_log.append(line[line.index("GET")+4:line.index("HTTP")])
except:
pass
counter = collections.Counter(clean_log)
# get the Top 50 most popular URLs
for count in counter.most_common(50):
print(str(count[1]) + " " + str(count[0]))
logfile.close()