本文整理汇总了Python中lxml.cssselect.CSSSelector.getiterator方法的典型用法代码示例。如果您正苦于以下问题:Python CSSSelector.getiterator方法的具体用法?Python CSSSelector.getiterator怎么用?Python CSSSelector.getiterator使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类lxml.cssselect.CSSSelector
的用法示例。
在下文中一共展示了CSSSelector.getiterator方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: unicode
# 需要导入模块: from lxml.cssselect import CSSSelector [as 别名]
# 或者: from lxml.cssselect.CSSSelector import getiterator [as 别名]
while True:
req = urllib2.Request(url)
req.add_header("User-Agent", useragent)
if lasturl:
req.add_header("Referer", lasturl)
html = unicode(urllib2.urlopen(req).read(), errors="ignore")
doc = etree.HTML(html)
rtr = CSSSelector("ol#rtr")(doc)
if rtr:
numresults = len(rtr[0].getchildren())
else:
numresults = 0
print "hit " + url + " got " + str(numresults) + " results"
rhscol = CSSSelector("div#rhscol")(doc)[0]
links = [a for a in rhscol.getiterator("a")]
if len(links) != 3 or "Older" not in links[1].text or "Newer" not in links[2].text:
print "Cant find older and newer links here, backing up"
oldurl = page.url
match = re.search("mbl_hs:(\d+),mbl_he:(\d+),mbl_rs:(\d+),mbl_re:(\d+)", oldurl)
mbl_hs = int(match.group(1)) + 600
mbl_he = int(match.group(2)) + 600
mbl_rs = int(match.group(3)) + 600
mbl_re = int(match.group(4)) + 600
url = oldurl.replace(
match.group(0),
"mbl_hs:" + str(mbl_hs) + ",mbl_he:" + str(mbl_he) + ",mbl_rs:" + str(mbl_rs) + ",mbl_re:" + str(mbl_re),
)
lasturl = oldurl
time.sleep(10)
continue