本文整理匯總了Python中models.Category.combine_url方法的典型用法代碼示例。如果您正苦於以下問題:Python Category.combine_url方法的具體用法?Python Category.combine_url怎麽用?Python Category.combine_url使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類models.Category
的用法示例。
在下文中一共展示了Category.combine_url方法的1個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Python代碼示例。
示例1: crawl_category
# 需要導入模塊: from models import Category [as 別名]
# 或者: from models.Category import combine_url [as 別名]
def crawl_category(self, ctx='', **kwargs):
res = requests.get(HOST)
res.raise_for_status()
tree = lxml.html.fromstring(res.content)
dept_nodes = tree.cssselect('div#top-navigation ul.navigation li.menu-item a')
for dept_node in dept_nodes:
key = dept_node.text.strip()
if 'brand' in key.lower():
continue
combine_url = dept_node.get('href')
match = re.search(r'https?://.+', combine_url)
if not match:
combine_url = '%s%s' % (HOST, combine_url)
r = requests.get(combine_url)
r.raise_for_status()
t = lxml.html.fromstring(r.content)
pagesize_node = None
link_nodes = t.cssselect('div.atg_store_filter ul.atg_store_pager li')
for link_node in link_nodes:
if link_node.get('class') and 'nextLink' in link_node.get('class'):
break
pagesize_node = link_node
pagesize = int(pagesize_node.cssselect('a')[0].text.strip()) if pagesize_node else 1
is_new = False; is_updated = False
category = Category.objects(key=key).first()
if not category:
is_new = True
category = Category(key=key)
category.is_leaf = True
if combine_url and combine_url != category.combine_url:
category.combine_url = combine_url
is_updated = True
if pagesize and pagesize != category.pagesize:
category.pagesize = pagesize
is_updated = True
category.hit_time = datetime.utcnow()
category.save()
print category.key; print category.cats; print category.pagesize; print category.combine_url; print is_new; print is_updated; print;
common_saved.send(sender=ctx, obj_type='Category', key=category.key, url=category.combine_url, \
is_new=is_new, is_updated=((not is_new) and is_updated) )