Python命令行IMDB爬虫(Scraper)

概览

这个脚本要求输入一个电影标题和年代，然后查询IMDB获得电影信息。

命令行IMDB Scraper

第一步是导入必要的模块。

#!/usr/bin/env python27

#Importing the modules

from BeautifulSoup import BeautifulSoup
import sys
import urllib2
import re
import json

#Ask for movie title
title = raw_input("Please enter a movie title: ")

#Ask for which year
year = raw_input("which year? ")

#Search for spaces in the title string
raw_string = re.compile(r' ')

#Replace spaces with a plus sign
searchstring = raw_string.sub('+', title)

#Prints the search string
print searchstring

#The actual query
url = "http://www.imdbapi.com/?t=" + searchstring + "&y="+year

request = urllib2.Request(url)

response = json.load(urllib2.urlopen(request))

print json.dumps(response,indent=2)

不错吧，好好享受！

IMDB爬虫

对于python 3.3:


#!/usr/bin/env python27

#Importing the modules

from bs4 import BeautifulSoup

import sys

import urllib.request

import urllib.error

import re

import json

#Ask for movie title

title = input("Please enter a movie title: ")

#Ask for which year

year = input("which year? ")

#Search for spaces in the title string

raw_string = re.compile(r' ')

#Replace spaces with a plus sign

searchstring = raw_string.sub('+', title)

#Prints the search string

print(searchstring)

#The actual query

url = "http://www.imdbapi.com/?t=" + searchstring + "&y="+year

request = urllib.request.Request(url)

response = json.loads(urllib.request.urlopen(request).read().decode("utf-8"))

print(json.dumps(response,indent=2))

参考资料

Python Command Line IMDB Scraper