示例说明
在此示例中,使用特征脸和SVM进行人脸识别,使用的数据集是“Labeled Faces in the Wild”(也称为LFW数据集)的预处理摘录:
数据集中代表性最高的5个人的预期结果如下:(可以用来跟后面代码运行的结果做个对比)
准确率 | 召回率 | F1值 | 支持度 | |
---|---|---|---|---|
艾丽尔·沙龙(Ariel Sharon) | 0.67 | 0.92 | 0.77 | 13 |
科林·鲍威尔 | 0.75 | 0.78 | 0.76 | 60 |
唐纳德·拉姆斯菲尔德 | 0.78 | 0.67 | 0.72 | 27 |
乔治·W·布什 | 0.86 | 0.86 | 0.86 | 146 |
格哈德·施罗德 | 0.76 | 0.76 | 0.76 | 25 |
雨果·查韦斯(Hugo Chavez) | 0.67 | 0.67 | 0.67 | 15 |
托尼·布莱尔 | 0.81 | 0.69 | 0.75 | 36 |
平均/总计 | 0.80 | 0.80 | 0.80 | 322 |
代码实现[Python]
# -*- coding: utf-8 -*-
from time import time
import logging
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import fetch_lfw_people
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import PCA
from sklearn.svm import SVC
print(__doc__)
# 输出进度日志
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
# #############################################################################
# 下载数据并加载为numpy数组。
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
# 获得图像数组的形状(用于绘图)
n_samples, h, w = lfw_people.images.shape
# for machine learning we use the 2 data directly (as relative pixel
# positions info is ignored by this model)
X = lfw_people.data
n_features = X.shape[1]
# 预测的目标是人的ID
y = lfw_people.target
target_names = lfw_people.target_names
n_classes = target_names.shape[0]
print("Total dataset size:")
print("n_samples: %d" % n_samples)
print("n_features: %d" % n_features)
print("n_classes: %d" % n_classes)
# #############################################################################
# Split into a training set and a test set using a stratified k fold
# 切分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=42)
# #############################################################################
# 在人脸数据集(当做无标记数据)上计算PCA (eigenfaces,特征脸):
# 无监督特征提取 / 降维
n_components = 150
print("Extracting the top %d eigenfaces from %d faces"
% (n_components, X_train.shape[0]))
t0 = time()
pca = PCA(n_components=n_components, svd_solver='randomized',
whiten=True).fit(X_train)
print("done in %0.3fs" % (time() - t0))
eigenfaces = pca.components_.reshape((n_components, h, w))
print("Projecting the input data on the eigenfaces orthonormal basis")
t0 = time()
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
print("done in %0.3fs" % (time() - t0))
# #############################################################################
# 训练SVM分类模型
print("Fitting the classifier to the training set")
t0 = time()
param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],
'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }
clf = GridSearchCV(SVC(kernel='rbf', class_weight='balanced'),
param_grid, cv=5, iid=False)
clf = clf.fit(X_train_pca, y_train)
print("done in %0.3fs" % (time() - t0))
print("Best estimator found by grid search:")
print(clf.best_estimator_)
# #############################################################################
# 在测试集上评估模型的量化效果
print("Predicting people's names on the test set")
t0 = time()
y_pred = clf.predict(X_test_pca)
print("done in %0.3fs" % (time() - t0))
print(classification_report(y_test, y_pred, target_names=target_names))
print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))
# #############################################################################
# 使用 matplotlib 定性分析预测结果
def plot_gallery(images, titles, h, w, n_row=3, n_col=4):
"""Helper function to plot a gallery of portraits"""
plt.figure(figsize=(1.8 * n_col, 2.4 * n_row))
plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
for i in range(n_row * n_col):
plt.subplot(n_row, n_col, i + 1)
plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
plt.title(titles[i], size=12)
plt.xticks(())
plt.yticks(())
# 绘制部分测试集的预测结果
def title(y_pred, y_test, target_names, i):
pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]
true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]
return 'predicted: %s\ntrue: %s' % (pred_name, true_name)
prediction_titles = [title(y_pred, y_test, target_names, i)
for i in range(y_pred.shape[0])]
plot_gallery(X_test, prediction_titles, h, w)
# 绘制几个最重要的特征脸的相册
eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
plot_gallery(eigenfaces, eigenface_titles, h, w)
plt.show()
代码执行
代码运行时间大约:0分59.514秒。
运行代码输出的文本内容如下:
Total dataset size: n_samples: 1288 n_features: 1850 n_classes: 7 Extracting the top 150 eigenfaces from 966 faces done in 0.118s Projecting the input data on the eigenfaces orthonormal basis done in 0.005s Fitting the classifier to the training set done in 36.078s Best estimator found by grid search: SVC(C=1000.0, class_weight='balanced', gamma=0.005) Predicting people's names on the test set done in 0.061s precision recall f1-score support Ariel Sharon 0.75 0.46 0.57 13 Colin Powell 0.79 0.87 0.83 60 Donald Rumsfeld 0.94 0.63 0.76 27 George W Bush 0.83 0.98 0.90 146 Gerhard Schroeder 0.91 0.80 0.85 25 Hugo Chavez 1.00 0.53 0.70 15 Tony Blair 0.96 0.75 0.84 36 accuracy 0.85 322 macro avg 0.88 0.72 0.78 322 weighted avg 0.86 0.85 0.84 322 [[ 6 2 0 5 0 0 0] [ 1 52 0 7 0 0 0] [ 1 3 17 6 0 0 0] [ 0 3 0 143 0 0 0] [ 0 1 0 3 20 0 1] [ 0 4 0 2 1 8 0] [ 0 1 1 6 1 0 27]]
运行代码输出的图片内容如下:
源码下载
- Python版源码文件: plot_face_recognition.py
- Jupyter Notebook版源码文件: plot_face_recognition.ipynb