示例說明
在此示例中,使用特征臉和SVM進行人臉識別,使用的數據集是“Labeled Faces in the Wild”(也稱為LFW數據集)的預處理摘錄:
數據集中代表性最高的5個人的預期結果如下:(可以用來跟後麵代碼運行的結果做個對比)
準確率 | 召回率 | F1值 | 支持度 | |
---|---|---|---|---|
艾麗爾·沙龍(Ariel Sharon) | 0.67 | 0.92 | 0.77 | 13 |
科林·鮑威爾 | 0.75 | 0.78 | 0.76 | 60 |
唐納德·拉姆斯菲爾德 | 0.78 | 0.67 | 0.72 | 27 |
喬治·W·布什 | 0.86 | 0.86 | 0.86 | 146 |
格哈德·施羅德 | 0.76 | 0.76 | 0.76 | 25 |
雨果·查韋斯(Hugo Chavez) | 0.67 | 0.67 | 0.67 | 15 |
托尼·布萊爾 | 0.81 | 0.69 | 0.75 | 36 |
平均/總計 | 0.80 | 0.80 | 0.80 | 322 |
代碼實現[Python]
# -*- coding: utf-8 -*-
from time import time
import logging
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import fetch_lfw_people
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import PCA
from sklearn.svm import SVC
print(__doc__)
# 輸出進度日誌
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
# #############################################################################
# 下載數據並加載為numpy數組。
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
# 獲得圖像數組的形狀(用於繪圖)
n_samples, h, w = lfw_people.images.shape
# for machine learning we use the 2 data directly (as relative pixel
# positions info is ignored by this model)
X = lfw_people.data
n_features = X.shape[1]
# 預測的目標是人的ID
y = lfw_people.target
target_names = lfw_people.target_names
n_classes = target_names.shape[0]
print("Total dataset size:")
print("n_samples: %d" % n_samples)
print("n_features: %d" % n_features)
print("n_classes: %d" % n_classes)
# #############################################################################
# Split into a training set and a test set using a stratified k fold
# 切分訓練集和測試集
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=42)
# #############################################################################
# 在人臉數據集(當做無標記數據)上計算PCA (eigenfaces,特征臉):
# 無監督特征提取 / 降維
n_components = 150
print("Extracting the top %d eigenfaces from %d faces"
% (n_components, X_train.shape[0]))
t0 = time()
pca = PCA(n_components=n_components, svd_solver='randomized',
whiten=True).fit(X_train)
print("done in %0.3fs" % (time() - t0))
eigenfaces = pca.components_.reshape((n_components, h, w))
print("Projecting the input data on the eigenfaces orthonormal basis")
t0 = time()
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
print("done in %0.3fs" % (time() - t0))
# #############################################################################
# 訓練SVM分類模型
print("Fitting the classifier to the training set")
t0 = time()
param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],
'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }
clf = GridSearchCV(SVC(kernel='rbf', class_weight='balanced'),
param_grid, cv=5, iid=False)
clf = clf.fit(X_train_pca, y_train)
print("done in %0.3fs" % (time() - t0))
print("Best estimator found by grid search:")
print(clf.best_estimator_)
# #############################################################################
# 在測試集上評估模型的量化效果
print("Predicting people's names on the test set")
t0 = time()
y_pred = clf.predict(X_test_pca)
print("done in %0.3fs" % (time() - t0))
print(classification_report(y_test, y_pred, target_names=target_names))
print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))
# #############################################################################
# 使用 matplotlib 定性分析預測結果
def plot_gallery(images, titles, h, w, n_row=3, n_col=4):
"""Helper function to plot a gallery of portraits"""
plt.figure(figsize=(1.8 * n_col, 2.4 * n_row))
plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
for i in range(n_row * n_col):
plt.subplot(n_row, n_col, i + 1)
plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
plt.title(titles[i], size=12)
plt.xticks(())
plt.yticks(())
# 繪製部分測試集的預測結果
def title(y_pred, y_test, target_names, i):
pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]
true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]
return 'predicted: %s\ntrue: %s' % (pred_name, true_name)
prediction_titles = [title(y_pred, y_test, target_names, i)
for i in range(y_pred.shape[0])]
plot_gallery(X_test, prediction_titles, h, w)
# 繪製幾個最重要的特征臉的相冊
eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
plot_gallery(eigenfaces, eigenface_titles, h, w)
plt.show()
代碼執行
代碼運行時間大約:0分59.514秒。
運行代碼輸出的文本內容如下:
Total dataset size: n_samples: 1288 n_features: 1850 n_classes: 7 Extracting the top 150 eigenfaces from 966 faces done in 0.118s Projecting the input data on the eigenfaces orthonormal basis done in 0.005s Fitting the classifier to the training set done in 36.078s Best estimator found by grid search: SVC(C=1000.0, class_weight='balanced', gamma=0.005) Predicting people's names on the test set done in 0.061s precision recall f1-score support Ariel Sharon 0.75 0.46 0.57 13 Colin Powell 0.79 0.87 0.83 60 Donald Rumsfeld 0.94 0.63 0.76 27 George W Bush 0.83 0.98 0.90 146 Gerhard Schroeder 0.91 0.80 0.85 25 Hugo Chavez 1.00 0.53 0.70 15 Tony Blair 0.96 0.75 0.84 36 accuracy 0.85 322 macro avg 0.88 0.72 0.78 322 weighted avg 0.86 0.85 0.84 322 [[ 6 2 0 5 0 0 0] [ 1 52 0 7 0 0 0] [ 1 3 17 6 0 0 0] [ 0 3 0 143 0 0 0] [ 0 1 0 3 20 0 1] [ 0 4 0 2 1 8 0] [ 0 1 1 6 1 0 27]]
運行代碼輸出的圖片內容如下:
源碼下載
- Python版源碼文件: plot_face_recognition.py
- Jupyter Notebook版源碼文件: plot_face_recognition.ipynb