sklearn例程:使用特征臉和SVM的人臉識別

示例說明

在此示例中，使用特征臉和SVM進行人臉識別，使用的數據集是“Labeled Faces in the Wild”(也稱為LFW數據集)的預處理摘錄：

http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)

數據集中代表性最高的5個人的預期結果如下：（可以用來跟後麵代碼運行的結果做個對比）

	準確率	召回率	F1值	支持度
艾麗爾·沙龍(Ariel Sharon)	0.67	0.92	0.77	13
科林·鮑威爾	0.75	0.78	0.76	60
唐納德·拉姆斯菲爾德	0.78	0.67	0.72	27
喬治·W·布什	0.86	0.86	0.86	146
格哈德·施羅德	0.76	0.76	0.76	25
雨果·查韋斯(Hugo Chavez)	0.67	0.67	0.67	15
托尼·布萊爾	0.81	0.69	0.75	36
平均/總計	0.80	0.80	0.80	322

代碼實現[Python]


# -*- coding: utf-8 -*- 
from time import time
import logging
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import fetch_lfw_people
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import PCA
from sklearn.svm import SVC


print(__doc__)

# 輸出進度日誌
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')


# #############################################################################
# 下載數據並加載為numpy數組。

lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

# 獲得圖像數組的形狀(用於繪圖)
n_samples, h, w = lfw_people.images.shape

# for machine learning we use the 2 data directly (as relative pixel
# positions info is ignored by this model)
X = lfw_people.data
n_features = X.shape[1]

# 預測的目標是人的ID
y = lfw_people.target
target_names = lfw_people.target_names
n_classes = target_names.shape[0]

print("Total dataset size:")
print("n_samples: %d" % n_samples)
print("n_features: %d" % n_features)
print("n_classes: %d" % n_classes)


# #############################################################################
# Split into a training set and a test set using a stratified k fold

# 切分訓練集和測試集
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42)


# #############################################################################
# 在人臉數據集(當做無標記數據)上計算PCA (eigenfaces，特征臉):
# 無監督特征提取 / 降維
n_components = 150

print("Extracting the top %d eigenfaces from %d faces"
      % (n_components, X_train.shape[0]))
t0 = time()
pca = PCA(n_components=n_components, svd_solver='randomized',
          whiten=True).fit(X_train)
print("done in %0.3fs" % (time() - t0))

eigenfaces = pca.components_.reshape((n_components, h, w))

print("Projecting the input data on the eigenfaces orthonormal basis")
t0 = time()
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
print("done in %0.3fs" % (time() - t0))


# #############################################################################
# 訓練SVM分類模型

print("Fitting the classifier to the training set")
t0 = time()
param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],
              'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }
clf = GridSearchCV(SVC(kernel='rbf', class_weight='balanced'),
                   param_grid, cv=5, iid=False)
clf = clf.fit(X_train_pca, y_train)
print("done in %0.3fs" % (time() - t0))
print("Best estimator found by grid search:")
print(clf.best_estimator_)


# #############################################################################
# 在測試集上評估模型的量化效果

print("Predicting people's names on the test set")
t0 = time()
y_pred = clf.predict(X_test_pca)
print("done in %0.3fs" % (time() - t0))

print(classification_report(y_test, y_pred, target_names=target_names))
print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))


# #############################################################################
# 使用 matplotlib 定性分析預測結果

def plot_gallery(images, titles, h, w, n_row=3, n_col=4):
    """Helper function to plot a gallery of portraits"""
    plt.figure(figsize=(1.8 * n_col, 2.4 * n_row))
    plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
    for i in range(n_row * n_col):
        plt.subplot(n_row, n_col, i + 1)
        plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
        plt.title(titles[i], size=12)
        plt.xticks(())
        plt.yticks(())


# 繪製部分測試集的預測結果

def title(y_pred, y_test, target_names, i):
    pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]
    true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]
    return 'predicted: %s\ntrue:      %s' % (pred_name, true_name)

prediction_titles = [title(y_pred, y_test, target_names, i)
                     for i in range(y_pred.shape[0])]

plot_gallery(X_test, prediction_titles, h, w)

# 繪製幾個最重要的特征臉的相冊

eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
plot_gallery(eigenfaces, eigenface_titles, h, w)

plt.show()

代碼執行

代碼運行時間大約:0分59.514秒。
運行代碼輸出的文本內容如下:

Total dataset size:
n_samples: 1288
n_features: 1850
n_classes: 7
Extracting the top 150 eigenfaces from 966 faces
done in 0.118s
Projecting the input data on the eigenfaces orthonormal basis
done in 0.005s
Fitting the classifier to the training set
done in 36.078s
Best estimator found by grid search:
SVC(C=1000.0, class_weight='balanced', gamma=0.005)
Predicting people's names on the test set
done in 0.061s
                   precision    recall  f1-score   support

     Ariel Sharon       0.75      0.46      0.57        13
     Colin Powell       0.79      0.87      0.83        60
  Donald Rumsfeld       0.94      0.63      0.76        27
    George W Bush       0.83      0.98      0.90       146
Gerhard Schroeder       0.91      0.80      0.85        25
      Hugo Chavez       1.00      0.53      0.70        15
       Tony Blair       0.96      0.75      0.84        36

         accuracy                           0.85       322
        macro avg       0.88      0.72      0.78       322
     weighted avg       0.86      0.85      0.84       322

[[  6   2   0   5   0   0   0]
 [  1  52   0   7   0   0   0]
 [  1   3  17   6   0   0   0]
 [  0   3   0 143   0   0   0]
 [  0   1   0   3  20   0   1]
 [  0   4   0   2   1   8   0]
 [  0   1   1   6   1   0  27]]

運行代碼輸出的圖片內容如下:

Faces recognition example using eigenfaces and SVMs

源碼下載

Python版源碼文件: plot_face_recognition.py
Jupyter Notebook版源碼文件: plot_face_recognition.ipynb

參考資料

Faces recognition example using eigenfaces and SVMs