Python sklearn BayesianGaussianMixture用法及代碼示例

本文簡要介紹python語言中 sklearn.mixture.BayesianGaussianMixture 的用法。

用法: class sklearn.mixture.BayesianGaussianMixture(*, n_components=1, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmeans', weight_concentration_prior_type='dirichlet_process', weight_concentration_prior=None, mean_precision_prior=None, mean_prior=None, degrees_of_freedom_prior=None, covariance_prior=None, random_state=None, warm_start=False, verbose=0, verbose_interval=10)

高斯混合的變分貝葉斯估計。

此類允許在高斯混合分布的參數上推斷出近似的後驗分布。可以從數據中推斷出組件的有效數量。

此類為權重分布實現了兩種類型的先驗：具有狄利克雷分布的有限混合模型和具有狄利克雷過程的無限混合模型。在實踐中，狄利克雷過程推理算法是近似的，並使用具有固定最大分量數的截斷分布(稱為Stick-breaking 表示)。實際使用的組件數量幾乎總是取決於數據。

在用戶指南中閱讀更多信息。

參數：

n_components：整數，默認=1

混合物成分的數量。根據weight_concentration_prior 的數據和值，模型可以通過將某些組件weights_ 設置為非常接近零的值來決定不使用所有組件。因此有效組件的數量小於 n_components。

covariance_type：{‘full’, ‘tied’, ‘diag’, ‘spherical’}，默認='滿'

說明要使用的協方差參數類型的字符串。必須是以下之一：

'full' (each component has its own general covariance matrix),
'tied' (all components share the same general covariance matrix),
'diag' (each component has its own diagonal covariance matrix),
'spherical' (each component has its own single variance).

tol：浮點數，默認=1e-3

收斂閾值。當(關於模型的訓練數據的)可能性的下限平均增益低於此閾值時，EM 迭代將停止。

reg_covar：浮點數，默認=1e-6

添加到協方差對角線上的非負正則化。允許確保協方差矩陣都是正數。

max_iter：整數，默認=100

要執行的 EM 迭代次數。

n_init：整數，默認=1

要執行的初始化次數。保留具有最高似然下限值的結果。

init_params：{‘kmeans’, ‘random’}，默認='kmeans'

用於初始化權重、均值和協方差的方法。必須是以下之一：

'kmeans' : responsibilities are initialized using kmeans.
'random' : responsibilities are initialized randomly.

weight_concentration_prior_type：str，默認='dirichlet_process'

說明之前重量濃度類型的字符串。必須是以下之一：

'dirichlet_process' (using the Stick-breaking representation),
'dirichlet_distribution' (can favor more uniform weights).

weight_concentration_prior：浮點數或無，默認=無

權重分布上各組分的狄利克雷濃度(狄利克雷)。這在文獻中通常稱為伽馬。較高的濃度將更多的質量放在中心，將導致更多的組件處於活動狀態，而較低的濃度參數將導致更多的質量在混合權重單純形的邊。該參數的值必須大於0。如果為None，則設置為1. / n_components。

mean_precision_prior：浮點數或無，默認=無

均值分布(高斯)的先驗精度。控製可以放置方法的範圍。較大的值集中在 mean_prior 附近。該參數的值必須大於0。如果為None，則設置為1。

mean_prior：類似數組，形狀 (n_features,)，默認=無

均值分布(高斯)的先驗。如果為 None，則設置為 X 的平均值。

degrees_of_freedom_prior：浮點數或無，默認=無

協方差分布上的自由度數的先驗(Wishart)。如果為 None，則設置為 n_features 。

covariance_prior：浮點數或類似數組，默認=無

協方差分布的先驗(Wishart)。如果它是 None ，則使用 X 的協方差初始化經驗協方差先驗。形狀取決於 covariance_type ：

(n_features, n_features) if 'full',
(n_features, n_features) if 'tied',
(n_features)             if 'diag',
float                    if 'spherical'

random_state：int、RandomState 實例或無，默認=無

控製為初始化參數而選擇的方法提供的隨機種子(請參閱init_params)。此外，它還控製從擬合分布中生成隨機樣本(請參閱方法 sample )。傳遞 int 以在多個函數調用之間實現可重現的輸出。請參閱術語表。

warm_start：布爾，默認=假

如果‘warm_start’ 為True，則最後一次擬合的解將用作fit() 的下一次調用的初始化。當在類似問題上多次調用 fit 時，這可以加快收斂速度。請參閱詞匯表。

verbose：整數，默認=0

啟用詳細輸出。如果為 1，則打印當前初始化和每個迭代步驟。如果大於 1，那麽它還會打印對數概率和每一步所需的時間。

verbose_interval：整數，默認=10

下一次打印之前完成的迭代次數。

屬性：

weights_：形狀類似數組 (n_components,)

每個混合物組分的重量。

means_：形狀類似數組 (n_components, n_features)

每個混合成分的平均值。

covariances_：類數組

每個混合成分的協方差。形狀取決於 covariance_type ：

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'

precisions_：類數組

混合物中每種成分的精度矩陣。精度矩陣是協方差矩陣的逆矩陣。協方差矩陣是對稱正定的，因此高斯混合矩陣可以等效地由精度矩陣參數化。存儲精度矩陣而不是協方差矩陣可以更有效地計算測試時新樣本的對數似然。形狀取決於covariance_type：

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'

precisions_cholesky_：類數組

每個混合分量的精度矩陣的喬列斯基分解。精度矩陣是協方差矩陣的逆矩陣。協方差矩陣是對稱正定的，因此高斯混合矩陣可以等效地由精度矩陣參數化。存儲精度矩陣而不是協方差矩陣可以更有效地計算測試時新樣本的對數似然。形狀取決於covariance_type：

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'

converged_：bool

當 fit() 達到收斂時為真，否則為假。

n_iter_：int

最佳推理擬合達到收斂所使用的步數。

lower_bound_：浮點數

最佳推理擬合的可能性(關於模型的訓練數據)的下限值。

weight_concentration_prior_：元組或浮點數

權重分布上各組分的狄利克雷濃度(狄利克雷)。類型取決於 weight_concentration_prior_type ：

(float, float) if 'dirichlet_process' (Beta parameters),
float          if 'dirichlet_distribution' (Dirichlet parameters).

較高的濃度將更多的質量放在中心，將導致更多的組件處於活動狀態，而較低的濃度參數將導致更多的質量位於單純形的邊。

weight_concentration_：形狀類似數組 (n_components,)

權重分布上各組分的狄利克雷濃度(狄利克雷)。

mean_precision_prior_：浮點數

均值分布(高斯)的先驗精度。控製可以放置方法的範圍。較大的值集中在 mean_prior 附近。如果 mean_precision_prior 設置為 None，則 mean_precision_prior_ 設置為 1。

mean_precision_：形狀類似數組 (n_components,)

每個分量在平均分布上的精度(高斯分布)。

mean_prior_：形狀類似數組 (n_features,)

均值分布(高斯)的先驗。

degrees_of_freedom_prior_：浮點數

協方差分布上的自由度數的先驗(Wishart)。

degrees_of_freedom_：形狀類似數組 (n_components,)

模型中每個組件的自由度數。

covariance_prior_：浮點或類似數組

協方差分布的先驗(Wishart)。形狀取決於 covariance_type ：

(n_features, n_features) if 'full',
(n_features, n_features) if 'tied',
(n_features)             if 'diag',
float                    if 'spherical'

n_features_in_：int

擬合期間看到的特征數。

feature_names_in_：ndarray 形狀(n_features_in_，)

擬合期間看到的特征名稱。僅當 X 具有全為字符串的函數名稱時才定義。

參考：

1: 克裏斯托弗 M. 主教 (2006)。 “模式識別和機器學習”。卷。 4 第 4 名。紐約：施普林格。
2: 哈蓋·阿提亞斯。 (2000 年)。 “圖形模型的變分貝葉斯框架”。神經信息處理係統的進展 12。
3: Blei、David M. 和 Michael I. Jordan。 (2006 年)。 “狄利克雷過程混合物的變分推理”。貝葉斯分析 1.1

例子：

>>> import numpy as np
>>> from sklearn.mixture import BayesianGaussianMixture
>>> X = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [12, 4], [10, 7]])
>>> bgm = BayesianGaussianMixture(n_components=2, random_state=42).fit(X)
>>> bgm.means_
array([[2.49... , 2.29...],
       [8.45..., 4.52... ]])
>>> bgm.predict([[0, 0], [9, 3]])
array([0, 1])

相關用法

注：本文由純淨天空篩選整理自scikit-learn.org大神的英文原創作品 sklearn.mixture.BayesianGaussianMixture。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。