Python SciPy hierarchy.fcluster用法及代碼示例

本文簡要介紹 python 語言中 scipy.cluster.hierarchy.fcluster 的用法。

用法: scipy.cluster.hierarchy.fcluster(Z, t, criterion='inconsistent', depth=2, R=None, monocrit=None)#

從給定的鏈接矩陣定義的層次聚類中形成平麵聚類。

參數：：

Z： ndarray

用 linkage 函數返回的矩陣編碼的層次聚類。

t：標量

對於標準‘inconsistent’, ‘distance’或‘monocrit’，: 這是形成扁平集群時應用的閾值。
對於 ‘maxclust’ 或 ‘maxclust_monocrit’ 標準，: 這將是請求的最大集群數。

criterion： str，可選

用於生成扁平集群的標準。這可以是以下任何值：

inconsistent :
If a cluster node and all its descendants have an inconsistent value less than or equal to t, then all its leaf descendants belong to the same flat cluster. When no non-singleton cluster meets this criterion, every node is assigned to its own cluster. (Default)

distance :
Forms flat clusters so that the original observations in each flat cluster have no greater a cophenetic distance than t.

maxclust :
Finds a minimum threshold r so that the cophenetic distance between any two original observations in the same flat cluster is no more than r and no more than t flat clusters are formed.

monocrit :
Forms a flat cluster from a cluster node c with index i when monocrit[j] <= t.

For example, to threshold on the maximum mean distance as computed in the inconsistency matrix R with a threshold of 0.8 do:
MR = maxRstat(Z, R, 3)
fcluster(Z, t=0.8, criterion='monocrit', monocrit=MR)
maxclust_monocrit :
Forms a flat cluster from a non-singleton cluster node c when monocrit[i] <= r for all cluster indices i below and including c. r is minimized such that no more than t flat clusters are formed. monocrit must be monotonic. For example, to minimize the threshold t on maximum inconsistency values so that no more than 3 flat clusters are formed, do:
MI = maxinconsts(Z, R)
fcluster(Z, t=3, criterion='maxclust_monocrit', monocrit=MI)

depth：整數，可選

執行不一致性計算的最大深度。它對其他標準沒有意義。默認值為 2。

R： ndarray，可選

用於 'inconsistent' 標準的不一致矩陣。如果未提供，則計算該矩陣。

monocrit： ndarray，可選

長度為 n-1 的數組。單體[i]是對非單例 i 進行閾值處理的統計量。 monocrit 向量必須是單調的，即給定一個索引為 i 的節點 c，對於所有節點索引 j 對應於 c 以下的節點，monocrit[i] >= monocrit[j].

fcluster： ndarray: 長度為 n 的數組。 T[i] 是原始觀測值i 所屬的平麵簇號。

例子：

>>> from scipy.cluster.hierarchy import ward, fcluster
>>> from scipy.spatial.distance import pdist

所有集群鏈接方法 - 例如， scipy.cluster.hierarchy.ward 生成鏈接矩陣 Z 作為其輸出：

>>> X = [[0, 0], [0, 1], [1, 0],
...      [0, 4], [0, 3], [1, 4],
...      [4, 0], [3, 0], [4, 1],
...      [4, 4], [3, 4], [4, 3]]

>>> Z = ward(pdist(X))

>>> Z
array([[ 0.        ,  1.        ,  1.        ,  2.        ],
       [ 3.        ,  4.        ,  1.        ,  2.        ],
       [ 6.        ,  7.        ,  1.        ,  2.        ],
       [ 9.        , 10.        ,  1.        ,  2.        ],
       [ 2.        , 12.        ,  1.29099445,  3.        ],
       [ 5.        , 13.        ,  1.29099445,  3.        ],
       [ 8.        , 14.        ,  1.29099445,  3.        ],
       [11.        , 15.        ,  1.29099445,  3.        ],
       [16.        , 17.        ,  5.77350269,  6.        ],
       [18.        , 19.        ,  5.77350269,  6.        ],
       [20.        , 21.        ,  8.16496581, 12.        ]])

這個矩陣表示一個樹狀圖，其中第一個和第二個元素是每一步合並的兩個簇，第三個元素是這些簇之間的距離，第四個元素是新簇的大小——包含的原始數據點的數量.

scipy.cluster.hierarchy.fcluster 可用於展平樹狀圖，從而將原始數據點分配給單個簇。

此分配主要取決於距離閾值 t - 允許的最大簇間距離：

>>> fcluster(Z, t=0.9, criterion='distance')
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12], dtype=int32)

>>> fcluster(Z, t=1.1, criterion='distance')
array([1, 1, 2, 3, 3, 4, 5, 5, 6, 7, 7, 8], dtype=int32)

>>> fcluster(Z, t=3, criterion='distance')
array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], dtype=int32)

>>> fcluster(Z, t=9, criterion='distance')
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

第一種情況，閾值t太小，無法讓數據中的任意兩個樣本形成一個簇，所以返回了12個不同的簇。

在第二種情況下，閾值足夠大以允許前 4 個點與其最近的鄰居合並。所以，這裏隻返回 8 個簇。

第三種情況，具有更高的閾值，最多允許連接 8 個數據點 - 因此此處返回 4 個集群。

最後，第四種情況的閾值足夠大，可以將所有數據點合並在一起——因此返回單個集群。

相關用法

注：本文由純淨天空篩選整理自scipy.org大神的英文原創作品 scipy.cluster.hierarchy.fcluster。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數 ：：

返回 ：：

例子：

參數：：

返回：：