Python tf.compat.v1.estimator.experimental.KMeans用法及代碼示例

K-Means 聚類的估計器。

警告：不建議將估算器用於新代碼。估算器運行 v1.Session 風格的代碼，這更難以正確編寫，並且可能會出現意外行為，尤其是與 TF 2 代碼結合使用時。估算器確實屬於我們的 compatibility guarantees ，但除了安全漏洞之外不會收到任何修複。有關詳細信息，請參閱migration guide。

繼承自： Estimator

用法

tf.compat.v1.estimator.experimental.KMeans(
    num_clusters, model_dir=None, initial_clusters=RANDOM_INIT,
    distance_metric=SQUARED_EUCLIDEAN_DISTANCE, seed=None, use_mini_batch=True,
    mini_batch_steps_per_iteration=1, kmeans_plus_plus_num_retries=2,
    relative_tolerance=None, config=None, feature_columns=None
)

參數

num_clusters 一個整數張量，指定集群的數量。如果 initial_clusters 是張量或 numpy 數組，則忽略此參數。
model_dir 保存模型結果和日誌文件的目錄。
initial_clusters
指定如何選擇初始聚類中心。以下之一： * 具有初始聚類中心的張量或 numpy 數組。 * 一個可調用的f(inputs, k)選擇並返回至k來自輸入批次的中心。f可以自由返回任意數量的中心0到k.它將根據需要在連續的輸入批次上調用，直到所有num_clusters選擇中心。
- KMeansClustering.RANDOM_INIT：從輸入批次中隨機選擇中心。如果批次大小小於num_clusters，則選擇整個批次作為初始聚類中心，其餘中心從連續輸入批次中選擇。
- KMeansClustering.KMEANS_PLUS_PLUS_INIT：使用 kmeans++ 從第一個輸入批次中選擇中心。如果批量大小小於 num_clusters ，則會發生 TensorFlow 運行時錯誤。
distance_metric
用於聚類的距離度量。之一：
- KMeansClustering.SQUARED_EUCLIDEAN_DISTANCE：向量u和v之間的歐幾裏得距離定義為\(||u - v||_2\)，它是元素差的絕對平方和的平方根。
- KMeansClustering.COSINE_DISTANCE ：向量 u 和 v 之間的餘弦距離定義為 \(1 - (u . v) / (||u||_2 ||v||_2)\) 。
seed Python 整數。 PRNG 的種子用於初始化中心。
use_mini_batch 一個布爾值，指定是否使用小批量k-means 算法。見上麵的解釋。
mini_batch_steps_per_iteration 更新後的集群中心同步回主副本之前的步驟數。僅在 use_mini_batch=True 時使用。見上麵的解釋。
kmeans_plus_plus_num_retries 對於在 kmeans++ 初始化期間采樣的每個點，此參數指定在選擇最佳之前從當前分布中抽取的附加點的數量。如果指定負值，則使用啟發式方法對 O(log(num_to_sample)) 附加點進行采樣。僅在 initial_clusters=KMeansClustering.KMEANS_PLUS_PLUS_INIT 時使用。
relative_tolerance 迭代之間損失變化的相對容差。如果損失變化小於此數量，則停止學習。如果 use_mini_batch=True ，這可能無法正常工作。
config 見 tf.estimator.Estimator 。
feature_columns 包含模型使用的所有特征列的可選迭代。集合中的所有項目都應該是可以傳遞給 tf.feature_column.input_layer 的特征列實例。如果這是無，將使用所有函數。

拋出

ValueError 將無效參數傳遞給 initial_clusters 或 distance_metric 。

屬性

config
model_dir
model_fn 返回綁定到 self.params 的 model_fn 。
params

例子：

import numpy as np
import tensorflow as tf

num_points = 100
dimensions = 2
points = np.random.uniform(0, 1000, [num_points, dimensions])

def input_fn():
  return tf.compat.v1.train.limit_epochs(
      tf.convert_to_tensor(points, dtype=tf.float32), num_epochs=1)

num_clusters = 5
kmeans = tf.compat.v1.estimator.experimental.KMeans(
    num_clusters=num_clusters, use_mini_batch=False)

# train
num_iterations = 10
previous_centers = None
for _ in xrange(num_iterations):
  kmeans.train(input_fn)
  cluster_centers = kmeans.cluster_centers()
  if previous_centers is not None:
    print 'delta:', cluster_centers - previous_centers
  previous_centers = cluster_centers
  print 'score:', kmeans.score(input_fn)
print 'cluster centers:', cluster_centers

# map the input points to their clusters
cluster_indices = list(kmeans.predict_cluster_index(input_fn))
for i, point in enumerate(points):
  cluster_index = cluster_indices[i]
  center = cluster_centers[cluster_index]
  print 'point:', point, 'is in cluster', cluster_index, 'centered at', center

export_saved_model方法保存的SavedModel不包括聚類中心。但是，可以通過訓練期間保存的最新檢查點來檢索聚類中心。具體來說，

kmeans.cluster_centers()

相當於

tf.train.load_variable(
    kmeans.model_dir, KMeansClustering.CLUSTER_CENTERS_VAR_NAME)

相關用法

注：本文由純淨天空篩選整理自tensorflow.org大神的英文原創作品 tf.compat.v1.estimator.experimental.KMeans。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。