Python sklearn KDTree用法及代碼示例

本文簡要介紹python語言中 sklearn.neighbors.KDTree 的用法。

用法:
class sklearn.neighbors.KDTree(X, leaf_size=40, metric='minkowski', **kwargs)

用於快速泛化 N-point 問題的 KDTree

在用戶指南中閱讀更多信息。

參數：

X：形狀類似數組 (n_samples, n_features): n_samples 是數據集中的點數，n_features 是參數空間的維度。注意：如果 X 是 C-contiguous 雙精度數組，則不會複製數據。否則，將製作內部副本。
leaf_size：正整數，默認=40: 切換到蠻力的點數。更改leaf_size 不會影響查詢結果，但會顯著影響查詢速度和存儲構造樹所需的內存。存儲樹所需的內存量大約為 n_samples /leaf_size。對於指定的 leaf_size ，葉節點保證滿足 leaf_size <= n_points <= 2 * leaf_size ，除了 n_samples < leaf_size 的情況。
metric：str 或 DistanceMetric 對象: 用於樹的距離度量。默認=‘minkowski’，p=2(即歐幾裏德度量)。有關可用指標的列表，請參閱 DistanceMetric 類的文檔。 kd_tree.valid_metrics 給出了對 KDTree 有效的指標列表。
Additional keywords are passed to the distance metric class.：
Note: Callable functions in the metric parameter are NOT supported for KDTree：
and Ball Tree. Function call overhead will result in very poor performance.：

屬性：

data：內存視圖: 訓練數據

例子：

查詢k-nearest 鄰居

>>> import numpy as np
>>> from sklearn.neighbors import KDTree
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
>>> tree = KDTree(X, leaf_size=2)              
>>> dist, ind = tree.query(X[:1], k=3)                
>>> print(ind)  # indices of 3 closest neighbors
[0 3 1]
>>> print(dist)  # distances to 3 closest neighbors
[ 0.          0.19662693  0.29473397]

pickle 和解開一棵樹。請注意，樹的狀態保存在 pickle 操作中：在 unpickle 時不需要重建樹。

>>> import numpy as np
>>> import pickle
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
>>> tree = KDTree(X, leaf_size=2)        
>>> s = pickle.dumps(tree)                     
>>> tree_copy = pickle.loads(s)                
>>> dist, ind = tree_copy.query(X[:1], k=3)     
>>> print(ind)  # indices of 3 closest neighbors
[0 3 1]
>>> print(dist)  # distances to 3 closest neighbors
[ 0.          0.19662693  0.29473397]

查詢給定半徑內的鄰居

>>> import numpy as np
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
>>> tree = KDTree(X, leaf_size=2)     
>>> print(tree.query_radius(X[:1], r=0.3, count_only=True))
3
>>> ind = tree.query_radius(X[:1], r=0.3)  
>>> print(ind)  # indices of neighbors within distance 0.3
[3 0 1]

計算一個高斯核密度估計：

>>> import numpy as np
>>> rng = np.random.RandomState(42)
>>> X = rng.random_sample((100, 3))
>>> tree = KDTree(X)                
>>> tree.kernel_density(X[:3], h=0.1, kernel='gaussian')
array([ 6.94114649,  7.83281226,  7.2071716 ])

計算 two-point 自相關函數

>>> import numpy as np
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((30, 3))
>>> r = np.linspace(0, 1, 5)
>>> tree = KDTree(X)                
>>> tree.two_point_correlation(X, r)
array([ 30,  62, 278, 580, 820])

相關用法

注：本文由純淨天空篩選整理自scikit-learn.org大神的英文原創作品 sklearn.neighbors.KDTree。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。