Python sklearn BallTree用法及代码示例

本文简要介绍python语言中 sklearn.neighbors.BallTree 的用法。

用法:
class sklearn.neighbors.BallTree(X, leaf_size=40, metric='minkowski', **kwargs)

BallTree 用于快速广义N-point 问题

在用户指南中阅读更多信息。

参数：

X：形状类似数组 (n_samples, n_features): n_samples 是数据集中的点数，n_features 是参数空间的维度。注意：如果 X 是 C-contiguous 双精度数组，则不会复制数据。否则，将制作内部副本。
leaf_size：正整数，默认=40: 切换到蛮力的点数。更改leaf_size 不会影响查询结果，但会显著影响查询速度和存储构造树所需的内存。存储树所需的内存量大约为 n_samples /leaf_size。对于指定的 leaf_size ，叶节点保证满足 leaf_size <= n_points <= 2 * leaf_size ，除了 n_samples < leaf_size 的情况。
metric：str 或 DistanceMetric 对象: 用于树的距离度量。默认=‘minkowski’，p=2(即欧几里德度量)。有关可用指标的列表，请参阅 DistanceMetric 类的文档。 ball_tree.valid_metrics 给出了对 BallTree 有效的指标列表。
Additional keywords are passed to the distance metric class.：
Note: Callable functions in the metric parameter are NOT supported for KDTree：
and Ball Tree. Function call overhead will result in very poor performance.：

属性：

data：内存视图: 训练数据

例子：

查询k-nearest 邻居

>>> import numpy as np
>>> from sklearn.neighbors import BallTree
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
>>> tree = BallTree(X, leaf_size=2)              
>>> dist, ind = tree.query(X[:1], k=3)                
>>> print(ind)  # indices of 3 closest neighbors
[0 3 1]
>>> print(dist)  # distances to 3 closest neighbors
[ 0.          0.19662693  0.29473397]

pickle 和解开一棵树。请注意，树的状态保存在 pickle 操作中：在 unpickle 时不需要重建树。

>>> import numpy as np
>>> import pickle
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
>>> tree = BallTree(X, leaf_size=2)        
>>> s = pickle.dumps(tree)                     
>>> tree_copy = pickle.loads(s)                
>>> dist, ind = tree_copy.query(X[:1], k=3)     
>>> print(ind)  # indices of 3 closest neighbors
[0 3 1]
>>> print(dist)  # distances to 3 closest neighbors
[ 0.          0.19662693  0.29473397]

查询给定半径内的邻居

>>> import numpy as np
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
>>> tree = BallTree(X, leaf_size=2)     
>>> print(tree.query_radius(X[:1], r=0.3, count_only=True))
3
>>> ind = tree.query_radius(X[:1], r=0.3)  
>>> print(ind)  # indices of neighbors within distance 0.3
[3 0 1]

计算一个高斯核密度估计：

>>> import numpy as np
>>> rng = np.random.RandomState(42)
>>> X = rng.random_sample((100, 3))
>>> tree = BallTree(X)                
>>> tree.kernel_density(X[:3], h=0.1, kernel='gaussian')
array([ 6.94114649,  7.83281226,  7.2071716 ])

计算 two-point 自相关函数

>>> import numpy as np
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((30, 3))
>>> r = np.linspace(0, 1, 5)
>>> tree = BallTree(X)                
>>> tree.two_point_correlation(X, r)
array([ 30,  62, 278, 580, 820])

相关用法

注：本文由纯净天空筛选整理自scikit-learn.org大神的英文原创作品 sklearn.neighbors.BallTree。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。