Python SciPy stats.binned_statistic_2d用法及代码示例

本文简要介绍 python 语言中 scipy.stats.binned_statistic_2d 的用法。

用法: scipy.stats.binned_statistic_2d(x, y, values, statistic='mean', bins=10, range=None, expand_binnumbers=False)#

计算一组或多组数据的二维分箱统计量。

这是 histogram2d 函数的概括。直方图将空间划分为多个 bin，并返回每个 bin 中的点数。此函数允许计算每个 bin 内的值(或值集)的总和、平均值、中位数或其他统计量。

参数：：

x： (N,) 数组

要沿第一个维度分箱的一系列值。

y： (N,) 数组

要沿第二维分箱的一系列值。

values： (N,) 数组或 (N,) 数组列表

将根据其计算统计数据的数据。这必须与 x 的形状相同，或者是一个序列列表——每个序列的形状都与 x 相同。如果 values 是这样一个列表，则将独立计算每个值的统计信息。

statistic：字符串或可调用，可选

要计算的统计数据(默认为‘mean’)。可用的统计数据如下：

‘mean’ : compute the mean of values for points within each bin. Empty bins will be represented by NaN.

‘std’ : compute the standard deviation within each bin. This is implicitly calculated with ddof=0.

‘median’ : compute the median of values for points within each bin. Empty bins will be represented by NaN.

‘count’ : compute the count of points within each bin. This is identical to an unweighted histogram. values array is not referenced.

‘sum’ : compute the sum of values for points within each bin. This is identical to a weighted histogram.

‘min’ : compute the minimum of values for points within each bin. Empty bins will be represented by NaN.

‘max’ : compute the maximum of values for point within each bin. Empty bins will be represented by NaN.

function : a user-defined function which takes a 1D array of values, and outputs a single numerical statistic. This function will be called on the values in each bin. Empty bins will be represented by function([]), or NaN if this returns an error.

bins： int 或 [int, int] 或数组或 [array, array]，可选

箱规格：

the number of bins for the two dimensions (nx = ny = bins),

the number of bins in each dimension (nx, ny = bins),

the bin edges for the two dimensions (x_edge = y_edge = bins),

the bin edges in each dimension (x_edge, y_edge = bins).

如果指定了 bin 边，则 bin 的数量将为 (nx = len(x_edge)-1, ny = len(y_edge)-1)。

range： (2,2) 数组，可选

沿每个维度的 bin 的最左侧和最右侧边(如果未在 bin 参数中明确指定)：[[xmin, xmax], [ymin, ymax]]。此范围之外的所有值都将被视为异常值，并且不计入直方图中。

expand_binnumbers：布尔型，可选

‘False’(默认)：返回的 binnumber 是线性化 bin 索引的形状 (N,) 数组。 ‘True’：返回的 binnumber 为 ‘unraveled’，为 (2,N) ndarray 形状，其中每一行给出相应维度中的 bin 编号。请参阅 binnumber 返回值和示例部分。

statistic： (nx, ny) 数组: 每个二维 bin 中所选统计数据的值。
x_edge： (nx + 1) 数组: bin 沿第一个维度边。
y_edge： (ny + 1) 数组: 箱沿第二个维度边。
binnumber： (N,) 整数数组或 (2,N) 整数数组: 这会为样本的每个元素分配一个整数，该整数表示该观测值所在的 bin。表示取决于expand_binnumbers 参数。有关详细信息，请参阅注释。

注意：

Binedges：除了最后一个(righthand-most)箱子外，所有箱子都是半开的。换句话说，如果箱子是[1, 2, 3, 4]，那么第一个 bin 是[1, 2)(包括1个，但不包括2个)和第二个[2, 3).然而，最后一个箱子是[3, 4]，哪一个包括 4.

binnumber：此返回参数为样本的每个元素分配一个整数，表示它所属的容器。该表示取决于 expand_binnumbers 参数。如果“False”(默认)：返回的 binnumber 是一个形状 (N,) 线性化索引数组，将样本的每个元素映射到其相应的 bin(使用行优先排序)。请注意，返回的线性化 bin 索引用于在外部 binedge 上具有额外 bin 的数组，以捕获定义的 bin 边界之外的值。如果“True”：返回的 binnumber 是形状 (2,N) ndarray，其中每行分别表示每个维度的 bin 位置。在每个维度中，i 的 binnumber 表示对应的值在 (D_edge[i-1], D_edge[i]) 之间，其中“D”是 ‘x’ 或 ‘y’。

例子：

>>> from scipy import stats

使用显式 bin-edges 计算计数：

>>> x = [0.1, 0.1, 0.1, 0.6]
>>> y = [2.1, 2.6, 2.1, 2.1]
>>> binx = [0.0, 0.5, 1.0]
>>> biny = [2.0, 2.5, 3.0]
>>> ret = stats.binned_statistic_2d(x, y, None, 'count', bins=[binx, biny])
>>> ret.statistic
array([[2., 1.],
       [1., 0.]])

放置每个样本的 bin 由 binnumber 返回参数给出。默认情况下，这些是线性化的 bin 索引：

>>> ret.binnumber
array([5, 6, 5, 9])

也可以使用 expand_binnumbers 参数将 bin 索引扩展为每个维度的单独条目：

>>> ret = stats.binned_statistic_2d(x, y, None, 'count', bins=[binx, biny],
...                                 expand_binnumbers=True)
>>> ret.binnumber
array([[1, 1, 1, 2],
       [1, 2, 1, 1]])

这表明前三个元素属于xbin 1，第四个元素属于xbin 2；依此类推。

相关用法

注：本文由纯净天空筛选整理自scipy.org大神的英文原创作品 scipy.stats.binned_statistic_2d。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。

用法:

参数 ：：

返回 ：：

注意：

例子：

参数：：

返回：：