Python SciPy sparse.csr_array用法及代碼示例

本文簡要介紹 python 語言中 scipy.sparse.csr_array 的用法。

用法: class scipy.sparse.csr_array(arg1, shape=None, dtype=None, copy=False)#

壓縮稀疏行數組。

這可以通過多種方式實例化：：

csr_array(D): 其中 D 是二維 ndarray
csr_array(S): 與另一個稀疏數組或矩陣 S (相當於 S.tocsr())
csr_array((M, N), [dtype]): 構造一個形狀為 (M, N) 的空數組 dtype 是可選的，默認為 dtype='d'。
csr_array((數據, (row_ind, col_ind)), [形狀=(M, N)]): 其中 data 、 row_ind 和 col_ind 滿足關係 a[row_ind[k], col_ind[k]] = data[k] 。
csr_array((數據，索引，indptr)，[形狀=(M，N)]): 是標準的 CSR 表示，其中行 i 的列索引存儲在 indices[indptr[i]:indptr[i+1]] 中，並且它們的相應值存儲在 data[indptr[i]:indptr[i+1]] 中。如果未提供 shape 參數，則從索引數組推斷數組維度。

注意：

稀疏數組可用於算術運算：它們支持加法、減法、乘法、除法和矩陣冪。

CSR 格式的優點：

高效算術運算 CSR + CSR、CSR * CSR 等
高效的行切片
快速矩陣向量積

CSR 格式的缺點：

慢速列切片操作(考慮 CSC)
稀疏結構的更改代價高昂(考慮 LIL 或 DOK)

規範格式：

在每行中，索引按列排序。
沒有重複的條目。

例子：

>>> import numpy as np
>>> from scipy.sparse import csr_array
>>> csr_array((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int8)

>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_array((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_array((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

重複的條目匯總在一起：

>>> row = np.array([0, 1, 2, 0])
>>> col = np.array([0, 1, 1, 0])
>>> data = np.array([1, 2, 4, 8])
>>> csr_array((data, (row, col)), shape=(3, 3)).toarray()
array([[9, 0, 0],
       [0, 2, 0],
       [0, 4, 0]])

作為如何逐步構建 CSR 數組的示例，以下代碼段從文本構建 term-document 數組：

>>> docs = [["hello", "world", "hello"], ["goodbye", "cruel", "world"]]
>>> indptr = [0]
>>> indices = []
>>> data = []
>>> vocabulary = {}
>>> for d in docs:
...     for term in d:
...         index = vocabulary.setdefault(term, len(vocabulary))
...         indices.append(index)
...         data.append(1)
...     indptr.append(len(indices))
...
>>> csr_array((data, indices, indptr), dtype=int).toarray()
array([[2, 1, 0, 0],
       [0, 1, 1, 1]])

屬性：：

dtype：類型: 數組的數據類型
shape 2元組: 陣列的形狀。
ndim： int: 維數(始終為 2)
nnz: 存儲值的數量，包括顯式零。
size: 存儲值的數量。
data：: 數組的CSR格式數據數組
indices：: 數組的CSR格式索引數組
indptr：: 數組的CSR格式索引指針數組
has_sorted_indices: 索引是否排序
has_canonical_format: 數組/矩陣是否具有排序索引並且沒有重複項
T: 轉置。

相關用法

注：本文由純淨天空篩選整理自scipy.org大神的英文原創作品 scipy.sparse.csr_array。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

這可以通過多種方式實例化：：

注意：

CSR 格式的優點：

CSR 格式的缺點：

規範格式：

例子：

屬性 ：：

屬性：：