Python pyspark DataFrame.dot用法及代碼示例

本文簡要介紹 pyspark.pandas.DataFrame.dot 的用法。

用法: DataFrame.dot(other: Series) → Series

計算DataFrame 和其他之間的矩陣乘法。

此方法計算 DataFrame 和其他係列的值之間的矩陣乘積

它也可以在 Python >= 3.5 中使用 self @ other 調用。

注意

由於大數據的性質，這種方法基於昂貴的操作。在內部，它需要為每個值生成每一行，然後分組兩次——這是一個巨大的操作。為防止誤用，此方法具有“compute.max_rows”默認輸入長度限製，並引發 ValueError。

>>> from pyspark.pandas.config import option_context
>>> with option_context(
...     'compute.max_rows', 1000, "compute.ops_on_diff_frames", True
... ):  
...     psdf = ps.DataFrame({'a': range(1001)})
...     psser = ps.Series([2], index=['a'])
...     psdf.dot(psser)
Traceback (most recent call last):
  ...
ValueError: Current DataFrame has more then the given limit 1000 rows.
Please set 'compute.max_rows' by using 'pyspark.pandas.config.set_option'
to retrieve to retrieve more than 1000 rows. Note that, before changing the
'compute.max_rows', this operation is considerably expensive.

參數：

other：Series: 計算矩陣乘積的另一個對象。

Series: 將 self 和 other 之間的矩陣乘積作為 Series 返回。

注意：

DataFrame 和其他的維度必須兼容才能計算矩陣乘法。此外，DataFrame 的列名和其他索引必須包含相同的值，因為它們將在乘法之前對齊。

Series 的 dot 方法計算內積，而不是這裏的矩陣積。

例子：

>>> from pyspark.pandas.config import set_option, reset_option
>>> set_option("compute.ops_on_diff_frames", True)
>>> psdf = ps.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> psser = ps.Series([1, 1, 2, 1])
>>> psdf.dot(psser)
0   -4
1    5
dtype: int64

請注意，對象的洗牌不會改變結果。

>>> psser2 = psser.reindex([1, 0, 2, 3])
>>> psdf.dot(psser2)
0   -4
1    5
dtype: int64
>>> psdf @ psser2
0   -4
1    5
dtype: int64
>>> reset_option("compute.ops_on_diff_frames")

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.pandas.DataFrame.dot。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數：

返回：

注意：

例子：