當前位置: 首頁>>代碼示例 >>用法及示例精選 >>正文


Python pyspark DataFrame.align用法及代碼示例


本文簡要介紹 pyspark.pandas.DataFrame.align 的用法。

用法:

DataFrame.align(other: Union[DataFrame, Series], join: str = 'outer', axis: Union[int, str, None] = None, copy: bool = True) → Tuple[DataFrame, Union[DataFrame, Series]]

使用指定的連接方法將兩個對象在其軸上對齊。

為每個軸索引指定連接方法。

參數

otherDataFrame 或係列
join{{‘outer’, ‘inner’, ‘left’, ‘right’}},默認 ‘outer’
axis其他對象的允許軸,默認無

對齊索引 (0)、列 (1) 或兩者(無)。

copy布爾值,默認為真

總是返回新對象。如果 copy=False 並且不需要重新索引,則返回原始對象。

返回

(left, right)(DataFrame,其他類型)

對齊的對象。

例子

>>> ps.set_option("compute.ops_on_diff_frames", True)
>>> df1 = ps.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"]}, index=[10, 20, 30])
>>> df2 = ps.DataFrame({"a": [4, 5, 6], "c": ["d", "e", "f"]}, index=[10, 11, 12])

對齊兩個軸:

>>> aligned_l, aligned_r = df1.align(df2)
>>> aligned_l.sort_index()
      a     b   c
10  1.0     a NaN
11  NaN  None NaN
12  NaN  None NaN
20  2.0     b NaN
30  3.0     c NaN
>>> aligned_r.sort_index()
      a   b     c
10  4.0 NaN     d
11  5.0 NaN     e
12  6.0 NaN     f
20  NaN NaN  None
30  NaN NaN  None

僅對齊軸 = 0(索引):

>>> aligned_l, aligned_r = df1.align(df2, axis=0)
>>> aligned_l.sort_index()
      a     b
10  1.0     a
11  NaN  None
12  NaN  None
20  2.0     b
30  3.0     c
>>> aligned_r.sort_index()
      a     c
10  4.0     d
11  5.0     e
12  6.0     f
20  NaN  None
30  NaN  None

僅對齊軸 = 1(列):

>>> aligned_l, aligned_r = df1.align(df2, axis=1)
>>> aligned_l.sort_index()
    a  b   c
10  1  a NaN
20  2  b NaN
30  3  c NaN
>>> aligned_r.sort_index()
    a   b  c
10  4 NaN  d
11  5 NaN  e
12  6 NaN  f

與連接類型 “inner” 對齊:

>>> aligned_l, aligned_r = df1.align(df2, join="inner")
>>> aligned_l.sort_index()
    a
10  1
>>> aligned_r.sort_index()
    a
10  4

與係列對齊:

>>> s = ps.Series([7, 8, 9], index=[10, 11, 12])
>>> aligned_l, aligned_r = df1.align(s, axis=0)
>>> aligned_l.sort_index()
      a     b
10  1.0     a
11  NaN  None
12  NaN  None
20  2.0     b
30  3.0     c
>>> aligned_r.sort_index()
10    7.0
11    8.0
12    9.0
20    NaN
30    NaN
dtype: float64
>>> ps.reset_option("compute.ops_on_diff_frames")

相關用法


注:本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.pandas.DataFrame.align。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。