Python pyspark Series.str.split用法及代碼示例

本文簡要介紹 pyspark.pandas.Series.str.split 的用法。

用法: str.split(pat: Optional[str] = None, n: int = - 1, expand: bool = False) → Union[ps.Series, ps.DataFrame]

圍繞給定的分隔符/分隔符拆分字符串。

在指定的分隔符字符串處從頭開始拆分係列中的字符串。相當於str.split()。

參數：

pat：str，可選

要拆分的字符串或正則表達式。如果未指定，則在空格處拆分。

n：int，默認 -1(全部)

限製輸出中的拆分數量。 None、0 和 -1 將被解釋為返回所有拆分。

expand：布爾值，默認為 False

將拆分的字符串展開為單獨的列。

如果 True ， n 必須是正整數，並返回 DataFrame 擴展維度。
如果 False ，返回 Series，包含字符串列表。

係列，DataFrame: 類型匹配調用者，除非expand=True(見注釋)。

注意：

n 關鍵字的處理取決於找到的拆分數量：

如果發現拆分 > n ，請先進行 n 拆分
如果發現拆分 <= n ，則進行所有拆分
如果對於某一行，找到的拆分數 < n ，則追加 None 以填充到 n if expand=True

如果使用 expand=True ，係列調用者返回帶有 n + 1 列的 DataFrame 對象。

注意

即使 n 比找到的拆分大得多，列數也不會像 pandas 那樣縮小。

例子：

>>> s = ps.Series(["this is a regular sentence",
...                "https://docs.python.org/3/tutorial/index.html",
...                np.nan])

在默認設置中，字符串由空格分隔。

>>> s.str.split()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: object

如果沒有 n 參數，則 rsplit 和 split 的輸出是相同的。

>>> s.str.rsplit()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: object

n 參數可用於限製分隔符上的拆分次數。 split 和 rsplit 的輸出是不同的。

>>> s.str.split(n=2)
0                     [this, is, a regular sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: object

>>> s.str.rsplit(n=2)
0                     [this is a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                               None
dtype: object

pat 參數可用於按其他字符分割。

>>> s.str.split(pat = "/")
0                         [this is a regular sentence]
1    [https:, , docs.python.org, 3, tutorial, index...
2                                                 None
dtype: object

使用 expand=True 時，拆分元素將擴展為單獨的列。如果存在NaN，它會在拆分期間傳播到整個列。

>>> s.str.split(n=4, expand=True)
                                               0     1     2        3         4
0                                           this    is     a  regular  sentence
1  https://docs.python.org/3/tutorial/index.html  None  None     None      None
2                                           None  None  None     None      None

對於稍微複雜的用例，例如從 url 中拆分 html 文檔名稱，可以使用參數設置的組合。

>>> s.str.rsplit("/", n=1, expand=True)
                                    0           1
0          this is a regular sentence        None
1  https://docs.python.org/3/tutorial  index.html
2                                None        None

請記住在顯式使用正則表達式時轉義特殊字符。

>>> s = ps.Series(["1+1=2"])
>>> s.str.split(r"\+|=", n=2, expand=True)
   0  1  2
0  1  1  2

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.pandas.Series.str.split。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數：

返回：

注意：

例子：