Python clx.dns.dns_extractor.parse_url用法及代码示例

用法: clx.dns.dns_extractor.parse_url(url_series, req_cols=None)

此函数提取给定 url 的子域、域和后缀。

参数：

url_df_col：(cudf.Series) - 要处理的 URL。
req_cols：(set(strings)) - 请求提取的列，例如(域、子域、后缀和主机名)。

提取的请求列的信息。

返回类型：

cudf.DataFrame

例子：

>>> from cudf import DataFrame
>>> from clx.dns import dns_extractor as dns
>>>
>>> input_df = DataFrame(
...     {
...         "url": [
...             "http://www.google.com",
...             "gmail.com",
...             "github.com",
...             "https://pandas.pydata.org",
...         ]
...     }
... )
>>> dns.parse_url(input_df["url"])
            hostname  domain suffix subdomain
0     www.google.com  google    com       www
1          gmail.com   gmail    com
2         github.com  github    com
3  pandas.pydata.org  pydata    org    pandas
>>> dns.parse_url(input_df["url"], req_cols={'domain', 'suffix'})
   domain suffix
0  google    com
1   gmail    com
2  github    com
3  pydata    org

相关用法

注：本文由纯净天空筛选整理自rapids.ai大神的英文原创作品 clx.dns.dns_extractor.parse_url。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。

用法:

参数：

返回：

返回类型：

例子：