Python pandas.to_datetime用法及代碼示例

用法: pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)

將參數轉換為日期時間。

此函數將標量 array-like、Series 或 DataFrame /dict-like 轉換為 pandas 日期時間對象。

參數：

arg：int float, str, datetime, list, tuple, 一維數組, Series, DataFrame/dict-like

要轉換為日期時間的對象。如果提供了DataFrame，則該方法至少需要以下列："year"、"month"、"day"。

errors：{‘ignore’, ‘raise’, ‘coerce’}，默認 ‘raise’

如果 'raise' ，則無效解析將引發異常。
如果 'coerce' ，則無效解析將設置為 NaT 。
如果 'ignore' ，則無效解析將返回輸入。

dayfirst：布爾值，默認為 False

如果 arg 是 str 或 list-like，則指定日期解析順序。如果 True ，首先解析日期，例如"10/11/12" 被解析為 2012-11-10 。

警告

dayfirst=True 並不嚴格，但更喜歡先解析 day。如果無法根據給定的 dayfirst 選項解析分隔日期字符串，例如to_datetime(['31-12-2021']) ，然後將顯示警告。

yearfirst：布爾值，默認為 False

如果 arg 是 str 或 list-like，則指定日期解析順序。

如果 True 以年份開頭解析日期，例如"10/11/12" 被解析為 2010-11-12 。
如果 dayfirst 和 yearfirst 都是 True ，則在 yearfirst 之前(與 dateutil 相同)。

警告

yearfirst=True 並不嚴格，但更喜歡先解析年份。

utc：布爾值，默認無

控製timezone-related的解析、本地化和轉換。

如果True，函數總是返回 timezone-aware UTC-localizedpandas.Timestamp,pandas.Series或者DatetimeIndex.為此，timezone-naive 輸入是本地化作為 UTC，而 timezone-aware 輸入是已轉換到 UTC。
如果False(默認)，輸入將不會被強製轉換為 UTC。 Timezone-naive 輸入將保持幼稚，而 timezone-aware 將保持其時間偏移。混合偏移(通常是夏令時)存在限製，有關詳細信息，請參閱示例部分。

另請參閱：pandas 關於 timezone conversion and localization 的一般文檔。

format：str，默認無

解析時間的 strftime，例如"%d/%m/%Y" 。請注意，"%f" 將一直解析到納秒。有關選擇的更多信息，請參閱strftime documentation。

exact：布爾值，默認為真

控製如何使用format：

如果 True ，需要精確的 format 匹配。
如果 False ，允許 format 匹配目標字符串中的任何位置。

unit：str，默認 ‘ns’

arg 的單位 (D,s,ms,us,ns) 表示單位，它是整數或浮點數。這將基於原點。例如，使用 unit='ms' 和 origin='unix'(默認值)，這將計算到 unix 紀元開始的毫秒數。

infer_datetime_format：布爾值，默認為 False

如果 True 並且沒有給出 format，則嘗試根據第一個非 NaN 元素推斷日期時間字符串的格式，如果可以推斷，則切換到更快的解析方法。在某些情況下，這可以將解析速度提高約 5-10 倍。

origin：標量，默認 ‘unix’

定義參考日期。自此參考日期以來，數值將被解析為單位數(由 unit 定義)。

如果'unix'(或POSIX)時間；原點設置為 1970-01-01。
如果 'julian' ，單位必須是 'D' ，並且原點設置為儒略曆的開頭。儒略日編號0 分配給從公元前 4713 年 1 月 1 日中午開始的那一天。
如果 Timestamp 可轉換，則 origin 設置為由 origin 標識的 Timestamp。

cache：布爾值，默認為真

如果 True ，使用唯一的轉換日期緩存來應用日期時間轉換。解析重複的日期字符串時可能會產生顯著的speed-up，尤其是具有時區偏移的字符串。僅當至少有 50 個值時才使用緩存。越界值的存在將導致緩存不可用並可能減慢解析速度。

datetime

如果解析成功。返回類型取決於輸入(括號中的類型對應於在不成功的時區或超出範圍的時間戳解析的情況下的回退)：

標量：Timestamp(或 datetime.datetime)
array-like：DatetimeIndex(或 Series 與 object dtype 包含 datetime.datetime )
係列：datetime64 dtype 的Series(或包含datetime.datetime 的object dtype 的Series)
數據幀：datetime64 dtype 的 Series(或包含 datetime.datetime 的 object dtype 的 Series)

拋出：

ParserError: 從字符串解析日期失敗時。
ValueError: 當另一個日期時間轉換錯誤發生時。例如，當 ‘year’, ‘month’、day' 列之一在 DataFrame 中丟失時，或者當 Timezone-aware datetime.datetime 在混合時間偏移的 array-like 和 utc=False 中找到時。

注意：

支持許多輸入類型，並導致不同的輸出類型：

標量可以是int float、str、datetime 對象(來自 stdlibdatetime模塊或numpy)。它們被轉換為pandas.Timestamp如果可能，否則它們將轉換為datetime.datetime. None/NaN/null 標量轉換為NaT.
array-like可以包含int float、str、datetime 對象。它們被轉換為DatetimeIndex如果可能，否則它們將轉換為pandas.Index和objectdtype，包含datetime.datetime. None/NaN/null 條目被轉換為NaT在這兩種情況下。
Series被轉換為pandas.Series和datetime64dtype 盡可能，否則它們將轉換為pandas.Series和objectdtype，包含datetime.datetime. None/NaN/null 條目被轉換為NaT在這兩種情況下。
數據幀/dict-like被轉換為pandas.Series和datetime64類型。對於每一行，通過組合各種 DataFrame 列創建一個日期時間。列鍵可以是常見的縮寫，如 [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) 或相同的複數。

以下原因導致返回 datetime.datetime 對象(可能在 Index 或帶有 object dtype 的 Series 內)而不是正確的 pandas 指定類型( Timestamp 、 DatetimeIndex 或 Series datetime64 數據類型)：

當任何輸入元素在 Timestamp.min 之前或 Timestamp.max 之後，請參見 timestamp limitations 。
當utc=False(默認)並且輸入是一個array-like 或Series 包含混合的naive/aware datetime，或有混合的時間偏移量。請注意，當時區有夏令時政策時，這種情況會發生(非常頻繁)。在這種情況下，您可能希望使用 utc=True 。

例子：

處理各種輸入格式

從 DataFrame 的多列組裝日期時間。鍵可以是常見的縮寫，如 [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) 或相同的複數

>>> df = pd.DataFrame({'year': [2015, 2016],
...                    'month': [2, 3],
...                    'day': [4, 5]})
>>> pd.to_datetime(df)
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

傳遞 infer_datetime_format=True 可以 often-times 加速解析，如果它不是 ISO8601 格式，而是常規格式。

>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'] * 1000)
>>> s.head()
0    3/11/2000
1    3/12/2000
2    3/13/2000
3    3/11/2000
4    3/12/2000
dtype: object

>>> %timeit pd.to_datetime(s, infer_datetime_format=True)  
100 loops, best of 3: 10.4 ms per loop

>>> %timeit pd.to_datetime(s, infer_datetime_format=False)  
1 loop, best of 3: 471 ms per loop

使用 unix 紀元時間

>>> pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
>>> pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')

警告

對於 float arg，可能會發生精確舍入。為防止出現意外行為，請使用 fixed-width 精確類型。

使用非 Unix 紀元起源

>>> pd.to_datetime([1, 2, 3], unit='D',
...                origin=pd.Timestamp('1960-01-01'))
DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'],
              dtype='datetime64[ns]', freq=None)

不可轉換的日期/時間

如果日期不符合 timestamp limitations ，則傳遞 errors='ignore' 將返回原始輸入，而不是引發任何異常。

傳遞 errors='coerce' 除了將非日期(或不可解析的日期)強製為 NaT 之外，還會強製將越界日期設置為 NaT 。

>>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
datetime.datetime(1300, 1, 1, 0, 0)
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce')
NaT

時區和時間偏移

默認行為(utc=False)如下：

Timezone-naive 輸入轉換為 timezone-naive DatetimeIndex ：

>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00:15'])
DatetimeIndex(['2018-10-26 12:00:00', '2018-10-26 13:00:15'],
              dtype='datetime64[ns]', freq=None)

Timezone-aware 輸入具有恒定的時間偏移被轉換為timezone-awareDatetimeIndex：

>>> pd.to_datetime(['2018-10-26 12:00 -0500', '2018-10-26 13:00 -0500'])
DatetimeIndex(['2018-10-26 12:00:00-05:00', '2018-10-26 13:00:00-05:00'],
              dtype='datetime64[ns, pytz.FixedOffset(-300)]', freq=None)

但是，timezone-aware 輸入具有混合時間偏移(例如從具有夏令時的時區發出，例如歐洲/巴黎)是未成功轉換到一個DatetimeIndex.取而代之的是一個簡單的pandas.Index包含datetime.datetime返回對象：

>>> pd.to_datetime(['2020-10-25 02:00 +0200', '2020-10-25 04:00 +0100'])
Index([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00],
      dtype='object')

如果 timezone-aware 的偏移量恒定，則 timezone-aware 和 timezone-naive 輸入的混合將轉換為 timezone-aware DatetimeIndex：

>>> from datetime import datetime
>>> pd.to_datetime(["2020-01-01 01:00 -01:00", datetime(2020, 1, 1, 3, 0)])
DatetimeIndex(['2020-01-01 01:00:00-01:00', '2020-01-01 02:00:00-01:00'],
              dtype='datetime64[ns, pytz.FixedOffset(-60)]', freq=None)

最後，混合timezone-aware 字符串和datetime.datetime 總是會引發錯誤，即使元素都具有相同的時間偏移。

>>> from datetime import datetime, timezone, timedelta
>>> d = datetime(2020, 1, 1, 18, tzinfo=timezone(-timedelta(hours=1)))
>>> pd.to_datetime(["2020-01-01 17:00 -0100", d])
Traceback (most recent call last):
    ...
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64
            unless utc=True

設置utc=True 可以解決上述大部分問題：

Timezone-naive 輸入本地化為 UTC

>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00'], utc=True)
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 13:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

Timezone-aware 輸入是已轉換到 UTC(輸出表示完全相同的日期時間，但從 UTC 時間偏移量來看+00:00)。

>>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'],
...                utc=True)
DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

輸入可以包含naive和aware，字符串或日期時間，上述規則仍然適用

>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 12:00 -0530',
...                datetime(2020, 1, 1, 18),
...                datetime(2020, 1, 1, 18,
...                tzinfo=timezone(-timedelta(hours=1)))],
...                utc=True)
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 17:30:00+00:00',
               '2020-01-01 18:00:00+00:00', '2020-01-01 19:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

相關用法

注：本文由純淨天空篩選整理自pandas.pydata.org大神的英文原創作品 pandas.to_datetime。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數：

返回：

拋出：

注意：

例子：