Programing

Python Pandas를 사용하여 날짜 및 시간 열 결합

crosscheck 2020. 9. 1. 07:02
반응형

Python Pandas를 사용하여 날짜 및 시간 열 결합


다음 열이있는 pandas 데이터 프레임이 있습니다.

Date              Time
01-06-2013      23:00:00
02-06-2013      01:00:00
02-06-2013      21:00:00
02-06-2013      22:00:00
02-06-2013      23:00:00
03-06-2013      01:00:00
03-06-2013      21:00:00
03-06-2013      22:00:00
03-06-2013      23:00:00
04-06-2013      01:00:00

데이터 [ '날짜']와 데이터 [ '시간']을 결합하여 다음을 얻으려면 어떻게해야합니까? 사용하는 방법이 pd.to_datetime있습니까?

Date
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00

예를 들어 using을 사용하는 경우 직접 읽을 수 있었을 수도 있다는 점을 언급 할 가치가 있습니다 .read_csvparse_dates=[['Date', 'Time']]

이것들이 단지 문자열이라고 가정하면 (공백으로) 간단히 추가 할 수 있으므로 다음을 적용 할 수 있습니다 to_datetime.

In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0    01-06-2013 23:00:00
1    02-06-2013 01:00:00
2    02-06-2013 21:00:00
3    02-06-2013 22:00:00
4    02-06-2013 23:00:00
5    03-06-2013 01:00:00
6    03-06-2013 21:00:00
7    03-06-2013 22:00:00
8    03-06-2013 23:00:00
9    04-06-2013 01:00:00
dtype: object

In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0   2013-01-06 23:00:00
1   2013-02-06 01:00:00
2   2013-02-06 21:00:00
3   2013-02-06 22:00:00
4   2013-02-06 23:00:00
5   2013-03-06 01:00:00
6   2013-03-06 21:00:00
7   2013-03-06 22:00:00
8   2013-03-06 23:00:00
9   2013-04-06 01:00:00
dtype: datetime64[ns]

참고 : 놀랍게도 (저에게는) NaN이 NaT로 변환되면 잘 작동하지만 변환 (아마도 raise인수 사용)에 대해 걱정할 가치가 있습니다.


The accepted answer works for columns that are of datatype string. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.

df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)

You can use this to merge date and time into the same column of dataframe.

import pandas as pd    
data_file = 'data.csv' #path of your file

Reading .csv file with merged columns Date_Time:

data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']]) 

You can use this line to keep both other columns also.

data.set_index(['Date', 'Time'], drop=False)

I don't have enough reputation to comment on jka.ne so:

I had to amend jka.ne's line for it to work:

df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)

This might help others.

Also, I have tested a different approach, using replace instead of combine:

def combine_date_time(df, datecol, timecol):
    return df.apply(lambda row: row[datecol].replace(
                                hour=row[timecol].hour,
                                minute=row[timecol].minute),
                    axis=1)

which in the OP's case would be:

combine_date_time(df, 'Date', 'Time')

I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar runtimes, but using combine is faster (59s for replace vs 50s for combine).


You can cast the columns if the types are different (datetime and timestamp or str) and use to_datetime :

df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))

Result :

0   2013-01-06 23:00:00
1   2013-02-06 01:00:00
2   2013-02-06 21:00:00
3   2013-02-06 22:00:00
4   2013-02-06 23:00:00
5   2013-03-06 01:00:00
6   2013-03-06 21:00:00
7   2013-03-06 22:00:00
8   2013-03-06 23:00:00
9   2013-04-06 01:00:00

Best,


The answer really depends on what your column types are. In my case, I had datetime and timedelta.

> df[['Date','Time']].dtypes
Date     datetime64[ns]
Time    timedelta64[ns]

If this is your case, then you just need to add the columns:

> df['Date'] + df['Time']

First make sure to have the right data types:

df["Date"] = pd.to_datetime(df["Date"])
df["Time"] = pd.to_timedelta(df["Time"])

Then you easily combine them:

df["DateTime"] = df["Date"] + df["Time"]

You can also convert to datetime without string concatenation, by combining datetime and timedelta objects. Combined with pd.DataFrame.pop, you can remove the source series simultaneously:

df['DateTime'] = pd.to_datetime(df.pop('Date')) + pd.to_timedelta(df.pop('Time'))

print(df)

             DateTime
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00

print(df.dtypes)

DateTime    datetime64[ns]
dtype: object

Use the combine function:

datetime.datetime.combine(date, time)

My dataset had 1second resolution data for a few days and parsing by the suggested methods here was very slow. Instead I used:

dates = pandas.to_datetime(df.Date, cache=True)
times = pandas.to_timedelta(df.Time)
datetimes  = dates + times

Note the use of cache=True makes parsing the dates very efficient since there are only a couple unique dates in my files, which is not true for a combined date and time column.

참고URL : https://stackoverflow.com/questions/17978092/combine-date-and-time-columns-using-python-pandas

반응형