Python Pandas를 사용하여 날짜 및 시간 열 결합
다음 열이있는 pandas 데이터 프레임이 있습니다.
Date Time
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
데이터 [ '날짜']와 데이터 [ '시간']을 결합하여 다음을 얻으려면 어떻게해야합니까? 사용하는 방법이 pd.to_datetime
있습니까?
Date
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
예를 들어 using을 사용하는 경우 직접 읽을 수 있었을 수도 있다는 점을 언급 할 가치가 있습니다 .read_csv
parse_dates=[['Date', 'Time']]
이것들이 단지 문자열이라고 가정하면 (공백으로) 간단히 추가 할 수 있으므로 다음을 적용 할 수 있습니다 to_datetime
.
In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object
In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]
참고 : 놀랍게도 (저에게는) NaN이 NaT로 변환되면 잘 작동하지만 변환 (아마도 raise
인수 사용)에 대해 걱정할 가치가 있습니다.
The accepted answer works for columns that are of datatype string
. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)
You can use this to merge date and time into the same column of dataframe.
import pandas as pd
data_file = 'data.csv' #path of your file
Reading .csv file with merged columns Date_Time:
data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']])
You can use this line to keep both other columns also.
data.set_index(['Date', 'Time'], drop=False)
I don't have enough reputation to comment on jka.ne so:
I had to amend jka.ne's line for it to work:
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)
This might help others.
Also, I have tested a different approach, using replace
instead of combine
:
def combine_date_time(df, datecol, timecol):
return df.apply(lambda row: row[datecol].replace(
hour=row[timecol].hour,
minute=row[timecol].minute),
axis=1)
which in the OP's case would be:
combine_date_time(df, 'Date', 'Time')
I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar runtimes, but using combine
is faster (59s for replace
vs 50s for combine
).
You can cast the columns if the types are different (datetime and timestamp or str) and use to_datetime :
df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))
Result :
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
Best,
The answer really depends on what your column types are. In my case, I had datetime
and timedelta
.
> df[['Date','Time']].dtypes
Date datetime64[ns]
Time timedelta64[ns]
If this is your case, then you just need to add the columns:
> df['Date'] + df['Time']
First make sure to have the right data types:
df["Date"] = pd.to_datetime(df["Date"])
df["Time"] = pd.to_timedelta(df["Time"])
Then you easily combine them:
df["DateTime"] = df["Date"] + df["Time"]
You can also convert to datetime
without string concatenation, by combining datetime
and timedelta
objects. Combined with pd.DataFrame.pop
, you can remove the source series simultaneously:
df['DateTime'] = pd.to_datetime(df.pop('Date')) + pd.to_timedelta(df.pop('Time'))
print(df)
DateTime
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
print(df.dtypes)
DateTime datetime64[ns]
dtype: object
Use the combine
function:
datetime.datetime.combine(date, time)
My dataset had 1second resolution data for a few days and parsing by the suggested methods here was very slow. Instead I used:
dates = pandas.to_datetime(df.Date, cache=True)
times = pandas.to_timedelta(df.Time)
datetimes = dates + times
Note the use of cache=True
makes parsing the dates very efficient since there are only a couple unique dates in my files, which is not true for a combined date and time column.
참고URL : https://stackoverflow.com/questions/17978092/combine-date-and-time-columns-using-python-pandas
'Programing' 카테고리의 다른 글
신속한 "전제 조건"과 "어설 션"의 차이점은 무엇입니까? (0) | 2020.09.01 |
---|---|
vim에서 실행 된 명령이 bash 명령 별칭을 인식하지 못합니다. (0) | 2020.09.01 |
런타임에 변수 유형을 얻고 싶습니다. (0) | 2020.09.01 |
React redux에서 가져 오기 오류를 처리하는 가장 좋은 방법은 무엇입니까? (0) | 2020.09.01 |
dict처럼 작동하는 파이썬 클래스 (0) | 2020.09.01 |