Techno Blender
Digitally Yours.

4 FAQ About Date-Time Manipulation with Pandas | by Soner Yıldırım | Jun, 2022

0 35


Explained with examples

(image by the author)

Pandas library is quite efficient with time series data. In fact, it was created by Wes McKinney to work with financial data which essentially consists of time series data.

When working with time series data, a substantial amount of time is spent on date and time manipulation. In this article, we will go over 4 frequently asked questions in this area.

You may have come across some of these questions. They can be solved with simple operations except for the last one, which is a bit tricky and requires multiple steps to solve.

Let’s start with creating a sample DataFrame to work with.

df = pd.DataFrame({    "booking_id": [1001, 1002, 1003, 1004, 1005],
"property" : ["A", "A", "B", "B", "C"],
"created_at": ["2022-03-01", "2022-02-10", "2022-04-12",
"2022-04-11", "2022-06-05"],
"checkin_date": ["2022-06-01", "2022-06-10", "2022-06-02",
"2022-06-20", "2022-08-10"],
"checkout_date": ["2022-06-06", "2022-06-15",
"2022-06-06","2022-06-28", "2022-08-16"],
"amount": [5400, 5600, 4800, 9000, 6500]
})# change the data type
date_cols = ["created_at","checkin_date","checkout_date"]
df[date_cols] = df[date_cols].astype("datetime64[ns]")
# display the DataFrame
df
df (image by author)

To be able to use the date manipulation functions of Pandas, we need to have the dates in a proper data type. This is the reason why we change the data type to “datetime64[ns]”.

1. How to extract year-month?

A date contains different pieces of information such as year, day of week, month, and so on. All the different pieces of information can be extracted from a date using the methods that are available through the dt accessor.

For instance, we can get the month using the month method. One of the not-so-obvious ones is the year-month combination. We can extract this information with the help of the to_period method.

# create the year_month column
df["year_month"] = df["created_at"].dt.to_period("M")
# display the DataFrame
df
df (image by author)

2. How to add a time interval to a date?

Adding or subtracting time intervals to a date is commonly performed in date manipulation. We can perform this task using the “DateOffset” or “Timedelta” function.

Let’s add 1 day to the checkout date of the booking with id 1001.

df.loc[df["booking_id"]==1001, "checkout_date"] = \
df.loc[df["booking_id"]==1001, "checkout_date"] + \
pd.DateOffset(days=1)
# check the result
print(df.loc[df["booking_id"]==1001, "checkout_date"])
# output
0 2022-06-07
Name: checkout_date, dtype: datetime64[ns]

3. How to find the difference between two dates in days?

We can find the difference between two dates by subtracting one from the other. However, the result of this operation is a Timedelta object that looks like this:

df["checkout_date"][0] - df["checkin_date"][0]# output
Timedelta('6 days 00:00:00')

We can extract the number of days as an integer by using the days method. Let’s create a column that shows the number of days between the check-in date and the date booking was created.

# difference in days
df["days_to_checkin"] = \
(df["checkin_date"] - df["created_at"]).dt.days
# display the DataFrame
df
df (image by author)

4. How to expand dates between a starting and ending date?

Consider we need a calendar that shows the booked days of properties. For instance, the booking in the first row of our DataFrame tells us that property A is booked from 2022–06–01 to 2022–06–07. Therefore, property A is booked for dates 2022–06–01, 2022–06–02, 2022–06–03, 2022–06–04, 2022–06–05, 2022–06–06 (Assuming the checkout is due 10 AM on 2022–06–07).

We can create such a calendar by finding the dates between the check-in and check-out dates and then expanding the DataFrame based on these dates.

First we create a calendar DataFrame that contains the property, checkin_date, and checkout_date columns.

# create a calendar DataFrame
calendar = df[["property","checkin_date","checkout_date"]]

The date_range function gives us the dates between a starting and ending date. Here is how it looks on the first booking:

pd.date_range(calendar["checkin_date"][0], calendar["checkout_date"][0])# output
DatetimeIndex(['2022-06-01', '2022-06-02', '2022-06-03',
'2022-06-04', '2022-06-05', '2022-06-06',
'2022-06-07'],
dtype='datetime64[ns]', freq='D')

The issue here is that we do not want the check-out date to be shown as booked. Thus, we will subtract 1 day from the check-out date before finding the dates in between.

To do this operation on all the rows, we need to use the apply function. We will also convert the output of the date_range function to a list using the list constructor.

# create the booked_days column
calendar.loc[:, "booked_days"] = calendar.apply(

lambda x: list(
pd.date_range(
x.checkin_date,
x.checkout_date + pd.DateOffset(days=1)
).date
),
axis = 1

)

# display the DataFrame
calendar
calendar (image by author)

The next step is to expand the DataFrame based on the dates in the booked_days column. The explode function does exactly this operation.

# explode 
calendar = calendar.explode(
column="booked_days", ignore_index=True
)[["property","booked_days"]]
# display the first 5 rows
calendar.head()
calendar (image by author)

We now have a calendar of booked days.

We have solved 4 issues that you are likely to encounter while working on time series data. The last one is not as common as the first three but I wanted to include that because it’s a bit tricky to solve.

You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.

Thank you for reading. Please let me know if you have any feedback.


Explained with examples

(image by the author)

Pandas library is quite efficient with time series data. In fact, it was created by Wes McKinney to work with financial data which essentially consists of time series data.

When working with time series data, a substantial amount of time is spent on date and time manipulation. In this article, we will go over 4 frequently asked questions in this area.

You may have come across some of these questions. They can be solved with simple operations except for the last one, which is a bit tricky and requires multiple steps to solve.

Let’s start with creating a sample DataFrame to work with.

df = pd.DataFrame({    "booking_id": [1001, 1002, 1003, 1004, 1005],
"property" : ["A", "A", "B", "B", "C"],
"created_at": ["2022-03-01", "2022-02-10", "2022-04-12",
"2022-04-11", "2022-06-05"],
"checkin_date": ["2022-06-01", "2022-06-10", "2022-06-02",
"2022-06-20", "2022-08-10"],
"checkout_date": ["2022-06-06", "2022-06-15",
"2022-06-06","2022-06-28", "2022-08-16"],
"amount": [5400, 5600, 4800, 9000, 6500]
})# change the data type
date_cols = ["created_at","checkin_date","checkout_date"]
df[date_cols] = df[date_cols].astype("datetime64[ns]")
# display the DataFrame
df
df (image by author)

To be able to use the date manipulation functions of Pandas, we need to have the dates in a proper data type. This is the reason why we change the data type to “datetime64[ns]”.

1. How to extract year-month?

A date contains different pieces of information such as year, day of week, month, and so on. All the different pieces of information can be extracted from a date using the methods that are available through the dt accessor.

For instance, we can get the month using the month method. One of the not-so-obvious ones is the year-month combination. We can extract this information with the help of the to_period method.

# create the year_month column
df["year_month"] = df["created_at"].dt.to_period("M")
# display the DataFrame
df
df (image by author)

2. How to add a time interval to a date?

Adding or subtracting time intervals to a date is commonly performed in date manipulation. We can perform this task using the “DateOffset” or “Timedelta” function.

Let’s add 1 day to the checkout date of the booking with id 1001.

df.loc[df["booking_id"]==1001, "checkout_date"] = \
df.loc[df["booking_id"]==1001, "checkout_date"] + \
pd.DateOffset(days=1)
# check the result
print(df.loc[df["booking_id"]==1001, "checkout_date"])
# output
0 2022-06-07
Name: checkout_date, dtype: datetime64[ns]

3. How to find the difference between two dates in days?

We can find the difference between two dates by subtracting one from the other. However, the result of this operation is a Timedelta object that looks like this:

df["checkout_date"][0] - df["checkin_date"][0]# output
Timedelta('6 days 00:00:00')

We can extract the number of days as an integer by using the days method. Let’s create a column that shows the number of days between the check-in date and the date booking was created.

# difference in days
df["days_to_checkin"] = \
(df["checkin_date"] - df["created_at"]).dt.days
# display the DataFrame
df
df (image by author)

4. How to expand dates between a starting and ending date?

Consider we need a calendar that shows the booked days of properties. For instance, the booking in the first row of our DataFrame tells us that property A is booked from 2022–06–01 to 2022–06–07. Therefore, property A is booked for dates 2022–06–01, 2022–06–02, 2022–06–03, 2022–06–04, 2022–06–05, 2022–06–06 (Assuming the checkout is due 10 AM on 2022–06–07).

We can create such a calendar by finding the dates between the check-in and check-out dates and then expanding the DataFrame based on these dates.

First we create a calendar DataFrame that contains the property, checkin_date, and checkout_date columns.

# create a calendar DataFrame
calendar = df[["property","checkin_date","checkout_date"]]

The date_range function gives us the dates between a starting and ending date. Here is how it looks on the first booking:

pd.date_range(calendar["checkin_date"][0], calendar["checkout_date"][0])# output
DatetimeIndex(['2022-06-01', '2022-06-02', '2022-06-03',
'2022-06-04', '2022-06-05', '2022-06-06',
'2022-06-07'],
dtype='datetime64[ns]', freq='D')

The issue here is that we do not want the check-out date to be shown as booked. Thus, we will subtract 1 day from the check-out date before finding the dates in between.

To do this operation on all the rows, we need to use the apply function. We will also convert the output of the date_range function to a list using the list constructor.

# create the booked_days column
calendar.loc[:, "booked_days"] = calendar.apply(

lambda x: list(
pd.date_range(
x.checkin_date,
x.checkout_date + pd.DateOffset(days=1)
).date
),
axis = 1

)

# display the DataFrame
calendar
calendar (image by author)

The next step is to expand the DataFrame based on the dates in the booked_days column. The explode function does exactly this operation.

# explode 
calendar = calendar.explode(
column="booked_days", ignore_index=True
)[["property","booked_days"]]
# display the first 5 rows
calendar.head()
calendar (image by author)

We now have a calendar of booked days.

We have solved 4 issues that you are likely to encounter while working on time series data. The last one is not as common as the first three but I wanted to include that because it’s a bit tricky to solve.

You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.

Thank you for reading. Please let me know if you have any feedback.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment