Dealing with Dates in Python’s DataFrame Part 2— The Basics | by KahEm Chu | Jan, 2023
Data Processing in Python
This article explained the basic pandas’ methods and properties for dealing with the DateTime series in a data frame
As written in the title, this article is part 2 of my Dealing with Dates in Python’s DataFrame series. Below show the content of each part of the Dealing with Dates in Python’s DataFrame series.
In my previous article, I have shown the DateTime series creation methods. Then, in this article, I will show the basic properties and methods to deal with the Datetime series in a data frame.
With that, this article will be structured as below:
- Convert Datatype to DateTime
- General DateTime Info Extraction
- Check if the Date is the Start or the End of the Period
- Check if the Date Belongs to a Leap Year
- Check the Number of Days in the Month
Let’s get started!
For the date creation methods I show in the previous article, the series is created as a DateTime object. When you are reading the data from Excel or any other data source, the DateTime column will be read as a string object if you did not parse them as dates. To extract the DateTime info from the DateTime series, the column needs to be converted into the DateTime data type beforehand.
There are two methods to convert the data type into DateTime.
pandas.Series.astype(“DateTime”)
pandas.datetime(pandas.SeriesSeries)
I created a demo.csv file with the script below to demonstrate the methods and properties for this section.
import pandas as pd
import datetime as dt
df = pd.DataFrame()
initial_date = pd.Timestamp("2023-01-04 21:55:00")
df["timedelta_range"] = pd.timedelta_range(start="1 day", end="5days", freq="4H")
df.insert(loc=0, column="initial_date",value=initial_date)
df.insert(loc=2, column="next_date",value=df["initial_date"] + df["timedelta_range"])
print(df.head(10))
print("Data Type: ", df.dtypes)
df.to_csv("demo.csv")
The output data frame is as in the screenshot below.
Now, we will read the file generated.
import pandas as pd
df = pd.read_csv(r"demo.csv", usecols=["next_date"])
print(df.info())
df.head()
For demonstration purposes, only the next_date
column will be read.
As you can see, when the column is imported directly without parsing it as DateTime, the column will be a string column where the Dtype is the object. Below are two popular methods to convert the column into a DateTime data type.
- pandas.to_datetime(pandas.Series)
df["next_date"] = pd.to_datetime(df["next_date"])
df.info()
- pandas.Series.astype(“datetime64”)
df["next_date"] = df["next_date"].astype("datetime64")
df.info()
Output from both scripts above:
Alternatively, you can parse the columns into a DateTime object at the point you import the data with parse_dates
parameter.
import pandas as pd
df = pd.read_csv(r"demo.csv", usecols=["next_date"], parse_dates=["next_date"])
print(df.info())
df.head()
There is a lot of info that can be obtained from the datetime series.
- Timestamp
- Week
- Day of Year
- Quarter
- ISO Calendar
Extract the Timestamp Info
Below is the list of properties and methods to return timestamp info.
import datetime as dt
The datetime module needs to be imported before using the methods or properties under series.dt
. Below is an example of extracting the timestamp info.
df.insert(loc=1, column="Date_",value=df["next_date"].dt.date)
df.insert(loc=2, column="Time",value=df["next_date"].dt.time)
df.insert(loc=3, column="year",value=df["next_date"].dt.year)
df.insert(loc=4, column="month",value=df["next_date"].dt.month)
# note that the month_name is a method instead of properties
df.insert(loc=5, column="month_name",value=df["next_date"].dt.month_name())
df.insert(loc=6, column="day",value=df["next_date"].dt.day)
df.insert(loc=7, column="hour",value=df["next_date"].dt.hour)
df.insert(loc=8, column="minute",value=df["next_date"].dt.minute)
df.head()
Output:
One thing to take note of is the columns created are not a DateTime object, even the “Date_” column.
You may notice that the examples for second, microsecond and nanosecond are not demonstrated. That’s because they do not apply to the dataset. Furthermore, the way to apply is the same. The column type needs to be converted to datetime before using the properties or methods to return the respective value.
Extract Week Info/Day of Year/Quarter/ISO Calendar
Below is the list of properties and methods to return the week number, the day of the year, the quarter and the info based on the ISO Calendar for the DateTime series.
Week Info
Day of Year
Quarter
ISO Calendar
To show the fun stuff we can do with the methods/properties above, I create a list of dates with randomly picked dates distributed throughout the year as shown below.
import pandas as pd
import datetime as dt
date_list = ["2022-10-03", "2022-11-17", "2022-12-14", "2023-01-23", "2023-02-14", "2023-03-23", "2023-04-11", "2023-05-28", "2023-06-24", "2023-07-04", "2023-08-06", "2023-09-08"]
df = pd.DataFrame(date_list, columns=["Date"])
df["Date"] = pd.to_datetime(df["Date"])
df.head(12)
As we are not reading from a file, there is no parse_dates
function available. Hence, the column has to be converted into datetime manually.
Below is an example of extracting the week, day of the year, quarter and ISO Calendar. The columns are named based on the properties or methods names for better understanding.
df.insert(loc=1, column="Day of Week",value=df["Date"].dt.day_of_week)
df.insert(loc=2, column="Weekday",value=df["Date"].dt.weekday)
# note that the month_name is a method instead of properties
df.insert(loc=3, column="Day Name",value=df["Date"].dt.day_name())
# day of the year
df.insert(loc=4, column="Day of Year",value=df["Date"].dt.day_of_year)
# quarter
df.insert(loc=5, column="Quarter",value=df["Date"].dt.quarter)
# iso calendar
df.insert(loc=6, column="ISO Year",value=df["Date"].dt.isocalendar().year)
df.insert(loc=7, column="ISO Week",value=df["Date"].dt.isocalendar().week)
df.insert(loc=8, column="ISO Day",value=df["Date"].dt.isocalendar().day)
df[["Date", "Day of Week", "Weekday", "Day Name", "Day of Year", "Quarter", "ISO Year", "ISO Week", "ISO Day"]].head(12)
Output:
Below is a summary of the table above:
- For both
day_of_week
andweekday
properties, they return the day of the week with index counting starting from 0. - For
day_of_year
,quarter
properties and theisocalendar()
method, they return output with index counting starting from 1.
The isocalendar()
method index for counting the weekday is start from 1, while the weekday
starts from 0. They both count weekdays starting from Monday. In another word, the first index is referring to Monday.
For this section, a different date list will be created to better demonstrate the properties below.
Example:
date_list = ["2023-01-01", "2023-01-23", "2023-01-31", "2023-02-01", "2023-02-28", "2023-04-01", "2023-06-30", "2023-09-30", "2023-11-30", "2023-12-31"]
df = pd.DataFrame(date_list, columns=["Date"])
df["Date"] = pd.to_datetime(df["Date"])
df.insert(loc=1, column="Month Start",value=df["Date"].dt.is_month_start)
df.insert(loc=2, column="Month End",value=df["Date"].dt.is_month_end)
df.insert(loc=3, column="Quarter Start",value=df["Date"].dt.is_quarter_start)
df.insert(loc=4, column="Quarter End",value=df["Date"].dt.is_quarter_end)
df.insert(loc=5, column="Year Start",value=df["Date"].dt.is_year_start)
df.insert(loc=6, column="Year End",value=df["Date"].dt.is_year_end)
df.head(12)
Output:
Thoughts: I believe the properties are best for anyone that needs to prepare a new report on a monthly, quarterly or yearly basis.
These properties will help them to refresh their report based on the automation logic created. Other than that, the properties above may also be useful in the calculation that needs to be restarted periodically.
A leap year is a year, which has 366 days (instead of 365) including the 29th of February as an intercalary day. Leap years are years which are multiples of four except for the years divisible by 100 but not by 400.
We can demonstrate this function with the date created with the period range.
df = pd.DataFrame()
df["Year"] = pd.period_range(start="2022/1/1", periods=10, freq="Y")
df.insert(loc=1, column="Leap Year",value=df["Year"].dt.is_leap_year)
print(df.head(10))
Output:
Both of the properties below can return the number of days in a month.
df = pd.DataFrame()
df["Month"] = pd.period_range(start="2022/1/1", periods=12, freq="M")
df.insert(loc=1, column="Days in Month",value=df["Month"].dt.days_in_month)
df.head(12)
Output:
In conclusion, some basic properties and methods to deal with the DateTime series have been explained. The method to convert the datatype of columns that contained datetime objects to datetime has been shown. Then, the basic properties and methods to extract or return the datetime info have been demonstrated. The datetime info like weekday has different indexing for different methods.
Other than that, some methods to check the properties of the date, like the date is the start or end of a period, or whether the date belongs to a leap year have been shown. Lastly, the method to check the number of a date in a month have been presented. These methods and properties might be useful for reporting purposes.
That’s all for the basics of dealing with dates in Python. I hope you enjoy reading this article and hope it helps you to understand more about how to deal with the DateTime series in a DataFrame. Thanks! 😊
Subscribe on YouTube
I have explained the possible manipulation you can perform on the DateTime variables in Dealing with Dates in Python.
In Report Automation Tips with Python, I have explained some tips on report automation. Check it out!
Thank you and congrats for reading to the end 😊!
Happy 2023!
Data Processing in Python
This article explained the basic pandas’ methods and properties for dealing with the DateTime series in a data frame
As written in the title, this article is part 2 of my Dealing with Dates in Python’s DataFrame series. Below show the content of each part of the Dealing with Dates in Python’s DataFrame series.
In my previous article, I have shown the DateTime series creation methods. Then, in this article, I will show the basic properties and methods to deal with the Datetime series in a data frame.
With that, this article will be structured as below:
- Convert Datatype to DateTime
- General DateTime Info Extraction
- Check if the Date is the Start or the End of the Period
- Check if the Date Belongs to a Leap Year
- Check the Number of Days in the Month
Let’s get started!
For the date creation methods I show in the previous article, the series is created as a DateTime object. When you are reading the data from Excel or any other data source, the DateTime column will be read as a string object if you did not parse them as dates. To extract the DateTime info from the DateTime series, the column needs to be converted into the DateTime data type beforehand.
There are two methods to convert the data type into DateTime.
pandas.Series.astype(“DateTime”)
pandas.datetime(pandas.SeriesSeries)
I created a demo.csv file with the script below to demonstrate the methods and properties for this section.
import pandas as pd
import datetime as dt
df = pd.DataFrame()
initial_date = pd.Timestamp("2023-01-04 21:55:00")
df["timedelta_range"] = pd.timedelta_range(start="1 day", end="5days", freq="4H")
df.insert(loc=0, column="initial_date",value=initial_date)
df.insert(loc=2, column="next_date",value=df["initial_date"] + df["timedelta_range"])
print(df.head(10))
print("Data Type: ", df.dtypes)
df.to_csv("demo.csv")
The output data frame is as in the screenshot below.
Now, we will read the file generated.
import pandas as pd
df = pd.read_csv(r"demo.csv", usecols=["next_date"])
print(df.info())
df.head()
For demonstration purposes, only the next_date
column will be read.
As you can see, when the column is imported directly without parsing it as DateTime, the column will be a string column where the Dtype is the object. Below are two popular methods to convert the column into a DateTime data type.
- pandas.to_datetime(pandas.Series)
df["next_date"] = pd.to_datetime(df["next_date"])
df.info()
- pandas.Series.astype(“datetime64”)
df["next_date"] = df["next_date"].astype("datetime64")
df.info()
Output from both scripts above:
Alternatively, you can parse the columns into a DateTime object at the point you import the data with parse_dates
parameter.
import pandas as pd
df = pd.read_csv(r"demo.csv", usecols=["next_date"], parse_dates=["next_date"])
print(df.info())
df.head()
There is a lot of info that can be obtained from the datetime series.
- Timestamp
- Week
- Day of Year
- Quarter
- ISO Calendar
Extract the Timestamp Info
Below is the list of properties and methods to return timestamp info.
import datetime as dt
The datetime module needs to be imported before using the methods or properties under series.dt
. Below is an example of extracting the timestamp info.
df.insert(loc=1, column="Date_",value=df["next_date"].dt.date)
df.insert(loc=2, column="Time",value=df["next_date"].dt.time)
df.insert(loc=3, column="year",value=df["next_date"].dt.year)
df.insert(loc=4, column="month",value=df["next_date"].dt.month)
# note that the month_name is a method instead of properties
df.insert(loc=5, column="month_name",value=df["next_date"].dt.month_name())
df.insert(loc=6, column="day",value=df["next_date"].dt.day)
df.insert(loc=7, column="hour",value=df["next_date"].dt.hour)
df.insert(loc=8, column="minute",value=df["next_date"].dt.minute)
df.head()
Output:
One thing to take note of is the columns created are not a DateTime object, even the “Date_” column.
You may notice that the examples for second, microsecond and nanosecond are not demonstrated. That’s because they do not apply to the dataset. Furthermore, the way to apply is the same. The column type needs to be converted to datetime before using the properties or methods to return the respective value.
Extract Week Info/Day of Year/Quarter/ISO Calendar
Below is the list of properties and methods to return the week number, the day of the year, the quarter and the info based on the ISO Calendar for the DateTime series.
Week Info
Day of Year
Quarter
ISO Calendar
To show the fun stuff we can do with the methods/properties above, I create a list of dates with randomly picked dates distributed throughout the year as shown below.
import pandas as pd
import datetime as dt
date_list = ["2022-10-03", "2022-11-17", "2022-12-14", "2023-01-23", "2023-02-14", "2023-03-23", "2023-04-11", "2023-05-28", "2023-06-24", "2023-07-04", "2023-08-06", "2023-09-08"]
df = pd.DataFrame(date_list, columns=["Date"])
df["Date"] = pd.to_datetime(df["Date"])
df.head(12)
As we are not reading from a file, there is no parse_dates
function available. Hence, the column has to be converted into datetime manually.
Below is an example of extracting the week, day of the year, quarter and ISO Calendar. The columns are named based on the properties or methods names for better understanding.
df.insert(loc=1, column="Day of Week",value=df["Date"].dt.day_of_week)
df.insert(loc=2, column="Weekday",value=df["Date"].dt.weekday)
# note that the month_name is a method instead of properties
df.insert(loc=3, column="Day Name",value=df["Date"].dt.day_name())
# day of the year
df.insert(loc=4, column="Day of Year",value=df["Date"].dt.day_of_year)
# quarter
df.insert(loc=5, column="Quarter",value=df["Date"].dt.quarter)
# iso calendar
df.insert(loc=6, column="ISO Year",value=df["Date"].dt.isocalendar().year)
df.insert(loc=7, column="ISO Week",value=df["Date"].dt.isocalendar().week)
df.insert(loc=8, column="ISO Day",value=df["Date"].dt.isocalendar().day)
df[["Date", "Day of Week", "Weekday", "Day Name", "Day of Year", "Quarter", "ISO Year", "ISO Week", "ISO Day"]].head(12)
Output:
Below is a summary of the table above:
- For both
day_of_week
andweekday
properties, they return the day of the week with index counting starting from 0. - For
day_of_year
,quarter
properties and theisocalendar()
method, they return output with index counting starting from 1.
The isocalendar()
method index for counting the weekday is start from 1, while the weekday
starts from 0. They both count weekdays starting from Monday. In another word, the first index is referring to Monday.
For this section, a different date list will be created to better demonstrate the properties below.
Example:
date_list = ["2023-01-01", "2023-01-23", "2023-01-31", "2023-02-01", "2023-02-28", "2023-04-01", "2023-06-30", "2023-09-30", "2023-11-30", "2023-12-31"]
df = pd.DataFrame(date_list, columns=["Date"])
df["Date"] = pd.to_datetime(df["Date"])
df.insert(loc=1, column="Month Start",value=df["Date"].dt.is_month_start)
df.insert(loc=2, column="Month End",value=df["Date"].dt.is_month_end)
df.insert(loc=3, column="Quarter Start",value=df["Date"].dt.is_quarter_start)
df.insert(loc=4, column="Quarter End",value=df["Date"].dt.is_quarter_end)
df.insert(loc=5, column="Year Start",value=df["Date"].dt.is_year_start)
df.insert(loc=6, column="Year End",value=df["Date"].dt.is_year_end)
df.head(12)
Output:
Thoughts: I believe the properties are best for anyone that needs to prepare a new report on a monthly, quarterly or yearly basis.
These properties will help them to refresh their report based on the automation logic created. Other than that, the properties above may also be useful in the calculation that needs to be restarted periodically.
A leap year is a year, which has 366 days (instead of 365) including the 29th of February as an intercalary day. Leap years are years which are multiples of four except for the years divisible by 100 but not by 400.
We can demonstrate this function with the date created with the period range.
df = pd.DataFrame()
df["Year"] = pd.period_range(start="2022/1/1", periods=10, freq="Y")
df.insert(loc=1, column="Leap Year",value=df["Year"].dt.is_leap_year)
print(df.head(10))
Output:
Both of the properties below can return the number of days in a month.
df = pd.DataFrame()
df["Month"] = pd.period_range(start="2022/1/1", periods=12, freq="M")
df.insert(loc=1, column="Days in Month",value=df["Month"].dt.days_in_month)
df.head(12)
Output:
In conclusion, some basic properties and methods to deal with the DateTime series have been explained. The method to convert the datatype of columns that contained datetime objects to datetime has been shown. Then, the basic properties and methods to extract or return the datetime info have been demonstrated. The datetime info like weekday has different indexing for different methods.
Other than that, some methods to check the properties of the date, like the date is the start or end of a period, or whether the date belongs to a leap year have been shown. Lastly, the method to check the number of a date in a month have been presented. These methods and properties might be useful for reporting purposes.
That’s all for the basics of dealing with dates in Python. I hope you enjoy reading this article and hope it helps you to understand more about how to deal with the DateTime series in a DataFrame. Thanks! 😊
Subscribe on YouTube
I have explained the possible manipulation you can perform on the DateTime variables in Dealing with Dates in Python.
In Report Automation Tips with Python, I have explained some tips on report automation. Check it out!
Thank you and congrats for reading to the end 😊!
Happy 2023!