Dealing with Dates in Python’s DataFrame Part 2— The Basics | by KahEm Chu | Jan, 2023


Data Processing in Python

This article explained the basic pandas’ methods and properties for dealing with the DateTime series in a data frame

Photo by Lukas Blazek on Unsplash

As written in the title, this article is part 2 of my Dealing with Dates in Python’s DataFrame series. Below show the content of each part of the Dealing with Dates in Python’s DataFrame series.

Image from Author.

In my previous article, I have shown the DateTime series creation methods. Then, in this article, I will show the basic properties and methods to deal with the Datetime series in a data frame.

With that, this article will be structured as below:

  1. Convert Datatype to DateTime
  2. General DateTime Info Extraction
  3. Check if the Date is the Start or the End of the Period
  4. Check if the Date Belongs to a Leap Year
  5. Check the Number of Days in the Month

Let’s get started!

For the date creation methods I show in the previous article, the series is created as a DateTime object. When you are reading the data from Excel or any other data source, the DateTime column will be read as a string object if you did not parse them as dates. To extract the DateTime info from the DateTime series, the column needs to be converted into the DateTime data type beforehand.

There are two methods to convert the data type into DateTime.

  • pandas.Series.astype(“DateTime”)
  • pandas.datetime(pandas.SeriesSeries)

I created a demo.csv file with the script below to demonstrate the methods and properties for this section.

import pandas as pd
import datetime as dt
df = pd.DataFrame()
initial_date = pd.Timestamp("2023-01-04 21:55:00")
df["timedelta_range"] = pd.timedelta_range(start="1 day", end="5days", freq="4H")
df.insert(loc=0, column="initial_date",value=initial_date)
df.insert(loc=2, column="next_date",value=df["initial_date"] + df["timedelta_range"])
print(df.head(10))
print("Data Type: ", df.dtypes)
df.to_csv("demo.csv")

The output data frame is as in the screenshot below.

Image from Author.

Now, we will read the file generated.

import pandas as pd
df = pd.read_csv(r"demo.csv", usecols=["next_date"])
print(df.info())
df.head()

For demonstration purposes, only the next_date column will be read.

Image from Author.

As you can see, when the column is imported directly without parsing it as DateTime, the column will be a string column where the Dtype is the object. Below are two popular methods to convert the column into a DateTime data type.

  • pandas.to_datetime(pandas.Series)
df["next_date"] = pd.to_datetime(df["next_date"])
df.info()
  • pandas.Series.astype(“datetime64”)
df["next_date"] = df["next_date"].astype("datetime64")
df.info()

Output from both scripts above:

Image from Author.

Alternatively, you can parse the columns into a DateTime object at the point you import the data with parse_dates parameter.

import pandas as pd
df = pd.read_csv(r"demo.csv", usecols=["next_date"], parse_dates=["next_date"])
print(df.info())
df.head()
Image from Author.

There is a lot of info that can be obtained from the datetime series.

  • Timestamp
  • Week
  • Day of Year
  • Quarter
  • ISO Calendar

Extract the Timestamp Info

Below is the list of properties and methods to return timestamp info.

import datetime as dt

The datetime module needs to be imported before using the methods or properties under series.dt. Below is an example of extracting the timestamp info.

df.insert(loc=1, column="Date_",value=df["next_date"].dt.date)
df.insert(loc=2, column="Time",value=df["next_date"].dt.time)
df.insert(loc=3, column="year",value=df["next_date"].dt.year)
df.insert(loc=4, column="month",value=df["next_date"].dt.month)
# note that the month_name is a method instead of properties
df.insert(loc=5, column="month_name",value=df["next_date"].dt.month_name())
df.insert(loc=6, column="day",value=df["next_date"].dt.day)
df.insert(loc=7, column="hour",value=df["next_date"].dt.hour)
df.insert(loc=8, column="minute",value=df["next_date"].dt.minute)
df.head()

Output:

Image from Author.

One thing to take note of is the columns created are not a DateTime object, even the “Date_” column.

Image from Author.

You may notice that the examples for second, microsecond and nanosecond are not demonstrated. That’s because they do not apply to the dataset. Furthermore, the way to apply is the same. The column type needs to be converted to datetime before using the properties or methods to return the respective value.

Extract Week Info/Day of Year/Quarter/ISO Calendar

Below is the list of properties and methods to return the week number, the day of the year, the quarter and the info based on the ISO Calendar for the DateTime series.

Week Info

Day of Year

Quarter

ISO Calendar

To show the fun stuff we can do with the methods/properties above, I create a list of dates with randomly picked dates distributed throughout the year as shown below.

import pandas as pd
import datetime as dt
date_list = ["2022-10-03", "2022-11-17", "2022-12-14", "2023-01-23", "2023-02-14", "2023-03-23", "2023-04-11", "2023-05-28", "2023-06-24", "2023-07-04", "2023-08-06", "2023-09-08"]
df = pd.DataFrame(date_list, columns=["Date"])
df["Date"] = pd.to_datetime(df["Date"])
df.head(12)

As we are not reading from a file, there is no parse_dates function available. Hence, the column has to be converted into datetime manually.

Image by Author.

Below is an example of extracting the week, day of the year, quarter and ISO Calendar. The columns are named based on the properties or methods names for better understanding.

df.insert(loc=1, column="Day of Week",value=df["Date"].dt.day_of_week)
df.insert(loc=2, column="Weekday",value=df["Date"].dt.weekday)
# note that the month_name is a method instead of properties
df.insert(loc=3, column="Day Name",value=df["Date"].dt.day_name())
# day of the year
df.insert(loc=4, column="Day of Year",value=df["Date"].dt.day_of_year)
# quarter
df.insert(loc=5, column="Quarter",value=df["Date"].dt.quarter)
# iso calendar
df.insert(loc=6, column="ISO Year",value=df["Date"].dt.isocalendar().year)
df.insert(loc=7, column="ISO Week",value=df["Date"].dt.isocalendar().week)
df.insert(loc=8, column="ISO Day",value=df["Date"].dt.isocalendar().day)
df[["Date", "Day of Week", "Weekday", "Day Name", "Day of Year", "Quarter", "ISO Year", "ISO Week", "ISO Day"]].head(12)

Output:

Image by Author.

Below is a summary of the table above:

  1. For both day_of_week and weekday properties, they return the day of the week with index counting starting from 0.
  2. For day_of_year, quarter properties and the isocalendar() method, they return output with index counting starting from 1.

The isocalendar() method index for counting the weekday is start from 1, while the weekday starts from 0. They both count weekdays starting from Monday. In another word, the first index is referring to Monday.

For this section, a different date list will be created to better demonstrate the properties below.

Example:

date_list = ["2023-01-01", "2023-01-23", "2023-01-31", "2023-02-01", "2023-02-28", "2023-04-01", "2023-06-30", "2023-09-30", "2023-11-30", "2023-12-31"]
df = pd.DataFrame(date_list, columns=["Date"])
df["Date"] = pd.to_datetime(df["Date"])
df.insert(loc=1, column="Month Start",value=df["Date"].dt.is_month_start)
df.insert(loc=2, column="Month End",value=df["Date"].dt.is_month_end)
df.insert(loc=3, column="Quarter Start",value=df["Date"].dt.is_quarter_start)
df.insert(loc=4, column="Quarter End",value=df["Date"].dt.is_quarter_end)
df.insert(loc=5, column="Year Start",value=df["Date"].dt.is_year_start)
df.insert(loc=6, column="Year End",value=df["Date"].dt.is_year_end)
df.head(12)

Output:

Image by Author.

Thoughts: I believe the properties are best for anyone that needs to prepare a new report on a monthly, quarterly or yearly basis.

These properties will help them to refresh their report based on the automation logic created. Other than that, the properties above may also be useful in the calculation that needs to be restarted periodically.

A leap year is a year, which has 366 days (instead of 365) including the 29th of February as an intercalary day. Leap years are years which are multiples of four except for the years divisible by 100 but not by 400.

We can demonstrate this function with the date created with the period range.

df = pd.DataFrame()
df["Year"] = pd.period_range(start="2022/1/1", periods=10, freq="Y")
df.insert(loc=1, column="Leap Year",value=df["Year"].dt.is_leap_year)
print(df.head(10))

Output:

Image by Author.

Both of the properties below can return the number of days in a month.

df = pd.DataFrame()
df["Month"] = pd.period_range(start="2022/1/1", periods=12, freq="M")
df.insert(loc=1, column="Days in Month",value=df["Month"].dt.days_in_month)
df.head(12)

Output:

Image by Author.

In conclusion, some basic properties and methods to deal with the DateTime series have been explained. The method to convert the datatype of columns that contained datetime objects to datetime has been shown. Then, the basic properties and methods to extract or return the datetime info have been demonstrated. The datetime info like weekday has different indexing for different methods.

Other than that, some methods to check the properties of the date, like the date is the start or end of a period, or whether the date belongs to a leap year have been shown. Lastly, the method to check the number of a date in a month have been presented. These methods and properties might be useful for reporting purposes.

That’s all for the basics of dealing with dates in Python. I hope you enjoy reading this article and hope it helps you to understand more about how to deal with the DateTime series in a DataFrame. Thanks! 😊

Subscribe on YouTube

I have explained the possible manipulation you can perform on the DateTime variables in Dealing with Dates in Python.

In Report Automation Tips with Python, I have explained some tips on report automation. Check it out!

Thank you and congrats for reading to the end 😊!

Happy 2023!

Photo by Adnan Mistry on Unsplash


Data Processing in Python

This article explained the basic pandas’ methods and properties for dealing with the DateTime series in a data frame

Photo by Lukas Blazek on Unsplash

As written in the title, this article is part 2 of my Dealing with Dates in Python’s DataFrame series. Below show the content of each part of the Dealing with Dates in Python’s DataFrame series.

Image from Author.

In my previous article, I have shown the DateTime series creation methods. Then, in this article, I will show the basic properties and methods to deal with the Datetime series in a data frame.

With that, this article will be structured as below:

  1. Convert Datatype to DateTime
  2. General DateTime Info Extraction
  3. Check if the Date is the Start or the End of the Period
  4. Check if the Date Belongs to a Leap Year
  5. Check the Number of Days in the Month

Let’s get started!

For the date creation methods I show in the previous article, the series is created as a DateTime object. When you are reading the data from Excel or any other data source, the DateTime column will be read as a string object if you did not parse them as dates. To extract the DateTime info from the DateTime series, the column needs to be converted into the DateTime data type beforehand.

There are two methods to convert the data type into DateTime.

  • pandas.Series.astype(“DateTime”)
  • pandas.datetime(pandas.SeriesSeries)

I created a demo.csv file with the script below to demonstrate the methods and properties for this section.

import pandas as pd
import datetime as dt
df = pd.DataFrame()
initial_date = pd.Timestamp("2023-01-04 21:55:00")
df["timedelta_range"] = pd.timedelta_range(start="1 day", end="5days", freq="4H")
df.insert(loc=0, column="initial_date",value=initial_date)
df.insert(loc=2, column="next_date",value=df["initial_date"] + df["timedelta_range"])
print(df.head(10))
print("Data Type: ", df.dtypes)
df.to_csv("demo.csv")

The output data frame is as in the screenshot below.

Image from Author.

Now, we will read the file generated.

import pandas as pd
df = pd.read_csv(r"demo.csv", usecols=["next_date"])
print(df.info())
df.head()

For demonstration purposes, only the next_date column will be read.

Image from Author.

As you can see, when the column is imported directly without parsing it as DateTime, the column will be a string column where the Dtype is the object. Below are two popular methods to convert the column into a DateTime data type.

  • pandas.to_datetime(pandas.Series)
df["next_date"] = pd.to_datetime(df["next_date"])
df.info()
  • pandas.Series.astype(“datetime64”)
df["next_date"] = df["next_date"].astype("datetime64")
df.info()

Output from both scripts above:

Image from Author.

Alternatively, you can parse the columns into a DateTime object at the point you import the data with parse_dates parameter.

import pandas as pd
df = pd.read_csv(r"demo.csv", usecols=["next_date"], parse_dates=["next_date"])
print(df.info())
df.head()
Image from Author.

There is a lot of info that can be obtained from the datetime series.

  • Timestamp
  • Week
  • Day of Year
  • Quarter
  • ISO Calendar

Extract the Timestamp Info

Below is the list of properties and methods to return timestamp info.

import datetime as dt

The datetime module needs to be imported before using the methods or properties under series.dt. Below is an example of extracting the timestamp info.

df.insert(loc=1, column="Date_",value=df["next_date"].dt.date)
df.insert(loc=2, column="Time",value=df["next_date"].dt.time)
df.insert(loc=3, column="year",value=df["next_date"].dt.year)
df.insert(loc=4, column="month",value=df["next_date"].dt.month)
# note that the month_name is a method instead of properties
df.insert(loc=5, column="month_name",value=df["next_date"].dt.month_name())
df.insert(loc=6, column="day",value=df["next_date"].dt.day)
df.insert(loc=7, column="hour",value=df["next_date"].dt.hour)
df.insert(loc=8, column="minute",value=df["next_date"].dt.minute)
df.head()

Output:

Image from Author.

One thing to take note of is the columns created are not a DateTime object, even the “Date_” column.

Image from Author.

You may notice that the examples for second, microsecond and nanosecond are not demonstrated. That’s because they do not apply to the dataset. Furthermore, the way to apply is the same. The column type needs to be converted to datetime before using the properties or methods to return the respective value.

Extract Week Info/Day of Year/Quarter/ISO Calendar

Below is the list of properties and methods to return the week number, the day of the year, the quarter and the info based on the ISO Calendar for the DateTime series.

Week Info

Day of Year

Quarter

ISO Calendar

To show the fun stuff we can do with the methods/properties above, I create a list of dates with randomly picked dates distributed throughout the year as shown below.

import pandas as pd
import datetime as dt
date_list = ["2022-10-03", "2022-11-17", "2022-12-14", "2023-01-23", "2023-02-14", "2023-03-23", "2023-04-11", "2023-05-28", "2023-06-24", "2023-07-04", "2023-08-06", "2023-09-08"]
df = pd.DataFrame(date_list, columns=["Date"])
df["Date"] = pd.to_datetime(df["Date"])
df.head(12)

As we are not reading from a file, there is no parse_dates function available. Hence, the column has to be converted into datetime manually.

Image by Author.

Below is an example of extracting the week, day of the year, quarter and ISO Calendar. The columns are named based on the properties or methods names for better understanding.

df.insert(loc=1, column="Day of Week",value=df["Date"].dt.day_of_week)
df.insert(loc=2, column="Weekday",value=df["Date"].dt.weekday)
# note that the month_name is a method instead of properties
df.insert(loc=3, column="Day Name",value=df["Date"].dt.day_name())
# day of the year
df.insert(loc=4, column="Day of Year",value=df["Date"].dt.day_of_year)
# quarter
df.insert(loc=5, column="Quarter",value=df["Date"].dt.quarter)
# iso calendar
df.insert(loc=6, column="ISO Year",value=df["Date"].dt.isocalendar().year)
df.insert(loc=7, column="ISO Week",value=df["Date"].dt.isocalendar().week)
df.insert(loc=8, column="ISO Day",value=df["Date"].dt.isocalendar().day)
df[["Date", "Day of Week", "Weekday", "Day Name", "Day of Year", "Quarter", "ISO Year", "ISO Week", "ISO Day"]].head(12)

Output:

Image by Author.

Below is a summary of the table above:

  1. For both day_of_week and weekday properties, they return the day of the week with index counting starting from 0.
  2. For day_of_year, quarter properties and the isocalendar() method, they return output with index counting starting from 1.

The isocalendar() method index for counting the weekday is start from 1, while the weekday starts from 0. They both count weekdays starting from Monday. In another word, the first index is referring to Monday.

For this section, a different date list will be created to better demonstrate the properties below.

Example:

date_list = ["2023-01-01", "2023-01-23", "2023-01-31", "2023-02-01", "2023-02-28", "2023-04-01", "2023-06-30", "2023-09-30", "2023-11-30", "2023-12-31"]
df = pd.DataFrame(date_list, columns=["Date"])
df["Date"] = pd.to_datetime(df["Date"])
df.insert(loc=1, column="Month Start",value=df["Date"].dt.is_month_start)
df.insert(loc=2, column="Month End",value=df["Date"].dt.is_month_end)
df.insert(loc=3, column="Quarter Start",value=df["Date"].dt.is_quarter_start)
df.insert(loc=4, column="Quarter End",value=df["Date"].dt.is_quarter_end)
df.insert(loc=5, column="Year Start",value=df["Date"].dt.is_year_start)
df.insert(loc=6, column="Year End",value=df["Date"].dt.is_year_end)
df.head(12)

Output:

Image by Author.

Thoughts: I believe the properties are best for anyone that needs to prepare a new report on a monthly, quarterly or yearly basis.

These properties will help them to refresh their report based on the automation logic created. Other than that, the properties above may also be useful in the calculation that needs to be restarted periodically.

A leap year is a year, which has 366 days (instead of 365) including the 29th of February as an intercalary day. Leap years are years which are multiples of four except for the years divisible by 100 but not by 400.

We can demonstrate this function with the date created with the period range.

df = pd.DataFrame()
df["Year"] = pd.period_range(start="2022/1/1", periods=10, freq="Y")
df.insert(loc=1, column="Leap Year",value=df["Year"].dt.is_leap_year)
print(df.head(10))

Output:

Image by Author.

Both of the properties below can return the number of days in a month.

df = pd.DataFrame()
df["Month"] = pd.period_range(start="2022/1/1", periods=12, freq="M")
df.insert(loc=1, column="Days in Month",value=df["Month"].dt.days_in_month)
df.head(12)

Output:

Image by Author.

In conclusion, some basic properties and methods to deal with the DateTime series have been explained. The method to convert the datatype of columns that contained datetime objects to datetime has been shown. Then, the basic properties and methods to extract or return the datetime info have been demonstrated. The datetime info like weekday has different indexing for different methods.

Other than that, some methods to check the properties of the date, like the date is the start or end of a period, or whether the date belongs to a leap year have been shown. Lastly, the method to check the number of a date in a month have been presented. These methods and properties might be useful for reporting purposes.

That’s all for the basics of dealing with dates in Python. I hope you enjoy reading this article and hope it helps you to understand more about how to deal with the DateTime series in a DataFrame. Thanks! 😊

Subscribe on YouTube

I have explained the possible manipulation you can perform on the DateTime variables in Dealing with Dates in Python.

In Report Automation Tips with Python, I have explained some tips on report automation. Check it out!

Thank you and congrats for reading to the end 😊!

Happy 2023!

Photo by Adnan Mistry on Unsplash

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai Newsartificial intelligencebasicsChuDataframeDatesDealingJanKahEmlatest newsPartPythons
Comments (0)
Add Comment