Techno Blender
Digitally Yours.

Create Population Pyramids for Any Country with the US Census Bureau Data | by Randy Runtsch | Nov, 2022

0 37


Create dynamic population pyramids through the year 2100 by country, age, sex, and year with the Census Data API, Python, and Tableau Public

Crowd of people.
Crowd of people. On November 15, 2022, the United Nations reported that the population of the world had reached 8 billion.. Photo by San Fermin Pamplona: https://www.pexels.com/photo/bird-s-eye-view-of-group-of-people-1299086/

While there are many sources of population data, the United States Census Bureau keeps population estimate and projection data in its International Database for 227 countries and areas of the world with a population of 5,000 or more through the year 2100. This article will show you how to use Python to retrieve the data from the International Database with the Census Data API and convert it from a nonstandard JSON format to a CSV file. The file is then loaded into Tableau to visualize it as a population pyramid.

A population pyramid is a graph that shows the distribution of population for a country or area by gender and age group. The example population pyramid shown below for Japan in mid-2022 indicates that the country is not replacing its aging population.

Population pyramid for Japan in 2022.
Population pyramid for Japan in 2022. Graph created by Randy Runtsch with Tableau Public.

The Census Data Application Programming Interface (API) gives the public direct data access to raw statistical data that the US Census Bureau collects as part of its various programs. The Bureau describes each of its datasets in this table, which it calls the API Discovery Tool.

The Bureau claims that its Census Data API provides convenient access to its data. Like most data APIs, it can be called directly from a web browser, or from any of various programming languages, including Python and R.

While many US government data APIs return data in a standard JavaScript Object Notation (JSON) format, and sometimes in a comma-separated value (CSV) format, the Census Data API returns data in a nonstandard JSON format. This adds a twist since programmers and data engineers must reformat the retrieved data to make it readable and usable in tools such as Excel or Tableau. In the example shown below, for example, note that the returned data is wrapped in square brackets at the dataset and record levels.

Try pasting the following URL into a web browser and press enter. It will return the estimated mid-year population of France in 2015, by age, in one-year increments, and by gender, with combined population values for males and females. The SEX codes are 0 for the combined genders, 1 for males, and 2 for females. GENC is the two-character code of the country of interest. In this case, the GENC value is “FR” for France.

https://api.census.gov/data/timeseries/idb/1year?get=NAME,AGE,POP&GENC=FR&YR=2015&SEX=0

Data returned by URL query shown above. Screenshot by Randy Runtsch.

The Python program shown later in this article makes a similar query to the Census Data API for four countries. Then, it strips the leading and trailing square brackets (“[“ and “]”) from the returned dataset. Finally, it writes each record, including the first one, which provides column headers, to a CSV file. That file is then loaded into Tableau Public, which is used to create a population pyramid from the data.

You can use the Census Data API APIs to retrieve midyear population estimates and projections from the International Database for all countries and areas of the world with a population of 5,000 or more. You can retrieve these datasets:

  • timeseries > idb > 1year: This dataset includes population by age and sex for a single year, country or area, age, and sex.
  • timeseries > idb > 5year: This dataset includes population by age, in 5-year groups, and sex for a single year, country or age, and sex. It also includes fertility, mortality, and migration indicators.

This article will demonstrate the use of the timeseries > idb > 1year dataset to retrieve data and build population pyramids for four countries.

The Census Data API User Guide provides information about how to retrieve data with the API. It, along with information the bureau provides in the API Discovery Tool for each API, provides most of the instructions needed to retrieve data for any of the available datasets.

In addition to the documentation linked above, the Bureau’s International Database (IDB) demo page demonstrates a dynamic population pyramid using data from the timeseries > idb > 1yr or 5yr datasets. The demo page will be useful to compare against the population pyramids you create to confirm their accuracy.

Population pyramid for Japan as shown on the US Census Bureau International Database demo page.
Population pyramid for Japan as shown on the US Census Bureau International Database demo page. Screenshot by Randy Runtsch.

The International Database can be queried by sex and country or area. Queries also return these data points in each record.

Following are the sex codes:

0 = Both
1 = Male
2 = Female

Two-character country codes are defined as “Alpha-2 codes” in the ISO 3166 Country Codes standard. For example, the Alpha-2 code for the United States is “US” and the code for France is “FR.” The complete list of country names and codes is documented on this page on the ISO Online Browsing Platform. But you might find this ISO 3166–1 alpha-2 Wikipedia page more convenient to use.

The base URL for the Census Data API International Database (IDB) is http://api.census.gov/data/timeseries/idb/1year. Calling this URL in a web browser returns a JSON structure that describes the query.

The following URL will return data for Japan (GENC code “JP”) in 2022 by age for females (SEX code 2):

https://api.census.gov/data/timeseries/idb/1year?get=NAME,AGE,POP&GENC=JP&YR=2022&SEX=2

In addition to displaying NAME (country or area), AGE, and POP (population) columns, the returned dataset will display the query values of the GENC, YR (year), and SEX columns. Sample return values are shown below.

Data returned by URL query shown above. Screenshot by Randy Runtsch.

According to the Census Data API User Guide, users can submit up to 500 API queries per day per IP address without registering a key. Users who need to make more than 500 queries per day will need to obtain an API key and append the key to each of their queries. To request a key, click on “Request a Key” on the Developers page.

Request a Census Data API. Image from Census Data web page.

After you register for your key, you will receive it by email. Then, you can append the key to your queries as shown in the example below:

https://api.census.gov/data/timeseries/idb/1year?get=NAME,AGE,POP&GENC=CN&YR=2022&SEX=1&key=YOUR_KEY_GOES_HERE

For the project presented in the following sections, I used Microsoft Visual Studio Community 2022 for Python programming and Tableau Public Desktop and the Tableau Public website for data visualization. For Python programming, feel free to use whatever editor or integrated development environment (IDE) you prefer.

Visual Studio Community and Tableau Public are free tools that you can install from these locations:

Please note that while the commercial version of Tableau will allow you to save data visualization workbooks to a local drive or server, in Tableau Public, all visualization workbooks may be saved only to the Tableau Public server. Plus, the visualizations will be visible to the general public.

Now that you have a basic understanding of the Census Data API and its International Database let’s review the Python code that retrieves population estimate data for four countries, by age and sex, for 2022. The program includes two modules, the class c_country_pop.y and the module that calls the class, which is called get_population_estimates.py. The following pseudocode describes each module, and the code is shown in a subsequent section.

In summary, the class c_country_pop queries the International Database using the Census Data API, reformats its nonstandard JSON records to a CSV format, and writes each record to an output file. This pseudocode describes what it does:

Instantiate (function __init__()) c_country_pop with these arguments:

  • out_file_name: The file that population data will be written to in a CSV format.
  • country_code: The two-character code of the country to obtain data for.
  • year: The four-digit year to retrieve data for.
  • sex_code: The sex to retrieve data for, where 0 = both, 1 = male, and 2 = female.
  • write_type: ‘w’ to write returned records to a new file and ‘a’ to append records to an existing output file. The purpose of the write_type will be explained below.
  • api_key: Your personal Census Data API key.

Call the get_data() function to perform these tasks:

  • Build the Census Data API query URL.
  • Call the API with the URL.
  • Convert the returned data from binary format to a string.

Call the write_data_to_csv() function to perform these tasks:

  • Open the output file with the specified write_type value.
  • Loop through the records returned from the get_data() function.
  • Write the first record, which contains column headers, but only if the write_type is ‘w.’ Do not write the first column header record if the write_type is ‘a,’ because this is the second or later dataset that the file will contain.
  • Strip any leading and trailing square brackets and commas from the record string.
  • Write the record string to the output file.

The module get_population_estimate.py, which I call the driver, simply calls an instance of the c_country_pop class for each country of interest. For the first country it calls, it includes a write_type value of “w,” which instructs c_country_pop to create a new output file to write CSV records to and to write a column header as its first record. Here is the pseudocode for this module:

  1. Get male (sex code 1) records for China (country code ‘CN’) for 2022. Instruct c_country_pop to create a new output file (file name ‘c:/population_data/pop_2022.csv’ and file type ‘w’) and write a column header as its first row.
  2. Get female (sex code 2) records for China for 2022. Use the same output file name with a write type of ‘a.’ This will instruct c_country_pop to open the file created in step 1 and to append its records, excluding the header column record, to the file.
  3. Repeat the steps above for Japan (country code ‘JP’), Norway (country code ‘NO’), and the United States of America (country code ‘US’). In all cases, use the same file name specified in step 1 above. However, use the write type of ‘a’ to append the records to the file created in step 1.

After you successfully run the program, examine the output file. When opened in Excel, it should look like the example shown below.

Sample population data for China is shown in Excel. Screenshot by Randy Runtsch.

Following is the code for the get_population_estimates.py and c_country_pop.py Python modules.

Code described in this article. The code was written by Randy Runtsch.

This section won’t provide detailed instructions to create the population pyramid in Tableau. To create a population pyramid in Tableau, see these instructions in Tableau Help for details.

The version of the population pyramid I built in Tableau Public uses the data from the CSV file created by the Python program described above. It allows users to switch between countries by clicking on a Country radio button. You can see the live version of the visualization here. You can also download the Tableau workbook and modify it for your needs if you desire.

This article provided information about the Census Data API and its International Database (IDB). It also presented a Python program to retrieve data from IDB and write it to a CSV file. Finally, it displayed a population pyramid that uses the data.

With topics such as population growth and climate change at the forefront, surely data obtained with the Census Data API is of interest to data analysts and data scientists worldwide. I hope that this article provided information useful to your projects that use population data.


Create dynamic population pyramids through the year 2100 by country, age, sex, and year with the Census Data API, Python, and Tableau Public

Crowd of people.
Crowd of people. On November 15, 2022, the United Nations reported that the population of the world had reached 8 billion.. Photo by San Fermin Pamplona: https://www.pexels.com/photo/bird-s-eye-view-of-group-of-people-1299086/

While there are many sources of population data, the United States Census Bureau keeps population estimate and projection data in its International Database for 227 countries and areas of the world with a population of 5,000 or more through the year 2100. This article will show you how to use Python to retrieve the data from the International Database with the Census Data API and convert it from a nonstandard JSON format to a CSV file. The file is then loaded into Tableau to visualize it as a population pyramid.

A population pyramid is a graph that shows the distribution of population for a country or area by gender and age group. The example population pyramid shown below for Japan in mid-2022 indicates that the country is not replacing its aging population.

Population pyramid for Japan in 2022.
Population pyramid for Japan in 2022. Graph created by Randy Runtsch with Tableau Public.

The Census Data Application Programming Interface (API) gives the public direct data access to raw statistical data that the US Census Bureau collects as part of its various programs. The Bureau describes each of its datasets in this table, which it calls the API Discovery Tool.

The Bureau claims that its Census Data API provides convenient access to its data. Like most data APIs, it can be called directly from a web browser, or from any of various programming languages, including Python and R.

While many US government data APIs return data in a standard JavaScript Object Notation (JSON) format, and sometimes in a comma-separated value (CSV) format, the Census Data API returns data in a nonstandard JSON format. This adds a twist since programmers and data engineers must reformat the retrieved data to make it readable and usable in tools such as Excel or Tableau. In the example shown below, for example, note that the returned data is wrapped in square brackets at the dataset and record levels.

Try pasting the following URL into a web browser and press enter. It will return the estimated mid-year population of France in 2015, by age, in one-year increments, and by gender, with combined population values for males and females. The SEX codes are 0 for the combined genders, 1 for males, and 2 for females. GENC is the two-character code of the country of interest. In this case, the GENC value is “FR” for France.

https://api.census.gov/data/timeseries/idb/1year?get=NAME,AGE,POP&GENC=FR&YR=2015&SEX=0

Data returned by URL query shown above. Screenshot by Randy Runtsch.

The Python program shown later in this article makes a similar query to the Census Data API for four countries. Then, it strips the leading and trailing square brackets (“[“ and “]”) from the returned dataset. Finally, it writes each record, including the first one, which provides column headers, to a CSV file. That file is then loaded into Tableau Public, which is used to create a population pyramid from the data.

You can use the Census Data API APIs to retrieve midyear population estimates and projections from the International Database for all countries and areas of the world with a population of 5,000 or more. You can retrieve these datasets:

  • timeseries > idb > 1year: This dataset includes population by age and sex for a single year, country or area, age, and sex.
  • timeseries > idb > 5year: This dataset includes population by age, in 5-year groups, and sex for a single year, country or age, and sex. It also includes fertility, mortality, and migration indicators.

This article will demonstrate the use of the timeseries > idb > 1year dataset to retrieve data and build population pyramids for four countries.

The Census Data API User Guide provides information about how to retrieve data with the API. It, along with information the bureau provides in the API Discovery Tool for each API, provides most of the instructions needed to retrieve data for any of the available datasets.

In addition to the documentation linked above, the Bureau’s International Database (IDB) demo page demonstrates a dynamic population pyramid using data from the timeseries > idb > 1yr or 5yr datasets. The demo page will be useful to compare against the population pyramids you create to confirm their accuracy.

Population pyramid for Japan as shown on the US Census Bureau International Database demo page.
Population pyramid for Japan as shown on the US Census Bureau International Database demo page. Screenshot by Randy Runtsch.

The International Database can be queried by sex and country or area. Queries also return these data points in each record.

Following are the sex codes:

0 = Both
1 = Male
2 = Female

Two-character country codes are defined as “Alpha-2 codes” in the ISO 3166 Country Codes standard. For example, the Alpha-2 code for the United States is “US” and the code for France is “FR.” The complete list of country names and codes is documented on this page on the ISO Online Browsing Platform. But you might find this ISO 3166–1 alpha-2 Wikipedia page more convenient to use.

The base URL for the Census Data API International Database (IDB) is http://api.census.gov/data/timeseries/idb/1year. Calling this URL in a web browser returns a JSON structure that describes the query.

The following URL will return data for Japan (GENC code “JP”) in 2022 by age for females (SEX code 2):

https://api.census.gov/data/timeseries/idb/1year?get=NAME,AGE,POP&GENC=JP&YR=2022&SEX=2

In addition to displaying NAME (country or area), AGE, and POP (population) columns, the returned dataset will display the query values of the GENC, YR (year), and SEX columns. Sample return values are shown below.

Data returned by URL query shown above. Screenshot by Randy Runtsch.

According to the Census Data API User Guide, users can submit up to 500 API queries per day per IP address without registering a key. Users who need to make more than 500 queries per day will need to obtain an API key and append the key to each of their queries. To request a key, click on “Request a Key” on the Developers page.

Request a Census Data API. Image from Census Data web page.

After you register for your key, you will receive it by email. Then, you can append the key to your queries as shown in the example below:

https://api.census.gov/data/timeseries/idb/1year?get=NAME,AGE,POP&GENC=CN&YR=2022&SEX=1&key=YOUR_KEY_GOES_HERE

For the project presented in the following sections, I used Microsoft Visual Studio Community 2022 for Python programming and Tableau Public Desktop and the Tableau Public website for data visualization. For Python programming, feel free to use whatever editor or integrated development environment (IDE) you prefer.

Visual Studio Community and Tableau Public are free tools that you can install from these locations:

Please note that while the commercial version of Tableau will allow you to save data visualization workbooks to a local drive or server, in Tableau Public, all visualization workbooks may be saved only to the Tableau Public server. Plus, the visualizations will be visible to the general public.

Now that you have a basic understanding of the Census Data API and its International Database let’s review the Python code that retrieves population estimate data for four countries, by age and sex, for 2022. The program includes two modules, the class c_country_pop.y and the module that calls the class, which is called get_population_estimates.py. The following pseudocode describes each module, and the code is shown in a subsequent section.

In summary, the class c_country_pop queries the International Database using the Census Data API, reformats its nonstandard JSON records to a CSV format, and writes each record to an output file. This pseudocode describes what it does:

Instantiate (function __init__()) c_country_pop with these arguments:

  • out_file_name: The file that population data will be written to in a CSV format.
  • country_code: The two-character code of the country to obtain data for.
  • year: The four-digit year to retrieve data for.
  • sex_code: The sex to retrieve data for, where 0 = both, 1 = male, and 2 = female.
  • write_type: ‘w’ to write returned records to a new file and ‘a’ to append records to an existing output file. The purpose of the write_type will be explained below.
  • api_key: Your personal Census Data API key.

Call the get_data() function to perform these tasks:

  • Build the Census Data API query URL.
  • Call the API with the URL.
  • Convert the returned data from binary format to a string.

Call the write_data_to_csv() function to perform these tasks:

  • Open the output file with the specified write_type value.
  • Loop through the records returned from the get_data() function.
  • Write the first record, which contains column headers, but only if the write_type is ‘w.’ Do not write the first column header record if the write_type is ‘a,’ because this is the second or later dataset that the file will contain.
  • Strip any leading and trailing square brackets and commas from the record string.
  • Write the record string to the output file.

The module get_population_estimate.py, which I call the driver, simply calls an instance of the c_country_pop class for each country of interest. For the first country it calls, it includes a write_type value of “w,” which instructs c_country_pop to create a new output file to write CSV records to and to write a column header as its first record. Here is the pseudocode for this module:

  1. Get male (sex code 1) records for China (country code ‘CN’) for 2022. Instruct c_country_pop to create a new output file (file name ‘c:/population_data/pop_2022.csv’ and file type ‘w’) and write a column header as its first row.
  2. Get female (sex code 2) records for China for 2022. Use the same output file name with a write type of ‘a.’ This will instruct c_country_pop to open the file created in step 1 and to append its records, excluding the header column record, to the file.
  3. Repeat the steps above for Japan (country code ‘JP’), Norway (country code ‘NO’), and the United States of America (country code ‘US’). In all cases, use the same file name specified in step 1 above. However, use the write type of ‘a’ to append the records to the file created in step 1.

After you successfully run the program, examine the output file. When opened in Excel, it should look like the example shown below.

Sample population data for China is shown in Excel. Screenshot by Randy Runtsch.

Following is the code for the get_population_estimates.py and c_country_pop.py Python modules.

Code described in this article. The code was written by Randy Runtsch.

This section won’t provide detailed instructions to create the population pyramid in Tableau. To create a population pyramid in Tableau, see these instructions in Tableau Help for details.

The version of the population pyramid I built in Tableau Public uses the data from the CSV file created by the Python program described above. It allows users to switch between countries by clicking on a Country radio button. You can see the live version of the visualization here. You can also download the Tableau workbook and modify it for your needs if you desire.

This article provided information about the Census Data API and its International Database (IDB). It also presented a Python program to retrieve data from IDB and write it to a CSV file. Finally, it displayed a population pyramid that uses the data.

With topics such as population growth and climate change at the forefront, surely data obtained with the Census Data API is of interest to data analysts and data scientists worldwide. I hope that this article provided information useful to your projects that use population data.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment