The Most Powerful Climate Data Remains Hidden | by Eric Broda | Feb, 2023
Climate data and models that power global sustainability decisions are far too often proprietary and hidden to us. Data marketplaces, data contracts, data licensing, and data pricing offer the incentives and practices for sharing this vital information.
There are several contributors and advisors that provided a huge helping hand to understand the current climate data landscape and shape this article: Ellie Young (Common Action), Philippe Höij (DFRNT.com), and Andrew Padilla (Datacequia.com).
Learning from others has been the hallmark of human progress. We are “standing on the shoulders of giants” when we build new inventions upon the work of those who came before us. So, when we see the wealth of public climate data, and how it is shared globally, we are standing on the metaphorical shoulders of climate data giants whom freely share climate data for the broader public good.
However, the simple and unfortunate fact is that much of the most valuable climate data is hidden from public view. Data that organizations have transformed and enriched. Refined data to enable better investment decisions. Data curated into proprietary models that power global sustainability and climate decisions made by organizations globally. The data that is “hidden” inside organizations.
Now, the term “hidden” suggest a nefarious intent. To be clear, this is not the case. Rather, the “hidden” nature is a consequence of the transformation, refinement, and curation that imbues tremendous value into public data. But it is not available. It is proprietary. It is closed. It is unavailable. Thus, it is hidden.
The practical issue, however, is that each organization performs near identical work to process, enrich, and refine this public data into something that is used to guide corporate sustainability and climate decisions. But this refined data and its associated models are rarely, if ever, shared.
In this article I will explore the root cause for this unfortunate situation. But, perhaps more importantly, I will also offering several potential solutions that make it easier to share data, to see what is behind the curtain, to see what was previously hidden.
Climate change is one of the most pressing issues of our time, and the availability of climate data is crucial for understanding the problem, identifying practical solutions, and determining the actual effects from policy change. Recognizing this, governments and private organizations alike collect and publish voluminous amounts of precipitation, wind, and temperature data to measure and understand their exposure to climate change.
And this data is crucially important — it is used to inform policy decisions, to create predictive models, and to assess the risks of climate change to various sectors of the economy.
But, today, we are faced a bounty of freely available public climate data. Organizations like NASA, NOAA, and many European and international organizations publish immense quantities of finely grained data about temperature, precipitation, and a myriad of sensor data. This data has been published for decades and continues to be published each day. Metaphorically, we are drinking from the climate data firehose!
However, most of this data is often still in raw format consisting of numerical measurements or text. And it comes in so many different formats and structures, with varying degrees of consistency and availability, that it is difficult for non-experts and experts alike to understand or use it.
“Applied” climate data, on the other hand, has been transformed, refined, and curated to extract meaningful insights and patterns. This typically involves combining multiple sets of raw public climate data, linking them to other public and private data, cleaning and organizing it, and then applying statistical or machine learning techniques to extract meaningful information.
Consider insurers. An insurer wants to make sure that insured assets prices reflect future climate risks. For example, they may use public weather, topological (flood plains, rivers), and demographic data to create algorithms and models that better understand the probability of flooding. And these models help determine whether to insure a property that exists in proximity to a flood plain.
Asset managers, for example, may use a variety of emissions, energy efficiency, and government data, both public and private, to understand the potential exposure of their assets to climate change to better align with their risk and return objectives.
And, with the advent of stricter climate regulations, most organizations are preparing to report on their own internal climate footprint. For example, they may aggregate emissions data from their core operations, subsidiaries, and global operating units to provide reporting to the public, and to regulators in their operating regions.
However, it is abundantly clear that many of these firms are gather similar data and create similar algorithms. This constant “re-inventing” of the climate data wheel happens for several reasons.
First, many companies are hesitant to share their data because they fear that their competitors will use the data to gain a competitive advantage, or that their data will be used to create new products or services that will directly compete with their own.
Second, there is a lack of legal and regulatory framework to support data sharing and data pricing. This makes it difficult for companies to establish clear rules and expectations around data sharing and data pricing, which further discourages companies from sharing their data.
Third, there may be a lack of provenance and lineage associated with the data, potentially creating a perception of untrustworthiness.
Last, perhaps the most challenging situation is that sharing data imposes a material latent and long term risk and obligation on the publisher of the data, especially when third parties use it in unexpected ways.
Simply put, decisions made with climate data have long time horizons and long-term consequences especially, when the data may be wrong; not intentionally wrong, but perhaps small errors when magnified over long periods cause large errors, or perhaps an algorithm was incorrectly tweaked. And any decision that has a long tail, can drive huge costs.
Consider, for example, the following scenario: an asset manager publishes data on probabilities of flooding for costal real estate. A property developer uses that data and builds on land that appeared, based upon the asset managers analysis, to have a relatively low probability of flooding. Several years later, properties have been built and, despite low probabilities, have unexpectedly flooded. What happens? Losses and legal suits.
But what if other organizations could benefit from sharing their private climate data to help others to better predict floods? What if organizations shared their climate models to help others make wiser climate investment decisions? What if it was easier to share private climate data and models, without the downstream risks?
There are some efforts being made to address these challenges. For example, some initiatives (such as OS-Climate) are being established to create common data standards, protocols, and platforms to make it easier to find and use climate data. Other organizations such as the United Nations as well as many forward-thinking governments are establishing regulatory guidelines and frameworks that encourage climate data sharing. And there are data marketplaces and data cooperatives being established to facilitate collaboration and create a fair data ecosystem with transparent data pricing.
Still more needs to be done. What type of present-day examples in other industries of use-cases are available that may teach us how to share data? Perhaps we can apply lessons learned from “creative commons” licensing to limit or manage risks in sharing climate data; Or perhaps open-source software licensing offers some new ideas? Or maybe there are examples of “data contracts” that can standardize climate data formats and structures?
Metaphorically, we should stand on the shoulders of data giants, and use present-day best practices to embolden and empower future climate data work. What lessons have we learned that can be applied — now — to make it easier and safer to share climate data much more broadly.
The Creative Commons organization provides a set of copyright licenses that enable creators to share their work while typically allowing third parties to use a creator’s content for non-commercial purposes or to make derivative works. But it also, crucially, lets creators retain a modicum of control over how their work is used.
Creative Commons is frequently used with creative works, such as music, art, writing, and software. The licenses are easy to understand and free to use, making it relatively easy for creators to share their work while also protecting their rights.
What lessons can we learn from the very popular Creative Commons (CC) licensing scheme? By using a CC license concept, data licensing should allow a data creator to specify the terms under which others can use the data, such as allowing others to use it for non-commercial purposes, to make derivative works, or to share it with others.
For example, a data scientist who has created a dataset can use a CC license to allow others to use the data to further analysis or research; A government agency can use a CC license to allow others to use its data for creating new products or services; A company may use a CC license to allow others to use its data for creating new innovations that can benefit the society.
Software licensing is a way for software creators to protect themselves from the risk of unintended use and unintended consequences of published software by clearly defining the terms and conditions under which it can be used. By creating a software license, creators can specify exactly how the software can be used, by whom, and for what purpose. This can help to prevent the software from being used in ways that the creator did not intend or that could potentially harm the creator or others.
Below are a few licensing techniques using in Open Source software licensing that protect software creators:
- Usage: Licenses can include restrictions on how the software can be used, such as prohibiting the software from being used for commercial purposes.
- Attribution: Licenses can require that any use of the software must be attributed to the creator, which helps protect the creator’s intellectual property.
- Liability: Licenses can include provisions that hold the user responsible for any damages that may result from the use of the software, which can help to protect the creator from legal action.
- Auditing: Licenses may include provisions that allow the creator to monitor and track how the software is being used, which can help to ensure that terms of the license are being adhered to.
There are several popular open-source licenses in use today, including, for example, the GNU General Public License (GPL), the MIT License, and the Apache License, each of which similar but distinct terms and conditions.
So, how can we apply the concepts in Open source software licenses to climate data sharing? By adapting sharing scheme modelled upon an open source software license, a climate data creator can specify:
- Usage: A climate data publisher can restrict usage or make it as flexible as desired; Raw climate data, for example, may be, or may continue to be, shared freely and openly with no restrictions. But sophisticated models and algorithms may require strict terms for their use.
- Attribution: A climate data publisher may require attribution of their data for several reasons: it may build eminence for the publisher which may lead to commercial opportunities, or it may serve as a vehicle to facilitate formal data origination and lineage.
- Liability: A climate data publisher may require provisions that provide protections from use or abuse by third parties. This would let data publishers mitigate potential risks due to data inaccuracies, errors, or lack of precision.
- Auditing: A climate data publisher, especially if their data is used commercially, may require data consumers to verify usage volumes to ensure appropriate billing can take place.
A data contract is an agreement that defines the terms and conditions under which data can be shared, used, and protected. In many cases it sets out the rights and responsibilities of the data provider, the data recipient, and any intermediaries. But a data contract also provides a definition of the format and structure of the data exchange.
Publishing the format and structure of data exchange has several benefits, including:
- Interoperability: A standardized format and structure allows different systems and applications to easily exchange information.
- Reusability: Data can be reused by many systems and applications, reducing the need for data duplication (and, hence, also increasing efficiency).
- Accessibility: Publishing the format and structure of data exchange makes it easier to understand and, hence, easier to share.
- Innovation: Having well understood data formats and structures fosters innovation by letting developers easily build new applications and services on top of existing data.
So, how can we use a contract modelled upon OpenAPI Specifications (OAS) to define how data can be exchanged? Data creators can use OAS-like data exchange specifications to gain several benefits:
- Data Specification documentation: Like OAS, data creators can create specifications that allows developers to automatically generate human-readable documentation for their API, which makes it easier for other developers to understand and use the API.
- Data Specification testing: Like OAS, data creators can automatically generate test cases for their data exchanges, which helps ensure that it is working as intended.
- Data Specification client generation: Like OAS, data creators can automatically generate client code for their API, which makes it easier to integrate data products with each other and with other applications.
- Data Specification Access Rights: Like OAS, data creators can embed security schema and scope that defines the permissions required to access a data product.
A data marketplace is a platform or network that makes it easy to find, understand, share, and trust data. It typically connects data providers, who publish data, with data consumers, who are looking for specific types of data to use. Data marketplaces help organizations access high-quality data quickly and easily, while also providing data providers with a new ways or publishing, and potentially monetizing, their data assets.
Consider the following scenario: Suppose a forward thinking non-profit (a non-profit dispells commercial motives) is establishing a “Climate Data Marketplace”, which serves as a registry of climate data. The non-profit’s Climate Data Marketplace serves a similar function to what DNS (domain naming services) does for the internet — the Climate Data Marketplace makes it easy to search for and find desired climate data by providing directory of pointers (URLs) to climate data in the wild (actually, it probably would also provide much more: an intuitive user interface, a curated hierarchy and knowledge graph of climate data categories, documentation, and data glossaries, but that is for another article).
Data Marketplace also offer lessons that can be applied to make data easier to share. Data marketplaces can be seen to address the current issues of data ownership and control, where individuals and communities have limited control over their data, and few benefits from its use. A Data Marketplace provides the umbrella structure to implement the previously mentioned lessons learned, including:
- User interfaces and supporting platforms, to make it easy to find and understand data.
- Licenses, to make it easy to share data by offering consistent, standard, and understandable terms and conditions that govern data sharing.
- Specifications, to make it easy to integrate data by defining the interactions mechanisms required to access data.
- Contracts, to make it easy to consume data by providing the metadata definitions and information structures and formats required by data scientists and developers to build their models and applications, respectively.
This article has discussed why much of the most valuable climate data is hidden from public view and use. I have also highlighted why this is so, while also offering several potential solutions, all based tweaking existing approaches to make them more palatable to share data.
I have also identified a potential root cause that keeps this powerful and refined data hidden is that the rights and needs of their creators is not adequately protected with current climate data sharing practices. Arts, software, ideas, data contracts and other areas offer plentiful suggestions for protecting logic, content, use, and other aspects of data products, and further provides suggestions as to codify collaboration protocols for data updates, contributions and improvements.
Our hope is that some of these ideas can be expanded upon and put into use to allow some of the refined, processed, and enriched data — and the algorithms behind them — to be put into the public domain or shared in a simpler and more responsible way. It is with this hope that the current effort expended in each and every organization — to re-invent, re-create, and duplicate data, algorithms, and models — be vectored more constructively and effectively to address our climate crisis.
***
All images in this document except where otherwise noted have been created by Eric Broda (the author of this article). All icons used in the images are stock PowerPoint icons and/or are free from copyrights.
The opinions expressed in this article are those of the author(s) alone and do not necessarily reflect the views of any clients.
Climate data and models that power global sustainability decisions are far too often proprietary and hidden to us. Data marketplaces, data contracts, data licensing, and data pricing offer the incentives and practices for sharing this vital information.
There are several contributors and advisors that provided a huge helping hand to understand the current climate data landscape and shape this article: Ellie Young (Common Action), Philippe Höij (DFRNT.com), and Andrew Padilla (Datacequia.com).
Learning from others has been the hallmark of human progress. We are “standing on the shoulders of giants” when we build new inventions upon the work of those who came before us. So, when we see the wealth of public climate data, and how it is shared globally, we are standing on the metaphorical shoulders of climate data giants whom freely share climate data for the broader public good.
However, the simple and unfortunate fact is that much of the most valuable climate data is hidden from public view. Data that organizations have transformed and enriched. Refined data to enable better investment decisions. Data curated into proprietary models that power global sustainability and climate decisions made by organizations globally. The data that is “hidden” inside organizations.
Now, the term “hidden” suggest a nefarious intent. To be clear, this is not the case. Rather, the “hidden” nature is a consequence of the transformation, refinement, and curation that imbues tremendous value into public data. But it is not available. It is proprietary. It is closed. It is unavailable. Thus, it is hidden.
The practical issue, however, is that each organization performs near identical work to process, enrich, and refine this public data into something that is used to guide corporate sustainability and climate decisions. But this refined data and its associated models are rarely, if ever, shared.
In this article I will explore the root cause for this unfortunate situation. But, perhaps more importantly, I will also offering several potential solutions that make it easier to share data, to see what is behind the curtain, to see what was previously hidden.
Climate change is one of the most pressing issues of our time, and the availability of climate data is crucial for understanding the problem, identifying practical solutions, and determining the actual effects from policy change. Recognizing this, governments and private organizations alike collect and publish voluminous amounts of precipitation, wind, and temperature data to measure and understand their exposure to climate change.
And this data is crucially important — it is used to inform policy decisions, to create predictive models, and to assess the risks of climate change to various sectors of the economy.
But, today, we are faced a bounty of freely available public climate data. Organizations like NASA, NOAA, and many European and international organizations publish immense quantities of finely grained data about temperature, precipitation, and a myriad of sensor data. This data has been published for decades and continues to be published each day. Metaphorically, we are drinking from the climate data firehose!
However, most of this data is often still in raw format consisting of numerical measurements or text. And it comes in so many different formats and structures, with varying degrees of consistency and availability, that it is difficult for non-experts and experts alike to understand or use it.
“Applied” climate data, on the other hand, has been transformed, refined, and curated to extract meaningful insights and patterns. This typically involves combining multiple sets of raw public climate data, linking them to other public and private data, cleaning and organizing it, and then applying statistical or machine learning techniques to extract meaningful information.
Consider insurers. An insurer wants to make sure that insured assets prices reflect future climate risks. For example, they may use public weather, topological (flood plains, rivers), and demographic data to create algorithms and models that better understand the probability of flooding. And these models help determine whether to insure a property that exists in proximity to a flood plain.
Asset managers, for example, may use a variety of emissions, energy efficiency, and government data, both public and private, to understand the potential exposure of their assets to climate change to better align with their risk and return objectives.
And, with the advent of stricter climate regulations, most organizations are preparing to report on their own internal climate footprint. For example, they may aggregate emissions data from their core operations, subsidiaries, and global operating units to provide reporting to the public, and to regulators in their operating regions.
However, it is abundantly clear that many of these firms are gather similar data and create similar algorithms. This constant “re-inventing” of the climate data wheel happens for several reasons.
First, many companies are hesitant to share their data because they fear that their competitors will use the data to gain a competitive advantage, or that their data will be used to create new products or services that will directly compete with their own.
Second, there is a lack of legal and regulatory framework to support data sharing and data pricing. This makes it difficult for companies to establish clear rules and expectations around data sharing and data pricing, which further discourages companies from sharing their data.
Third, there may be a lack of provenance and lineage associated with the data, potentially creating a perception of untrustworthiness.
Last, perhaps the most challenging situation is that sharing data imposes a material latent and long term risk and obligation on the publisher of the data, especially when third parties use it in unexpected ways.
Simply put, decisions made with climate data have long time horizons and long-term consequences especially, when the data may be wrong; not intentionally wrong, but perhaps small errors when magnified over long periods cause large errors, or perhaps an algorithm was incorrectly tweaked. And any decision that has a long tail, can drive huge costs.
Consider, for example, the following scenario: an asset manager publishes data on probabilities of flooding for costal real estate. A property developer uses that data and builds on land that appeared, based upon the asset managers analysis, to have a relatively low probability of flooding. Several years later, properties have been built and, despite low probabilities, have unexpectedly flooded. What happens? Losses and legal suits.
But what if other organizations could benefit from sharing their private climate data to help others to better predict floods? What if organizations shared their climate models to help others make wiser climate investment decisions? What if it was easier to share private climate data and models, without the downstream risks?
There are some efforts being made to address these challenges. For example, some initiatives (such as OS-Climate) are being established to create common data standards, protocols, and platforms to make it easier to find and use climate data. Other organizations such as the United Nations as well as many forward-thinking governments are establishing regulatory guidelines and frameworks that encourage climate data sharing. And there are data marketplaces and data cooperatives being established to facilitate collaboration and create a fair data ecosystem with transparent data pricing.
Still more needs to be done. What type of present-day examples in other industries of use-cases are available that may teach us how to share data? Perhaps we can apply lessons learned from “creative commons” licensing to limit or manage risks in sharing climate data; Or perhaps open-source software licensing offers some new ideas? Or maybe there are examples of “data contracts” that can standardize climate data formats and structures?
Metaphorically, we should stand on the shoulders of data giants, and use present-day best practices to embolden and empower future climate data work. What lessons have we learned that can be applied — now — to make it easier and safer to share climate data much more broadly.
The Creative Commons organization provides a set of copyright licenses that enable creators to share their work while typically allowing third parties to use a creator’s content for non-commercial purposes or to make derivative works. But it also, crucially, lets creators retain a modicum of control over how their work is used.
Creative Commons is frequently used with creative works, such as music, art, writing, and software. The licenses are easy to understand and free to use, making it relatively easy for creators to share their work while also protecting their rights.
What lessons can we learn from the very popular Creative Commons (CC) licensing scheme? By using a CC license concept, data licensing should allow a data creator to specify the terms under which others can use the data, such as allowing others to use it for non-commercial purposes, to make derivative works, or to share it with others.
For example, a data scientist who has created a dataset can use a CC license to allow others to use the data to further analysis or research; A government agency can use a CC license to allow others to use its data for creating new products or services; A company may use a CC license to allow others to use its data for creating new innovations that can benefit the society.
Software licensing is a way for software creators to protect themselves from the risk of unintended use and unintended consequences of published software by clearly defining the terms and conditions under which it can be used. By creating a software license, creators can specify exactly how the software can be used, by whom, and for what purpose. This can help to prevent the software from being used in ways that the creator did not intend or that could potentially harm the creator or others.
Below are a few licensing techniques using in Open Source software licensing that protect software creators:
- Usage: Licenses can include restrictions on how the software can be used, such as prohibiting the software from being used for commercial purposes.
- Attribution: Licenses can require that any use of the software must be attributed to the creator, which helps protect the creator’s intellectual property.
- Liability: Licenses can include provisions that hold the user responsible for any damages that may result from the use of the software, which can help to protect the creator from legal action.
- Auditing: Licenses may include provisions that allow the creator to monitor and track how the software is being used, which can help to ensure that terms of the license are being adhered to.
There are several popular open-source licenses in use today, including, for example, the GNU General Public License (GPL), the MIT License, and the Apache License, each of which similar but distinct terms and conditions.
So, how can we apply the concepts in Open source software licenses to climate data sharing? By adapting sharing scheme modelled upon an open source software license, a climate data creator can specify:
- Usage: A climate data publisher can restrict usage or make it as flexible as desired; Raw climate data, for example, may be, or may continue to be, shared freely and openly with no restrictions. But sophisticated models and algorithms may require strict terms for their use.
- Attribution: A climate data publisher may require attribution of their data for several reasons: it may build eminence for the publisher which may lead to commercial opportunities, or it may serve as a vehicle to facilitate formal data origination and lineage.
- Liability: A climate data publisher may require provisions that provide protections from use or abuse by third parties. This would let data publishers mitigate potential risks due to data inaccuracies, errors, or lack of precision.
- Auditing: A climate data publisher, especially if their data is used commercially, may require data consumers to verify usage volumes to ensure appropriate billing can take place.
A data contract is an agreement that defines the terms and conditions under which data can be shared, used, and protected. In many cases it sets out the rights and responsibilities of the data provider, the data recipient, and any intermediaries. But a data contract also provides a definition of the format and structure of the data exchange.
Publishing the format and structure of data exchange has several benefits, including:
- Interoperability: A standardized format and structure allows different systems and applications to easily exchange information.
- Reusability: Data can be reused by many systems and applications, reducing the need for data duplication (and, hence, also increasing efficiency).
- Accessibility: Publishing the format and structure of data exchange makes it easier to understand and, hence, easier to share.
- Innovation: Having well understood data formats and structures fosters innovation by letting developers easily build new applications and services on top of existing data.
So, how can we use a contract modelled upon OpenAPI Specifications (OAS) to define how data can be exchanged? Data creators can use OAS-like data exchange specifications to gain several benefits:
- Data Specification documentation: Like OAS, data creators can create specifications that allows developers to automatically generate human-readable documentation for their API, which makes it easier for other developers to understand and use the API.
- Data Specification testing: Like OAS, data creators can automatically generate test cases for their data exchanges, which helps ensure that it is working as intended.
- Data Specification client generation: Like OAS, data creators can automatically generate client code for their API, which makes it easier to integrate data products with each other and with other applications.
- Data Specification Access Rights: Like OAS, data creators can embed security schema and scope that defines the permissions required to access a data product.
A data marketplace is a platform or network that makes it easy to find, understand, share, and trust data. It typically connects data providers, who publish data, with data consumers, who are looking for specific types of data to use. Data marketplaces help organizations access high-quality data quickly and easily, while also providing data providers with a new ways or publishing, and potentially monetizing, their data assets.
Consider the following scenario: Suppose a forward thinking non-profit (a non-profit dispells commercial motives) is establishing a “Climate Data Marketplace”, which serves as a registry of climate data. The non-profit’s Climate Data Marketplace serves a similar function to what DNS (domain naming services) does for the internet — the Climate Data Marketplace makes it easy to search for and find desired climate data by providing directory of pointers (URLs) to climate data in the wild (actually, it probably would also provide much more: an intuitive user interface, a curated hierarchy and knowledge graph of climate data categories, documentation, and data glossaries, but that is for another article).
Data Marketplace also offer lessons that can be applied to make data easier to share. Data marketplaces can be seen to address the current issues of data ownership and control, where individuals and communities have limited control over their data, and few benefits from its use. A Data Marketplace provides the umbrella structure to implement the previously mentioned lessons learned, including:
- User interfaces and supporting platforms, to make it easy to find and understand data.
- Licenses, to make it easy to share data by offering consistent, standard, and understandable terms and conditions that govern data sharing.
- Specifications, to make it easy to integrate data by defining the interactions mechanisms required to access data.
- Contracts, to make it easy to consume data by providing the metadata definitions and information structures and formats required by data scientists and developers to build their models and applications, respectively.
This article has discussed why much of the most valuable climate data is hidden from public view and use. I have also highlighted why this is so, while also offering several potential solutions, all based tweaking existing approaches to make them more palatable to share data.
I have also identified a potential root cause that keeps this powerful and refined data hidden is that the rights and needs of their creators is not adequately protected with current climate data sharing practices. Arts, software, ideas, data contracts and other areas offer plentiful suggestions for protecting logic, content, use, and other aspects of data products, and further provides suggestions as to codify collaboration protocols for data updates, contributions and improvements.
Our hope is that some of these ideas can be expanded upon and put into use to allow some of the refined, processed, and enriched data — and the algorithms behind them — to be put into the public domain or shared in a simpler and more responsible way. It is with this hope that the current effort expended in each and every organization — to re-invent, re-create, and duplicate data, algorithms, and models — be vectored more constructively and effectively to address our climate crisis.
***
All images in this document except where otherwise noted have been created by Eric Broda (the author of this article). All icons used in the images are stock PowerPoint icons and/or are free from copyrights.
The opinions expressed in this article are those of the author(s) alone and do not necessarily reflect the views of any clients.