A Maturity Model for Data Modeling and Design | by Willem Koenders | Apr, 2023


Photo by Clint Adair on Unsplash

As someone who has worked with so-called master modelers across various leading financial institutions throughout the Americas, I have had the good fortune to witness how data modeling can help standardize an organization’s way of defining and describing its data. I’ve seen firsthand how a sound data model accelerated the implementation of a data catalog and the launch of data quality programs, and how a lack of one caused similar programs to self-destruct.

Drawing from this experience, this point of view explores how organizations can grow their data modeling capability by focusing on three key sub-capabilities: metamodeling, conceptual and logical data modeling, and physical data modeling. For each sub-capability, it will describe what high maturity looks like, and it will close with overall best practices and success factors.

Sub-capabilities of data modeling

Let’s start with the foundational components of data modeling:

  1. Metamodeling — Metamodeling is a sub-capability of data modeling that involves creating a model that defines the structure, concepts, and relationships of other models. A metamodel provides a standardized way of defining and describing models and their components, which helps ensure consistency and clarity in the development and use of these models. A metamodel stretches across conceptual, logical, and physical models and defines how the respective components are consistently linked together.
  2. Conceptual and Logical Data Modeling — Conceptual and logical data modeling are sub-capabilities of data modeling that involve creating business-oriented views of data that capture the major entities, relationships, and attributes involved in particular domains such as Customers, Employees, and Products. Logical data modeling involves refining the conceptual model by adding more detail, such as specifying data types, keys, and relationships between entities, and by breaking conceptual domains out into logical attributes, such as Customer Name, Employee Name, and Product SKU.
A conceptual model explains at the highest level what respective domains or concepts are, and how they are related. In this example, you can see that Transactions form a key concept, where each transaction can be linked to the Products that were sold, the Customer that bought them, the method of Payment, and the Store the purchase was made in — each of which constitute their own concept. Moreover, the connectors show that each individual transaction can have at most one customer, store, or employee associated with it, but these in turn can be associated with many transactions — which of course makes sense.
In a logical model, the concepts (or domains) are broken down into logical attributes. Such a logical model is in line with an underlying metamodel and can now be translated into a physical model, where details are added on where exactly (e.g., in what table), and in what format, these data attributes exist.
  1. Physical Data Modeling — Physical data modeling involves translating the logical data model into specific database schemas that can be implemented on a particular technology platform. This includes defining tables, columns, indexes, and other database objects, as well as specifying data types, constraints, and other details necessary for implementing the model.
In a physical model, further technical details are added to the logical model, for example to clarify table names, column names, and data types.

What good looks like

A high maturity in data modeling requires you to think through several dimensions:

  1. Strategy — The organization’s overall data modeling strategy, including the alignment of data modeling efforts with business goals and objectives.
  2. People/Talent — The articulation of specific roles and their responsibilities, as well as required expertise, skills, and training.
  3. Processes — The processes and workflows, including the documentation of data modeling methodologies, the development of data modeling templates and standards, and the establishment of quality control and review processes.
  4. Technology — The tools required to support data modeling efforts such data modeling software and tools, and the integration of data modeling tools with other systems and applications.
  5. Adoption — The adoption and usage of data modeling practices within and across the organization. This may include socialization programs, the resolution of barriers to adoption, and the tracking metrics to measure the effectiveness and impact of data modeling efforts.

Let’s apply this to the three sub-capabilities we defined.

Metamodel

  • A high maturity for metamodeling involves having a well-defined and widely adopted metamodel that is used consistently across the organization. To confirm — you need one, and only one. Without a metamodel in place (let’s say, the basic grammar and vocabulary of a universal language), all sorts of data models (dialects or even entirely different languages) may sprout up, causing interpretability and interoperability problems across domains, processes, and systems.
  • You need people who understand the metamodel, and who can maintain and explain it. It is recommended to have a single person with ultimate authority over the metamodel. He or she can take in feedback and collect change requests to ensure it is and stays fit-for-purpose.
  • A defined process should describe how the metamodel is to be used in modeling activities. Its adoption should be widespread across data modelers, data architects, and data engineers — it should actually make their work easier as they have a clear basis to start from.
  • Tools such as a data catalogue or metadata management system can provide a centralized repository for storing and managing the metamodel and can support collaboration and version control, but this is not strictly necessary. Some experts might disagree with me, but many perfectly fit-for-purpose metamodels live in tools like Microsoft Excel, PowerPoint, or Visio, and depending on the size and complexity of your organization, that can be appropriate as long as it is strictly adhered to by your conceptual and logical data modelers.

Conceptual and logical data modeling

  • A high maturity involves having a well-defined and consistent approach to conceptual and logical data modeling, with strict adherence to the metamodel. Conceptual domains should have an owner and there should be a structured process to create new logical attributes, and to then have them reviewed, approved, and published.
  • Modeling domains and logical attributes is a skill — and a rare one at that. It requires both a core data modeling expertise and the ability to project it onto a real-life business domain to describe it in logical attributes that make sense to the business and technology organization alike. Don’t underestimate how hard this is — mature organizations often have a “master modeler” in place that guards and socializes best practices (Paul Carey, thinking about you!).
  • The conceptual data model should be (explicitly) aligned to the enterprise or business architecture, and align with the data domains where data ownership is assigned.
  • Data modeling software can be used to create and maintain conceptual and logical data models at scale. These tools can provide a visual representation of the models and can support collaboration, version control, and integration with other systems such as a data catalogue or metadata management system. A business glossary can be used to define and standardize the business concepts and terms that are used in the models.

Physical data modeling

  • A high maturity level for physical data modeling would involve generally having well-designed and efficient database schemas in place that meet applicable performance and scalability requirements. This requires having people who can design and implement the schemas, well-defined processes for schema development and maintenance, and appropriate technology tools to support schema design and implementation.
  • Database design software can be used to create and maintain physical data models. These tools can generate database schemas from the logical data model and can support collaboration, version control, and integration with other systems such as a data catalogue or metadata management system. A data dictionary can also be used to define and standardize the technical details such as data types, constraints, and other database objects.

Higher-level best practices and success factors

To enhance their data modeling capability, organizations can follow some best practices and success factors:

  1. Get the metamodel right first. The metamodel drives reusability and consistency across the entire enterprise. It makes sure that all subsequent modeling efforts incrementally build out the overall model. If you don’t have one in place, you’re up for a gargantuan task of aligning and bridging existing, incompatible models in the future.
  2. Consider prebaked industry or universal models. Depending on where you are in your journey, you can consider adopting a preexisting data model. This can drive alignment with international best practices and standards, save you time and effort to build a model entirely from scratch, and enable efficient and reliable data exchanges with external parties. For example, BIAN provides a standardized banking services reference model that defines a common language, taxonomy, and business process framework for the banking industry.
  3. Iterate between conceptual, logical, and physical. Data modeling takes time — the job will never be done. It is recommended to prioritize domains — reference domains like Customers and Products are good candidates—and start with 1 or 2, where you first complete the logical model and then guidelines for the physical model, before you move on to the next domain.
  4. Don’t overdo the physical. Data modeling can be complex, time-consuming, and therefore expensive. Completing a basic conceptual and logical model is almost always definitely worth the effort, but once venturing into the physical domain, you may not need to centrally direct and capture all of the physical models. You may want to prioritize here as well — for example, identify “mission critical” systems and document physical models for those, but for other ones, it may be sufficient to ensure that local application owners abide by specific modeling norms and standards.
  5. Strategically implement technology. They can be expensive, and you might not need them for the first domain, but eventually your data models will grow exponentially in terms of their size and complexity. Consider a data catalogue, business glossary, and data dictionary, or something that can serve as all of these. Without it, consumption (and hence value creation) will be poor.

A final recommendation, to find and empower a master modeler, deserves its own section.

The role of the master modeler

Photo by Usman Yousaf on Unsplash.

A “master modeler” is an expert in the field of data modeling, who possesses an in-depth understanding of modeling techniques and tools, who can define best practices, and who critically has the business savviness to not let “the perfect” distract from the “the good.” Someone like that can take ownership of and navigate the above-referenced five other recommendations.

In addition, master modelers can provide guidance and support to all logical and physical data modelers in the organization. This can be done by creating reusable artefacts like the aforementioned metamodel, but also basic data modeling blueprints and instruction manuals. Master modelers can also safeguard and socialize best practices by defining standards and guidelines and facilitating trainings. By staying up-to-date with the latest trends and advancements, they can introduce new techniques and tools to the organization that drive continuous improvement. They can also advise on whether to leverage existing, premade data models instead of building them from scratch.

Data modeling is a critical input into any effort to design or optimize solutions, products, and services. Business teams can benefit from the support provided by a master modeler as it accelerates their efforts and as it will ultimately improve the overall quality and consistency of the organization’s data models.

Next steps

Data modeling is a critical capability that helps organizations ensure that their data is well-designed, consistent, and effective in supporting business needs. By focusing on sub-capabilities such as metamodeling, conceptual and logical data modeling, and physical data modeling, organizations can grow their maturity, and enable business users and data scientists to consistently find the right data for their respective use cases. If all of this sounds like too much to take on all at once, your first step might be to find a master modeler.

References and recommendations for further reading

All images unless otherwise noted are by the author.


Photo by Clint Adair on Unsplash

As someone who has worked with so-called master modelers across various leading financial institutions throughout the Americas, I have had the good fortune to witness how data modeling can help standardize an organization’s way of defining and describing its data. I’ve seen firsthand how a sound data model accelerated the implementation of a data catalog and the launch of data quality programs, and how a lack of one caused similar programs to self-destruct.

Drawing from this experience, this point of view explores how organizations can grow their data modeling capability by focusing on three key sub-capabilities: metamodeling, conceptual and logical data modeling, and physical data modeling. For each sub-capability, it will describe what high maturity looks like, and it will close with overall best practices and success factors.

Sub-capabilities of data modeling

Let’s start with the foundational components of data modeling:

  1. Metamodeling — Metamodeling is a sub-capability of data modeling that involves creating a model that defines the structure, concepts, and relationships of other models. A metamodel provides a standardized way of defining and describing models and their components, which helps ensure consistency and clarity in the development and use of these models. A metamodel stretches across conceptual, logical, and physical models and defines how the respective components are consistently linked together.
  2. Conceptual and Logical Data Modeling — Conceptual and logical data modeling are sub-capabilities of data modeling that involve creating business-oriented views of data that capture the major entities, relationships, and attributes involved in particular domains such as Customers, Employees, and Products. Logical data modeling involves refining the conceptual model by adding more detail, such as specifying data types, keys, and relationships between entities, and by breaking conceptual domains out into logical attributes, such as Customer Name, Employee Name, and Product SKU.
A conceptual model explains at the highest level what respective domains or concepts are, and how they are related. In this example, you can see that Transactions form a key concept, where each transaction can be linked to the Products that were sold, the Customer that bought them, the method of Payment, and the Store the purchase was made in — each of which constitute their own concept. Moreover, the connectors show that each individual transaction can have at most one customer, store, or employee associated with it, but these in turn can be associated with many transactions — which of course makes sense.
In a logical model, the concepts (or domains) are broken down into logical attributes. Such a logical model is in line with an underlying metamodel and can now be translated into a physical model, where details are added on where exactly (e.g., in what table), and in what format, these data attributes exist.
  1. Physical Data Modeling — Physical data modeling involves translating the logical data model into specific database schemas that can be implemented on a particular technology platform. This includes defining tables, columns, indexes, and other database objects, as well as specifying data types, constraints, and other details necessary for implementing the model.
In a physical model, further technical details are added to the logical model, for example to clarify table names, column names, and data types.

What good looks like

A high maturity in data modeling requires you to think through several dimensions:

  1. Strategy — The organization’s overall data modeling strategy, including the alignment of data modeling efforts with business goals and objectives.
  2. People/Talent — The articulation of specific roles and their responsibilities, as well as required expertise, skills, and training.
  3. Processes — The processes and workflows, including the documentation of data modeling methodologies, the development of data modeling templates and standards, and the establishment of quality control and review processes.
  4. Technology — The tools required to support data modeling efforts such data modeling software and tools, and the integration of data modeling tools with other systems and applications.
  5. Adoption — The adoption and usage of data modeling practices within and across the organization. This may include socialization programs, the resolution of barriers to adoption, and the tracking metrics to measure the effectiveness and impact of data modeling efforts.

Let’s apply this to the three sub-capabilities we defined.

Metamodel

  • A high maturity for metamodeling involves having a well-defined and widely adopted metamodel that is used consistently across the organization. To confirm — you need one, and only one. Without a metamodel in place (let’s say, the basic grammar and vocabulary of a universal language), all sorts of data models (dialects or even entirely different languages) may sprout up, causing interpretability and interoperability problems across domains, processes, and systems.
  • You need people who understand the metamodel, and who can maintain and explain it. It is recommended to have a single person with ultimate authority over the metamodel. He or she can take in feedback and collect change requests to ensure it is and stays fit-for-purpose.
  • A defined process should describe how the metamodel is to be used in modeling activities. Its adoption should be widespread across data modelers, data architects, and data engineers — it should actually make their work easier as they have a clear basis to start from.
  • Tools such as a data catalogue or metadata management system can provide a centralized repository for storing and managing the metamodel and can support collaboration and version control, but this is not strictly necessary. Some experts might disagree with me, but many perfectly fit-for-purpose metamodels live in tools like Microsoft Excel, PowerPoint, or Visio, and depending on the size and complexity of your organization, that can be appropriate as long as it is strictly adhered to by your conceptual and logical data modelers.

Conceptual and logical data modeling

  • A high maturity involves having a well-defined and consistent approach to conceptual and logical data modeling, with strict adherence to the metamodel. Conceptual domains should have an owner and there should be a structured process to create new logical attributes, and to then have them reviewed, approved, and published.
  • Modeling domains and logical attributes is a skill — and a rare one at that. It requires both a core data modeling expertise and the ability to project it onto a real-life business domain to describe it in logical attributes that make sense to the business and technology organization alike. Don’t underestimate how hard this is — mature organizations often have a “master modeler” in place that guards and socializes best practices (Paul Carey, thinking about you!).
  • The conceptual data model should be (explicitly) aligned to the enterprise or business architecture, and align with the data domains where data ownership is assigned.
  • Data modeling software can be used to create and maintain conceptual and logical data models at scale. These tools can provide a visual representation of the models and can support collaboration, version control, and integration with other systems such as a data catalogue or metadata management system. A business glossary can be used to define and standardize the business concepts and terms that are used in the models.

Physical data modeling

  • A high maturity level for physical data modeling would involve generally having well-designed and efficient database schemas in place that meet applicable performance and scalability requirements. This requires having people who can design and implement the schemas, well-defined processes for schema development and maintenance, and appropriate technology tools to support schema design and implementation.
  • Database design software can be used to create and maintain physical data models. These tools can generate database schemas from the logical data model and can support collaboration, version control, and integration with other systems such as a data catalogue or metadata management system. A data dictionary can also be used to define and standardize the technical details such as data types, constraints, and other database objects.

Higher-level best practices and success factors

To enhance their data modeling capability, organizations can follow some best practices and success factors:

  1. Get the metamodel right first. The metamodel drives reusability and consistency across the entire enterprise. It makes sure that all subsequent modeling efforts incrementally build out the overall model. If you don’t have one in place, you’re up for a gargantuan task of aligning and bridging existing, incompatible models in the future.
  2. Consider prebaked industry or universal models. Depending on where you are in your journey, you can consider adopting a preexisting data model. This can drive alignment with international best practices and standards, save you time and effort to build a model entirely from scratch, and enable efficient and reliable data exchanges with external parties. For example, BIAN provides a standardized banking services reference model that defines a common language, taxonomy, and business process framework for the banking industry.
  3. Iterate between conceptual, logical, and physical. Data modeling takes time — the job will never be done. It is recommended to prioritize domains — reference domains like Customers and Products are good candidates—and start with 1 or 2, where you first complete the logical model and then guidelines for the physical model, before you move on to the next domain.
  4. Don’t overdo the physical. Data modeling can be complex, time-consuming, and therefore expensive. Completing a basic conceptual and logical model is almost always definitely worth the effort, but once venturing into the physical domain, you may not need to centrally direct and capture all of the physical models. You may want to prioritize here as well — for example, identify “mission critical” systems and document physical models for those, but for other ones, it may be sufficient to ensure that local application owners abide by specific modeling norms and standards.
  5. Strategically implement technology. They can be expensive, and you might not need them for the first domain, but eventually your data models will grow exponentially in terms of their size and complexity. Consider a data catalogue, business glossary, and data dictionary, or something that can serve as all of these. Without it, consumption (and hence value creation) will be poor.

A final recommendation, to find and empower a master modeler, deserves its own section.

The role of the master modeler

Photo by Usman Yousaf on Unsplash.

A “master modeler” is an expert in the field of data modeling, who possesses an in-depth understanding of modeling techniques and tools, who can define best practices, and who critically has the business savviness to not let “the perfect” distract from the “the good.” Someone like that can take ownership of and navigate the above-referenced five other recommendations.

In addition, master modelers can provide guidance and support to all logical and physical data modelers in the organization. This can be done by creating reusable artefacts like the aforementioned metamodel, but also basic data modeling blueprints and instruction manuals. Master modelers can also safeguard and socialize best practices by defining standards and guidelines and facilitating trainings. By staying up-to-date with the latest trends and advancements, they can introduce new techniques and tools to the organization that drive continuous improvement. They can also advise on whether to leverage existing, premade data models instead of building them from scratch.

Data modeling is a critical input into any effort to design or optimize solutions, products, and services. Business teams can benefit from the support provided by a master modeler as it accelerates their efforts and as it will ultimately improve the overall quality and consistency of the organization’s data models.

Next steps

Data modeling is a critical capability that helps organizations ensure that their data is well-designed, consistent, and effective in supporting business needs. By focusing on sub-capabilities such as metamodeling, conceptual and logical data modeling, and physical data modeling, organizations can grow their maturity, and enable business users and data scientists to consistently find the right data for their respective use cases. If all of this sounds like too much to take on all at once, your first step might be to find a master modeler.

References and recommendations for further reading

All images unless otherwise noted are by the author.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai NewsAprDataDesignKoendersmachine learningmaturityModelmodelingTech NewsWillem
Comments (0)
Add Comment