From Data Model to Ontology, Evolution of Enterprise Data Management Practices
Continuing my recent post on Ontology, please find the second installment in the series. In this post I try to examine why we need Ontology based Data Management practices in the first place and how we can possibly go about building an enterprise system which leverages the semantic nature of data and not just the structural nature of data as we have been doing for decades.
Introduction:
Ever
since the first commercial databases came into the market some 40 years back,
data modeling has been evolving to facilitate users on how to conceptualize,
capture, store and analyze data. Data is considered as a vital organization
resource and needed to be managed and used like any other business asset. Most
organizations, as they grew, found the importance of organizing data about
their internal operations as well as external relationships. These led to the
evolution of a practice called “Data Modeling”.
Data Modeling has been vital towards giving
‘shape’ or ‘structure’ to abstract pieces of information so that they can be
queried, analyzed and modified is a systematic manner. Thus, Data Modeling
was more vital from the point of view of machines because computer programs
would not understand abstract concepts and needed structured information. This
further led to the development of SQL (Structured Query Language) and similar
languages (many times DBMS specific languages) whose main intention was to give
the much needed ‘structure’ to the information collected.
Hidden Problems:
For
decades that followed, the structured information was enough. It was good in
describing the content, constraints and relationships very well. SQL was king
and led to its rapid adoption, application and job creation. It was the
de-facto standard for data query and analysis. But there was a problem. It was
for people who understood the ‘structure’ of data. The very concept of
structure which led to the development of DBMS, was proving to be a hindrance
when it came to exploring data at a logical or semantic level. The common
issues were:
- Data Models were too much pre-occupied with the ‘physical’ form of data – such as tables, attributes, attribute data types, lengths and so on. Even though people wanted to enforce practices such as logical data modeling and conceptual data modeling, the final form which was actually machine-understandable eventuallybecame more importantand in most cases the only concern. It is common to observe even today that organizations have huge discrepancies between the logical and the physical data models. The implications are far reaching. As logical data models are the mode of exchanging information between the users (because the physical form may be complex for everybody to understand), the discrepancies lead to inefficiency and mistakes. The only solution being, asking someone to be a DBMS specialist if he wanted to deal with data.
- Since everyone could not be a DBMS specialist, Data became more and more the ownership and prerogative of a few, belonging to the traditional IT (Information Technology) or BI (Business Intelligence) teams. They would be ones who could understand the physical structure of data, proficient in SQL and other DBMS technologies and other departments had to depend on them. For many decades it was not a problem because the amount of data and what we could do with the data, both were limited. Over the last decade we observed it became more of an issue if data would be confined in the hands of a few. It led to major blockades in terms of empowering employees for innovation. Giving on demand, real time access to organization’s data was the stepping stone towards driving innovation as a company culture. Lack of it started preventing new kinds of analysis, new ideas being pursued or new decisions being taken, with ease and efficiency. In today’s terms, this meant “Data could not be democratized”. This brings us to the next issue.
- Data could not be understood equally by different people in the same organization. As data got structured into tables and attributes, it lost or gained information in most cases. Such as enforcement of data types, length restrictions, tokenization, transformation and so on. We would mostly extract what we thought was meaningful and deal with them alone. And as only a handful of people would this work in an organization, the issue was multi-folded. Information which would be meaningful to some may be discarded as non-meaningful. Information which would be interpreted in one way by some, would get captured in a different way. It became common to see two departments referring to the same information using different names or even worse, different pieces of information using the same name. The problem went beyond just names, when people started to explore the inter-relationships between different information elements.
As
organizations grew, they produced more data and there were more opportunities
to develop innovative services and products, optimize existing ones thereby making
data a major business asset. And the constraints put by the traditional way of
analyzing data after modeling, seemed to be an emerging issue. It was hard to
democratize data, it was hard to understand data using same meaning, and it was
hard to work with data only in the physical form.
Introducing Ontology:
The word “Ontology” literally means
“studying the way things are”, in simple terms “studying the real meanings”
rather than the “form”. In connection to data management and data analysis,
it translates into an ecosystem which allows users to explore data in its real
sense (or meaning) and not constrained by the way data is physically
represented. While physical data (data model and related concepts) was needed
for machines, humans needed more flexible and natural ways to explore data,
using their meanings and relating to them as ‘things’ and ‘concepts’.
Once
we identified which data elements correspond to things and which to concepts, the process of building the Ontology involves
defining the logical relationships between different things and things,
things and concepts, concepts and concepts and so on. As a result, one is able
to express ideas of the ‘real world’ in a flexible way. For example “a mobile
device connects to a cell tower”, “a cell tower generates signaling logs”, “a
signaling log contains mobile number”, “a mobile number belongs to a customer”
etc. For example a typical Ontology description of ‘Telecom Customer’ may look
like the following *:
* picture courtesy WuJianlinet. al. Beijing University of
Posts and Telecommunications
Building these ‘logical’ meaning and
relationships between data elements, paves the way of exploring data in the
real sense and not how they may be looking in the DBMS and Data Stores. For
example it becomes possible to explore a relationship like “customers belong to
a cell tower” and when we want to get information about a customer, we could
pull information about the cell towers he has belonged to from time to time.
The biggest advantage is being able to specify rules and queries without
knowing the underlying DBMS, Schema, Table Structures and so on.
Evolving From Data Model to Ontology:
Ontology
is important, in the sense that it talks about the ‘actual’ meaning of the data
rather than the ‘form’ or ‘structure’ of data. As a result, a well-defined Ontology is more close to humans and
suitable for business usage (people who are good at describing what they want
but may not be good at expressing it using concepts of Tables, Attributes, APIs
etc.). This enables any layman in the organization to explore the data
landscape by mostly using ‘natural language’ or ‘business terms’.
It
would be wrong to think that Ontology is a “Smarter” replacement of Data
Model. It is not. It essentially opens a
new way of looking and exploring data.Organizations
need not really throw away their existing investment into DBMS technologies.
But rather think of a way to evolve existing “Data Model” oriented systems to
“Ontological” systems. In other terms, it needs an adaptation from the
physical layer to the logical layer. This idea is explained at a high level in
the below diagram:
While
Data Model is the physical representation of the data close to the actual
storage or DBMS systems, Ontology is more close to the business views,
providing meaningful entities and relationships to the business users.How do we
evolve our product portfolios from a traditional Data Model driven architecture
to an Ontology driven architecture? The following diagram shows a high level
way:
Most
enterprises, with a serious and strategic approach towards enterprise data
management, have already deployed some kind architecture which supports:
creating and managing data models, data governance, business glossaries (data
dictionary) and so on. The existing investment is good enough for the
traditional BI and IT teams, but not suitable for teams more close to business
and investment planning. For them exposing a platform which enables them to
semantically manage and explore data in the system is more valuable, even
though it comes with an additional investment.
Conclusion:
Evolution
from a Data Model oriented analysis to an Ontology based method opens up new
possibilities. It allows viewing data from a domain perspective and suitable
for different business functions within the organization. Empowering employees
with the ability to explore data and extract insights from them leads to innovation
and ultimately strengthening the organization’s business. An adoption of the Ontological way of
information capturing and analysis does not require discarding of the existing
‘Data Model’ oriented systems, but those can be reused by introducing appropriate
tools and platforms on top of the existing Data Warehouse infrastructure.
Comments
Post a Comment