From Data Model to Ontology, Evolution of Enterprise Data Management Practices
Continuing my recent post on Ontology, please find the second installment in the series. In this post I try to examine why we need Ontology based Data Management practices in the first place and how we can possibly go about building an enterprise system which leverages the semantic nature of data and not just the structural nature of data as we have been doing for decades.
Ever since the first commercial databases came into the market some 40 years back, data modeling has been evolving to facilitate users on how to conceptualize, capture, store and analyze data. Data is considered as a vital organization resource and needed to be managed and used like any other business asset. Most organizations, as they grew, found the importance of organizing data about their internal operations as well as external relationships. These led to the evolution of a practice called “Data Modeling”.
Data Modeling has been vital towards giving ‘shape’ or ‘structure’ to abstract pieces of information so that they can be queried, analyzed and modified is a systematic manner. Thus, Data Modeling was more vital from the point of view of machines because computer programs would not understand abstract concepts and needed structured information. This further led to the development of SQL (Structured Query Language) and similar languages (many times DBMS specific languages) whose main intention was to give the much needed ‘structure’ to the information collected.
For decades that followed, the structured information was enough. It was good in describing the content, constraints and relationships very well. SQL was king and led to its rapid adoption, application and job creation. It was the de-facto standard for data query and analysis. But there was a problem. It was for people who understood the ‘structure’ of data. The very concept of structure which led to the development of DBMS, was proving to be a hindrance when it came to exploring data at a logical or semantic level. The common issues were:
- Data Models were too much pre-occupied with the ‘physical’ form of data – such as tables, attributes, attribute data types, lengths and so on. Even though people wanted to enforce practices such as logical data modeling and conceptual data modeling, the final form which was actually machine-understandable eventuallybecame more importantand in most cases the only concern. It is common to observe even today that organizations have huge discrepancies between the logical and the physical data models. The implications are far reaching. As logical data models are the mode of exchanging information between the users (because the physical form may be complex for everybody to understand), the discrepancies lead to inefficiency and mistakes. The only solution being, asking someone to be a DBMS specialist if he wanted to deal with data.
- Since everyone could not be a DBMS specialist, Data became more and more the ownership and prerogative of a few, belonging to the traditional IT (Information Technology) or BI (Business Intelligence) teams. They would be ones who could understand the physical structure of data, proficient in SQL and other DBMS technologies and other departments had to depend on them. For many decades it was not a problem because the amount of data and what we could do with the data, both were limited. Over the last decade we observed it became more of an issue if data would be confined in the hands of a few. It led to major blockades in terms of empowering employees for innovation. Giving on demand, real time access to organization’s data was the stepping stone towards driving innovation as a company culture. Lack of it started preventing new kinds of analysis, new ideas being pursued or new decisions being taken, with ease and efficiency. In today’s terms, this meant “Data could not be democratized”. This brings us to the next issue.
- Data could not be understood equally by different people in the same organization. As data got structured into tables and attributes, it lost or gained information in most cases. Such as enforcement of data types, length restrictions, tokenization, transformation and so on. We would mostly extract what we thought was meaningful and deal with them alone. And as only a handful of people would this work in an organization, the issue was multi-folded. Information which would be meaningful to some may be discarded as non-meaningful. Information which would be interpreted in one way by some, would get captured in a different way. It became common to see two departments referring to the same information using different names or even worse, different pieces of information using the same name. The problem went beyond just names, when people started to explore the inter-relationships between different information elements.
As organizations grew, they produced more data and there were more opportunities to develop innovative services and products, optimize existing ones thereby making data a major business asset. And the constraints put by the traditional way of analyzing data after modeling, seemed to be an emerging issue. It was hard to democratize data, it was hard to understand data using same meaning, and it was hard to work with data only in the physical form.
The word “Ontology” literally means “studying the way things are”, in simple terms “studying the real meanings” rather than the “form”. In connection to data management and data analysis, it translates into an ecosystem which allows users to explore data in its real sense (or meaning) and not constrained by the way data is physically represented. While physical data (data model and related concepts) was needed for machines, humans needed more flexible and natural ways to explore data, using their meanings and relating to them as ‘things’ and ‘concepts’.
Once we identified which data elements correspond to things and which to concepts, the process of building the Ontology involves defining the logical relationships between different things and things, things and concepts, concepts and concepts and so on. As a result, one is able to express ideas of the ‘real world’ in a flexible way. For example “a mobile device connects to a cell tower”, “a cell tower generates signaling logs”, “a signaling log contains mobile number”, “a mobile number belongs to a customer” etc. For example a typical Ontology description of ‘Telecom Customer’ may look like the following *:
* picture courtesy WuJianlinet. al. Beijing University of Posts and Telecommunications
Building these ‘logical’ meaning and relationships between data elements, paves the way of exploring data in the real sense and not how they may be looking in the DBMS and Data Stores. For example it becomes possible to explore a relationship like “customers belong to a cell tower” and when we want to get information about a customer, we could pull information about the cell towers he has belonged to from time to time. The biggest advantage is being able to specify rules and queries without knowing the underlying DBMS, Schema, Table Structures and so on.
Evolving From Data Model to Ontology:
Ontology is important, in the sense that it talks about the ‘actual’ meaning of the data rather than the ‘form’ or ‘structure’ of data. As a result, a well-defined Ontology is more close to humans and suitable for business usage (people who are good at describing what they want but may not be good at expressing it using concepts of Tables, Attributes, APIs etc.). This enables any layman in the organization to explore the data landscape by mostly using ‘natural language’ or ‘business terms’.
It would be wrong to think that Ontology is a “Smarter” replacement of Data Model. It is not. It essentially opens a new way of looking and exploring data.Organizations need not really throw away their existing investment into DBMS technologies. But rather think of a way to evolve existing “Data Model” oriented systems to “Ontological” systems. In other terms, it needs an adaptation from the physical layer to the logical layer. This idea is explained at a high level in the below diagram:
While Data Model is the physical representation of the data close to the actual storage or DBMS systems, Ontology is more close to the business views, providing meaningful entities and relationships to the business users.How do we evolve our product portfolios from a traditional Data Model driven architecture to an Ontology driven architecture? The following diagram shows a high level way:
Most enterprises, with a serious and strategic approach towards enterprise data management, have already deployed some kind architecture which supports: creating and managing data models, data governance, business glossaries (data dictionary) and so on. The existing investment is good enough for the traditional BI and IT teams, but not suitable for teams more close to business and investment planning. For them exposing a platform which enables them to semantically manage and explore data in the system is more valuable, even though it comes with an additional investment.
Evolution from a Data Model oriented analysis to an Ontology based method opens up new possibilities. It allows viewing data from a domain perspective and suitable for different business functions within the organization. Empowering employees with the ability to explore data and extract insights from them leads to innovation and ultimately strengthening the organization’s business. An adoption of the Ontological way of information capturing and analysis does not require discarding of the existing ‘Data Model’ oriented systems, but those can be reused by introducing appropriate tools and platforms on top of the existing Data Warehouse infrastructure.