Enterprise Data Warehouse approach - definition and scope
The Enterprise Data Warehouse (EDWH) approach tries to integrate all in-house Business Intelligence apps into one consolidated Data Warehouse. The goal is to provide a one-stop source in which all relevant business data is available in high quality, independant from the original source systems, in it's whole history and aligned to a common business directory and metadata model.
At DNVGL, the EDWH approach is trying to enhance to effects of Business Intelligence by integrating existing on premise BI solutions, the new Oracle BI Cloud (BICS) initialtive and OTBI reporting. Oracle Transactional Business Intelligence (OTBI) is focused on quick delivery of insight on specific Oracle cloud apps data using a custom, closed-shop data model. EDWH is focused on integrating enterprise data independent of source system using a consolidated data model.
Layers and architecture
For data to be used for Enterprise Business Intelligence, it has to be fetched from the source system, integrated, consolidated, aligned and presented to BI tools. This is done moving the data between several architectural layers. This process is called ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform).
In the STAGE, data is fetched from the source systems. This can be done by using a database connection (database link) or by reading files delivered in XML or other formats. The source system data is imported to the stage layer as it is modeled in the source system. The only addition may be DWH-specifiy metadata columns for each stage table: a column identifying the original source system, and another column that states the time this data was first seen in the data warehouse.
In addition to the stage (STG) layer, a pesistant stage layer (PSA) may be established. In this layer, every load iteration is preserved by partitioning using the DELTA DATE column. This enables the ability to reload every iteration of source system data into the subsequent DWH layers.
In a Data Warehouse it is recommended to implement a central Core layer. The core layer is where integration, consolidation and historization are done. To integrate data from various heterogenous sources, it has to be loaded into a common, unified data model. The Core is the only source for the Data Marts that are used for BI applications.
There are two commonly used approaches for DWH core modeling:
Relational Core Modeling
In this approach, data is consolidated and historized in a form as nearly derived from the original structure as possible. Most data usually is in the third normal form (3NF) when originating from a relational database at the source system.
When existing structures – especially relationships – are changed, the effort for data migration in the Core may be high. For example, it may be simple to add a new foreign key column to an existing table. But what contents must be filled in for the historical data records?
Data Vault Modeling
Data Vault modeling, invented by Dan Linstedt in 1990, is a modeling approach specialized to model the Enterprise Data Warehouse Core layer. This approach is suitable for multi-source environments needing a fast adaptation to changes. The model consists of a complete denormalization (“Unified Decomposition” mentioned by Hans Hultgren) of three base constructs into entities:
Data delivery from a Core Data Vault model can generate performance issues, because of the multiple “joins” that are sometimes outer in case all the records are not in the Satellite of a Hub. If the model contains point-in-time tables, the queries to reconsolidate the historized dimensions or facts through multiple SCD2 Satellites is very complex and, again, increase the number of “joins” in the queries.
Recommendations for Core Modeling
Relational Modeling is useful for the Core layer if the analytical requirements are diffuse or not yet defined or if the purpose of the Data Warehouse is the delivery of information to different kind of BI applications and other target systems. In combination with Master Data Versioning, a relational Core gives flexible support of data historization and allows delivering current and historical data to the Data Marts. The ETL processes to load a relational Core are usually easier than for a dimensional Core, but the data delivery to Data Marts can be more complex – especially if the Core model is derived from the data models of the source systems. A relational Core is recommended if the source system structures are relatively stable and if a highly integrated Core data model is required.
Fundamental for all modeling approaches is that the method is accepted by the DWH development team, and all architects and developers understand the concepts, benefits and issues of the chosen modeling approach.
Warehouse or Data Store?
The effort to model data, using the architure concepts mentioned above, is growing in the pace we consolidate source data in the warehouse. To keep the modeling effort minimal, we can decide to keep data transformations at a minimum and use source system data in it's current form whenever possible. This is the concept of Oracle Transactional Business Intelligence (OTBI) and the BI Apps. Data is taken from the source system and directly delivered to the BI framework, or just treated minimally in a dimensional data mart. In this way, most of the benefits of an enterprise data warehouse (EDWH) cannot be realized, though (like enterprise data governance, commonmetadata directories and enterprise data integration and historization).
The Data Mart layer is used for BI instruments like Oracle Business Intelligence (OBIEE) to deliver data for reports, dahboards and analysis. Data is modeled in dimension form as a star schema.
Dimensional modeling is a set of techniques and concepts used in Data Warehouse design proposed by Ralph Kimball, which is oriented around understandability and performance and which uses the concepts of facts (measures), and dimensions (context). Most reporting tools and frameworks require a dimensional model, business users intuitively understand a dimensional model and are able to formulate queries. A dimensional model is a conceptual model rather than a logical/physical data model, since it can be used for any physical form, e.g. as relational or multidimensional database. If a dimensional model is implemented in a relational database, the most commonly used schema types are star and snowflake schema. The latter one normalizes dimension tables, in a star schema the dimension tables are denormalized.
Tools and methodology
Database (cloud approach)
Stage, Core and Mart layers need a relational database (RDBMS) as underlying data storage. Wheras as technically possible with PostgreSQL or other relational databases, practical and corporate considerations suggest to use Oracle RDBMS as data sink.
If availability of the DWH platform is critical, a standby database can be created using Data Guard. In case of any failure of the primary instance, the standby instance can be brought online in short time.
Data Modeling should be done using standard tools, e.g. Oracle SQL Developer Data Modeler. Data Modeler can be configured to use a version control system for best collaboration results. Recommended are Subversion or Git (available in the newer Data Modeler Versions).
Extract, Transform an Load data
To stay in the Oracle world, Oracle Data Integrator ist the ETL (or ELT) tool delivering best results with the other technologies used. ODI has the capability to push data to the Oracle Cloud Database Service, and to obtain data from various sources, including Oracle and Non-Oracle databases and files as CSV or XML.
BI (in the cloud)
Although there are several alternatives available on the market (like Pentaho or Tableau), Oracle Business Intelligence still seems to be the market leader and is the standard tool for DNVGL.
In addition to drill-down analysis using the OBI frontend, there are dashboards and traditions reports then can be delivered.