|
Formal approach to modelling a multiversion data warehouse
|
B. BEBEL, Z. KROLIKOWSKI, and R. WREMBEL
|
|
A data warehouse
(DW) is a large centralized database that stores data integrated from multiple, usually heterogeneous external
data sources (EDSs). DW content is processed by so called On-Line Analytical Processing applications, that analyze business trends, discover
anomalies and hidden dependencies between data. These applications are part of decision support systems. EDSs constantly change their
content and often change their structures. These changes have to be propagated into a
DW, causing its evolution. The propagation of content
changes is implemented by means of materialized views. Whereas the propagation of structural changes is mainly based on temporal extensions
and schema evolution, that limits the application of these techniques. Our approach to handling the evolution of a DW is based on schema and
data versioning. This mechanism is the core of, so called, a multiversion data warehouse. A multiversion DW is composed of the set of its
versions. A single DWversion is in turn composed of a schema version and the set of data described by this schema version. Every
DW version stores a DW state which is valid within a certain time period. In this paper we present: (1) a formal model of a multiversion data warehouse,
(2) the set of operators with their formal semantics that support a DW evolution, (3) the impact analysis of the operators on DW data and user
analytical queries. The presented formal model was a basis for implementing a multiversion DW prototype system. |
|
Keywords: |
schema evolution, data evolution, schema versioning, data versioning, multiversion data warehouse, formal
model |