company logo

Database correlation

Database correlation is the degree of dependencies between database entries within a database. Here, a database entry is the smallest physical unit within a database system, that is stored on data storage in one piece. Database entries are atomic in the sense, that they are addressable and cannot be divided physically. Typically, this object instances stored in the database, MEMO or BLOB entries, indexes or part of those are stored as database entries.

Database entries depend in different ways on each other. Typically, one database entry refers to one or more other database entries (e.g. by a sort of data entry pointer or database address).

D1.1: When a database entry A refers to an entry B, A correlates with B, or, between A and B exists a database entry correlation.

The way, database entries refer to each other depends on the specific database system, i.e. the reference might be a database address but also a value, which identifies an instance in a table (e.g. primary key). Another way of expressing dependencies are set relations. Thus, a subset correlates with its super set etc.

D1.2: The degree of database correlation, which results from the average number of database entries correlations within a database, is called database correlation factor (DCF).

The DCF is a measure for database complexity and order of a database instantiation, which may influence versioning and transaction strategies. Client/server strategies are not influenced much by DCF, but preferred transaction strategies resulting from the DCF may influence the choice of the proper client server model.

D1.3: Database correlations maintained by the database management system (DBMS) are called physical database entry correlation.

D1.4: The physical database correlation factor (DCF(p)) is the average number of physical database entry correlations per database entry within a database instantiation.

Besides physical correlations, conceptual correlations might be defined within a database. Conceptual correlations are those, which are defined in terms of specific rules describing conceptual database consistency. E.g. a rule, that an employee may never get more money than his boss describes a conceptual correlation between employee and boss. When subset relations are not supported by the DBMS, those might be defined as consistency rule, in which case a conceptual correlation exists between subset and super set.

D1.5: When a database entry A creates side effects to a database entry B when being updated, or when the current state of A influences updating B, A has a conceptual correlation with B., i.e. between A and B there exists a conceptual database entry correlation.

D1.6: The conceptual database correlation factor (DCF(c)) is the average number of conceptual database entry correlations per database entry within a database instantiation.

In many cases, conceptual correlations are based on physical ones. Since database entries correlate or not, conceptual correlations that exist in addition to physical ones will not influence the DCF. Since conceptual correlations are difficult to detect, the physical DCF might be sufficient in many cases for choosing the right versioning or transaction strategy.

The DCF needs to be determined for each database. Most correlations are defined by indexes, since each index entry defines a correlation between the collection and the related instance. This, however, is not very important considering versioning or transaction strategies, since managing indexes properly is handled by the DBMS and cannot be controlled by the application. Moreover, correlations between index entries and instances are rather simple and do not cause problems as we will discuss in the following chapters.

Since database instance entries might correlate indirectly (e.g. via an index) with each other, instance entry correlation is described as follows:

D1.6: Instance entry A and instance entry B correlate with each other, when there exists a database entry correlation between A and B or when there exists a correlation path A E1 ... En B where a database entry correlation exists between the elements in the path, i.e. between A and E1, Ei and Ei+1 (i > 0 and i < n) and En and B, and where each database entry Ei is not an instance entry. In this case, a database instance correlation exists between A and B.

Whether CLOB and BLOB entries are counted as instance entries, depends on the physical storage model a DBMS is based on. Often, CLOB and BLOB fields are stored in separate database entries, in which case they are considered as database instance entries.

D1.7: The database instance correlation factor (DCF(i)) is the correlation factor resulting from the average correlations between instance database entries.

The DCF(I) is a good measure for the complexity of a database instantiation. Regardless on a specific database instantiation, model correlation factor can be determined from the model definition for a database or object model.

D1.8: The database model correlation factor (DCFM) is the average number of potential database instance correlations in a database, i.e. the average number of relationships (or references) defined per complex data type.

The DCFM can be measured for database models. Similar to the DCF, an instance correlation factor (DCFM(i)) can be calculated for the model as well as a physical correlation factor (DCFM(p)). The conceptual correlation factor for a database model (DCFM(c)) can be estimated for most database models. Sometimes, there is no conceptual dependency at all defined within a database model, which simplifies versioning and transaction strategies.

For a database instantiation, the DCF(i) might be measured easily. The conceptual correlation DCF(c) is difficult to evaluate, and thus, the DCF is hardly to calculate. In order to evaluate the DCF(c), conceptual correlations have to be defined as part of the database model. Practically, those relationships are implemented in the functional model, were those are difficult to extract by programs. Practically, the DCF(c) could be estimated in order to select proper strategies.

Conclusions

Database correlation factors can be determined on database entry, instance, conceptual or model level. On each level, database correlation factors can be measured or estimated. Developing an application, model correlation factors are the only information available. In order to provide database correlation factor, a representative database instantiation is required.