Handling inter-dimensional members dependency and reducing cube sparsity using reference dimensions in Analysis Services 2005 SP2 : Cube design tip

Traditionally the cubes structure is designed based on either star or snowflake schema. The dimensions in the cube is totally independent of each other and in the results can be obtained for any dimension members to any dimension. This is a classic illusion of an ideal world, i.e. even though the intention is good but the performance suffers since the cube is very sparse.

High cube sparsity adversely affect the MDX query performance. A very sparse cube can take longer to resolve calculated members and calculated cells, and MDX functions involving empty cells, such as CoalesceEmpty or NonEmptyCrossjoin, take slightly longer to process because of the large volume of empty cells that must be considered by such functions.

For example, the following diagram indicates three dimension hierarchies used to construct a cube for tracking orders.

Each customer, product & time dimensions have 5 members each. In the above design, between customer, product and time, theoretically 125 cells are possible ( 5 customer X 5 product X 5 time members).

However, in reality, may times, the existence of a real measure of a given dimensions is dependent on other dimensions. E.g. Sales rep can handle a few given territories, the given customer is handled by a few sales rep and the customer purchase a given set of products, some products are not even sold in other territories. So, up front in the design, we are sure that there would be a lot of dimension combinations for which real data would not exist.

Suppose customer and product dimensions have dependency on each other and the following following 11 valid combinations exists:

And suppose, these 11 combinations have data existing for all the 5 time periods, then the total number of valid combinations are 55 (11 customer-product X 5 time members).

It means that the actual density of the cube is only 44 % (55 real measures / 125 theoretical measures).

This cube is a theoretical example; in reality, many dimensions are much more sparse than indicated in this example. E.g. we have assumed that 11 customer product combinations have purchase history for all the 5 time periods, but if they have on an average 2 purchases out of five, then the actual number of cells would drop to 22 (11 customer-product X 2 time period) and the sparsity would drop further to 17% (22 real measures / 125 theoretical measures).

By themselves, the dimensions do not appear overly sparse; each dimension has members with relevant fact table data. If these dimensions are used together in a cube, however, the sparsity of the cube increases exponentially with each dimension, because the introduction of each dimension exponentially increases the number of cells within the cube. The above diagram, the shaded cells on customer - product face indicate the cells that actually contain data.

Dimensions with unrelated data can also greatly increase cube sparsity, especially if the dimensions are included as part of an associative relationship. For example, a business case is designed to compare the sales from retail customers with the sales from vendors, so a cube with three dimensions representing sales, customers, and vendors is created. The Customers dimension organizes retail customers by location, the Vendors dimension organizes vendors by sales region and vendor type, and the Orders dimension organizes order quantities by date. Both the Customers and Vendors dimension share elements with the Orders dimension, but not with each other. Because customers and vendors do not directly relate, from the viewpoint of the underlying data source, the result is a very sparse cube. In this case, it is easier to construct two cubes, one for vendors and one for customers, which share the Orders dimension.

Conversely, if beforehand, we know that dimensions are dependent on each other, then we can reduce the cube sparsity at the cube design level itself. Analysis Services 2000 did not support "Reference Dimensions" but it is something we can utilize in Analysis Services 2005.

We can design an intermediate dimension which contains the valid combinations of customer and product members at leaf level. This intermediate dimension joins with the fact table. The Customer and product dimensions are joined to fact table as a "Reference Dimension" utilizing the intermediate dimension.

The below diagram illustrates the conversion of above star schema to a "Reference-Intermediate" dimension structure. The shaded region is the valid combination of Customer-Product intermediate dimension and the time dimension, i.e. 55 cells and the sparsity is zero.

The intermediate dimension can be made invisible so that to a external OLAP cube consumer, there is no difference in the cube browsing experience even though the underlying cube structure has been changed drastically. The MDX queries too would run faster since the cube sparsity is reduced drastically by use of intermediate dimension.

7 comments:

Anonymous said...: Pingback. Link to this post was added to our site in the [Articles]/[Design] section:
http://www.ssas-info.com/analysis-services-articles/62-design/872-handling-inter-dimensional-members-dependency-and-reducing-cube-sparsity-using-reference-dimensions-in-analysis-services-2005; 4:26 PM
Unknown said...: Very Good Article. Good advise. Make sense and speed is very important this days.; 1:05 AM
Pedro said...: Very good article and blog!!
Pedro; 5:00 AM
Anonymous said...: Hi Ajit,

I want some help on excel connectivity to AS, can you help?

Thanks
Nitin Murkute
nitin.murkute@clariontechnologies.co.in; 7:47 PM
Darren Gosbell said...: Hi Ajit,

You also don't mention whether you used materialized or non-materialized reference relationships. Materialized reference relationships should not be any different from having the keys in the fact table, as the key from the referenced dimension is stored in the partition at processing time.

And I have my doubts about non-materialized reference relationships as they need to resolve the join between the intermediate and the referenced dimension at runtime, so I would be interested to see the measurements from any testing you have done.

Regards
Darren (http://geekswithblogs.net/darrengosbell); 7:02 AM
Sam Kane said...: Here are this and some other articles on SSAS Reference Dimension

http://ssas-wiki.com/w/Articles#Reference_Dimension; 11:47 PM
Utsav said...: Doesn't it push snow flake schema?? There are lot of issues already discussed with snow flake over internet.; 2:00 PM

Microsoft Analysis Services & MDX - blogs by Ajit Singh