Handling inter-dimensional members dependency and reducing cube sparsity using reference dimensions in Analysis Services 2005 SP2 : Cube design tip

Traditionally the cubes structure is designed based on either star or snowflake schema. The dimensions in the cube is totally independent of each other and in the results can be obtained for any dimension members to any dimension. This is a classic illusion of an ideal world, i.e. even though the intention is good but the performance suffers since the cube is very sparse.

High cube sparsity adversely affect the MDX query performance. A very sparse cube can take longer to resolve calculated members and calculated cells, and MDX functions involving empty cells, such as CoalesceEmpty or NonEmptyCrossjoin, take slightly longer to process because of the large volume of empty cells that must be considered by such functions.

For example, the following diagram indicates three dimension hierarchies used to construct a cube for tracking orders.

StarSchema

Each customer, product & time dimensions have 5 members each. In the above design, between customer, product and time, theoretically 125 cells are possible ( 5 customer X 5 product X 5 time members).

However, in reality, may times, the existence of a real measure of a given dimensions is dependent on other dimensions. E.g. Sales rep can handle a few given territories, the given customer is handled by a few sales rep and the customer purchase a given set of products, some products are not even sold in other territories. So, up front in the design, we are sure that there would be a lot of dimension combinations for which real data would not exist.

Suppose customer and product dimensions have dependency on each other and the following following 11 valid combinations exists:

IntermediateDimension

And suppose, these 11 combinations have data existing for all the 5 time periods, then the total number of valid combinations are 55 (11 customer-product X 5 time members).

It means that the actual density of the cube is only 44 % (55 real measures / 125 theoretical measures).

This cube is a theoretical example; in reality, many dimensions are much more sparse than indicated in this example. E.g. we have assumed that 11 customer product combinations have purchase history for all the 5 time periods, but if they have on an average 2 purchases out of five, then the actual number of cells would drop to 22 (11 customer-product X 2 time period) and the sparsity would drop further to 17% (22 real measures / 125 theoretical measures).

By themselves, the dimensions do not appear overly sparse; each dimension has members with relevant fact table data. If these dimensions are used together in a cube, however, the sparsity of the cube increases exponentially with each dimension, because the introduction of each dimension exponentially increases the number of cells within the cube. The above diagram, the shaded cells on customer - product face indicate the cells that actually contain data.

Dimensions with unrelated data can also greatly increase cube sparsity, especially if the dimensions are included as part of an associative relationship. For example, a business case is designed to compare the sales from retail customers with the sales from vendors, so a cube with three dimensions representing sales, customers, and vendors is created. The Customers dimension organizes retail customers by location, the Vendors dimension organizes vendors by sales region and vendor type, and the Orders dimension organizes order quantities by date. Both the Customers and Vendors dimension share elements with the Orders dimension, but not with each other. Because customers and vendors do not directly relate, from the viewpoint of the underlying data source, the result is a very sparse cube. In this case, it is easier to construct two cubes, one for vendors and one for customers, which share the Orders dimension.

Conversely, if beforehand, we know that dimensions are dependent on each other, then we can reduce the cube sparsity at the cube design level itself. Analysis Services 2000 did not support "Reference Dimensions" but it is something we can utilize in Analysis Services 2005.

We can design an intermediate dimension which contains the valid combinations of customer and product members at leaf level. This intermediate dimension joins with the fact table. The Customer and product dimensions are joined to fact table as a "Reference Dimension" utilizing the intermediate dimension.

The below diagram illustrates the conversion of above star schema to a "Reference-Intermediate" dimension structure. The shaded region is the valid combination of Customer-Product intermediate dimension and the time dimension, i.e. 55 cells and the sparsity is zero.

ReferenceDimensionCube

 

The intermediate dimension can be made invisible so that to a external OLAP cube consumer, there is no difference in the cube browsing experience even though the underlying cube structure has been changed drastically. The MDX queries too would run faster since the cube sparsity is reduced drastically by use of intermediate dimension.

7 comments:

Anonymous said...

Pingback. Link to this post was added to our site in the [Articles]/[Design] section:
http://www.ssas-info.com/analysis-services-articles/62-design/872-handling-inter-dimensional-members-dependency-and-reducing-cube-sparsity-using-reference-dimensions-in-analysis-services-2005

Unknown said...

Very Good Article. Good advise. Make sense and speed is very important this days.

Pedro said...

Very good article and blog!!
Pedro

Anonymous said...

Hi Ajit,

I want some help on excel connectivity to AS, can you help?

Thanks
Nitin Murkute
nitin.murkute@clariontechnologies.co.in

Darren Gosbell said...

Hi Ajit,

You also don't mention whether you used materialized or non-materialized reference relationships. Materialized reference relationships should not be any different from having the keys in the fact table, as the key from the referenced dimension is stored in the partition at processing time.

And I have my doubts about non-materialized reference relationships as they need to resolve the join between the intermediate and the referenced dimension at runtime, so I would be interested to see the measurements from any testing you have done.

Regards
Darren (http://geekswithblogs.net/darrengosbell)

Sam Kane said...

Here are this and some other articles on SSAS Reference Dimension

http://ssas-wiki.com/w/Articles#Reference_Dimension

Utsav said...

Doesn't it push snow flake schema?? There are lot of issues already discussed with snow flake over internet.

My Articles

Design

Cube structure optimization for MDX query performance in Analysis Services 2005 SP2: Tips for Parent Child Hierarchies usage

Fact table design for “State Workflow Analysis”: Analysis Services Dimensional modeling

Handling inter-dimensional members dependency and reducing cube sparsity using reference dimensions in Analysis Services 2005 SP2 : Cube design tip

Identifying intra-dimensional members relationship and reducing cube sparsity in Analysis Services 2005 SP2 : Cube design tip

Leaves() : An example to understand it for both regular hierarchies as well as parent child hierarchies

Aggregation design: useful tips

Level based attribute hierarchy: MDX query performance woes in SQL Server 2005 SP2: Is it fixed in post SP2 hotfix?

Parent child hierarchy to level base hierarchy conversion: hiding placeholder dimension members in client application

Trouble / Troubleshooting

Aggregate(), Sum() functions using calculated members does not work in Analysis Services 2005 SP2 (9.00.3042.00 version) but works in Analysis Services 2000 SP4

Analysis Services 2005 migration tool: Custom member formula issues in migrated database

Cube Partitions: Fact table not listing in Business Intelligence Development Studio in partition wizard

Analysis Services 2005: Many-to-Many relationship does not support unary operators with parent-child dimension

MDX

NextAnalytics and MDX : Part 1 - Swap Cells with Row Labels

Selecting dimension's default member based on a member property

Sorting members on member codes / member properties

Time Dimension: How to set Default Member to Current Month

Setting dynamic default member in dimension X based on the current member of dimensions Y

ADOMD.NET

Code : utility code for converting cellset to a data table

Others

Google specialized search for Analysis Services and MDX web resources integrated in my blog

Art of reading MDX articles

MDX Expression Builder : Need for a tool making it easier for functional users to write MDX expressions, queries.

Blogroll