Cube structure optimization for MDX query performance in Analysis Services 2005 SP2: Tips for Parent Child Hierarchies usage

Note: The below findings have been found on the PC hierarchies. The same may or may not be applicable to level based hierarchies since I have not tested them. I need to convert the same PC hierarchies to level based ones and then retest to find out if the similar issues exists with them too.

The bottom line of this blog, Analysis Servies 2005 does not handle rollup operators and custom members on Parent Child hierarchies properly leading to significant MDX query performance bottlenecks.

There are ample amount of material available dealing with cube optimization to improve MDX query performance. The problem is that most of the materials almost give equal weightings to all the tips and we are not able to extract which tip would provide the biggest bang related to query performance.

If you deal with parent child hierarchies extensively in your cube design, many of the query optimization tips don't even apply since parent child dimensions are specific case.

Here, based on my personal experience, I have listed down the factors in the decreasing order:

1. Parent child dimension, rollup operators and custom members on dimensions are the most lethal combinations to kill the query performance (or kill the cube itself, no MDX queries run).

If there are multiple Parent Child hierarchies on which for each hierarchy, we enable rollup operator and custom members, then even for very small cube (5-6 PC hierarchy few hundred members and fact table of 50k rows), the cube just hangs and MDX query don't run. I have validated this in my previous blog. So, disable rollup operator and custom members as much as possible and find alternate ways of dealing with it.

By observing the profiler, millions of lines are generated with frequent occurrence of "get data from calculation cache" or query dimension and I suspect, there is some issue in calculation algorithm in Analysis Services where Parent Child hierarchies with rollup operator and custom members are involved. When I disable all of the rollup operator and custom members, the queries run in a couple of seconds. Even with I retain the custom members and rollup operator on only one PC hierarchy and disable on rest of the of the PC hierarchies, the query performance is good.

2. In MDX query, provide the default members of the parent child hierarchies which are not used in the query.

Per best practices, we would enable "isAggregatable" property = Yes for all the attributes. In the MDX query, if these hierarchies are not needed in the query result, we would not provide any values for other PC hierarchies since they are supposed to provide values for "All" members.

However, I found that if the other PC hierarchies contain lots of calculated members and if we don't provide the default members for these unused PC hierarchis in the MDX query, the Analysis Services seems to calculate the values of the other custom members even if they are not needed and the query runs longer. Providing the default value for other PC hierarchies significantly hastens the query time.

E.g., an MDX query, which ran just under 4 seconds by providing all the default members on "where" clause, ran eternally which I needed to cancel the query after 5 minutes and by that time, it had taken all of available RAM (2gb) and the CPU processing when I did not specify the default members for other PC hierarchies.

3. Rollup operators seems to degrade query performance much more than the custom members on PC hierarchies.

It was strange to find about it. I always though rollup operators must be faster than the custom members for the similar calculations.

In an cube, I disabled rollup operators and enabled custom members on PC hierarchies. The MDX query ran in under 4 seconds. Now, I disabled custom members and enabled rollup operators on the same PC hierarchies. Now the same MDX query, even after 30 min did not run and I needed to cancel the query. This I experienced multiple times even after I started the Analysis Services.

4. Conversion of PC hierarchy to level based hierarchy does not necessarily mean better query performance.

It was again a surprise. Many would suggest that level based hierarchies are better than PC hierarchies, but that advice needs to be taken with a pinch of salt. Level based hierarchies are certainly better than PC hierarchies. However, certain hierarchies are better represented in a PC fashion only e.g. a big employee organization hierarchy or financial account containing Income Statement or Balance sheet etc. The members in these hierarchies do not reside in any specific levels. In fact, the hierarchy continuously undergoes rapid changes whenever newer members are attached or existing members are deleted or moved under some other parents. When this happens, the number of levels changes dramatically and many member shift from one level to another. These hierarchies have many null members in many levels since the members at the leaf level may end in any level not necessarily at the lowest level.

Hierarchies like above when converted to level based one and then used in the cube affect the query time drastically. Under PC hierarchy, the MDX which took about 9 seconds to execute, after converting to level based hierarchy took about 1 min 33 seconds to execute! This effect was validated multiple times.

PC hierarchies where levels can be identified easily or all the leaf level members have proper parents in the above levels would improve query performance when converted to level based ones.

5. Maintain "Calculated Members" and delete unwanted ones

It has been observed that mere presence of calculated members on dimensions affect the query performance even if they are not referred in the MDX query context. We tend to create a lot of junk calculated members are they just hang out in hierarchy without any use. These kind of members needs to be regularly removed from hierarchy.

6. Watch out for slow running MDX functions while creating calculated members

Functions like StrToMember are inherently slow. In words of Mosha, ".....I really dislike StrToMember function for many reasons. For the havoc it wrecks in the query optimizer, for the unpredictable caching guarantees, for the very dynamic binding by means of reparsing its input..."

Slow running functions like this drastically amplify their inefficiencies if they get evaluated in the context of each and every cell of resultant cellset, e.g. if calculated measure use these slow running functions, they would get evaluated for each and every cell if calculated measure is on page level.

Chris Webb on one of his blog explains how to handle StrToMember properly and avoid its evaluation for multiple cells.

7. Take advantage of "Disable Prefetch Facts=True; cache ratio=1" connection properties

Parent child hierarchies don't store the aggregation for intermediate levels. When we execute the MDX query involving the parent child dimension, the analysis services' formula engine uses a number of heuristics when formulating subcube requests from the storage engine to attempt to optimize overall performance. These include fetching data in one subcube request into cache for use by subsequent queries and the ordering of subcube requests for an individual query to optimize the use of cache (by retrieving as much information as necessary to answer a query by using as few noncached subcube query requests as possible). This may result in excessive unnecessary subcube calculations. Per my experience, when the parent child dimensions are involved, using this connection string property has provided minor to many-many multiple time query time benefits.

8. Use effective partitioning strategy

There are many resources available to understand it.

9. Pre-calculate the formulas outside cube to avoid calculations inside the cube

The key requirement is to avoid simple calculations inside cube as much as possible so that cube can be freed up to do complex calculations.

Below are the two prime candidates which can be calculated outside the cube:

1. In the fact table itself as measure, e.g. using calculated column.

2. The calculated members along the dimension, if their values are resolved at leaf level and then they need to rollup like any other member, then it is prudent to calculate their formula values outside and load it the fact table.

(The article is work in progress & would be regularly updated. If you have any tips, please share with me to post it here.)

7 comments:

Anonymous said...: Pingback. Link to this post was added on our website in [Articles]/[Parent-Child] section:
http://www.ssas-info.com/analysis-services-articles/52-parent-child/868-cube-structure-optimization-for-mdx-query-performance-in-analysis-services-2005-sp2-tips-for-parent-child-hierarchies-usage; 3:22 AM
Anonymous said...: Thanks for this post, it helped me a lot !
(cf http://forums.microsoft.com/msdn/ShowPost.aspx?PostID=3503600&SiteID=1); 8:57 PM
Satish said...: Hi,

I am a newbie in PPS server.

I have a query regarding Cubes.

I have entered data into the assignments using PPS Excel add-in (Report Matrix).

The Matrix data is stored to Planning server when you refresh the System data.

I fired a MDX Query in Analysis service DB, but the result were NULL values for that Cube.

Is this Report Matrix data saved in cubes? Is there any more processes between saving assignments and retrieving data in cubes?

Thanks,
Satish Jadhav
satleo9@gmail.com; 6:10 PM
Anonymous said...: Thanks for discussing some of the complexities.

One tip I would offer is that the Parent-Child Dimension Naturalizer is a quick way to create a level-based hierarchy out of a parent-child hierarchy.
http://www.codeplex.com/bidshelper/Wiki/View.aspx?title=Parent-Child%20Dimension%20Naturalizer; 7:20 PM
Anonymous said...: Can anyone recommend the robust RMM utility for a small IT service company like mine? Does anyone use Kaseya.com or GFI.com? How do they compare to these guys I found recently: [url=http://www.n-able.com] N-able N-central inventory management
[/url] ? What is your best take in cost vs performance among those three? I need a good advice please... Thanks in advance!; 1:04 PM
Pravin said...: Hi I am trying to fetch all the calculated fields on the cube. How can I do that?; 12:02 AM
Sam Kane said...: Here are this and some other articles on Parent-Child Dimension:

http://ssas-wiki.com/w/Articles#Parent-Child_Dimension; 1:20 AM

Microsoft Analysis Services & MDX - blogs by Ajit Singh