NextAnalytics and MDX : Part 1 - Swap Cells with Row Labels

NextAnalytics blog on "Can a business intelligence product be used to answer analytic questions?"  raised some valid questions on the complexity, the business users face while analyzing the business data. It did generate quite of few good responses on how MDX can provide the similar solution. I think both the approaches address different level of needs and both can co-exist.

I just started to look at their online demo site. Being in CPM (Corporate Performance Management) industry for about eight years, I can understand the difficulty the functional users face while venturing outside the realm of predefined reports and designing on their own.

I plan to write a multi-part articles on how NextAnalytics solutions can be replicated using MDX. I  would be using "Adventure Works DW" OLAP database to replicate the similar business case.

I saw a very interesting case in NextAnalytics where the result grid cells distinct values can be swapped with either rows or columns as labels. The row or column members appear in against the swapped cell's. This is a very important and useful feature for performing the basket analysis or outliers in a report.

The goal is not to prove who is superior or complex to use. Just the sheer pleasure of validating that MDX is equally capable of fulfilling similar requirement.

Let me illustrate as what is happening in NextAnalytics' "Swap Cells with Row Labels" Feature:

1. This is the start point. The sales data is presented in the cross tab format. (a limited portion of data is displayed, hence in screens, they might not match)

image

2. Now the variation of Sales data for each day and sales person (column-wise) is calculated and displayed. I guess, the formula would be somewhat at the below line:

Variation = (Sales Amount - Average) / (Standard Deviation for the day)

image

3. Using "Swap cells to Row Labels", the unique values of variation is shifted to rows and the row members are loaded in the corresponding standard deviation cells.

image

I would say that the above feature is simply awesome. I can visually say who are the "Outstanding" sales reps (Standard Deviation of more than 3) and who are the laggards (in above screen, standard deviation of -1) and whether they are  consistent in their performance.

Challenge to replicate the same requirement using MDX:

1. Lets get some sample cross tab data

WITH
SET Emp AS
{
[Employee].[Employee].&[290]
,[Employee].[Employee].&[289]
,[Employee].[Employee].&[284]
,[Employee].[Employee].&[291]
,[Employee].[Employee].&[283]
,[Employee].[Employee].&[288]
,[Employee].[Employee].&[282]
,[Employee].[Employee].&[296]
,[Employee].[Employee].&[281]
,[Employee].[Employee].&[286]
,[Employee].[Employee].&[295]
,[Employee].[Employee].&[292]
,[Employee].[Employee].&[287]
,[Employee].[Employee].&[272]
,[Employee].[Employee].&[294]
,[Employee].[Employee].&[293]
,[Employee].[Employee].&[285]
}
SELECT
NON EMPTY
(EXISTING
{
[Date].[Date].&[915]
,[Date].[Date].&[946]
,[Date].[Date].&[975]
,[Date].[Date].&[1006]
,[Date].[Date].&[1036]
,[Date].[Date].&[1067]
}
*
[Reseller Sales Amount]
) ON COLUMNS
,NonEmpty(emp) ON ROWS
FROM [Adventure Works]
WHERE
[Date].[Calendar Year].&[2004]

image

2. Now, lets calculate the variation. I would convert the variation amounts to the range basket so that we would have few but distinct variation baskets in the cell.

WITH
SET Emp AS
{
[Employee].[Employee].&[290]
,[Employee].[Employee].&[289]
,[Employee].[Employee].&[284]
,[Employee].[Employee].&[291]
,[Employee].[Employee].&[283]
,[Employee].[Employee].&[288]
,[Employee].[Employee].&[282]
,[Employee].[Employee].&[296]
,[Employee].[Employee].&[281]
,[Employee].[Employee].&[286]
,[Employee].[Employee].&[295]
,[Employee].[Employee].&[292]
,[Employee].[Employee].&[287]
,[Employee].[Employee].&[272]
,[Employee].[Employee].&[294]
,[Employee].[Employee].&[293]
,[Employee].[Employee].&[285]
}
MEMBER sales_avg AS
Avg
(
[Date].[Date].CurrentMember * [Emp]
,[Reseller Sales Amount]
)
MEMBER sales_StDev AS
StDev
(
[Date].[Date].CurrentMember * [Emp]
,[Reseller Sales Amount]
)
MEMBER Sales_Variation AS
([Reseller Sales Amount] - sales_avg) / sales_StDev
,format_string = "currency"
MEMBER Sales_Variation_Basket AS
CASE
WHEN
Sales_Variation > 3
THEN
3
WHEN
Sales_Variation > 2
THEN
2
WHEN
Sales_Variation > 1
THEN
1
WHEN
Sales_Variation > 0
THEN 0
WHEN
Sales_Variation > -1
THEN -1
WHEN
Sales_Variation > -2
THEN -2
WHEN
Sales_Variation > -3
THEN -3
END
SELECT
NonEmpty
(
(EXISTING
{
[Date].[Date].&[915]
,[Date].[Date].&[946]
,[Date].[Date].&[975]
,[Date].[Date].&[1006]
,[Date].[Date].&[1036]
,[Date].[Date].&[1067]
}
*
Sales_Variation_Basket
)
,NonEmpty(emp)
) ON COLUMNS
,NonEmpty(emp) ON ROWS
FROM [Adventure Works]
WHERE
[Date].[Calendar Year].&[2004]

image

3. Now, we need to do "Cells to Row Labels" transformation on the above MDX. What i mean is to retain the dats on the column, put 3,2,1,0,-1,-2,-3 on rows and fill in the corresponding employees in the cells.

The result should be somewhat as below grid ( I have filled couple of cells manually for illustration).

image

The MDX in item #1 and item#2 is for illustration of concept and not necessary the accuracy of calculation. The trick needed is to use the item#2 MDX and generate the last item#3 output.

Is anybody up for the challenge to write the third part of the MDX query?

Mosha provided the first cut of the query as below:

WITH
  SET Emp AS
    {
      [Employee].[Employee].&[290]
     ,[Employee].[Employee].&[289]
     ,[Employee].[Employee].&[284]
     ,[Employee].[Employee].&[291]
     ,[Employee].[Employee].&[283]
     ,[Employee].[Employee].&[288]
     ,[Employee].[Employee].&[282]
     ,[Employee].[Employee].&[296]
     ,[Employee].[Employee].&[281]
     ,[Employee].[Employee].&[286]
     ,[Employee].[Employee].&[295]
     ,[Employee].[Employee].&[292]
     ,[Employee].[Employee].&[287]
     ,[Employee].[Employee].&[272]
     ,[Employee].[Employee].&[294]
     ,[Employee].[Employee].&[293]
     ,[Employee].[Employee].&[285]
    }
  MEMBER sales_avg AS
    Avg
    (
      [Date].[Date].CurrentMember * [Emp]
     ,[Reseller Sales Amount]
    )
  MEMBER sales_StDev AS
    StDev
    (
      [Date].[Date].CurrentMember * [Emp]
     ,[Reseller Sales Amount]
    )
  MEMBER Sales_Variation AS
    ([Reseller Sales Amount] - sales_avg) / sales_StDev
   ,format_string = "currency"
  MEMBER Sales_Variation_Basket AS
    CASE
      WHEN
        Sales_Variation > 3
      THEN 3
      WHEN
        Sales_Variation > 2
      THEN 2
      WHEN
        Sales_Variation > 1
      THEN 1
      WHEN
        Sales_Variation > 0
      THEN 0
      WHEN
        Sales_Variation > -1
      THEN
        -1
      WHEN
        Sales_Variation > -2
      THEN
        -2
      WHEN
        Sales_Variation > -3
      THEN
        -3
    END
  MEMBER Measures.[-2] AS
    Generate
    (
      Filter
      (
        Emp
       ,
        Sales_Variation_Basket = -2
      )
     ,
      Employee.Employee.CurrentMember.Name + ","
    )
  MEMBER Measures.[-1] AS
    Generate
    (
      Filter
      (
        Emp
       ,
        Sales_Variation_Basket = -1
      )
     ,
      Employee.Employee.CurrentMember.Name + ","
    )
  MEMBER Measures.[0] AS
    Generate
    (
      Filter
      (
        Emp
       ,
        Sales_Variation_Basket = 0
      )
     ,
      Employee.Employee.CurrentMember.Name + ","
    )
  MEMBER Measures.[1] AS
    Generate
    (
      Filter
      (
        Emp
       ,
        Sales_Variation_Basket = 1
      )
     ,
      Employee.Employee.CurrentMember.Name + ","
    )
  MEMBER Measures.[2] AS
    Generate
    (
      Filter
      (
        Emp
       ,
        Sales_Variation_Basket = 2
      )
     ,
      Employee.Employee.CurrentMember.Name + ","
    )
SELECT
  {
    [Date].[Date].&[915]
   ,[Date].[Date].&[946]
   ,[Date].[Date].&[975]
   ,[Date].[Date].&[1006]
   ,[Date].[Date].&[1036]
   ,[Date].[Date].&[1067]
  } ON COLUMNS
,{
    Measures.[-2]
   ,Measures.[-1]
   ,Measures.[0]
   ,Measures.[1]
   ,Measures.[2]
  } ON ROWS
FROM [Adventure Works]

And the output is as desired (partial screenshot):

image

While the above meets the requirement, there is one significant limitation that we need to overcome.

In the solution, the distinct values of "Sales_Variation_Basket" are manually defined as Measures.[3].....Measures.[-3].


We need to automate it. I mean, using MDX, to read the set of "Sales_Variation_Basket" values, read the distinct values out of it and sort it, and then put the distinct values of "Sales_Variation_Basket" on rows and then put the employees in the result cells.

I am sure we would have the solution soon.

2 comments:

Anonymous said...

"The goal is not to prove who is superior or complex to use. Just the sheer pleasure of validating that MDX is equally capable of fulfilling similar requirement."

Thanks Ajit for spending so much time and effort to demonstrate that nextanalytics makes life much easier for the business analyst (at least in this case) :) Surely the whole point of both MDX and nextanalytics is to enable human beings to perform analytics on a set of data. I believe that most human beings understand a series of operations performed sequentially (as in a nextanalytics script) better than the multi-dimensional query paradigm (as in MDX).

Polaro said...

Dear Ajit,

Thanks for the entry, very interesting. I am trying to accomplish something similar although there are a couple of differences. First, once the calc (weighted average i my case) is defined on the cell level, I will define a basket value 1-10, as in your case. Then I want to create a virtual dimension which when placed on rows will have the available basket values - 1-10 and corresponding aggregated cells - count/sum/avg of the cell occurrences with the same value as the basket. The only complication is that instead of using a hard coded set - emp in your case - I would like to get correct aggregations irregardless of the dimensions selected. So the bucketing happens on the cell level. Possible or not? Thanks a lot.

My Articles

Design

Cube structure optimization for MDX query performance in Analysis Services 2005 SP2: Tips for Parent Child Hierarchies usage

Fact table design for “State Workflow Analysis”: Analysis Services Dimensional modeling

Handling inter-dimensional members dependency and reducing cube sparsity using reference dimensions in Analysis Services 2005 SP2 : Cube design tip

Identifying intra-dimensional members relationship and reducing cube sparsity in Analysis Services 2005 SP2 : Cube design tip

Leaves() : An example to understand it for both regular hierarchies as well as parent child hierarchies

Aggregation design: useful tips

Level based attribute hierarchy: MDX query performance woes in SQL Server 2005 SP2: Is it fixed in post SP2 hotfix?

Parent child hierarchy to level base hierarchy conversion: hiding placeholder dimension members in client application

Trouble / Troubleshooting

Aggregate(), Sum() functions using calculated members does not work in Analysis Services 2005 SP2 (9.00.3042.00 version) but works in Analysis Services 2000 SP4

Analysis Services 2005 migration tool: Custom member formula issues in migrated database

Cube Partitions: Fact table not listing in Business Intelligence Development Studio in partition wizard

Analysis Services 2005: Many-to-Many relationship does not support unary operators with parent-child dimension

MDX

NextAnalytics and MDX : Part 1 - Swap Cells with Row Labels

Selecting dimension's default member based on a member property

Sorting members on member codes / member properties

Time Dimension: How to set Default Member to Current Month

Setting dynamic default member in dimension X based on the current member of dimensions Y

ADOMD.NET

Code : utility code for converting cellset to a data table

Others

Google specialized search for Analysis Services and MDX web resources integrated in my blog

Art of reading MDX articles

MDX Expression Builder : Need for a tool making it easier for functional users to write MDX expressions, queries.

Blogroll