2 views
in BI

I'm trying to calculate a business-logic in DAX which has turned out to be quite resource-heavy and complex. I have a very large PowerPivot model (call it "sales") with numerous dimensions and measures. A simplified view of the sales model:

+-------+--------+---------+------+---------+-------+

| State |  City  |  Store  | Week | Product | Sales |

+-------+--------+---------+------+---------+-------+

| NY    | NYC    | Charlie |    1 | A       | \$5    |

| MA    | Boston | Bravo   |    2 | B       | \$10   |

| -     | D.C.   | Delta   |    1 | A       | \$20   |

+-------+--------+---------+------+---------+-------+

Essentially what I'm trying to do is calculate a DISTINCTCOUNT of product by store and week:

SUMMARIZE(Sales,[Store],[Week],"Distinct Products",DISTINCTCOUNT([Product]))

+---------+------+-------------------+

|  Store  | Week | Distinct Products |

+---------+------+-------------------+

| Charlie |    1 |                15 |

| Charlie |    2 |                 7 |

| Charlie |    3 |                12 |

| Bravo   |    1 |                20 |

| Bravo   |    2 |                14 |

| Bravo   |    3 |                22 |

+---------+------+-------------------+

I then want to calculate the AVERAGE of these Distinct Products at the store level. The way I approached this was by taking the previous calculation, and running a SUMX on top of it and dividing it by distinct weeks:

SUMX(

SUMMARIZE(Sales,[Store],[Week],"Distinct Products",DISTINCTCOUNT([Product]))

,[Distinct Products]

) / DISTINCTCOUNT([Week])

+---------+------------------+

|  Store  | Average Products |

+---------+------------------+

| Charlie | 11.3             |

| Bravo   | 18.7             |

+---------+------------------+

I stored this calculation in a measure and it worked well when the dataset was smaller. But now the dataset is so huge that when I try to use the measure, it hangs until I have to cancel the process.

Is there a more efficient way to do this?

by (47.2k points)
• Since you want the distinct product count calculated independently for each store & for each week, then summed together by store, and then divided by the number of weeks by store,SUMX is appropriate in this case. There's no way around that.

• SUMX is an iterator, and is the likely cause of the slowdown. As we can't eliminate the SUMX entirely, the biggest factor here is the number of combinations of stores/weeks that you have.

• To confirm if the number of combinations of stores/weeks is the source of the slowdown, try filtering or removing 50% from a copy of your data model and see if that speeds things up. If that doesn't time out, add more back in to get a sense of how many combinations are the failing point.

• To make things faster with the full dataset:

• Before dragging on the measure, you may be able to filter to a subset of stores/weeks in your pivot table. This will get faster results than dragging on the measure first, then adding filters. This isn't really a change to your measure, but more of a behavior change for users of your model.

• You might want to consider grouping at a higher level than a week (e.g. month), to reduce the number of combinations it has to iterate over

• If you're running Excel 32-bit, or only have 4GB of RAM, consider 64-bit Excel and/or a more powerful machine (I doubt this is the case, but am including for comprehensiveness - Power Pivot can be a resource hog)

• If you can move your model to Power BI Desktop (I don't believe Calculated Tables are supported in Power Pivot), you could extract out the SUMMARIZE into a calculated table, and then re-write your measure to reference that calculated table instead. This reduces the number of calculations the measure has to perform at run-time, as all the combinations of store/week plus the distinct count of products will be pre-calculated (leaving only the summing & division for your measure to do - a lot less work).

Calculated Table =

SUMMARIZE (

Sales,

[Store],

[Week],

"Distinct Products", DISTINCTCOUNT ( Sales[Product] )

)

Note: The calculated table code above is rudimentary and is mostly designed as a proof of concept. If this is the path you take, you'll want to make sure you have a separate store dimension to join the calculated table to, as this won't join to the source table directly

Measure Using Calc Table =

SUMX (

'Calculated Table',

[Distinct Products] / DISTINCTCOUNT ( 'Calculated Table'[Week] )

)