Wednesday, November 3, 2010

PARTITION BY clause works incorrectly when in nested select over large volumes of data

I have been witnessing a very strange behaviour of ROW_NUMBER() PARTITION BY clause in complex join and nested select or CTE statements over large volumes of data with Max Degree of Parallelism <> 1. The ROW_NUMBER() or RANK() were incorrectly marking records within a partiton, and that behaviour was nondeterministic.

Let's say I have three tables girls, shoppingmall and times

girls and shoppingmall have from and to dates, and times table has a row for every date. shoppingmall has shopid which can appear in different mallid.

ex:


Now, I want to get first mallid where the girl was shopping for every day.

I join girls table with times by from and to. Then I create a nested select from shoppingmall also joined to times by from and to, which would use ROW_NUMBER () PARTITION BY in order to give me first mallid for a given date and given shop.


select * into #resultset from girls
left join times ON times.timeid >= girls.datefrom and times.timeid < girls.dateto
left join
(select mallid, shopid as shopidx, times.timeid as timeidx, ROW_NUMBER() OVER (PARTITION BY shopid, times.timeid ORDER BY ordering ASC) Lenin
FROM shoppingmall
INNER JOIN times ON times.timeid between shoppingmall.datefrom and shoppingmall.dateto
) AS trip ON girls.shopid = trip.shopidx AND times.timeid = trip.timeidx and Lenin = 1

Now, what I should get is only one row per date, per shop from shoppingmall

However, if I run this query on a very large dataset ROW_NUMBER() occasionally marks as 1 two rows for same shopid and timeid!

As soon as I add a very restrictive WHERE condition which limits the dataset, ex: WHERE girls.girlid = "Kathy", the query starts behaving correctly (whereas without strict WHERE and on the same record it was marking two rows PARTITIONED BY shopid and timeid as 1)

Also, if I fist insert result of inner select in the temporary table, and then use it in the join - results are correct.


I am running on SQL Server 2008 10.0.2531.0

I will rewrite all my inner selects to use temporary tables instead of nested selects and CTE, but still I do not like the unexpected and unexplainable behaviour of PARTITION BY. I guess that this behaviour is due to parallelism because when I changed the Max Degree of Paralellism to 1 this behaviour disappeared and PARTITION BY started working correctly?

Tuesday, October 12, 2010

MDX Performance - burdens of IIF

I used to think naively that IIF is an innocent little function that can only do good and help me make a calculated measure evn faster! Wrong! Wrong! Wrong!

I found this very helpful article by Mosha Pasumansky

http://sqlblog.com/blogs/mosha/archive/2007/01/29/performance-of-iif-function-in-mdx.aspx

where he urges MDX developers to avoid using IIF function and showing how bad it could be for the performance.

I have rewritten my calcuilated measures which widely used IIF functions and results in performance gain were very impressive: IIF worsened performance almost in geometric progression:





with IIFwithout IIF
Query 1 5 s2 s
Query 2 18 s8 s
Query 3 87 s25 s


Lesson learned: before writing a calculated member - try to think more about performance...

Thursday, October 7, 2010

SSRS axe work: How to use fields from different Datasets in the same field Expression and Grouping in Report Builder 2.0?

Last week I was puzzled by one of my colleagues - he needed help in creating a Report Builder 2.0 SSRS report based on several OLAP cubes utilizing measures from different cubes in the same field in the expression and grouped on the same logically common dimension - Period.

At first I though - impossible. You need a Linked cube for that! But then I got curious and decided to see if we could hash and slash through the report and make it work on scotch tape and pins.

This article helped me get an idea on how to do it:
http://blogs.msdn.com/b/robertbruckner/archive/2008/07/20/using-group-variables-in-reporting-services-2008-for-custom-aggregation.aspx

To achieve this we must make sure that all datasets from these different cubes return same number of rows and grouped by same logical dimension, e.g.: month-year or Period as it was in my case, and are sorted same way - so that we could line them up side-by-side and pretend they are all grouped on same dimension.

Using Custom Code in SSRS create a Collection and push measures from cube A into it. Next - use those values with cube B measures by accessing the Collection with RowNumber("Tablix") as an index

And that is all there is to it.

Now, in detail:

1. Create your first DataSet DataSet1 from cube A with Period at Rows and Actual amt measure, and Tablix Tablix1 with Period as Row Group and Actual amt as a total. Add sorting by Period





2. Create the second dataset - DataSet2 from another cube: cube B with Period at the Rows and Budget_amt measure
3. Make a copy of Tablix1 and paste it as anew tablix Tablix2 . Connect it to the dataset DataSet2
4. Right-click on the blue background outside of the report body, go to Properties->Code
Add following code into Custom Code window:

Dim denominators As System.Collections.Generic.List(Of Decimal) = New System.Collections.Generic.List(Of Decimal)

Function AddDenominator(ByVal newVal As Decimal)
denominators.Add(newVal)
Return newVal
End Function

Function GetDenominator(ByVal row As Integer) As Decimal
Return denominators(row)
End Function

5. Add following expression to the data field of the Tablix2: =Code.AddDenominator(Fields!Budget_amt.Value)




6. Delete the first column in Tablix2 but leave the associated group:





7. glue Tablix2 to Tablix1 by dragging it to the right side of Tablix1



Now, make sure you see and align by these blue lines - both tablixes have to be aligned with these lines.

8. Make a copy of Tablix2 and paste it - you have Tablix3 now. Attach it to the right of Tablix2



9. Go to properties of Tablix3 and change the biding to DataSet1
10. Change Expression in the data field to =Fields!Actual_amt.Value + Code.GetDenominator(RowNumber("Tablix3") - 1)
11. Change the title of the Tablix3 column to Result
12. Change the title of the Tablix2 column to Budget






And here is the result - you have a Result column which uses totals from two different cubes in the expression

Limitations: the column that pushes the totals from cue=be into the stack cannot be set to invisible but it could be as thin as you can get. Just squish it between Actual amt and result if you don't want to show it. Also, the order is important - pushing the value into the stack has to be before getting it from there, i.e. Budget column cannot be placed after Result column.

Thursday, September 30, 2010

ShellRunas

ShellRunas - http://technet.microsoft.com/en-us/sysinternals/cc300361.aspx

Discovered a great tool that is very helpful for testing security. It lets you open any program under a different user account just like "Run as... " in Windows.

Ex: I want to test if my security works correctly in Excel for users in specific roles - I would copy my Shellrunas.exe into C:\Program Files (x86)\Microsoft Office\Office12

Then go to command prompt and run

cd C:\Program Files (x86)\Microsoft Office\Office12
C:\Program Files (x86)\Microsoft Office\Office12>shellrunas excel.exe

This tool has helped me a lot, because the native Windows "Run as..." functionality that could be achieved by holding Shift down and then right click didn't work on my Windows Vista for Excel and "Run as..." is not available for SSMS and for IE as well.

This tool helps work around this limitation:


(c) http://technet.microsoft.com/en-us/sysinternals/cc300361.aspx

Monday, August 30, 2010

SQL or MDX query trace in SSRS - trick of imagination or a real thing?

I sometimes have ghost memories - I remember some things that had never in fact happened. And I was wondering if my remembering that Reporting Services 2000 had an option in its configuration file that enabled query logging in Reporting Services log files (LogFiles folder) along with the name of the user who executed the statement. I cannot find anything on Google nowadays. It seems that I have lost my ability to correctly state the question to Google to get my answer.

Anyway, I was wondering if that option was ever available in SQL Server Reporting Services 2000?

I tried to see if query logging for each executed report is available in SQL Server 2005, but did not find anything too. There is of course SQL Profiler, but I was looking for something that would put together username, report ID, execution time AND the MDX or SQL query that was executed with the report.

I guess you could do it manually by comparing execution log table in SSRS with SQL Profiler trace data - but if you have a lot concurrent users running the same report - that could become an unpleasant task.

I tried to see if SSRS Web.config options for Trace could help me

<RStrace>

<add name="FileName" value="ReportServer_" />

<add name="FileSizeLimitMb" value="32" />

<add name="KeepFilesForDays" value="14" />

<add name="Prefix" value="tid, time" />

<add name="TraceListeners" value="debugwindow, file" />

<add name="TraceFileMode" value="unique" />

<add name="Components" value="all,RunningJobs:3,SemanticQueryEngine:4,SemanticModelGenerator:2" />

But it seems that query tracing works only for reports created in Report Builder, and not the old style reports done in Visual Studio.

I am still wondering - was query tracing ever available in SSRS?


Tuesday, July 20, 2010

SSRS Subscriptions failure - should we always listen to our DBA's?

During the configuration of one of the SSRS instances I have stumbled upon a problem with SSRS Subscription. Subscriptions were scheduled but were never executed. "Last run ..." stayed unchanged showing that a subscription execution was never even attempted.

I searched everywhere - at the computer clock, at the SSRS log files, and Event logs but could not find the problem. Googling for the answer didn't help too - since there was no error message by which I could narrow down my search...

Looking desperately everywhere, I peeped in the SQL Services Job Activity Monitor and there I noticed that some jobs with ugly uniqueid names were failing. And to my astonishment the jobs that were failing were the SSRS jobs for the subscriptions!

The problem was - SSRS instance did not like the name of the SSRS database, because it started with the digit, ex: 123_ReportServer. When I tried to execute the SSRS job manually I got the error "Incorrect syntax near '123'" - and when I placed [] around the 123_ReportServer - the job successfully worked and subscription was executed. It took a complete reinstall of the SSRS instance with the new name for the SSRS database for the issue to be resolved.

By the way, you cannot just rename the SSRS database and expect your SSRS instance to work with the new database name. When the ReportServer DB name changes - the complete reinstall is required for SSRS instance to work correctly.

Lesson Learned: never name the database starting with the digit or symbol, even if that was a standard enforced by your local DBA team.