All of the topics discussed here in this blog comes from my real life encounters. They serve as references for future research. All of the data, contents and information presented in my entries have been altered and edited to protect the confidentiality and privacy of the clients.

Various scenarios of designing RPD and data modeling

Find the easiest and most straightforward way of designing RPD and data models that are dynamic and robust

Countless examples of dashboard and report design cases

Making the dashboard truly interactive

The concept of Business Intelligence

The most important concept ever need to understand to implement any successful OBIEE projects

Making it easy for beginners and business users

The perfect place for beginners to learn and get educated with Oracle Business Intelligence

Tuesday, August 21, 2012

OBIEE Project Case Study 1 ---- Dealing With Performance Issue

Hello there

I once wrote an entry to discuss performance tuning in OBIEE, that was a high level discussion on how to deal with performance issues in OBIEE. In fact, slow performance can be caused by many different reasons, which takes effort to diagnose. It is important to not jump the gun whenever you run into this type of issues by purposing indexes or 'creating aggr tables' or even 'caching'. Every problem is caused by a specific reason, a lot of it can be reflected by the query it generates while others can't. So today, I will discuss something that I run into on the project.

Remember I once wrote an entry about how I implemented some filtering logic in a quite complex way? While that was working great, it was later creating problems in performance. The performance was so bad that it would take 30 mins or more to return 50 records, and it would hang the entire system.  Now what are you going to do about that?

The reason for this sudden slow-down was because we later loaded more complete data sets, so we were dealing with more realistic data size than before when I first created the report. This suggests that the query was not efficient in handling larger tables even if they were properly indexed.

The real SQL query was quite long so I don't post it here. Yet, if you still remember from the previous design that I had to write some coding to get the filtering logic work:

If you look, you will see that there are quite a bit of coding in the expression. Based on the requirement, i was originally thinking to apply conditions before filtering values. In other words, whether the result should be within the date and hour range would depends on conditions like the current date vs user input date or against history. Nevertheless, having this type of filter would add a huge case statement in the where clause of the SQL that it generates.

Having case statements or 'OR' operator is always going to cause performance problems because it can't be properly indexed so it will always do full table scan. As an experiment, I removed the saved filter and just ran the report itself, it came back within 10 seconds. Therefore, something had to be done to avoid case statement in where clause while keeping the filter logic working. That became a headache for me for sometimes..

Luckily, there is always more than one ways to solve the same problem. The reason I had to use case statement was because I was treating the date and hours as separate columns. They were being filtered separately. Therefore, based on the requirement, I had to first create the filters on date columns to make sure it works, then create filters on hour columns. Yet the hour filters wouldn't always work the same way depending on the conditions from the date and history flag, therefore i was writing cases.

What if I can combine date and hours today and filter them once, then hours are no longer just numbers between 0 to 24, there will be date implications behind each 0 and 24. Then the problem should be solved..

I will skip the thought process and steps I took to get to the final stage as it would be very boring. I will just show you what the final filter looks like:

Here the condition is simple:

start date < TIMESTAMPADD(SQL_TSI_DAY, @{endhour}{24}/24, date '@{varConfigHistoryDate}')

Here on the right side of the equation, we are basically taking the user entered date value, which is passed through the presentation variables 'varConfigHistoryDate', then we add a couple of hours to it. As far as the actual number of hours added to the date will depend on the user entered hours. So let's say user select Aug 1 and end hour as 8 from the filter, and let's say the date column has a value Aug 1st 8:00:00 am, then this expression will become Aug 1st 4:00:00 pm, because it is adding 8/24 to the date, which is 8 hours (1/3 of the date) ahead of the start date. So in this case, Aug 1st 8:00:00 will satisfy the filter because it is less than Aug 1st 4:00:00 pm, and thus show up in the report as exactly what we want.

ifnull(end date, timestampadd(sql_tsi_day, 10000000, current_date)  => TIMESTAMPADD(SQL_TSI_DAY, @{starthour}{0}/24, date '@{varConfigHistoryDate}')

Here the same idea applies. Only that here we are evaluating based on the hour values passed through start hour prompt. So in this case, if users enter 8 in the start hour prompt and Aug 1st as the varConfigHistoryDate' value, then the right side of the equation will be Aug 1st hour plus 8/24. So again, if there is an transaction that states at Aug 1st 8:00:00am, then this filter condition will be Aug 1st 8:00:00am => Aug 1st 4:00:00am. This means if the end date timestamp is greater than or equal to anything that started in the past (starthour being added to the timestamp), then result should return.

On the left side of the equation, an 'ifnull' has to be employed to take care of situations where end date timestamp is blank. In this case, it will be added 1000000 indicating that this record will absolutely satisfy the filter conditions because it is always going to be way bigger into the future.

After testing the result, I confirmed that this query is much simpler and performance-friendlier than the case statements. More importantly, it didn't compromise any logic that was desired.

Thank you

Until next time

Saturday, August 18, 2012

Configurating outer joins using LTS mapping content

About outer join, although it is to be avoided if can be, it can't entirely. There are always needs for the query to return results from both tables even if one side doesn't always match the other side.. Therefore, OBIEE allows us to configure outer joins in BMM layer.

The most simple way of doing it is by changing the complex join setting from "inner" to other types from the drop down. There is another way of doing it. It is by defining the logical table source of the chosen table and map it to multiple physical tables by joining them. Think of it as creating a view that joins multiple physical tables in BMM Layer.

Now, the difference between doing it this way vs changing the complex join to outer join will covered a different time. So right now, I am just going to point out that, by doing it this way, the query will always include an outer join in the 'from' clause. Because what we are doing it by converting a physical table into a logical view that uses outer join across 2 tables..

This may not be desire that, what if I am running a query that only uses 1 column from table A? However, the query will still be fired against the result set of table A outer joins to table B? This can cause performance issues unnecessarily as indicated below:

The query generated when selecting LSP Name, History from LSP table will give the following SQL statement even if nothing from the other table is needed:

select distinct T103846.LSP_NAME as c1,
     T103846.HISTORY as c2
          IS_SMB_LSP_DIM T103846 /* A1_IS_SMB_LSP_DIM */  left outer join IS_SMB_NODE_DIM T106742 /* A2_IS_SMB_NODE_DIM */  On T103846.FROM_NODE_KEY = T106742.NODE_KEY) left outer join IS_SMB_NODE_DIM T106752 /* A3_IS_SMB_NODE_DIM */  On T103846.TO_NODE_KEY = T106752.NODE_KEY
order by c1, c2

So what's the way around that?

Well, to get around that, we need to break up the LTS into 2 different sources. We will keep the first LSP as it as by mapping it to join all of the physical tables as intended. From there, we create an additional LTS called 'LSP'. This LTS, we map all of the columns from the physical tables that it belongs without the columns from the other table. So in other words, all of the columns that comes from LSP table will be mapped accordingly, all of the tables from Node Interface Dimension will be unmapped. So this LTS will be used when users only select columns from LSP tables for their report.

The original LTS will be left as it is as it still outer joins to node interface dimension. However in there, we only map the columns that comes from Node interface dimension, but not the columns from LSP table. So it is the other way around from the 'LSP' LTS mappings:

So the idea is that, when users select only LSP columns into their report, 'LSP' LTS will be used in the query because LSP Columns are only mapped here. And in 'LSP' LTS, there is not outer joins being configured in the mapping. When users select node interface columns, then the original LTS will be used because node interface columns are only mapped here.

So now, let's go back to the answers and run the same report again and see what query it is generated:

As we can see, for the same report, the query generated this time doesn't have an outer join in the 'From' clause:

select distinct T103846.LSP_NAME as c1,
     T103846.HISTORY as c2
     IS_SMB_LSP_DIM T103846 /* A1_IS_SMB_LSP_DIM */ 
order by c1, c2

Now, when I want the outer join to be used, i simply add the columns from node interface dimension into my report and you will see that the query changes:

The query is as the following:

select distinct T103846.LSP_NAME as c1,
     T103846.HISTORY as c2,
     T106742.NODE_NAME as c3  --------- This is the new column from node interface dimension
          IS_SMB_LSP_DIM T103846 /* A1_IS_SMB_LSP_DIM */  left outer join IS_SMB_NODE_DIM T106742 /* A2_IS_SMB_NODE_DIM */  On T103846.FROM_NODE_KEY = T106742.NODE_KEY) left outer join IS_SMB_NODE_DIM T106752 /* A3_IS_SMB_NODE_DIM */  On T103846.TO_NODE_KEY = T106752.NODE_KEY
order by c1, c2, c3

Until next time

Thank you

Saturday, August 4, 2012

OBIEE Project Case Study 1 ---- Dealing with complex filtering requirement Part 2

So from one of my previous post where I discussed about a requirement I had for implementing a pretty complex filtering logic, I had realized that the logic I implemented there wasn't working 100%. I had discovered a few issues with that and therefore I am here to fix these issues.

Look at the following report as it is. After applying the filters, there are some issues.

Isssue 1: Some records shouldn't be there when the hour range is selected between accordingly on that date.In my case, it's between hour 0 to 2.

Issue 2: Those records that history is Y, they should only show up when the right hour range is selected. In this case, 3:05am should be considered after 3am. So if the end date is 3, then these records shouldn't show up.

Now the diagnosis is that, the filter expression for the hour fields should change.

The idea is that, when the date value passed by the presentation variable from the prompt matches the date value of start date, then we should display the start date hour as it is. That way, we are making sure that on the particular date that the users select, the start hour should be as it is. It will be filtered based on the hour that users select. If it's out of the range, then it should be filtered out. Then we need another condition, that if the end date is null, meaning that the record is still active, then these start hours should be the same as the presentation variables value, making sure that these records should always come through regardless what hourly range users select because they haven't been made history yet. So with this idea, I had to change the filter expression for the start hour to the following expression:

Case when start date = date '@presentationvariable' then hour(start date) else case when end date is null, then @hour else 1000 end end

For the end date expression, it is slightly more complex.

When the start date = the value from the prompt, then we want end date to be as it is except we convert 0 to 24 to satisfied the requirement in the previous post, then when end hour is null then end hour should be the same as whatever value passed from end hour prompt, making sure that it returns from the query regardless what actual values being selected. Last but not least, convert 0 to 24 for end hours.

Having implemented that, we can now retest the reports with the same filtering conditions and see the difference.

Thank you, 

Until next time.

PS: Later on, there was some performance issues discovered and I was forced to work more on this solution. I came up with my final solution to the problem and it was working great.

Please check the next post for the final solution

Related Posts Plugin for WordPress, Blogger...