Disclaimer

All of the topics discussed here in this blog comes from my real life encounters. They serve as references for future research. All of the data, contents and information presented in my entries have been altered and edited to protect the confidentiality and privacy of the clients.

Various scenarios of designing RPD and data modeling

Find the easiest and most straightforward way of designing RPD and data models that are dynamic and robust

Countless examples of dashboard and report design cases

Making the dashboard truly interactive

The concept of Business Intelligence

The most important concept ever need to understand to implement any successful OBIEE projects

Making it easy for beginners and business users

The perfect place for beginners to learn and get educated with Oracle Business Intelligence

Tuesday, January 22, 2013

OBIEE Lessons learned ---- How to design the data modeling and schema based on requirements

Hello

As you have worked in various OBIEE projects, you probably will have encountered different data models for different business. Oracle has implemented their own BI Apps which contains pre-built schema as well. Unless you started the project from ground zero, chances are your project will already have a list of tables created as you walk in to the project. Looking into all these dimensions and facts or cubes, you will probably be told what was going on before and how to move forward from that. However, little did you know the mind of the person who decided on this list of objects nor did you know his understanding of the requirement. You will probably use your technical backgrounds to build the application based on what's already there.

Some of the designs are really good and some of the designs have limitations. Nonetheless, they all come from trials and errors over and over, and the way they are is probably where they concluded to be the most suitable design at the time. So today, I want to just go over something I learned over the years in terms of how to decide the data model based on the understanding of the requirement. This involves a lot of considerations from a holistic perspective.

Look at the below diagram:



This is a very simple requirement. I want to create a report that shows me all of the entity As in my system, as well as the entity Bs and Cs that each A associates with at any point in time. I may also want to see the counts of Bs and Cs that each As have.

Simple enough right?

So we have the keys, we create a report that joins all of the these 3 entities. Great, job's done. However, the problem comes when you want to look at the report at a specific time. Why is that a problem?

Well, Lets say entity A doesn't change very often. It could be your local retail branch that once it's opened, it will stay active for a while. Or it could be geographical area or sales territory that won't change unless the business change. Entity B changes more often than A, it may change once a day. So it could be some kind of account status, or in network industry, it could be router or Pseudowire configuration data. And entity C may change hourly, this could be certain business transaction, market indicators or it could be the wireless signals that travels through the area. What if I want to see the information of all my As as well as how many Bs and Cs (in counts) that each As have on Jan 1st 2013 between 3pm to 4pm?

The problem comes that, since all of the As are active, they should  be expected to be shown on the list. At this particular hour, not all of the As are associated to any Bs and Cs, their counts should be displayed 0. However, in this particular report, it is not going to happen. Only the As that it's keys are found matching the keys in Bs will return at the particular time, same as for C counts. All of the sudden, users will be scratching their head while looking at this report wondering where are all the other As, because they know they have them in the system..

Let me show you another problem. What if Bs get updated once a day at 12am, then the time key from date dimension will only match when your hour filter is set at 12am for the report to return anything because the time key in your B dimension probably looks like this "201301010000" (for 2013 Jan 1st 12am) and then "201301020000". The only way to query this table without filtering out data is to join B dimension at date level by not looking at the hours. However, what if later on when the business needs to change the data collection frequency from once a day to 4 times a day?

There can be many other issues when it comes to time sensitive information. So knowing all these (trust me, sometimes you wouldn't see the problem unless it hit you, then it could be a big mess!), what should I do?

Seeing that there is a clear hierarchical relationship between these entities (or sometimes there might not be a straightforward hierarchy), we need to find a way to have super set of the data that are pre-computed with all of the entities.

This is where an aggregated fact table should be designed. Imagine there is a table, it is pertaining to entity A, so in this table, all of the As in the system will be preloaded based on it's update frequency. For each As, we will find the associated B keys. Then we have a few columns, like number of Bs, number Cs where the counts of Bs and Cs will be populated hourly. Ideally, you will have each A repeated 24 times per date, on each hours it will show the counts because hour is the granularity of the report. Now you have to be careful, because once you start denormalizing the table, it can result in huge data-sets. We want to really be careful about how many levels of hierarchy you want to report here. Depending on the nature of the business and the data from each entity, only maintain the dimension keys in this table as long as it doesn't expend the data set into too huge volume. 

So let's say the nature of the hierarchy is like the following:




In this case, if we put C keys and B keys both into the same aggregate fact table, we might create a huge data set, which we have to be careful about. One option will be breaking this own diagram into smaller areas for reporting.

You can have 1 table that have keys of As and Bs and another for Bs and Cs. In terms of designing the display of the data, you can always go from summary to detail or from high level to lower level. Kind of like the level 1 and level 2 report shown there. This will be a good reference for deciding how many of these pre-computed tables that needs to be create or how to design the granularity of your fact table, it also determines the update frequency of this table.

Oracle has another option for dealing with this kind of requirement, it is called materialized views. Implementing materialized views can provide many benefits, however, it also can come with some drawbacks. Therefore, it needs to be looked at case by case. The idea, though, is similar.

If you look at the BI Apps and many other out of the box schema from OBIEE product, you will see similar things being implemented. There could be many other reasons for that. Knowing the reason, knowing the pros and cons of the data modeling will eventually sets you apart from others who only takes what's been done and run with it.

Thanks, until next time.

Thursday, January 17, 2013

HOW TO BUILD A REPORT THAT ACCURATELY REFLECTS DATA CORRESPONDING TO DIFFERENT LOCAL TIMES FROM DIFFERENT TIME ZONE --PART 2

Previously, I have stated the issue when it comes to reporting based on different time zone hours, now continue on what the solution I have found.

As previously mentioned "the conversion is giving the correct hour number from 0 to 23. However, when the result exceeds 23 (in the case of +8 time zone when adding 16 (UTC hour) with 8), it becomes 24 and it keeps going on with 8 more numbers by increment of 1; when the result is less than 0 (In the case of -8 time zone when adding 0 (UTC hour) with -8), it becomes -8 and it keeps going on with 8 more numbers by increment of 1."

So I have to add some conditions to take care of both cases when the number exceeds 23 and when it is less than 0.

As an experiment, I created one more column named 'hour conversion' with the following formula:

case when (hour(Date."Hour Date") + @{Offset}) < 0 then hour(TIMESTAMPADD(SQL_TSI_Hour, @{Offset}, Date."Hour Date" )) when (hour(Date."Hour Date") + @{Offset}) > 23 then hour(TIMESTAMPADD(SQL_TSI_Hour, @{Offset}, Date."Hour Date" )) else (hour(Date."Hour Date") + @{Offset}) end

The idea is that, when the converted time stamp is less than 0 and greater than 23, we want to display the converted time stamps in hour number, else, just the time stamp itself as it is in terms of hour number:



Now after done that, I have to create some filters on the report. The filter is going to base on the content of the formula defined in this column.

So, create a set of filter on date and "hour conversion" column:


1. cast(Timestampadd(SQL_TSI_HOUR, @{(Offset}, Date, "Hour Date") as date) is between date from the prompt ---- Here, I first convert the time stamp with Offset value, then convert it to date. Then use this to filter against the date values that user pass through the prompt.

2. case when (hour(Date."Hour Date") + @{Offset}) < 0 then hour(TIMESTAMPADD(SQL_TSI_Hour, @{Offset}, Date."Hour Date" )) when (hour(Date."Hour Date") + @{Offset}) > 23 then hour(TIMESTAMPADD(SQL_TSI_Hour, @{Offset}, Date."Hour Date" )) else (hour(Date."Hour Date") + @{Offset}) end is between the hours from the prompt --------- Here, basically whatever value that's converted by the formula in "Hour Conversion" column that I created will be compared against the hour number that user passes through the prompt:



Now, let's test it out:

This time, the time zone is UTC + 8 (Offset = 8).

As you can see, Hour Conversion field is showing the hour numbers matching Time Zone Local Time, the Hour Day Date column on the far right is matching the date as well


It works just as well with UTC -8 Time Zone


Now, let's change the filter hour to 1 to 3 and test again:

It is correct:

Remember the incorrect report from part 1 of this series? Now, let's compare that report with the new one and see the difference:


So, now I can confidently say that my report is going to give you the correct data at your local time based on whatever hourly criteria you want.

Thanks

Until next time

Tuesday, January 8, 2013

How to build a report that accurately reflects data corresponding to different local times from different time zone --Part 1

Previously,  I have concluded my approach to implement the reports that will intelligently retrieve data corresponds to different time zone even if the displayed time stamp never changes. In other words, when any user changes its account setting to a different time zone, the report should automatically displays the data accordingly even if the values displayed on the dashboard prompt is still "Jan 01, 2001" for instance. The meaning of the report, the displayed time changes completely when the account setting changes. This wasn't easy but it was proven possible.

However, there was an issue with the report I created if you really test it deeply. Let me demonstrate here:


We have the following report that shows date, hour, hour number and the throughput. Each throughput has a corresponding time which is from the Date Dim table. It is a simple report and the time zone setting is default. So nothing fancy here.


Now, let's change the time zone to GMT + 8 Perth time and rerun the report:

As you can see below, the displayed time zone local time has changed to reflect Perth time while hour, hour date and hour number are still reflecting the default time. This issue has been mentioned previously. This is the behavior of OBIEE, it only changes the display of the time stamp based on locale time zone, it doesn't really do the actual time stamp conversion. Therefore, when you convert time stamp to hour number, it is converting the time stamp from "Hour Date" column, rather than "Hour Date Time Zone Local Time" column. Hence the confusion.



So let's change the hour filter to 10 to 12, the report comes back with correct data. How so?

Well, the user wants to know the throughput of 10am to 12pm on Jan 6th 2013 Perth time, and the report is showing these 3 throughput records of 2am to 4am, Jan 06 2013 UTC time. That's why it is good.



Now let's change the filter to between 1 to 3 and rerun it.

Then we are having problem. Because now, the user wants to see the throughput of 1am to 3am Jan 6th 2013 Perth Time, yet the report is displaying the data 5pm to 7pm Jan 6th 2013 UTC. This is wrong. It should've displayed the throughput of around 5pm to 7pm Jan 5th 2013 (the day before).



Now we know there is an issue, but how are we going about it?

Before doing anything, I need to understand how is the conversion taking place and how am I filtering the report. So I then created some dummy columns in the same report with different expression to understand what kind of behavior it is doing for this particular time zone (when Offset value = 8).

Hour date: This is nothing but hour date - 8. This will return the time stamp of the default UTC because the display of hour date is going to be plus 8.

Hour number: This column will give the hour number according to UTC time stamp

Conversation Hour Number: Here I am doing a time stamp plus 16 hours. This will give me the hour number in accordance with the Displayed Perth Time Stamp. It took me a while to figure out it is 8*2. If the offset is 4, then it should be 4*5 to get you the right hour number that matches the display.

Run the report and see what its doing:


So we see the problem with this column. Since the filter on date is Jan 6th 2013, we are getting all of the records of Jan 6th 2013 of UTC time. Therefore, the displayed local time in this case is missing 0 to 7 on the same date, and its adding 0 to 7 of the next date. This is expected behavior.

With the same logic, if I change the local to GMT -8, or Pacific Time in U.S.A, this behavior should occur the same way in reverse as shown:



My first approach is to find a way to convert this hour so it will match the displayed local time stamp both hourly and daily. Without thinking too much, I added "hour(Date, "Hour Date") +@{Offset}" to the expression and named the new column "Hour number with offset" and rerun the report for both +8 and -8 time zone:


Perth Time, +8 to UTC:

Pacific Time, -8 to UTC:



See what's going on? Looking at both cases, the conversion is giving the correct hour number from 0 to 23. However, when the result exceeds 23 (in the case of +8 time zone when adding 16 (UTC hour) with 8), it becomes 24 and it keeps going on with 8 more numbers by increment of 1; when the result is less than 0 (In the case of -8 time zone when adding 0 (UTC hour) with -8), it becomes -8 and it keeps going on with 8 more numbers by increment of 1.

I am close, but not quite there yet. I have realized that I need to do some programming and address the case of less than 0 and greater than 23. Luckily, I have found a good solution and it is working perfect. I will continue here.

Stay tune!
Related Posts Plugin for WordPress, Blogger...