Disclaimer

All of the topics discussed here in this blog comes from my real life encounters. They serve as references for future research. All of the data, contents and information presented in my entries have been altered and edited to protect the confidentiality and privacy of the clients.

Various scenarios of designing RPD and data modeling

Find the easiest and most straightforward way of designing RPD and data models that are dynamic and robust

Countless examples of dashboard and report design cases

Making the dashboard truly interactive

The concept of Business Intelligence

The most important concept ever need to understand to implement any successful OBIEE projects

Making it easy for beginners and business users

The perfect place for beginners to learn and get educated with Oracle Business Intelligence

Tuesday, November 27, 2012

OBIEE Case study --- How to ensure reporting the right data that is time sensitive part 3

The second approach that I talked about earlier was about adding the history records into some of the fact tables that are updating at daily level. That means the below model will have to change




To this:


From OBIEE side, we remove the join between Date and LSP PWE Config fact, and snowflake the model in BMM Layer. From the reporting side, we will have 2 filters:

1. Date = date prompt value and hour = hour prompt value (These values will come from presentation variables) or Date is between date prompt values

2. LSP PWE Config start date < = date prompt value and end date > = date prompt value. This ensures only the 'current' record from this table will be returned at any chosen date in the past or present.

One thing to keep in mind is that, when it comes to reporting the historical trend or plotting the performance overtime, the second filter will have to change, here is why:

When user are selecting a range of days, such as date is between Nov 1st to Nov 20th and let's say we do have data in the Pseudowire fact table from these days, we are going to display the results correctly. Lets say all of the records in LSP PWE Config fact and other dimension tables also start on Nov 1st and no end date,  the report is working fine. However, when user select from Oct 31st to Nov 20th, they would naturally expect the same result even if nothing occurs on Oct 31th. However, the report will go blank because the second filter won't work here. On Oct 31th, there is no record is nothing was ''Current" at that time.

In order to fix it, we will then have to change the second filter to go between the day range. Which means:

2. LSP PWE Config start date >= date prompt value and end date <= date prompt value. This ensures only the 'current' record from this table will be returned at any chosen date in the past or present.

Thanks

Until next time

Friday, November 23, 2012

Invalid state Identifier Error Code: GV3MWW29 on OBIEE Dashboard

Have you ever seen this error message before?



Below is what Oracle says about this error and how to fix it, the original document can be found in Oracle support metalink with the support ID: 1087513.1



Applies to:
Business Intelligence Server Enterprise Edition - Version: 10.1.3.4.0 [1900] to 11.1.1.5.0 [1308] - Release: 10g to 11g
Information in this document applies to any platform.

Symptoms
An Oracle Business Intelligence Enterprise Edition (OBIEE) 10g or 11g environment includes Microsoft IIS (version 6) and Single Sign On (SSO) with Windows Active Directory (AD).
In this environment, users login to their Windows PC via AD and the authentication details are passed directly to OBIEE so that the OBIEE login page is not displayed.
Additionally, there are no session timeouts and it is possible for users to remain logged in to their system for long periods of time (i.e. 20 hours).
If a user leaves the sessions in Presentation Services unused for a number of hours and then attempts a dashboard action, the following error occurs:

Invalid state identifier 4i2aukgvatfu0k2ifkgmt5sj3m
Error Details
Error Codes: GV3MWW29

If the browser is refreshed, or a new dashboard page is opened, the error is removed.

When this problem occurred, the saw.log showed the following errors:

Type: Error
Severity: 42
Time: Fri Jan 08 11:20:56 2010
File: project/webbedrock/binaryfile.cpp Line: 355
Properties: ThreadID-3728;HttpCommand-Dashboard;Proxy-;RemoteIP-127.0.0.1;User-;Impersonator-
Location:
saw.xmlDocumentCache.findDocument
saw.views.evc
saw.subsystem.portal
saw.httpserver.request
saw.rpc.server.responder
saw.rpc.server
saw.rpc.server.handleConnection
saw.rpc.server.dispatch
saw.threadPool
saw.threads

Path not found: C:\Oracle\OracleBIData\tmp\sawvc\10a3\4i2aukgvatfu0k2ifkgmt5sj3m.sawxd
The file specified cannot be found

---------------------------------------
Type: Error
Severity: 42
Time: Fri Jan 08 11:20:56 2010
File: project/webreport/xmlstatepool.cpp Line: 97
Properties: ThreadID-3728;HttpCommand-Dashboard;Proxy-;RemoteIP-127.0.0.1;User-;Impersonator-
Location:
saw.views.evc
saw.subsystem.portal
saw.httpserver.request
saw.rpc.server.responder
saw.rpc.server
saw.rpc.server.handleConnection
saw.rpc.server.dispatch
saw.threadPool
saw.threads

Invalid state identifier 4i2aukgvatfu0k2ifkgmt5sj3m

Cause
Invalid state error occurred due to inconsistency between OBIEE server tmp files and SSO environment.
When a user connects to a dashboard, OBIEE creates TEMP files to maintain the state information of the dashboard. These files are stored in the ${OracleBIData}/web/tmp directory.

OBIEE also has a number of internal functions/jobs that clear up what it determines are stale sessions and TEMP files when the server is not busy.

In this situation, the internal job performing the cleanup removed the TEMP files as it determined the sessions were stale as they had been left for so long.
The TMP file being removed is evidenced in the saw.log:

Path not found: C:\Oracle\OracleBIData\tmp\sawvc\10a3\4i2aukgvatfu0k2ifkgmt5sj3m.sawxd
The file specified cannot be found
....
Invalid state identifier 4i2aukgvatfu0k2ifkgmt5sj3m
Solution
Please do the following to prevent this error from occurring:

Set the following parameters in the ${OracleBIData}\web\config\instanceconfig.xml file:

1440
1440
The above parameters (1440 minutes) will prevent the .sawxd files from being removed after 24 hours of idle times.
You may set the values higher (i.e. greater than 24 hours) if needed, but this is not recommended.
Setting the memory expire and disk cleanup interval to high values increases the number of sessions on the server.  If a significant number of idle sessions linger on the server, it will increase resource utilization and may lead to performance problems.

References
BUG:7244734 - INTERMITTENT INVALID STATE IDENTIFIER ERRORS ON DASHBOARD
BUG:9374127 - ERROR IN BROWSER - ERROR CODES: GV3MWW29
NOTE:563079.1 - Analytics Logout on integrated Env with SSO

Sunday, November 18, 2012

OBIEE Case study --- How to ensure reporting the right data that is time sensitive part 2

In the previous post, I have stated the scenario and the things we have to look at before making any decisions. Knowing the right questions to ask, we get the right answers that will matter to our design.

In my case, for the 4 things I listed in the previous post, the answers are below:


1. How often are the dimension tables updated? Does it keep history or not?
--Dimensions get updated whenever there is an update. Therefore 3 columns in all dimension table, start date, end date and history. History is flag either 'Y' or 'N' to indicate whether it is still active as of the current time stamp. The history is kept and new update comes as new insert

2. How often are data loaded into LSP_PWE_Config_Fact?

The configuration data are updated once a day at 11: 00 pm. There is currently a datekey column in the fact table that gives the date and hour. However, because it is only updated once a day, so before the next update we will have datekey as ''201211112300" for all of the records.

3. How often are data loaded into Pseudowire Fact or any performance fact?

The statistic record are at hourly level. In other words, the datekey for fact table is having 1 hour increment: 201211110000, 201211110200 ...... 201211112300 etc.

4. What does date key in date dimension and fact table have?
As indicated above. The datekey in Date Dimension also contains hours.

Remember the 2 dashboard requirements I mentioned before? I will just mention it again here:


Overview report queries: LSP Dim --> LSP_PWE_CONFIG_F--> Pseudowire Dim
In this case, LSP_PWE_CONFIG_F is more of an mapping table that join the 2 dimensions when they are N:N relationship.

Performance reports queries:  LSP Dim --> LSP_PWE_CONFIG_F--> Pseudowire Dim --> Pseudowire Fact

Now how will I go above creating such reports? How would I design the data model for this?

First of all, I have to check the date against fact table to avoid full table scan, moreover, to ensure we are getting the right data at right time at the hourly or daily level, we have to join date Dimension to fact table.

What about configuration fact?

We also need to look at the date in the configuration fact table to ensure that we are picking the dimension objects that are active during the date range we pass from the filter. Does that mean we will have a model like this?


It looks like it will work but it has an issue. Suppose we have a list of LSP and Pseudowires that started on Nov 10 at 1:am, and it's never ended. The performance of these objects are constantly updated in fact table every hour and everyday. Now, the user goes into the system and select the date as Nov 11 from 5am to 7am. Fair enough, the user wants to see the performance of these objects between 5am to 7am on Nov 11th knowing that these values are recorded in the fact table.

However, based on the above model, no data will be returned, which is wrong. The reason why no data will be return is that the datekey from day dimension in this case will be 2011110500 ....2011110700, which will be passed to both LSP PWE Config fact and Pseudowire Fact. Now Pseudowire fact obviously will have data for these keys because records are updated every hour, the same won't be true for LSP PWE Config fact. Because of the fact that the data are updated once a day at 11pm, which is 23rd hour. So all of the keys in that table will be date||2300. This means, unless user's filter values includes 11pm's hour, the report will always be empty.

So how are we going to solve this problem?

There are several options we can think of from modeling's perspective. I will just list 2 here:

1. Substring the date key in LSP PWE Configuration fact so that the join will only look at the date part, but not hour part. In this case, the report will be correct since this table updates once a day

2. Add a start date, end date including time stamps and history column in the LSP PWE Config fact just like in Dimension tables. Instead of joining LSP PWE Config fact to date dimension, we will remove this join. Instead, we will use start date, end date from this table to receive filter values as well as hours. The date will be filtered based on directly passing filter values into this table, instead of through datekey.

Now obviously, the first approach sounds much easier. Yet, it has a drawback. Although it is going to work after making the change to check only date, not hours, it has a limitation. When the system decides to change the data collection frequency of this table from once a day to twice a day, this design is not going to be flexible enough to accommodate that. Because now we do need to look at hours, then what to do? From an architect's point of point, I need to take these potential changes into consideration in every design. So I can't just provide short term solutions.

Therefore, after some deliberation, I have decided to explore the second approach.

Stay tuned


Thursday, November 15, 2012

OBIEE Case study --- How to ensure reporting the right data that is time sensitive part 1

Hello again.

Today I want to talk about a scenario we face in our project. In terms of OBIEE, we know we have to create reports and dashboards to show data. How to ensure we show the right data? Data means nothing when the concept of time is out the windows, specially when it comes to performance related data. In other words, when you are reporting your sales performance, financial performance, you always have to look at a specific time or time range in order to have the number make sense to you. How do you ensure you are getting the right data is always something we have to guarantee in our design.

Let's take the following scenario:

I have an overview dashboard that the goal is to show the list of objects or entities that I have in my system. In my case, show me a list of network devices such as LSP cables, Pseudowires, Nodes that are active within 1 day's hourly range. Then from overview dashboard, we will look at the performance of each devices by throughput, utilization and drops over a range of dates and hours.

In terms of data modeling, we will have models like the following:

Overview report queries: LSP Dim --> LSP_PWE_CONFIG_F--> Pseudowire Dim
In this case, LSP_PWE_CONFIG_F is more of an mapping table that join the 2 dimensions when they are N:N relationship.

Performance reports queries:  LSP Dim --> LSP_PWE_CONFIG_F--> Pseudowire Dim --> Pseudowire Fact

Here in this case, it is a little bit of snowflaking, but eventually we will get the throughput, Utilization of pseudowires that would add up to LSP.

So when we have this model, how are we going to check against the date dimension to ensure you have the right data?

From a developer's perspective, you don't have to worry such thing as you would pretty much do what the design document tells you. However, that would only get you this far. If you want to think beyond developers, you have to look at it from an architect's perspective. This is what I am focusing in this article. This isn't about where to click and what code to write, it is about how to make the design.

So here are a few things we have to take into consideration before making the design.

1. How often are the dimension tables updated? Does it keep history or not?

2. How often are data loaded into LSP_PWE_Config_Fact?

3. How often are data loaded into Pseudowire Fact or any performance fact?

4. What does date key in date dimension and fact table have?

In other words, you can't just blindly join the date dimension to both fact or any one of the fact table and be done with it. You have to think in terms of architect and really understand how would it work when you make that join.

Next time, we will continue on how this scenario is implemented

Until next time




Monday, November 12, 2012

How to configure session time out time in OBIEE

Today, let's talk about something real quick and simple here. Have you ever wondered why after being idle for a while, when you resume your tasks in OBIEE, you are logged out? That's because you have wondered off for quiet sometime and the system logged you out due to inactivity. So knowing this behavior, how do we control the time that OBIEE should allow user to be idle before logging them out?

The configuration is done in instanceconfig.xml file by entering the following entries:










I have entered 60 mins here, which means the system will allow me to be idle for 60 mins before logging me out..

The place you enter these entries is shown below:


Thanks 

Until next time

Friday, November 2, 2012

a general approach to performance tuning in OBIEE

A long time ago, I wrote an article about performance tuning in OBIEE, in which I stated the importance of minimizing performance issue by having a good project direction and a clear understanding of the BI Concept. This is the ensure the project with clean design, optimized process so that we won't even running into a lot of the performance issues later on. Remember, a bad design, wrong process and mismanaged expectation are the root cause of a lot of the issues we potentially face on our path. Applying fixes, workaround and changes attempting the improve the performance are only temporary solutions that eases the symptom, but it won't help in the long run.

Although I am a strong believer of dealing with performance issue at design and project process level, it doesn't mean that there is nothing else we can do within OBIEE to optimize the performance. When a report is running very slow, there are a few areas in OBIEE that we can look into to diagnose the issue.

One of the most common place to investigate is the query log, which records the exact physical queries that a particular report is generating. When you look into the query log, don't just focus on the physical SQL, but also read the steps this log is recording. Each step of the action has a time stamp telling us what time this action started and what time it completed and moved on to the next step. Sometimes, it could be the connection to the DB that takes unusually long, not the query itself. Sometimes, we are defining some DB scripts in the connection pool for it to execute before or after queries, this can cause extra delay. So we need to look into all these things before getting to the physical query itself.

In order to tune a query, some basic understanding of our tables and objects are very important. We know that most of the queries consist of some dimension and fact tables. Fact tables are generally huge because it holds time sensitive information. All these stats, measures and keys are subject to change based on time. When you look at sales fact, you don't just look at the sales total, you need to look at sales total based on dimensional information, especially time related information in order for it to make sense. Therefore, fact tables are loaded with a certain frequency based on time, the records can be at hourly frequency or daily frequency or even min frequency. Over time, fact tables get huge. A full table scan on a fact table will definitely be very costly and slow. Therefore, when all of the queries should at least have a day dimension joining to the fact table and filter based on a specific time period to avoid full table scan. Indexes, and partitions are always needed on the fact table. But then again, if a user insists on running a report for entire 2 years of data, then there is no much we can do. He should read the myth of performance tuning.

Another thing to look for is, if the report is created with some conditional expressions to define the column values, such as case statement, make sure in the where clause the case statement is not on the right side of the equation. For example, you may have a column like x column that gives you date based on different calendars, and then create a filter (like x column = current date) out of this column. Then the where clause would look like this: Where case when column = retail then retail date when column = fiscal then fiscal date else date end = current date. This type of query will take away the index you created on fact table because it has to go through the entire table since the value is conditional.

Explain plan is always a good place to look at in order to understand what the query is actually doing. However, if the query takes way too long to run, you may not want to join the explain plan and wait for a long time. I'd suggest to analyze the query by looking at how many tables and how many joins are involved in the report. Draw a diagram if need to. Then, start removing columns and joins one after another, then rerun the report and measure the performance. I guarantee that by removing layers and layers of joins, the performance will start to get better. Then you will be able to find that 1 table, when joined with your other tables, kills the performance. Then focus on tuning a much simpler version of the query. Chances are, you need to set the driving table to be the smaller table in your logical & complex join. Understand the sequence of how data are being fetched to make sure the smaller dataset is used as the driving dataset against much larger fact tables to avoid full table scan.

An example: You have one report that fetches data from the following tables in a snowflake fashion:

Region <--> Supplier <-->Customer <--> Sales fact<-->Date

We know that sales fact is much bigger. Region has to go through supplier in order to find customer, and then go to the sales fact to find the measures for those selected customers based on date and regions. So if I go from regions to supplier and then customer, I will obtain a pretty small list of records. Then I will just search into the sales fact for these small list of customers instead of scanning the whole fact table. Other way to get the result would be, get all of the sales fact records, then take the record and search into customer and eventually get you the region. First way is much faster then the second way because the driving data set is much smaller. The explain plan will outline the flow of the joins. Analyze that and see if you are using the wrong data set as driver.

Sometimes, graphs and charts can cost system overheads too. Putting too many charts and graphs on one dashboard will slow down the performance.

Although caching, aggregate tables are available in OBIEE as features, don't look at it as magic bullets. Especially caching, it has cost and it won't work in ad-hoc environment.

Understanding your hardware resources and system environment is another important aspect of performance tuning. You need to know where your limit is.

In summary, performance tuning is a very common practice in all OBIEE project because we all face slow performance. Certain things can be avoided by having a clean design, other issues will have to be investigated and tuned at the query and DB Level, while there are still possible solutions using OBIEE features. Still there will be some issues you just can't fix. As a last resort, a deep understanding of what the customer wants is important before you apply the workaround so that you can manage the expectation properly.

Thanks

Until next time



Related Posts Plugin for WordPress, Blogger...