May 2010 ~ Best Way To Understand BI and Data Visualization And How Everything Works Here

Saturday, May 29, 2010

Understanding Complex Join and Physical Join in OBIEE

9:11 PM OBIEE Rpd Configuration 11 comments

What is the difference between complex join and physical join? The easiest way to understand the basic is to remember that physical join is used in Physical layer and complex join is used in BMM layer. However, just knowing that isn't going to be enough to build solid skills on OBIEE development. In order to gain more insight on how it really works in OBIEE, we need to know more about these 2 types of joins.

First, let's look at complex join:

The diagram is a window of a typical complex join. In here, you notice that you can't change any of the table columns of neither logical tables in the join and the expression pane is grayed out. However, you are able to change the type of join from inner to outer joins. This type of behavior is telling us that complex join is a logical join that OBIEE server looks at to determine the relationship between logical tables, in other words, it is just a placeholder. Complex join will not be able to tell the server what physical columns are used in joining, but it will be able to tell the server what type of join this is going to be..

In order to know how exactly the join is, we will need to look at physical join in the physical layer:

Notice that in this window of physical join, we are able to change the columns under both tables, we are also able to define our own expressions. However, are can't change the joining method unlike complex join. This behavior is to help us to know that this is where we tell OBIEE how to join the 2 actual tables by specifying the columns. Hence this is what physical join does.

Knowing the basic, let's take a step further. Can I use complex join in physical layer or can I use physical join in BMM layer? Yes we can, and by doing it the application will not flag errors. However, we need to know when to use them and what to expect after using them..

Let's look at complex join in physical layer. Although it doesn't happen frequently, it is sometimes needed. Let's say we have 2 tables, promotion fact and contract date dimension. I want to join these 2 tables in such way so that only the dates that are still in contract should return. Therefore, I can't just use a simple join on the date columns from both tables, conditions need to be applied.. In this case, let's use complex join in physical layer:

In the below diagram, I enter 'PTS_DATES.COMPANYDATEID >= PTS_STAR_FACTS.CONTRACTSTARTDATEID AND PTS_DATES.COMPANYDATEID <= PTS_STAR_FACTS.CONTRACTENDDATEID' to satisfy the joining condition. At the front end, when you run a report using these tables, this expression will be included in the where clause of the SQL Statement:

Having physical join in BMM layer is also acceptable, however it is very rare to see that happen. The purpose of having physical join in BMM layer is to override the physical join in physical layer. It allows users to define more complex joining logic there than they could using physical join in physical layer, in other words, it works similar to complex join in physical layer. Therefore, if we are already using complex join in physical layer for applying more join conditions, there is no need to follow this set up with physical join in BMM layer again.

Remember, the best data modeling design in OBIEE is not the most complex and overly convoluted design, it should be as straightforward as possible. Therefore, use physical join in physical layer and complex join in BMM layer as much as you can. Only when situation calls for a different join, then go for it.

Til next time

How to use Dynamic Variable in OBIEE

9:05 PM OBIEE Rpd Configuration 1 comment

What is dynamic variable? According to OBIEE Admin guide, it is basically a variable that dynamically holds its value depending on the way it is defined. In this blog, I am going to talk about how to create dynamic variables and also how to apply them. The goal is for the beginners to get a good understanding of dynamic variable so that they will be able to apply them when situation calls for it.

Let's start with creating a very simple dynamic variable. I want this variable to always hold yesterday's date so that in OBIEE answers, I can always use this dynamic variable to any numbers of report.

Also, let's start with creating a new initialization block, it is basically the place to define the content of this variable and how it queries the desired value:

I am calling this initialization block "Yesterday" and in the 'Edit Data Source', I have ented select to_char(sysdate-1, 'dd-Mon-yyyy') from dual for the initialization string. The format of the SQL statement should be exactly the same as you would enter directly in the database. So a good practice in general is to try the query in your database to make sure it works before entering it in the initialization block. Keep in mind that depending on what type of database you are using, the SQL statement may not be supported. Therefore, lets test the initialization string and see how it works:

And it works. The connection pool should be the one used for the same physical tables in OBIEE.

Now let's associate this Initialization block to a new dynamic variable by clicking 'Edit Data Target':

I will just call this variable 'Yesterday'. So now this variable will always return yesterday's date.

Let's create another dynamic variable with slightly more complex logic. This time I want this variable to give me the current GL months, which is 7 months lagging current calendar month. Moreover, the first day of the GL month starts on every 15th day of each calendar month. Therefore, Let's write the following select statement:

SELECT DISTINCT a.posted_period gl_dashboard_period
FROM rd_glmart.v_gl_star_factsdetails a
WHERE a.posted_period =
TO_CHAR
(ADD_MONTHS (CASE
WHEN TO_CHAR (SYSDATE, 'dd') < '15' THEN (SYSDATE - 15) ELSE SYSDATE END, -7 ), 'yyyymm' )

Test and see the result:

Since today is May 14th and the result of this initialization string brings back 200909, I'd say it is pretty good.

Now I have two dynamic variables: Yesterday and GL Current Closed Period. Let's apply them in our reports.

Enter the name of the dynamic variable: GL Current Closed Period in the filter

And the result:

Til next time.

Filter based on the repetition of records in the column

1:32 PM obiee Report and Dashboard No comments

Recently, I just came across a requirement that I think is worth blogging for beginner who's trying to manipulate the report data based on the the repetition of the data. Take a look at the following report:

You see, not every order number has only one corresponding document number. So how can I filter this report so that it only gives me back those order numbers that have more than 1 document number? Because there is no out of the box filter operator that allows us to easily create a filter based on such condition, we will have to work around it.

Let's first start by creating a report that counts the occurrence of every order number. So bring in a new column and call it 'count'. In the column formula, let's enter a formula: sum(count (distinct "Facts Info"."Order Number") by "Facts Info"."Order Number")

What this formula does is that it will count every order number's occurrence as 1 and them sum all the 1s by each order number. So the goal is that for those order numbers that have let's say 2 document numbers, these order numbers will repeat twice and there corresponding counts will be 2 in both lines after the 'Sum' function. So see the report below:

Now the hardest part is over. For the final report, we can then create a filter on column count. Now depending on what you want to see, you can get the order numbers that have 2 document numbers, 3 document numbers, less than 5 or however you want. For the purpose of this demonstration, I will just set this filter to be greater than 1, which means it will bring back all the order numbers that have more than 1 document numbers:

Run the report and here is the final result:

Of course, we can always hide the column count so that others don't see it in the report

Next time.

reporting against multiple transactional datasources

1:15 PM OBIEE Rpd Configuration No comments

Recently, I came across a requirement that the users want to query against the transactional database using OBIEE. In our case, they literally want to pull data directly (meaning no ETL, no staging area in between) from the cash register of the retail stores nationwide. The databases of these stores are all sitting on different servers although they all have the same data structure. The goal is to be able to report these store data in 1 subject area in OBIEE with the functionality of letting users choose to report against 1 store data or numbers of store data together for totaling.

So let's explore how this can be implemented.

First, let's say I have successfully imported 2 sets of tables into my physical layer with different connection pool, and I have defined the join criteria among these tables.

First as we can see, there are two groups of tables coming from 2 different datasource, but their definitions identical. Therefore, we need to come up with a way to let OBIEE know how to distinguish these 2 sets of data. Therefore, I decided to add a new field in both sets call 'store number' and hard code this field with different values. This field will be used later to allow us to control the querying targets.

Since these datasource are within the store that we do not have control of, I decided to add a new physical column in each of the physical tables in OBIEE and change these physical tables to become views in order to define the value of the new column 'store number':

Test the data in column 'store number', it appears to be correct:

Now apply the same configuration to all physical tables in both physical database tables. We should have store number column in every table. The one from server 5th-ave is hardcoded 'NY' and the one from Dolphin is hardcoded '001'

Now bring both groups of tables into the same BMM layer folder. In each of the logical table, we should now have 2 different LTS. In each LTS, define the fragmentation content using 'store number':

This will be done in all Logical tables, and then move the wanted columns into presentation layer and after re-arranging, it looks like this:

Now let's go and test the query. In the subject area, the column store number will now content both 'NY' and '001', depending on which value we select, the result will come back accordingly.

So let's just run a simple report of store number and total amount received and see what we get:

If we look at the physical query that's generated, we will see that it fired 2 queries to both datasources and bring back the result :

This goes without saying that we can also just pick one store number and the report will come back with data from just one datasource.

One thing to keep in mind is that although there is no hard limit of how many datasource OBIEE is allowed to defined, meaning that I can have 50+ datasources defined in the same RPD without causing configuration errors (if this is actually required), there is practical limitations on how many datasources OBIEE can handle at the same time. This will depend on a lot of things, such as data size, reporting requirements, physical hardware, network connections. I can say that if the users want to report against 20 stores at the same time and see the grand total, the query will very likely be hanging for a long time. Hence, staging area and ETL effort will be a better option.

Nonetheless, for working around the requirement of having a few datasource in report, it can be done in OBIEE and still be practical.

til next time.

The myth of OBIEE performance tuning

11:36 AM BI Concepts 1 comment

It is very common for people to think that when reports run for excessive amount of time, there is configurations or performance tunings that need to be done or enhanced witin OBIEE. It is as if OBIEE has the magic key to faster and better performance. At the surface level, the configurations and reports are done in OBIEE, it is natural for most people to think that the problem needs to be solved in OBIEE as well. That is fair.

However, when looking at reports with poor performance, there could be a lot of reasons behind it. OBIEE in essence creates SQL query that executes directly against the database, we can ensure the SQL query to be free of unnecessary clauses and nesting statements by following the best practice in OBIEE design. Yet, this only takes care of some of the the performance issues. It is true that OBIEE has several out of the box functionality for performance related needs, such as caching capability, aggregate tables, changing the log level and so on, but none of these are magic bullets in dealing with performance.

Caching is widely used in OBIEE projects. It offers help in query responses time by storing queries in cache. We can automate the process of refreshing caches by event pulling tables, or we can pre-run a lot of queries for cache seeding by using scheduled IBots. All these can maximize the benefit of caching reports. However, when entering into the OBIEE world, I can safely say that we are entering the Ad-hoc world. So in this case, caching isn't going to help completely.

Aggregate tables allow OBIEE to create separate tables aggregated at defined dimensional level, hence reduces the size of the record set. When certain queries that need to get data at that level, it can have an option to query smaller tables and return results quicker that way. This surely seem a pretty good option that deals with certain requests (but we know not for all). However, creating more physical tables mean spending more resources, and taking up for database spaces. If going down this path in the long run, the business needs to understand their infrastructure and know what they can afford in the long run. Therefore, aggregate table isn't going to help in all situations.

Everything else that can be done within OBIEE configuration is just be ensure the BI server can generate the best query to be executed. If the request runs for ever, we will need to look at the query and understand why it is running slowly. This becomes a question to both the business and technical people.

What is this query going to give you? What value will it add to the business with this report? I have seen requests composed of 45+ columns in one report. They would run for 30 minutes and come back with error, basically the number of records returned was too large. We can certainly increase the default setting of 65K in OBIEE to take care of the error part, but it will never prevent the error for good because the data is likely to increase over time. We can definitely spend lots of effort and time to tune the query or even schedule to cache this report everyday, but it will not help when Adhocing. More critically, if we decide to go down this path, we are leading the business to a wrong direction, and I can ensure that there will be more similar queries to tune in the future, which will never end. Therefore, the first step in making sure the reports are running fast is not to be in a situation where we have to deal with performance killing requests. This is not an easy thing to do because it involves on-going education to the business world about BI concepts and practices. The whole point about business intelligence is to avoid having to data dump millions of records into user's computer for them to slice & dice, it is about slicing & dicing data for them. It is a mindset change that will take time, but can't be done without clearly conveying the purpose and value of BI in the business world. Internet changes the way people obtain information, but the change didn't happen over night, same with BI.

From a technical perspective, when a fairly decent size request is taking a lot of time to return, it can be caused by a lot of things. How is the server machine's CPU usage? Does your database support the type of query you are running? Are you actually performing multiple requests and then union them? What type of table or view this query is running again?

One of the things that I like to do first is bring down the request into smaller ones that only consist of 1 joining operations to see which join is very slow. Chances are, if the query is taking for a long time, it is likely that the query is executing against huge data sets that are joined up. Knowing which join is slow can help us pinpoint the tuning target. For example, I have 1 report that has 12 columns from 4 tables running for 40 minutes in answers. By looking at the Sql statement of this request, I don't see anything particularly wrong in the query. Therefore, I decide to break it down to 3 columns from 1 dim and 1 fact table. The query is running with decent speed. Then I delete the columns from this dim and replace them with those from the second dim table in the original request. The query now is running for 25 minutes. So now I have my focus on investigating why querying these 2 tables take such a long time. Therefore, if i were to create Indexes on the joining columns, I at least when where to create it on. If the table size of these 2 tables are huge, then obviously the join between the 2 will be slow. If worse, these are actually views across more physical tables in the database that i am querying against, then the performance will obviously be slow. Therefore, we need to have detail understanding of the environment in order to determine the best thing to do.

There is no one simple change in the database or OBIEE that will eliminate performance issues one for all. When reports are running slow, we can't just tell the business to change their reports nor can we just throw it to the DBA to deal with it. Instead, we need to all work together and come up with the best solutions for all. Creating Indexes, hints or materialized views are easier to say than done without considering the impacts on the infrastructure side in the long run. In certain cases, capturing certain data may only be done using views to avoid re-writing the ETL process and other jobs. Therefore, the business would need to understand the situation and be willing to compromise certain things if they want better performance. On the other hand, we also have to thoroughly understand the business requirement to determine if there is any workaround available, only then the message can be accurately communicated and the expectation can be effectively managed.

In summary, the topic of performance tuning in OBIEE is a very broad topic that becomes performance tuning in general. There is no one solution or simple configurations that will take care of all the performance issues. It's all about cooperating, understanding, compromising, working around and coming up the best solutions for the project in the long run. It requires deep understand of the nature of one's project environment, data structures, project flows, business in order to come up with the best technical process and business practice in order to be on a good project path to start with. No matter how advance the tool is, if we are dealing with billions of records that outer joins another billions, we are going to have to wait.

Best Way To Understand BI and Data Visualization And How Everything Works Here

Disclaimer

Search Here

Followers

Category

About Me

Find trivia IT Project challenges here

Various scenarios of designing RPD and data modeling

Countless examples of dashboard and report design cases

The concept of Business Intelligence

Making it easy for beginners and business users

Saturday, May 29, 2010

Understanding Complex Join and Physical Join in OBIEE

Monday, May 24, 2010

How to use Dynamic Variable in OBIEE

Monday, May 17, 2010

Filter based on the repetition of records in the column

Wednesday, May 12, 2010

reporting against multiple transactional datasources

Thursday, May 6, 2010

The myth of OBIEE performance tuning

Labels

Search From Here

Blog Archive

Affliates

OBIEE Resource

Pages

Labels

Disclaimer

Search Here

Followers

Category

About Me

Subscribe To

Saturday, May 29, 2010

Monday, May 24, 2010

Monday, May 17, 2010

Wednesday, May 12, 2010

Thursday, May 6, 2010

Social Profiles

Labels

Search From Here

Blog Archive

Affliates

OBIEE Resource

Pages

Labels