All of the topics discussed here in this blog comes from my real life encounters. They serve as references for future research. All of the data, contents and information presented in my entries have been altered and edited to protect the confidentiality and privacy of the clients.

Various scenarios of designing RPD and data modeling

Find the easiest and most straightforward way of designing RPD and data models that are dynamic and robust

Countless examples of dashboard and report design cases

Making the dashboard truly interactive

The concept of Business Intelligence

The most important concept ever need to understand to implement any successful OBIEE projects

Making it easy for beginners and business users

The perfect place for beginners to learn and get educated with Oracle Business Intelligence

Monday, October 21, 2013

Mathematically calculate the difference between month when the data is numeric 'yyyymm' in OBIEE


I just found something fun and maybe useful to do in OBIEE. Calculating the difference between timestamps is pretty easy by using timestamp diff function. When it comes to calculating the difference between month, I know 'sql_tsi_month' can be used, which will give you results in month. However, you don't always want to put date columns into the equation if your report is aggregating at monthly level, and sometimes the date column is simply not available. So how do you go about calculating the difference in month.

To show the point, let's look at the following report to understand the requirement:

We need to calculate the difference in month between check in date and KPI month and year

So in this case, the report is going against the monthly aggregate fact since the date dimension field is KPI month and year, we don't want to introduce KPI Date into the formula (even if we do have this field) for comparing timestamps. So we need a way to figure out the difference between (in this case) 3/08/2010 and 201101 in months. Now the good new is that, it can be a rounded estimate, meaning that as long as the date is in March, any date between 1 to 31, it is March. So the difference between March 31th and April 1st is 1 month. This is a good news, because it basically means I need to mathematically calculate the difference between something like 201003 and 201101, but the result should be in month.

Now let's figure out how to achieve this mathematically.

We all know that 201101 - 201003 = 98. Obviously this is going to be wrong. In order to do the right calculation, we need to calculate this in two parts: 2011 - 2010 and 01 - 03, then add the result back together to give the final result.

Now let's break it down step by step:

1. Assign KPI Month (201101) is X and Check in Date (201003) is Y.

2. do: X/100 and round off, this will give you something like 2011 and Y/100 will be 2010.

3. Divide the above and multiply by 12 since there are 12 months in a year. In this case, it will give you 1*12 = 12.

4. Do: X- (X/100)*100 and Y - (Y/100)*100. Both will give you the last 2 digits: 03 and 01.

5. Divide the two numbers from step 4, this will give you difference in month, which is -2.

6. Now add the result from step 5 and step 3, which is then -2+12 = 10. This is your answer.

Now the programming of this in OBIEE is very simple. You convert the data from 03/08/2010 into numeric 201003, the formula of achieving such becomes your 'Y'. And the rest are basically the exact carry-out of what has been described above, so I will skip both of the steps and just show you the result.

Defining the OBIEE formula:

Result (and it's working):


Until next time

Tuesday, October 15, 2013

De-mystify OBIEE with Big Data Integration

Hello there

I know over the last few years, the word 'big data' has become all of the tech world. People talk about it, project management talks about, even those that don't know anything about anything talks about it as if when throwing words like 'cloud computing', 'Hadoop', 'Map Reduce' into the conversation makes one person sound that much smarter and sophisticated. Big Data is a concept, which has been done for many years, it's just that it wasn't well-known until much later. The concept is powerful yet simple. This is the product of the ever-evolving modern society where information means money.

Inevitably, this new concept, new framework will have to be integrated with technologies that can display the pre-computed data with business meanings. This is when the integration between Big Data framework and reporting platform happens. In the last few Oracle Open World, I have seen companies with all kinds of products and services tapping into the world of big data and cloud computing as if this is the rocket science. Let me tell this, it's not big deal. The integration between OBIEE and Hadoop Hive isn't as complicated as you think. There is no special configuration nor special coding to make this happen. It is what it is, Hadoop pulls data from various nodes and stores them into a database (Hbase or anything else), in the middle of the process, Hadoop does what it does, group by or sort in Map Reduce, but at the end the output is what OBIEE reports on. It's no big deal, of course I don't mean it's a waste, I mean it's brilliant because it's no big deal yet powerful.

Now that OBIEE allows to pull data directly from Hadoop, Informatica has a new Hadoop plug in, just about every ETL tools are working to recognize Hadoop, it is making the integration easier from configuration point of view. However, the main engineering work isn't the integration part, it is the architecture of such framework that not only has to be cost effective, but also highly scale-able. This is the money.

I have been dealing with Hadoop with OBIEE since its 10G version. I could get that to work, 11G is just becoming simpler.

The idea of Big Data is to handle highly unstructured data, data that are not conventionally defined as attributes or facts. Stuffs people write about on facebook comments, tweets, blog comments are all unstructured data, yet they are highly valuable stuffs. These data are all over the internet, not just in 1 source OLTP like most of the conventional stuffs that BI Projects deal with. These data comes in huge volumes in matters of minutes. Most powerful of all, the concept of cloud computing is to be able to do the job instead of using 1 super computer that costs millions, but using said 2000 ordinary laptops each acting as a node. That way, you can freely plug your laptop in and out of the network without destructing others -- things you can't do with mainframe system. This translations to cost-effectiveness.

While Bid Data is so hot and so Big now, it is more of a concept, a framework. It will still need to stand the test of time to be accepted and standardized. This concept has been more popular among start-up companies, or tech-giants who want to build software products using this concepts. However, it will take more time more big banks and financial giants to accept. The main risk of Hadoop framework in my opinion is the risk of losing data during the transactions. Just like any software, there will be error during the process. Unlike big banks who stores data in their mainframe that are highly secured and protected, Hadoop may lose some data during the distribution. This is why sometimes when your google search turns out empty, or your facebook comment is not found, or whatever cool pictures you uploaded was unsuccessful. If it's not big deal to lose transaction once a while, like in this case, you will just upload again or refresh the search page again, then it's no big deal, enjoy those more benefits that the framework offers. But if you are talking about financial transactions, stock transactions that you absolutely can not afford to miss at specific time, then Big Data's risk way outweighs the gain. Therefore, one big evaluation of the product is the analysis of the nature of your data to determine how much can you afford to miss some data (as known risk in any engineered system).

Despite all these, the integration between reporting and big data framework does have a lot of futures. Most of these projects will be different from the traditional OBIEE projects that handles company's internal reporting, rather, they will be more outward to external consumers and marketing.

This is a typical project I went through using Hadoop integration with OBIEE, which really is no big deal:

Building a software product that does telecom network analysis to diagnosis network performance. Hadoop was used to collect data from routers and devices (500 to 2000 routers), ETL framework was used to transform data into analytic data for trending while Hbase provides more real time data. OBIEE reports on top of them with dashboard and charts that shows network performance including various network elements, towers, stations, pseudowires in the network. Since data that are traveling through the network contents customer information, such as numbers, IP Address, gender etc, marketing analysis was also provided as dashboards. The product is not for internal stuffs, rather it is sold to network carriers as pre-built solutions, similar to BI Apps. Therefore, OBIEE maybe replaced by Pentaho for carriers who don't want to spend too much money.

Anyways, more project in this nature will become available as I see it. Once people have the data, they don't want to lost it, they want to do something about it. Hadoop framework is a good solution. Along with that, there are several DB technologies that handles unstructured data, such as Cassandra, Mongol DB and so on, they all do the job differently, but the idea is comparable. OBIEE on the other hand, really isn't gonna make much difference here. The integration of OBIEE and all that stuff really isn't much different from Integrating with said Oracle DB or Informatica.

The bottom-line is, all of the engineering work will be done to collect data and group by as single output, then OBIEE somehow manage to arrange these schemas into a star to report on. Well, if you are not using OBIEE to report, do you still care about logical Star Schemas? Something for you to think about.


Until next time.

Saturday, October 12, 2013

A comprehensive way to understand Oracle BI Apps

As you spend more time developing reports using OBIEE, you will realize that the needs for better and more optimized data models and ETL frameworks become more important. You realize that most of the time required to spend is on designing database schemas, tables and objects as well as ETL frameworks to load the data into analytic warehouse. Needless to say, the time required to research and understand the specific business process and subject matter is more crucial than the technical implementation. Often, millions of dollars and years of time are invested in this matter, but still things can go wrong. This is the price people pay when it invent the wheel.

What if there is a complete, end-to-end solution for any specific business needs that are pre-engineered and packaged for you, which can be deployed and installed in your system fairly simply, that will save you endless times and risk of mistakes? This is what BI Apps is all about. With a complete end-to-end structure that consists of the groundwork of analytic warehouse where all the required schemas and objects have been created, the most optimized ETL framework that takes care of all the data transformation needs, and most eye-catching reports and dashboards with the most valuable information displayed in highly visual and communicative ways. Most importantly, BI App is a project after years of research and engineering that are designed to solve most of the problems for business. In other words, the wheel is already invented, you just need to put it on your car. This is the reason why BI Apps have become the solution for so many companies, it's all about cost-effectiveness.

In order to understand how BI Apps work and how to study and gain more insight about it, you should first start thinking in terms of problems and how BI Apps solves them. If you were just trying to study and memorize things by reading documents, it is hard.

So start asking yourself these questions:

1. Knowing that BI Apps is an end-to-end solution package that comes with a pre-build data warehouse, Pre-defined Informatica framework, DAC and OBIEE, what does it take to put these things in your business so it can be operating? In other words, should I install all these things on 1 machine or multiple machines?

hint: Informatica and DAC have to be installed on the same machine. OBIEE has to be on different machine. Staging DB and warehouse can be on 1 machine. Since OBIEE's admin tool is a windows based application, a windows machine is needed to install OBIEE client tool so that the pre-build OBIEE objects (rpd) of the BI Apps can be installed. 

2. Once this thing is installed, it is not running yet. Which means, your live data from OLTP is not yet being loaded into your warehouse, so none of the OBIEE reports will work. How does BI Apps pull data from your OLTP source, if your OLTP source is Siebel, Peoplesoft, EBS, JD Edwards or your home grown one?

Hint: Oracle provides adapter for pulling data from various sources. If you open the Informatica repository, you will see a list of Adapters available. These are nothing but folders where different mappings are stored. If your source is EBS, then you will use EBS adapter where all of the informatica mappings are pre-built to pull data from EBS source. All of the source table definitions there will match exactly with the tables in your EBS. If you use home-grown source, then there is universal adapter for you where the input files will have to be customized into CSV format for Informatica to load from. 

3. One you start loading data into your warehouse, how does BI Apps actually do it successfully based on your needs? If you think about it, in any projects you are going to face the same needs such as defining your calendar, time dimensions, your dimensional hierarchies, your fact tables and aggregation tables. How does BI App actually do it?

Hint: This article gives some helpful hints. To configure for the initial full load, you need to go through several configuration steps in Dac and Informatica. Such as defining initial extract date parameter in DAC to avoid loading way too much record, configure DAC Source system parameters for global currencies, exchange rate types, Nullify the refresh date tab in DAC to enable full load to W_Day_D dimension (based table for all calendars in BI Apps) and define $start_date and $end_date parameter for Task SIL_Daydimension so that DAC will load data into Date dimension based on those date ranges. Run PLP tasks to load aggregate tables, configure GL Account hierarchies, etc

4. If your business is international, you are likely to deal with transactions that are not in just 1 currency, but multiple currencies. How does BI Apps handle this?

Hint: Found above

5. Inevitably, any business is going to want to customize the BI App to make it more specific to their business, how does BI App make it easier for such customization? What are the levels of customization?

Hint: There are category 1 and category 2 customization. For cat 1, it is about bringing in additional fields from the source into target. In all of the pre-built mappings, there is an unmapped field 'x_custom' in pretty much every transformation. Your new field should go through the same sequence and flow as x_custom to ensure it's safe arrival to the target. For Cat 2, it is about bringing new tables and dimensions into the mapping. Use your informatica knowledge to do it against the requirement. The best way and the only way Oracle will support your customization is by creating custom folders in your Informatica and Custom container in DAC, then copy the pre-built mappings/workflows into this folder to make your customization, change DAC task to point to these new folders so that the execution plan will run these workflows instead of the original ones. DON'T EVER CUSTOMIZE DIRECTLY IN ORIGINAL MAPPING.

6. Inevitably, there will be records deleted in the OLTP system later on, how does BI App handle this situation so that the deleted records won't show up in its warehouse?

Hint: Informatica has pre-built mappings that handle deletion to the warehouse records. This article gives a good explanation.

7. In any ETL framework, there is going to be needs for maintaining historical records in the tables. How does BI App handle the loading of SCD? What's BI App's way of checking and flagging records that become history?

Hint: There are a lot of approach to this problem. The bottom-line is, how to identify records that have been updated and once it's been identified, what to do about it? Using orahash value has been a traditional way by a lot of DBAs to identify records that have been changed. When any of the key attributes related to this record has changed (such as address, phone number, name etc related to this specific account changes), new orahash key gets generated. There must be fields that stores information such as record start date, end date and history flag in the dimension where the primary key will be attached to timestamp to ensure uniqueness. Knowing the idea, how does BI App handle it there way (no necessarily using Orahash )? This article can be helpful.

8. How does BI App handle incremental load?

Hint: In a lot of the pre-built mapping, you will notice that it has 2 workflow sessions. The suffix will suggest which one is full load and which is incremental. The DAC Task window will also provide setting for both full load and incremental load sessions as mentioned here.

9. When it comes to data modeling in OBIEE, there are going to be a lot of snowflaking of dimensions? If you are dealing with financial analytic, you will see that a lot of the dimensions are at different granularity that won't join to the fact. Fact tables are aggregated at different levels for optimizing performance, how are these being modeled in the rpd?

Hint: In your OBIEE rpd, you will notice that in the BMM layer, there are lots of logical tables being mapped to multiple LTS. This is creating logical star schema out of highly normalized physical snowflake schema. This is almost like creating a logical view that joins multiple physical tables.

10. Any applications will have to deal with upgrades and migrations. How does that work in BI Apps?

Hint: If your customization is done accordingly, the upgrade should be fairly easy.

11. Security is always a big concern for any software, what are the security models provided by BI Apps?

Hint: This document gives good information

12. Last but not least, having some domain knowledge about the business is always important. If you are dealing with Financial analytics, what does business unit means, what about chart of account, line of business, GL stuffs? What business information do dashboards like Payables, receivables, profitability provide for financial analytics? What about HR Analytics?

Hint: This is something you will gain through experience, research and curiosity. 

These are areas that you have to investigate and research to gain better understanding of how BI Apps work, if you want to excel in interviews. Not only you have to be familiar with the topics, the customization and handling of such a vast application, but also start thinking in terms of what is the nature of the business problems and how BI App solves it. If you can relate BI Apps solutions with data warehouse solutions in general, things will start making more sense, because if you have ever designed and implement any solutions with your own resource before, you will realize that a lot of the problems and concerns have already been addressed in BI Apps. Therefore, the engineering of it becomes minimum, the customization and handling of the operation is what BI App is all about.


Until next time
Related Posts Plugin for WordPress, Blogger...