All of the topics discussed here in this blog comes from my real life encounters. They serve as references for future research. All of the data, contents and information presented in my entries have been altered and edited to protect the confidentiality and privacy of the clients.

Tuesday, October 15, 2013

De-mystify OBIEE with Big Data Integration

Hello there

I know over the last few years, the word 'big data' has become all of the tech world. People talk about it, project management talks about, even those that don't know anything about anything talks about it as if when throwing words like 'cloud computing', 'Hadoop', 'Map Reduce' into the conversation makes one person sound that much smarter and sophisticated. Big Data is a concept, which has been done for many years, it's just that it wasn't well-known until much later. The concept is powerful yet simple. This is the product of the ever-evolving modern society where information means money.

Inevitably, this new concept, new framework will have to be integrated with technologies that can display the pre-computed data with business meanings. This is when the integration between Big Data framework and reporting platform happens. In the last few Oracle Open World, I have seen companies with all kinds of products and services tapping into the world of big data and cloud computing as if this is the rocket science. Let me tell this, it's not big deal. The integration between OBIEE and Hadoop Hive isn't as complicated as you think. There is no special configuration nor special coding to make this happen. It is what it is, Hadoop pulls data from various nodes and stores them into a database (Hbase or anything else), in the middle of the process, Hadoop does what it does, group by or sort in Map Reduce, but at the end the output is what OBIEE reports on. It's no big deal, of course I don't mean it's a waste, I mean it's brilliant because it's no big deal yet powerful.

Now that OBIEE allows to pull data directly from Hadoop, Informatica has a new Hadoop plug in, just about every ETL tools are working to recognize Hadoop, it is making the integration easier from configuration point of view. However, the main engineering work isn't the integration part, it is the architecture of such framework that not only has to be cost effective, but also highly scale-able. This is the money.

I have been dealing with Hadoop with OBIEE since its 10G version. I could get that to work, 11G is just becoming simpler.

The idea of Big Data is to handle highly unstructured data, data that are not conventionally defined as attributes or facts. Stuffs people write about on facebook comments, tweets, blog comments are all unstructured data, yet they are highly valuable stuffs. These data are all over the internet, not just in 1 source OLTP like most of the conventional stuffs that BI Projects deal with. These data comes in huge volumes in matters of minutes. Most powerful of all, the concept of cloud computing is to be able to do the job instead of using 1 super computer that costs millions, but using said 2000 ordinary laptops each acting as a node. That way, you can freely plug your laptop in and out of the network without destructing others -- things you can't do with mainframe system. This translations to cost-effectiveness.

While Bid Data is so hot and so Big now, it is more of a concept, a framework. It will still need to stand the test of time to be accepted and standardized. This concept has been more popular among start-up companies, or tech-giants who want to build software products using this concepts. However, it will take more time more big banks and financial giants to accept. The main risk of Hadoop framework in my opinion is the risk of losing data during the transactions. Just like any software, there will be error during the process. Unlike big banks who stores data in their mainframe that are highly secured and protected, Hadoop may lose some data during the distribution. This is why sometimes when your google search turns out empty, or your facebook comment is not found, or whatever cool pictures you uploaded was unsuccessful. If it's not big deal to lose transaction once a while, like in this case, you will just upload again or refresh the search page again, then it's no big deal, enjoy those more benefits that the framework offers. But if you are talking about financial transactions, stock transactions that you absolutely can not afford to miss at specific time, then Big Data's risk way outweighs the gain. Therefore, one big evaluation of the product is the analysis of the nature of your data to determine how much can you afford to miss some data (as known risk in any engineered system).

Despite all these, the integration between reporting and big data framework does have a lot of futures. Most of these projects will be different from the traditional OBIEE projects that handles company's internal reporting, rather, they will be more outward to external consumers and marketing.

This is a typical project I went through using Hadoop integration with OBIEE, which really is no big deal:

Building a software product that does telecom network analysis to diagnosis network performance. Hadoop was used to collect data from routers and devices (500 to 2000 routers), ETL framework was used to transform data into analytic data for trending while Hbase provides more real time data. OBIEE reports on top of them with dashboard and charts that shows network performance including various network elements, towers, stations, pseudowires in the network. Since data that are traveling through the network contents customer information, such as numbers, IP Address, gender etc, marketing analysis was also provided as dashboards. The product is not for internal stuffs, rather it is sold to network carriers as pre-built solutions, similar to BI Apps. Therefore, OBIEE maybe replaced by Pentaho for carriers who don't want to spend too much money.

Anyways, more project in this nature will become available as I see it. Once people have the data, they don't want to lost it, they want to do something about it. Hadoop framework is a good solution. Along with that, there are several DB technologies that handles unstructured data, such as Cassandra, Mongol DB and so on, they all do the job differently, but the idea is comparable. OBIEE on the other hand, really isn't gonna make much difference here. The integration of OBIEE and all that stuff really isn't much different from Integrating with said Oracle DB or Informatica.

The bottom-line is, all of the engineering work will be done to collect data and group by as single output, then OBIEE somehow manage to arrange these schemas into a star to report on. Well, if you are not using OBIEE to report, do you still care about logical Star Schemas? Something for you to think about.


Until next time.


SAB said...

Good one , Simple but effective

Related Posts Plugin for WordPress, Blogger...