I was going to start off this post with another old adage, but I could not find one that fit.
When we talk about the IBM Netezza appliance, we talk about a system that was expertly integrated so that you do not need an army of DBAs to get it running and keep it going. The unique hardware acceleration in the Field Programmable Gate Arrays (FPGAs) allows the system to quickly get the results your users want, without the need for complex tuning, like indexes, aggregates, partition strategy, etc.
But, there are times when warehousing and analytic workloads can benefit from these accelerators, and that is when the workload becomes more operational in nature. What do I mean by operational, let me explain. Think of a warehouse that keeps record of all customer interactions, that you want to mine through to determine optimal service strategy, how to react to different types of complaints, and/or proactively make offers to customers to keep them happy. Since you have all of the information on each and every customer interaction stored in the warehouse, would it not make sense to use that information to get a complete view of your customer(s) if and when they call your company? Now, if you have only a few customer service people working the phones in your company, it probably will not matter how the database retrieves the information about the caller. But, if you have hundreds, or thousands, of call center people, all trying to get information on different users, then the database will be getting a mix of both warehousing / analytic work, as well as these more transactional, point lookup queries. This is what we call operational data warehousing.
In order to optimize the operational workload, and these point lookups, a warehouse system can and should have indexes and/or table partitioning mechanisms. Although these will definitely increase the maintenance requirements of the system, they also help to optimize the operational analytic workload. It is these types of workloads where IBM’s InfoSphere Warehouse 10 (and the Smart Analytic Systems) really shines. It can handle 10’s of thousands of concurrent users running a mix of operational analytic queries, and allow you to specify and limit the resources users and/or groups can use, so that they do not impact everyone else accessing the system.
And for those large queries or analytic calls, DB2 is a true MPP database system. This allows it to break the query into pieces, and run all of these pieces in parallel, each on a subset data, all at the same time. DB2 was built from the ground up to utilize the MPP architecture, rather than engineering an MPP like storage subsystem, and attaching it to a shared disk platform to eliminate storage bottle necks.
I think this last point, as well as my previous blogs, truly demonstrate why an expertly integrated, purpose-built solution is the right choice, and a system that has extra engineering added onto it, to help it be faster for a variety workloads, is just not optimal.
Remember to join a live broadcast PureSystems event on October 9th at 2pm EDT- Expert IT 2012: Accelerate Big Data and Cloud with Expert Integrated Systems. Customers & prospects will learn first hand how to overcome the toughest challenges in IT. Built-for-the-cloud IBM PureSystems solutions make capturing value from their data faster, simpler and more cost-effective. Only IBM can deliver the built-in expertise, integration-by-design systems that simplify the entire IT experience, from data analysis to cloud computing.
Shawn – You are right, but customers, partners, etc. need to harness the power of ALL of their data. The PureData System for Analytics provides the power to analyze ALL of your data, and not have to take a sample to get the answer in time.
This blog I wrote explains why sampling is a bad thing for analytics.
http://dsnowondb2.blogspot.com/2012/07/indexes-do…
It is just amazing how these systems work. They store a big pile of information in their database and utilize it whenever needed. For a big company this means much as this provides a whole background of information for their clients. This is used to identify the previous problem and work from that history.