I have been thinking about IBM’s Pure Systems for the past few months, and there are a couple of adages that come to mind that explain the old systems that have come before it:
- GOOD at everything but GREAT at nothing
- Jack of all trades, master of none
I think that these adages really sum up what IBM’s Pure Systems are all about. Do you want to buy and then build a system that “can do the job”, or do you want to buy a complete system that has been optimized to do the job at hand? Some might say that they can build the same system for all of their applications, and they will all run, and they are right. But, they will not run as fast, or with as little hand holding as a system designed for the particular application/purpose.
I know that not everyone golfs, but with the interest in Tiger Woods and others over the past few years, I think most of us are familiar enough with the game for this analogy to work. Golfers do not compete with just one club. They carry a bag full of clubs so that they can choose the right club for the conditions. In golf the conditions are the height of the grass (or the sand if you are unlucky enough to be in the bunker), the distance to the hole, the height of ground between the ball and the hole, and the placement of the hole on the green.
Unless the hole is a par 3, and 150 yards or shorter, most golfers will hit a driver/wood off the tee. Then, if they stay on the fairway they can use an iron based on how far they are from the hole. But if they stray from the fairway into a sand trap they will need a sand wedge to lift the ball out of the trap. If there is a tree or hill between the ball and the hole, they would choose a pitching wedge to hit the ball high enough to clear the obstacle. And then once they are on the green, every golfer has their favorite putter.
The same is true for the systems in your data center. And I think it is even more true for the systems you use for your data processing needs.
Data processing today is not the same as it was 5 or ten years ago. In the past data processing was all about handling the transaction as quickly as you can. Today enterprises must also harvest their years’ worth of transactional data to gain insight that they can use to help increase sales and drive more profit.
These two workloads are very different. Transaction processing is all about reading and writing one row at a time, as fast as possible, using as little system resources as possible, since there are many other people (or registers) also processing transactions at the same time. And data harvesting, or analytics, is about accessing all of the data to find insight, using all of the resources available to get the job done as fast as possible. Just like you would not get the best results by putting with a sand wedge, you will not get the best results running complex analytics on a system designed and built to handle on line transactions.
Another database system vendor argues that their system can do both of these workloads well, and at the same time. While I would not argue that they can “run” these workloads, I would argue that they will not run optimally.
This other vendor engineered their system by building an extra storage tier that has 4 main enhancements. Of these 4 enhancements, only one (the incorporation of flash on the storage servers) could actually help transactional performance. The other 3 can help analytical performance if conditions are right.
In optimizing transactional performance, the main things you need to do are:
- Have the data already in the buffer cache/buffer pool (i.e. memory)
- Access only the row you need
- Log the changes are quickly as possible
In a transactional system, the use of flash storage for the tables and indexes is not effective. The flash storage is still an I/O call, and when database software makes an I/O call it releases the CPU to other transactions. In my opinion, a Pure System for transactions would use the flash for the database logs, to get them written as quickly as possible. But this would not require as much flash storage, since the logs would be archived to a disk or tape once they are full. So, with that “extra” money I would increase the memory (RAM) on the database server(s) so that they can keep more data and index pages in memory, and do a lot less I/O, and not have to ever yield the CPU.
With a system designed as I described, you have a system that has been fully optimized for transactional processing, not one that was built to be good at everything that is trying to handle transactional processing.
I’ll be posting 2 more articles this week that will dig more into the Analytics/Op Analytics systems.
Be sure to join the live broadcast PureSystems event beginning October 9th at 2pm EDT- Expert IT 2012: Accelerate Big Data and Cloud with Expert Integrated Systems. Customers & prospects will learn first hand how to overcome the toughest challenges in IT. Built-for-the-cloud IBM PureSystems solutions make capturing value from their data faster, simpler and more cost-effective. Only IBM can deliver the built-in expertise, integration-by-design systems that simplify the entire IT experience, from data analysis to cloud computing.