I have been thinking about IBM’s Pure Systems for the past few months, and there are a couple of adages that come to mind that explain the old systems that have come before it:
- GOOD at everything but GREAT at nothing
- Jack of all trades, master of none
I think that these adages really sum up what IBM’s Pure Systems are all about. Do you want to buy and then build a system that “can do the job”, or do you want to buy a complete system that has been optimized to do the job at hand? Some might say that they can build the same system for all of their applications, and they will all run, and they are right. But, they will not run as fast, or with as little hand holding as a system designed for the particular application/purpose.
I know that not everyone golfs, but with the interest in Tiger Woods and others over the past few years, I think most of us are familiar enough with the game for this analogy to work. Golfers do not compete with just one club. They carry a bag full of clubs so that they can choose the right club for the conditions. In golf the conditions are the height of the grass (or the sand if you are unlucky enough to be in the bunker), the distance to the hole, the height of ground between the ball and the hole, and the placement of the hole on the green.
Unless the hole is a par 3, and 150 yards or shorter, most golfers will hit a driver/wood off the tee. Then, if they stay on the fairway they can use an iron based on how far they are from the hole. But if they stray from the fairway into a sand trap they will need a sand wedge to lift the ball out of the trap. If there is a tree or hill between the ball and the hole, they would choose a pitching wedge to hit the ball high enough to clear the obstacle. And then once they are on the green, every golfer has their favorite putter.
The same is true for the systems in your data center. And I think it is even more true for the systems you use for your data processing needs.
Data processing today is not the same as it was 5 or ten years ago. In the past data processing was all about handling the transaction as quickly as you can. Today enterprises must also harvest their years’ worth of transactional data to gain insight that they can use to help increase sales and drive more profit.
These two workloads are very different. Transaction processing is all about reading and writing one row at a time, as fast as possible, using as little system resources as possible, since there are many other people (or registers) also processing transactions at the same time. And data harvesting, or analytics, is about accessing all of the data to find insight, using all of the resources available to get the job done as fast as possible. Just like you would not get the best results by putting with a sand wedge, you will not get the best results running complex analytics on a system designed and built to handle on line transactions.
Another database system vendor argues that their system can do both of these workloads well, and at the same time. While I would not argue that they can “run” these workloads, I would argue that they will not run optimally.
This other vendor engineered their system by building an extra storage tier that has 4 main enhancements. Of these 4 enhancements, only one (the incorporation of flash on the storage servers) could actually help transactional performance. The other 3 can help analytical performance if conditions are right.
In optimizing transactional performance, the main things you need to do are:
- Have the data already in the buffer cache/buffer pool (i.e. memory)
- Access only the row you need
- Log the changes are quickly as possible
In a transactional system, the use of flash storage for the tables and indexes is not effective. The flash storage is still an I/O call, and when database software makes an I/O call it releases the CPU to other transactions. In my opinion, a Pure System for transactions would use the flash for the database logs, to get them written as quickly as possible. But this would not require as much flash storage, since the logs would be archived to a disk or tape once they are full. So, with that “extra” money I would increase the memory (RAM) on the database server(s) so that they can keep more data and index pages in memory, and do a lot less I/O, and not have to ever yield the CPU.
With a system designed as I described, you have a system that has been fully optimized for transactional processing, not one that was built to be good at everything that is trying to handle transactional processing.
I’ll be posting 2 more articles this week that will dig more into the Analytics/Op Analytics systems.
Be sure to join the live broadcast PureSystems event beginning October 9th at 2pm EDT- Expert IT 2012: Accelerate Big Data and Cloud with Expert Integrated Systems. Customers & prospects will learn first hand how to overcome the toughest challenges in IT. Built-for-the-cloud IBM PureSystems solutions make capturing value from their data faster, simpler and more cost-effective. Only IBM can deliver the built-in expertise, integration-by-design systems that simplify the entire IT experience, from data analysis to cloud computing.
I worked with DB2 LUW 9.7FP5 (Linux in this case) using another vendors flash storage to develop some best practices for DB2 in an OLTP environment, and I must say… it screamed… SSD drives are fast… but it is still I/O and has storage overhead, et al… You still have a "drive" plugged into a SAN or whatever storage you are using… Sure… the storage with this other vendor still requires I/O… but it is not SAN technology… it is pure memory storage.
Plugging an SSD drive into storage is not optimal… a true flash memory array can bring huge performance boosts. Sure it costs more… it's the wave of the future… Guess where some of my stocks are coming from…
With appliance SSDs running at or below the cost/gb of enterprise storage that value option is changing. Love to talk offline with you about it!
I disagree with your comment that FLASH/solid state is not good for transactional processing. With most new SSD appliances reads and writes are equally performant, indeed, in many cases the writes can be faster than the reads. SSDs benefit any system that is IO bound. I look forward to working with you to help you understand how much SSDs can benefit all databases and IO bound applications.
Hi Mike – I guess I could have said cost effect, or optimal. I have worked with some large OLTP systems with DB2, and the one place we saw the biggest bang for the buck with the use of SSD/Flash was for the database logs, not for the database tables themselves. And, if we compare TPC full disclosure reports on DB2 and Oracle we see that DB2 uses a significant amount less log space than Oracle. In my experience I have found that rather than adding terabytes of Flash /SSD to a system, it is more cost effective to add a few (hundred) GB for the database logs, and use the rest of the money that would have been spent in the Flash/SSD on more memory for the database buffer pool / buffer cache and more CPUs.
Sorry, my reply got cut off somehow. I would change your quote slightly at the end. I would say:
There is no single solution which is optimal for all requirements. Only a set of solutions which are expertly integrated to be optimal for their designated workload.
I don't know much about IBM Pure Systems, or golf for that matter, but I agree with the golfing analogy of Exadata. It's perhaps slightly unfair to call it the "Jack of all trades – master of none" because it really is very good at the particular data warehousing workloads for which it was designed. The golfing analogy works for me because you do indeed need different systems for different workloads, just as you need different golf clubs for different shots (so my golfing friends tell me to my utter apathy).
My biggest concern with Exadata is Oracle's decision to call it "The Strategic platform for all database workloads". So that means it doesn't matter what you are doing, or what your requirements are… Oracle's answer is to position Exadata. As an ex-Oracle employee who used to implement Exadata, I just don't see how that can always be the best answer for the customer.
If you'll allow, I wrote more about it on my blog here: http://flashdba.com/2012/06/13/the-strategic-plat…