Delphi Parallel Programming Library & Memory Managers
In September 2014 Delphi XE7 was launched. I was excited to try the new Parallel Programming Library (PPL). I thought I’d be able to quickly and easily write code which takes advantage of the processors in my Intel i7 machine. To my dismay the performance improvement were virtually non-existent if there was any memory allocation carried out in parallel. I even brought up the subject on Google+ (see here: https://goo.gl/hWc6Z6). The bottleneck seemed to be with FastMM4. When it was first bundled with Delphi 2006, FastMM4 was a breakthrough in single core memory management. But FastMM4 had not been designed for multi-threaded performance. In my tests the PPL versions of my test routines were slower than the single core versions. I concluded it was wiser to stick with the simpler single core routine, and hope for a solution.
A week ago I stumbled across this thread on Google+, “Why is the FastMM4 development stalled?”. I’d never heard about NexusDB’s memory manager. Apparently it scales well in a multi-thread environment. Eivind Bakkestuen from NexusDB kindly offered to provide a test copy. So I thought I’d give it a try. I went back to my main application and fired up the parallel map rendering routines of my Sales Territory Mapping application. To my amazement I got a instant speedup of 68% when using the NexusDB memory manager. I was ecstatic! Out of pure interest I then tried the parallel rendering using FastMM4. To my surprise FastMM4 performed equally as well as NexusDB. What has changed since 2014? FastMM4 hadn’t been updated since May 2013. Was it something new with the PPL included with Delphi 10 Seattle? Or was it as a result of Windows 10. I also noticed the memory manager included with Delphi didn’t perform quite as well as explicitly including FastMM4 at the start of the “dproj” file.
So I set out to create a test and investigate further.
I created a test project (you can download it here). It’s nothing special. The small app creates and destroys lists of small and simple objects. I quickly established there was no speed difference between XE7 and Delphi 10 Seattle applications. I then create six Delphi 10 Seattle versions of the application; a 32 bit and 64 bit version of with the native memory manager, FastMM4 and NexusDB:
Each executable can run in single-core or multi-core mode. You can download the executables here (SpeedTest.zip). I then ran them on four different laptops:
- Dell XPS 15: Windows 10 2.6 GHz i7-6700HQ
- HP from 2009: Windows 7 2.2 GHz i7-2670QM
- HP from 2014: Windows 8.1 2.4 GHz i7-3630QM
- Surface Book: Windows 10 2.4 GUs i5-6300U
Here are the 32 bit results – each value is the times in milliseconds required to execute (smaller is better):
And here are the 64 bit results – each value is the times in milliseconds required to execute (smaller is better):
And here are the key points:
- NexusDB’s memory manager was impressive in every multi-threaded test (in some cases double the speed of FastMM4)
- FastMM4 did much better than I expected. There was a measurable speed improvement in all multi-threaded tests
- The native memory manager (which I thought was FastMM4) was measurably slower than FastMM4 in most multi-threaded tests (e.g. 64 bit multi-threaded). The difference was negligible in single thread tests.
- I was amazed at how well the 2009 laptop performed – Moore’s Law is clearly dead for laptops.
My conclusion is that NexusDB’s memory manager is the one to use if multi-thread performance is an issue.
All of these tests were carried out on laptops. I’d love to see the speeds when run on an eight core machine.