Delphi Parallel Programming Library & Memory Managers

In September 2014 Delphi XE7 was launched. I was excited to try the new Parallel Programming Library (PPL). I thought I’d be able to quickly and easily write code which takes advantage of the processors in my Intel i7 machine. To my dismay the performance improvement were virtually non-existent if there was any memory allocation carried out in parallel. I even brought up the subject on Google+ (see here: https://goo.gl/hWc6Z6). The bottleneck seemed to be with FastMM4. When it was first bundled with Delphi 2006, FastMM4 was a breakthrough in single core memory management. But FastMM4 had not been designed for multi-threaded performance. In my tests the PPL versions of my test routines were slower than the single core versions. I concluded it was wiser to stick with the simpler single core routine, and hope for a solution.

A week ago I stumbled across this thread on Google+, “Why is the FastMM4 development stalled?”. I’d never heard about NexusDB’s memory manager.  Apparently it scales well in a multi-thread environment. Eivind Bakkestuen from NexusDB kindly offered to provide a test copy. So I thought I’d give it a try. I went back to my main application and fired up the parallel map rendering routines of my Sales Territory Mapping application. To my amazement I got a instant speedup of 68% when using the NexusDB memory manager. I was ecstatic! Out of pure interest I then tried the parallel rendering using FastMM4. To my surprise FastMM4 performed equally as well as NexusDB. What has changed since 2014? FastMM4 hadn’t been updated since May 2013. Was it something new with the PPL included with Delphi 10 Seattle? Or was it as a result of Windows 10.  I also noticed the memory manager included with Delphi didn’t perform quite as well as explicitly including FastMM4 at the start of the “dproj” file.

So I set out to create a test and investigate further.

I created a test project (you can download it here). It’s nothing special. The small app creates and destroys lists of small and simple objects. I quickly established there was no speed difference between XE7 and Delphi 10 Seattle applications. I then create six Delphi 10 Seattle versions of the application; a 32 bit and 64 bit version of with the native memory manager, FastMM4 and NexusDB:

  • Speedtest-Native-32.exe
  • Speedtest-FastMM4-32.exe
  • Speedtest-NexusDB-32.exe
  • Speedtest-Native-64.exe
  • Speedtest-FastMM4-64.exe
  • Speedtest-NexusDB-64.exe

Each executable can run in single-core or multi-core mode. You can download the executables here (SpeedTest.zip). I then ran them on four different laptops:

  • Dell XPS 15: Windows 10 2.6 GHz i7-6700HQ
  • HP from 2009: Windows 7 2.2 GHz i7-2670QM
  • HP from 2014: Windows 8.1 2.4 GHz i7-3630QM
  • Surface Book: Windows 10 2.4 GUs i5-6300U

Here are the 32 bit results – each value is the times in milliseconds required to execute (smaller is better):

Chart32

SpeedTest32

And here are the 64 bit results – each value is the times in milliseconds required to execute (smaller is better):

Chart64

SpeedTest64

And here are the key points:

  1. NexusDB’s memory manager was impressive in every multi-threaded test (in some cases double the speed of FastMM4)
  2. FastMM4 did much better than I expected. There was a measurable speed improvement in all multi-threaded tests
  3. The native memory manager (which I thought was FastMM4) was measurably slower than FastMM4 in most multi-threaded tests (e.g. 64 bit multi-threaded). The difference was negligible in single thread tests.
  4. I was amazed at how well the 2009 laptop performed – Moore’s Law is clearly dead for laptops.

My conclusion is that NexusDB’s memory manager is the one to use if multi-thread performance is an issue.

All of these tests were carried out on laptops. I’d love to see the speeds when run on an eight core machine.

10 replies
  1. Alexandre Machado
    Alexandre Machado says:

    My results, single pass, core i7-4770 CPU @ 3.40 GHz, (4 physical cores + 4 hyperthreading), Windows 7 Pro:

    FastMM4 64 – Multicore: 10899
    Native 64 – Multicore: 13604
    Nexus DB 64 – Multicore: 10397

    FastMM4 32 – Multicore: 6399
    Native 32 – Multicore: 9138
    Nexus DB 32 – Multicore: 5539

    FastMM4 64 – single core: 35999
    Native 64 – single core: 36099
    Nexus DB 64 – single core: 37237

    FastMM4 32 – single core: 21088
    Native 32 – single core: 21298
    Nexus DB 32 – single core: 21843

    Reply
  2. IL
    IL says:

    Core2Duo E8500 @ 3.16 GHz, (2 cores), Windows 7 x86, best from 2 tries:

    FastMM4 32 – Multicore: 29769
    Native 32 – Multicore: 32581
    Nexus DB 32 – Multicore: 29196

    FastMM4 32 – single core: 56105
    Native 32 – single core: 55363
    Nexus DB 32 – single core: 57110

    Reply
  3. Michal Abramczyk
    Michal Abramczyk says:

    Delphi 10 Seattle on Windows7-64 i7-2600 4C-HT 3,5GHz

    Native
    SC-32 21,5
    MC-32 9,3
    SC-64 39,6
    MC-64 18,1

    Scalemm2
    SC-32 9,4
    MC-32 2,6
    SC-64 40,1
    MC-64 10,7

    Reply

Trackbacks & Pingbacks

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply