Wednesday, November 9, 2011

Puzzling heap performance in CRT under Visual Studio

So I'm running this app which does reasonable amount of allocations using standard CRT functions. It's compiled for release, but I was running it under Visual Studio. Absolutely terrible memory allocator performance, especially the one in free: 89 seconds to free just under a million structures (the app uses just two sizes - around 32000 2056 byte structures and about 900000 36 byte structures).

Generating: 1000000
Adding: 951 ms
Port: 10000, total ip entries: 994937, total memory used: 3673124
    Entries at level 1: 128 (blanket: 0)
    Entries at level 2: 32768 (blanket: 0)
    Entries at level 3: 885384 (blanket: 3975)
Testing the ip ranges... 0
Testing ip ranges: 27378 ms
Total ips allowed across all ports: 2012537
Removing: 71355 ms
Port: 10000, total ip entries: 0, total memory used: 0
    Entries at level 1: 0 (blanket: 0)
    Entries at level 2: 0 (blanket: 0)
    Entries at level 3: 0 (blanket: 0)

Fine, I replaced the allocator with my own, using fixed memory blocks (described here: http://1-800-magic.blogspot.com/2007/11/guerilla-guide-to-native-memory.html). The object removal time promptly dropped to just above 200ms. Ok. Then, on a hunch, I run this - with the standard CRT allocator - from the command line:

Generating: 1000000
Adding: 281 ms
Port: 10000, total ip entries: 994937, total memory used: 3673124
    Entries at level 1: 128 (blanket: 0)
    Entries at level 2: 32768 (blanket: 0)
    Entries at level 3: 885384 (blanket: 3975)
Testing the ip ranges... 0
Testing ip ranges: 27831 ms
Total ips allowed across all ports: 2012537
Removing: 249 msPort: 10000, total ip entries: 0, total memory used: 0
    Entries at level 1: 0 (blanket: 0)
    Entries at level 2: 0 (blanket: 0)
    Entries at level 3: 0 (blanket: 0)

Come on, Visual Studio guys. Release should mean RELEASE - not a debug heap!

Interestingly, I ran it under profiler after this, and the profiler does the right thing - it uses the fast heap. Otherwise the results would be really, really screwed. It's a good thing - if you happen to use the profiler or at least run it outside the environment before pursuing the problem that does not even exist!

1 comment:

Jon said...

It's too slow even for debug. Are there any other options to speed this up without changing the allocator?