Discussion:
Hashmap iteration is too costly
(too old to reply)
v***@pengaru.com
2017-07-14 10:18:43 UTC
Permalink
Raw Message
The current hashmap iteration incurs at least one function call per
iteration, and in my observations using gcc 6.3 w/-g -O2, it's two:

internal_hashmap_iterate()
hashmap_iterate_entry()

for every element in the hashmap.

In profiles of `journalctl -b --no-pager` having multiple (50) log files,
the hashmap iteration shows up as ~10% of the CPU in callgrind output.

Since it appears effort has been made to keep the hashmap structure opaque,
making these iterators zero-cost seems non-trivial.

As an experiment, I added two new hashmap functions:

hashmap_get_array()
ordered_hashmap_get_array()

These return an allocated array snapshot of the hashmap's contents, in the
same order they would be iterated in. Their implementation has direct
access to the hashmap internals, and uses inlined versions of the
type-specific iterator functions to avoid the per-element function call
overhead of the usual iterators.

You can find the code here:

https://github.com/systemd/systemd/compare/303608...vcaputo:hashmap_get_array

The last commit includes some numbers, which I'll include them here for
convenience:

Before:

# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.452s
user 0m16.371s
sys 0m0.066s

# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.375s
user 0m16.310s
sys 0m0.057s

# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.390s
user 0m16.329s
sys 0m0.047s

After:

# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.827s
user 0m15.769s
sys 0m0.052s

# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.650s
user 0m15.592s
sys 0m0.050s

# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.646s
user 0m15.574s
sys 0m0.056s


Note this is adding the cost of a malloc and free, and copying the
elements. It's still faster than the existing hashmap iterator.

I'm not sold on this being a good solution, it's more meant to shed light
on what I feel is a bit of a wart and hopefully start some discussion.

Regards,
Vito Caputo
Lennart Poettering
2017-07-17 08:12:12 UTC
Permalink
Raw Message
Post by v***@pengaru.com
The current hashmap iteration incurs at least one function call per
internal_hashmap_iterate()
hashmap_iterate_entry()
for every element in the hashmap.
In profiles of `journalctl -b --no-pager` having multiple (50) log files,
the hashmap iteration shows up as ~10% of the CPU in callgrind
output.
I wonder if it wouldn't be easiest to simply bypass hashmap native
iteration for this case. i.e. add a linked list for the open journal
files which we can easily traverse in O(1) time?

The hashmap implementation has been optimized to require very little
memory, because we have so many hashmap/sets around in PID 1. It's not
optimized to make iteration fast though.

Lennart
--
Lennart Poettering, Red Hat
Zbigniew Jędrzejewski-Szmek
2017-07-17 14:18:54 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by v***@pengaru.com
The current hashmap iteration incurs at least one function call per
internal_hashmap_iterate()
hashmap_iterate_entry()
for every element in the hashmap.
In profiles of `journalctl -b --no-pager` having multiple (50) log files,
the hashmap iteration shows up as ~10% of the CPU in callgrind output.
I wonder if it wouldn't be easiest to simply bypass hashmap native
iteration for this case. i.e. add a linked list for the open journal
files which we can easily traverse in O(1) time?
The hashmap implementation has been optimized to require very little
memory, because we have so many hashmap/sets around in PID 1. It's not
optimized to make iteration fast though.
See also https://github.com/systemd/systemd/pull/6365:
test-hashmap took more than two minutes on rpi3.

Zbyszek
v***@pengaru.com
2017-07-17 17:33:33 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by v***@pengaru.com
The current hashmap iteration incurs at least one function call per
internal_hashmap_iterate()
hashmap_iterate_entry()
for every element in the hashmap.
In profiles of `journalctl -b --no-pager` having multiple (50) log files,
the hashmap iteration shows up as ~10% of the CPU in callgrind output.
I wonder if it wouldn't be easiest to simply bypass hashmap native
iteration for this case. i.e. add a linked list for the open journal
files which we can easily traverse in O(1) time?
What part of hashmap_iterate_in_insertion_order() grows in complexity with
increasing N?
Post by Lennart Poettering
The hashmap implementation has been optimized to require very little
memory, because we have so many hashmap/sets around in PID 1. It's not
optimized to make iteration fast though.
I don't believe these are mutually exclusive.

Regards,
Vito Caputo

Loading...