ircd-hybrid-7.1beta1 vs ircd-ratbox-2.0.8 hash algorithm comparisons

Sat Apr 23 13:23:31 EDT 2005

I connected several (150,000) fake clients via siege.pl, a lightweight ircd
stress testing script I wrote. Drones were connected with very similar nicknames
(userXXXXXX where XXXXXX is the client number that ranged between 1 and 150000).
The UID's were pretty similar too, sequentially they went from 0AZAAAAAA to
0AZAAINXF, but this is expected behavior. Although nicknames are not expected to
be too close to each other during normal operation, UID's can reasonably assumed
as such.

The main hub (hybrid/bin/ircd) ran ircd-hybrid-7.1beta. The test leaves utilized
ircd-hybrid-7.1beta (hybrid2/bin/ircd) and ircd-ratbox-2.0.8 (ratbox/bin/ircd)

Conclusions:
ircd-hybrid showed the least memory consumption, but was extremely slow in terms
of being able to generate statistics (one server took 351.56 seconds to respond
and the other took 442.25). With such similar UID's, the distribution became
extremely poor, indicating a potential denial of service attack. I don't think
that ircd-hybrid7.1beta is suitable for EFnet at time of writing for this
reason. UID's are similar by their nature, so the creation of such large chains
(the maximum chain length was 237 client pointers), which would have to be
walked when seeking a specific client. Hopefully this will be corrected before
7.1 becomes a release candidate; it's definitely not ready for a stable release
with its current algorithm.

ircd-ratbox's distribution is much better, at the cost of consuming substantial
amounts of memory compared to ircd-hybrid (about 3.5 times more, in fact). Most
nodes end up having only one or two entries. It should be noted that ircd-ratbox
uses the Fowler/Noll/Vo (FNV) hashing algorithm, which is proven to be fast
while maintaining a low collision rate, even with keys that are as similar as
UID's are. With such short chains, the FNV algorithm seems to be working nicely
for EFnet.

cerberus used by far the most memory but presumably had almost no collisions
(with 150,000 clients, 113,000 nodes in use meant that most contained one or two
clients). Perl's hashing algorithm (from 5.8 and on) has an extra shift to avoid
collisions, at the expense of much more memory consumption. As well, the number
in %MEM includes other hashes, like nickname hashes and server hashes, too.

Evidence:
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
frequen    767  0.0  7.9 74424 22748 ?       Ss   Apr14   4:13 hybrid/bin/ircd
entries: 150010 buckets: 1565 max chain: 237

frequen    773  0.0  1.1 74280 3400 ?        Ss   Apr14   2:09 hybrid2/bin/ircd
entries: 150007 buckets: 1586 max chain: 212

frequen  31505  0.0 25.7 82288 73248 ?       Ss   Apr22   0:03 ratbox/bin/ircd
Size: 65536 Empty: 5967 (9.105%)
Average depth: 2.000/2.000 Highest depth: 11
Nodes with 0 entries: 5967
Nodes with 1 entries: 15259
Nodes with 2 entries: 18075
Nodes with 3 entries: 13697
Nodes with 4 entries: 7565
Nodes with 5 entries: 3287
Nodes with 6 entries: 1174
Nodes with 7 entries: 364
Nodes with 8 entries: 118
Nodes with 9 entries: 23
Nodes with 10 entries: 7

frequen  31532  0.2 40.1 122468 114460 pts/2 S+   Apr22   2:26
cerberus: cerberus.int
113269 of 262144 buckets in use

Suitability:
ircd-ratbox seems well geared towards TS6-compatible networks such as EFnet,
while ircd-hybrid seems to have several features geared towards smaller IRC
networks, where such a collision rate is acceptable. ircd-hybrid might be
desirable when memory consumption is a concern (public shell providers, etc)

If you have any comments or anything to add (maybe I just happened to
find a particularly bad test case?), feel free to reply back to the
list. I would especially appreciate commentary from the developers of
ircd-hybrid in case this issue has been fixed for new versions.
-- 
Regards,

Rachel Llorenna (frequency)