Wireshark-dev: Re: [Wireshark-dev] [Wireshark-core] Hash Tables and Algorithmic Complexity Atta
From: Evan Huus <eapache@xxxxxxxxx>
Date: Tue, 22 Apr 2014 08:23:41 -0400
> On Apr 22, 2014, at 7:52 AM, Bálint Réczey <balint@xxxxxxxxxxxxxxx> wrote: > > [Bringing the discussion to -dev with Evan's permission] > > 2014-04-22 10:15 GMT+02:00 Anders Broman <anders.broman@xxxxxxxxxxxx>: >> >> >> -----Original Message----- >> From: wireshark-core-bounces@xxxxxxxxxxxxx [mailto:wireshark-core-bounces@xxxxxxxxxxxxx] On Behalf Of Evan Huus >> Sent: den 22 april 2014 05:36 >> To: Wireshark core developers >> Subject: Re: [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks >> >>> On Mon, Apr 21, 2014 at 6:28 PM, Balint Reczey <rbalint@xxxxxxxxx> wrote: >>> >>>> On Apr 22, 2014 12:11 AM, Evan Huus <eapache@xxxxxxxxx> wrote: >>>> >>>>> On Mon, Apr 21, 2014 at 3:59 PM, Bálint Réczey <balint@xxxxxxxxxxxxxxx> wrote: >>>>> Hi Evan, >>>>> >>>>> 2014-04-21 18:21 GMT+02:00 Evan Huus <eapache@xxxxxxxxx>: >>>>>> (sending to -core because of potential security implications) >>>>>> >>>>>> I was recently investigating implementing a wmem-based hash table, >>>>>> since many of the current users of the wmem-tree do not actually >>>>>> need the predecessor-lookup feature which is the only advantage it >>>>>> provides over a hash table (whereas a hash table is otherwise much faster). >>>>>> >>>>>> I ended up wandering down the rabbit-hole of hash collisions, >>>>>> algorithmic complexity attacks, and universal hashing ([1], [2], >>>>>> [3] and more). >>>>>> >>>>>> Then I read the Glib hash table documentation: [4]. A quick grep >>>>>> through the Wireshark source reveals numerous locations where we >>>>>> use packet data in hash tables with insecure hash functions. As >>>>>> such, I believe that Wireshark is vulnerable to a whole class of >>>>>> denial-of-service attacks in this area. >>>>>> >>>>>> Has this come up before? Am I overlooking something clever we're doing? >>>>> It is true that Wireshark is vulnerable to hash collision attacks, >>>>> and I think it did not come up before because we had enough >>>>> vulnerabilities of other simpler classes. ;-) >>>> >>>> Makes sense :) >>>> >>>>> I think we could create wrappers to provide random seed to standard >>>>> glib hash functions which is the standard way of handling such >>>>> vulnerabilities. >>>> >>>> Unfortunately (based on my reading) that will not be sufficient. >>>> There are two vectors for an attack via collisions: the mapping from >>>> key to guint, and the mapping from guint to bucket index. The first >>>> (key->guint) is user-settable so we can provide a random seed. The >>>> second (guint->bucket) is not user-controllable. From the glib >>>> documentation: >>>> >>>> "The modulus is taken with the hash table size (a prime number) to >>>> find the 'bucket' to place each key into." >>>> and then a few paragraphs down >>>> "Even cryptographic hashes are very easy to find collisions for when >>>> the remainder is taken modulo a somewhat predictable prime number." >>>> >>>> So basically, it doesn't matter how strong we make the initial >>>> mapping because the secondary bucket mapping is predictable anyways. >>>> Fortunately there are easy and efficient ways to make the secondary >>>> mapping unpredictable (it's actually simpler than the initial >>>> mapping) so I guess that a good secure wmem map implementation is >>>> actually fairly important to have. >>> Luckily ghashtable resizes itself increasing the number of buckets when the collision rate is high, thus this attack can't be performed effectively. >>> The described problem is valid only for hash tables with fixed bucket count. >>> >>> Regarding the wmem hash tables I think C wrapping C++ STL hash tables with wmem based custom allocators would do the job. >>> >>> Unfortunately this doesn't work; the free-all operation which wmem provides doesn't play nice with C++ destructors (besides which we are still pure-C for libwireshark for the time being). > We lost being C only with the Qt UI. :-) Not in libwireshark, there may be 3rd-party c-only apps which build against that. > It can be implemented several ways which work, one is registering each > hash table to the proper wmem pool and calling their destructor when > freeing the pool - this way we don't have to play with a custom > allocator. But then we lose the fast free-all which was the entire point. Otherwise we might as well use smart pointers. > Other technique would be not calling destructors, just freeing all the > allocated connected objects. This is not nice, but would work as well > IMO, since we would not hold refs to the free()-d objects. Might work in practice, but definitely not guaranteed to work by the standard. >>> >>> I have whipped up a very basic wmem-map implementation at [1]. It uses universal multiply-shift hashing [2] for bucket placement, and provides a wmem_secure_hash function for replacing >g_str_hash and friends, based on work done by the Perl community [3] in hardening their hash tables. To the best of my knowledge the resulting hash map is secure against complexity attacks. >>> >>> It still needs cleanup (comments, tests, etc). I would also like to replace one tree somewhere and run benchmarks to make sure it's actually faster. Suggestions and concerns are welcome. > It would be nice to see how it compares to other implementations: > http://incise.org/hash-table-benchmarks.html > > I prefer using existing building building blocks instead of inventing our own. I agree, I just couldn't find an existing implementation that was easily adaptable to memory pools. > Cheers, > Balint > >>> >>> [1] https://code.wireshark.org/review/1272 >>> [2] https://en.wikipedia.org/wiki/Universal_hashing#Avoiding_modular_arithmetic >>> [3] http://blog.booking.com/hardening-perls-hash-function.html >> >> >> Would it be good to have initial hastable size in the API? So a table could be pre allocated to a custom size for the cases where we know the size will be larger than standard default. > I think it would not be good. I would save a constant number resizes > (O(1) gain) but we would reserve more memory for each hash table > initially. > > Cheers, > Balint > >> Regards >> Anders >> >> >>> Cheers, >>> Balint >>> >>>> >>>>> AFAIK ghashtable does not employ randomization of hash function seed: >>>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655044 >>>>> >>>>> Cheers, >>>>> Balint >>>>> >>>>>> >>>>>> Evan >>>>>> >>>>>> P.S. I believe it's still perfectly OK to use hash tables with >>>>>> non-packet data such as static value_strings or as in [5]. >>>>>> >>>>>> [1] http://lwn.net/Articles/474912/ [2] >>>>>> http://perldoc.perl.org/perlsec.html#Algorithmic-Complexity-Attack >>>>>> s [3] https://en.wikipedia.org/wiki/Universal_hashing >>>>>> [4] >>>>>> https://developer.gnome.org/glib/unstable/glib-Hash-Tables.html#GH >>>>>> ashFunc [5] >>>>>> https://code.wireshark.org/review/gitweb?p=wireshark.git;a=commit; >>>>>> h=5983cda769fa6615dda5fc7b8f87819d40f0a8d5 >>>>> > ___________________________________________________________________________ > Sent via: Wireshark-dev mailing list <wireshark-dev@xxxxxxxxxxxxx> > Archives: http://www.wireshark.org/lists/wireshark-dev > Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev > mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe
- References:
- Prev by Date: Re: [Wireshark-dev] [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks
- Next by Date: Re: [Wireshark-dev] How can Wireshark improve
- Previous by thread: Re: [Wireshark-dev] [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks
- Next by thread: [Wireshark-dev] preventing malformed packet errors with dissector when desegment is turned off
- Index(es):