Sparsehash 2.0.4 Performance Issues: A Deep Dive
Introduction
In this article, we delve into a critical performance regression observed in version 2.0.4 of the Sparsehash library, specifically within the google::dense_hash_map implementation. This issue, which is absent in version 2.0.3, was brought to light during routine benchmarking of high-performance container libraries, including google::dense_hash_map, as part of the development of the TommyDS data structure library (www.tommyds.it). This article aims to thoroughly document the problem, providing insights, reproducible examples, and a detailed analysis for developers and users of the Sparsehash library. Understanding the root cause and potential solutions is crucial for maintaining the efficiency and reliability of applications that rely on this widely-used hash map implementation.
The Problem: Size-Dependent Performance Degradation
The core issue manifests as a dramatic performance slowdown when the number of elements in the hash map approaches a power of two. In these instances, operations that typically take milliseconds can extend to seconds, or even minutes, making the hash map virtually unusable. This pathological behavior is particularly concerning as it affects a fundamental data structure used in numerous performance-critical applications. The slowdown intensifies as the element count nears the power-of-two threshold, creating a sharp performance cliff. This size-dependent performance degradation is the primary focus of our investigation.
Interestingly, the problem disappears immediately after surpassing the power-of-two count. Adding just one more element beyond the worst-case limit restores the execution time to its expected state. This abrupt shift suggests that the underlying issue is closely tied to the internal resizing or rehashing mechanisms of the hash map, which are often triggered when the number of elements reaches a certain threshold related to its capacity.
Demonstrating the Issue: A Reproducible Test Case
To illustrate this performance regression, consider the following scenario. When using 65530 elements in the google::dense_hash_map, the performance is as expected, with insertion, change, and removal operations completing quickly:
$ time ./tommybench 65530
65530
insert
change
remove
real 0m0.006s
user 0m0.004s // Immediate
sys 0m0.002s
However, reducing the element count by just one, to 65529, results in a significant slowdown:
$ time ./tommybench 65529
65529
insert
change // HERE IS SLOW
remove
real 0m5.057s
user 0m5.056s // Approximately 5 seconds delay
sys 0m0.001s
This example clearly demonstrates the dramatic impact of the element count on performance, with a single element difference causing a slowdown of several orders of magnitude. This behavior is not an isolated incident; similar performance regressions have been observed at other element counts close to powers of two, further highlighting the systemic nature of the issue.
Code Example: tommybench.c
To further document and track this size-dependent performance degradation, a test program (tommybench.c) has been developed. This program provides a concrete example of the issue and can be used to reproduce the problem on different systems and configurations. The source code is provided below:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stddef.h>
#include <stdbool.h>
#include <stdint.h>
/* From https://github.com/sparsehash/sparsehash version 2.0.4 */
#include <google/dense_hash_map>
struct KnuthMultiplicativeHash {
size_t operator()(const int& key) const {
const uint32_t C = 2654435761U;
uint32_t h = (uint32_t)key;
h *= C;
h ^= (h >> 16);
return (size_t)h;
}
};
struct JenkinsMixHash {
size_t operator()(const int& key) const {
size_t h = (size_t)key;
h += (h << 10);
h ^= (h >> 6);
h += (h << 3);
h ^= (h >> 11);
h += (h << 15);
return h;
}
};
/*
* The hash function used here (KnuthMultiplicativeHash) is arbitrary.
* Testing has shown that substituting it with JenkinsMixHash, other
* custom hash functions, or even relying on the default hash function
* does not resolve the pathological performance issue.
*/
typedef google::dense_hash_map<int, int, KnuthMultiplicativeHash> googledensehash_t;
googledensehash_t* googledensehash;
unsigned the_max; /**< Number of elements in test. */
void test_insert(void)
{
unsigned i;
for(i=0;i<the_max;++i) {
/* insert */
unsigned key = i * 2;
unsigned value = key;
(*googledensehash)[key] = value;
}
}
void test_change(void)
{
unsigned i;
for(i=0;i<the_max;++i) {
/* find */
unsigned key = i * 2;
unsigned value;
googledensehash_t::iterator ptr;
ptr = googledensehash->find(key);
if (ptr == googledensehash->end())
abort();
value = ptr->second;
if (value != key)
abort();
/* remove */
googledensehash->erase(ptr);
/* reinsert with a different key */
key = key + 1;
value = key;
(*googledensehash)[key] = value;
}
}
void test_remove(void)
{
unsigned i;
for(i=0;i<the_max;++i) {
/* find */
unsigned key = i * 2 + 1;
unsigned value;
googledensehash_t::iterator ptr = googledensehash->find(key);
if (ptr == googledensehash->end())
abort();
value = ptr->second;
if (value != key)
abort();
/* remove */
googledensehash->erase(ptr);
}
}
int main(int argc, char * argv[])
{
/*
* The values below represent element counts (N) found to exhibit the
* pathological performance regression in google::dense_hash_map v2.0.4.
*
* These numbers are slightly larger than powers of two (e.g., 2^20 + 9,
* 2^19 - 7, 2^18 + 9, 2^17 + 9, and 2^16 - 7), suggesting an issue with
* resizing or load factor calculation at critical power-of-two boundaries.
*
* the_max is intentionally set to 65529 as the default test case
* because, unlike the others, this specific pathological size terminates
* within a reasonable timeframe (seconds) instead of requiring manual
* termination (minutes/hours).
*/
the_max = 1048569;
the_max = 524281;
the_max = 262137;
the_max = 131065;
the_max = 65529;
if (argc > 1)
the_max = atoi(argv[1]);
printf("%u\n", the_max);
googledensehash = new googledensehash_t;
googledensehash->set_empty_key(-1);
googledensehash->set_deleted_key(-2);
printf("insert\n");
test_insert();
printf("change\n");
test_change();
printf("remove\n");
test_remove();
delete googledensehash;
return EXIT_SUCCESS;
}
This program demonstrates the issue by performing a series of insert, change, and remove operations on a google::dense_hash_map. The the_max variable controls the number of elements used in the test, and the program highlights the performance slowdown when this value is close to a power of two. The comments within the code provide further context and explain the rationale behind the default test case.
Investigating the Root Cause
The observed performance regression strongly suggests an issue related to the internal resizing or rehashing mechanism of google::dense_hash_map. When a hash map reaches a certain load factor (the ratio of elements to buckets), it typically resizes its internal storage to maintain performance. This resizing process involves allocating a new, larger bucket array and rehashing all existing elements into the new array. If this process is not implemented efficiently, it can lead to significant performance overhead, especially when dealing with large numbers of elements.
The fact that the slowdown occurs specifically near powers of two suggests a potential problem with the way the hash map calculates the new bucket size during resizing. It is possible that the resizing logic is not correctly handling edge cases or is leading to suboptimal bucket sizes, resulting in increased collisions and slower lookups. Another possibility is that the rehashing algorithm itself is inefficient, perhaps due to excessive memory allocations or poor cache utilization.
Hash Function Independence
An important finding is that changing the underlying hash function does not mitigate this size-dependent performance degradation. The test program demonstrates this by allowing the use of different hash functions, including KnuthMultiplicativeHash and JenkinsMixHash. However, regardless of the hash function used, the performance slowdown persists when the number of elements approaches a power of two. This observation suggests that the issue is not primarily related to the quality of the hash function or the distribution of hash values, but rather to the resizing or rehashing process itself.
Implications and Impact
This performance regression has significant implications for applications that rely on google::dense_hash_map in Sparsehash 2.0.4. The dramatic slowdown near powers of two can lead to unexpected performance bottlenecks and severely impact the responsiveness of applications. Developers need to be aware of this issue and take steps to mitigate its effects, such as avoiding element counts near powers of two or using alternative hash map implementations. For performance-critical applications, thorough testing and benchmarking are essential to identify and address potential performance regressions.
Potential Solutions and Workarounds
While a comprehensive fix requires a deeper investigation into the google::dense_hash_map implementation, there are several potential solutions and workarounds that developers can consider:
- Avoid element counts near powers of two: If possible, design the application to avoid using hash maps with sizes that are close to powers of two. This may involve padding the number of elements or using a different data structure altogether.
- Use a different hash map implementation: Consider using an alternative hash map implementation that does not exhibit this performance regression. There are numerous open-source hash map libraries available, each with its own performance characteristics and trade-offs.
- Revert to Sparsehash 2.0.3: If feasible, reverting to version 2.0.3 of the Sparsehash library may be a viable workaround, as this version does not exhibit the performance regression.
- Custom resizing logic: If in-depth knowledge is available, modifying the resizing logic to avoid this situation.
Conclusion
The performance regression in Sparsehash 2.0.4's google::dense_hash_map is a serious issue that can significantly impact the performance of applications. The size-dependent nature of the slowdown, particularly near powers of two, points to a problem with the resizing or rehashing mechanism. While the exact root cause requires further investigation, developers can take steps to mitigate the issue by avoiding element counts near powers of two, using alternative hash map implementations, or reverting to Sparsehash 2.0.3. This article has provided a detailed analysis of the problem, along with a reproducible test case, to aid in understanding and addressing this important performance regression. For further information on hash map implementations and performance considerations, visit Wikipedia's article on Hash Tables.