1.8官方文档 谷歌翻译

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。

HashMap类是基本哈希表的Map实现。它提供了所有可选的map操作(也就是说Map类中的default方法,HashMap类也都实现了自己的一套),并且允许null键和null值。除了HashMap线程不安全并且允许null键值外,HashMap类与Hashtable类大致相同。

HashMap也不保证map中的数据顺序,事实上,它都不保证这个数据顺序会不随时间而改变(所以一般在与数据库交互的持久化过程中,插入顺序与取出顺序是不会一致的,比如我们在mysql中用了order by进行排序,然而读取并没有效果。这种情况可以使用LinkedHashMap)。

遍历map不采用迭代器,而是Map.Entry<V,V>,通过HashMap的entrySet(),根据entry的getKey和getValue方法遍历。

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

假设hash函数(也就是get和put过程中调用的hash()函数)让桶中的元素适当的分布,那么可以使得HashMap在处理get和put方法时花费的时间是常量级的(O(1))。迭代的性能依赖于capacity(桶的数量)加上键值对的大小。因此想要保证高性能的话,不要初始化capacity太高也不要让复杂因子太低是很重要的。

类在初始化时,会创建一个长度为capacity的brucket(桶)数组,每个桶中是一个Entry,这个Entry也可能只是一个引用,从而引用到了别的Entry上,形成了Entry链。这会导致哈希冲突。而迭代性能与容量(capacity)以及负载因子(load factor)有关。当HashMap中的Entry数量不超过极限容量(容量*负载因子)时,HashMap就不会调用resize()函数。一般负载因子是0.75。这是平衡了性能与内存消耗的值。

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

影响HashMap行为的参数有两个:1.初始化的容量 2.负载因子。容量就是哈希表中桶(brucket)的数量,初始化容量也就是哈希表在被创建时的容量。负载因子是对在哈希表的容量在自动增长之前,哈希表有多"满"的一个度量。当哈希表中实体的数量超过了容量与负载因子之积,哈希表将会重新rehashed(即内部数据结构重构)。哈希表将会变大为桶的数量的两倍。

桶的数量 >= entry的数量。根据注释来看。 entry = 0.75 bruckets 似乎是一个比较优的解。当entry的数量大于 负载因子 乘以 桶 的数量时,就会调用resize(),将capacity(brucket桶的数量)扩为原来的两倍。

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

通常来说,0.75的默认负载因子是一个对空间和时间性能的好的权衡。较高的值会减少空间开销,但会增加查找(时间)成本(反映在HashMap类的大多数操作中,包括get()和put()。为了最小化再哈希的次数,在一开始设置map的初始化容量就应当考虑其期望的entry数以及负载因子值。如果 **初始容量 > entry数 / 负载因子**,在哈希就永远不会发生

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

如果HashMap实例中存储着许多键值对,以足够大的容量创建这个map在存储效率方面将会比让map自动扩容再哈希的方法效率更高。注意 使用许多带有相同hashCode()函数的键确实会降低任意哈希表的效率。为了改善这种情况,当键是可以比较的情况下,HashMap将会在键中相互比较进行排序来打破这种(低效率)关系。

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using theCollections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:

注意HashMap的实现是非同步的(线程不安全)。如果多个线程同时访问一个hash map,并且线程中至少有一个线程对map进行了结构性的修改,那么在HashMap外面必须要同步。(结构性修改是指任何添加或者删除键值对的操作,仅仅修改一个map中一个key对应的value并不是结构性的修改)。这通常通过同步封装了map的某个对象来实现。如果这样的对象不存在,map应该通过Collections.synchronizedMap()进行封装。最好在创建的时候就封装,这样可以避免意外的对map的非同步访问。

Map m = Collections.synchronizedMap(new HashMap(...));

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's ownremove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

所有这个类的"集合视图方法"都是 fail-fast 的:如果在创建迭代器之后的任意时间对map进行结构性的修改,除了迭代器自己的remove方法外,迭代器将抛出ConcurrentModificationException。因此,在并发修改的情况下,迭代器迅速而干净的失败(也就是用不了了),而不是在未来的未确定时间冒着未知,非确定性的风险。

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

This class is a member of the Java Collections Framework.

注意到迭代器的这种fail-fast行为无法得到保证,因为一般来说,在存在不同步的并发修改时,我们不可能做出任何硬性保证(也就是说不保证一定会跑出并发修改异常)。fail-fast迭代器会尽最大努力抛出ConcurrentModificationException。因此编写依赖此异常的程序以确保其正确性是错误的:迭代器的fail-fast行为应当仅被用来检测错误。

HashMap属于Java集合框架

扫码关注我们
微信号:SRE实战
拒绝背锅 运筹帷幄