HashMap

印度阿三17 2019-08-21

展開全文

為了便于理解,，以下源碼分析以 JDK 1.7 為主。

存儲結(jié)構(gòu)

HashMap 內(nèi)部的存儲結(jié)構(gòu)是一個 Entry 類型的數(shù)組 table,。

transient Entry[] table;

Entry 是 Map 的內(nèi)部類,，用于存儲鍵值對，它包含了四個屬性：key,，value,，hash 值和 next 指針。從 next 指針可以看出,， table 數(shù)組中的每個位置被當(dāng)成一個桶,，一個桶存放一個 Entry 類型的鏈表。HashMap 使用拉鏈法來解決哈希沖突,，同一個鏈表中存放哈希值和散列桶取模運(yùn)算結(jié)果相同的 Entry,，即 hashCode % table.length 相等的 Entry 存放在同一個桶（鏈表）中。

static class Entry<K,V> implements Map.Entry<K,V> {
    final K key;
    V value;
    Entry<K,V> next;
    int hash;

    Entry(int h, K k, V v, Entry<K,V> n) {
        value = v;
        next = n;
        key = k;
        hash = h;
    }

    public final K getKey() {
        return key;
    }

    public final V getValue() {
        return value;
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (!(o instanceof Map.Entry))
            return false;
        Map.Entry e = (Map.Entry)o;
        Object k1 = getKey();
        Object k2 = e.getKey();
        if (k1 == k2 || (k1 != null && k1.equals(k2))) {
            Object v1 = getValue();
            Object v2 = e.getValue();
            if (v1 == v2 || (v1 != null && v1.equals(v2)))
                return true;
        }
        return false;
    }

    public final int hashCode() {
        return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
    }

    public final String toString() {
        return getKey()   "="   getValue();
    }
}

拉鏈法的工作原理

HashMap<String, String> map = new HashMap<>();
map.put("K1", "V1");
map.put("K2", "V2");
map.put("K3", "V3");Copy to clipboardErrorCopied

新建一個 HashMap，默認(rèn)大小為 16,；
插入 <K1,V1> 鍵值對,，先計算 K1 的 hashCode 為 115，使用除留余數(shù)法得到所在的桶下標(biāo) 115=3,。
插入 <K2,V2> 鍵值對,，先計算 K2 的 hashCode 為 118，使用除留余數(shù)法得到所在的桶下標(biāo) 118=6,。
插入 <K3,V3> 鍵值對,，先計算 K3 的 hashCode 為 118，使用除留余數(shù)法得到所在的桶下標(biāo) 118=6,，插在 <K2,V2> 前面,。

應(yīng)該注意到鏈表的插入是以頭插法方式進(jìn)行的,，例如上面的 <K3,V3> 不是插在 <K2,V2> 后面,，而是插入在鏈表頭部。

查找需要分成兩步進(jìn)行：

計算鍵值對所在的桶,；
在鏈表上順序查找,，時間復(fù)雜度顯然和鏈表的長度成正比。

put 操作

public V put(K key, V value) {
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    // 鍵為 null 單獨(dú)處理,，所有 HashMap 允許 key 為 null
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);
    // 確定桶下標(biāo)
    int i = indexFor(hash, table.length);
    // 先找出是否已經(jīng)存在鍵為 key 的鍵值對,，如果存在的話就更新這個鍵值對的值為 value
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount  ;
    // 插入新鍵值對
    addEntry(hash, key, value, i);
    return null;
}

HashMap 允許插入鍵為 null 的鍵值對。但是因?yàn)闊o法調(diào)用 null 的 hashCode() 方法,，也就無法確定該鍵值對的桶下標(biāo),，只能通過強(qiáng)制指定一個桶下標(biāo)來存放。HashMap 使用第 0 個桶存放鍵為 null 的鍵值對,。

private V putForNullKey(V value) {
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount  ;
    addEntry(0, null, value, 0);
    return null;
}

使用鏈表的頭插法,，也就是新的鍵值對插在鏈表的頭部，而不是鏈表的尾部,。

void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    // 頭插法,，鏈表頭部指向新的鍵值對
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size  ;
}Copy to clipboardErrorCopied
Entry(int h, K k, V v, Entry<K,V> n) {
    value = v;
    next = n;
    key = k;
    hash = h;
}

get 操作

相對于 put 過程，get 過程是非常簡單的,。

根據(jù) key 計算 hashCode 值,。
找到相應(yīng)的桶下標(biāo)：hashCode % capacity。
遍歷該桶位置處的鏈表,，直到找到相等( == 或 equals ) 的 key,。

public V get(Object key) {
    // 之前說過，key 為 null 的話,，會被放到 table[0],，所以只要遍歷下 table[0] 處的鏈表就可以了
    if (key == null)
        return getForNullKey();
   
    Entry<K,V> entry = getEntry(key);
    return null == entry ? null : entry.getValue();
}

final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }
    int hash = (key == null) ? 0 : hash(key);
    // 確定桶下標(biāo)，然后從頭開始遍歷鏈表，直到找到為止
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

確定桶下標(biāo)

無論是 put 還是 get 操作都需先確定一個鍵值對所在的桶下標(biāo),，計算桶下標(biāo)通過哈希和取模兩步實(shí)現(xiàn)：

int hash = hash(key);
int i = indexFor(hash, table.length);

1. 計算 hash 值

final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}Copy to clipboardErrorCopied
public final int hashCode() {
    return Objects.hashCode(key) ^ Objects.hashCode(value);
}

2. 取模

令 x = 1<<4,，即 x 為 2 的 4 次方，它具有以下性質(zhì)：

x   : 00010000
x-1 : 00001111

令一個數(shù) y 與 x-1 做與運(yùn)算,，可以去除 y 位級表示的第 4 位以上數(shù)：

y       : 10110010
x-1     : 00001111
y&(x-1) : 00000010

這個性質(zhì)和 y 對 x 取模效果是一樣的：

y   : 10110010
x   : 00010000
y%x : 00000010

我們知道,，位運(yùn)算的代價比求模運(yùn)算小的多，因此在進(jìn)行這種計算時用位運(yùn)算的話能帶來更高的性能,。

確定桶下標(biāo)的最后一步是將 key 的 hash 值對桶個數(shù)取模：hash % capacity,，如果能保證 capacity 為 2 的 n 次方，那么就可以將這個操作轉(zhuǎn)換為位運(yùn)算,。

static int indexFor(int h, int length) {
    return h & (length-1);
}

擴(kuò)容-基本原理

設(shè) HashMap 的 table 長度為 M,，需要存儲的鍵值對數(shù)量為 N，如果哈希函數(shù)滿足均勻性的要求,，那么每條鏈表的長度大約為 N/M,，因此平均查找次數(shù)的復(fù)雜度為 O(N/M)。

為了讓查找的成本降低,，應(yīng)該盡可能使得 N/M 盡可能小,，因此需要保證 M 盡可能大，也就是說 table 要盡可能大,。HashMap 采用動態(tài)擴(kuò)容來根據(jù)當(dāng)前的 N 值來調(diào)整 M 值,，使得空間效率和時間效率都能得到保證。

和擴(kuò)容相關(guān)的參數(shù)主要有：capacity,、size,、threshold 和 load_factor。

參數(shù)	含義
capacity	table 的容量大小,，默認(rèn)為 16,。需要注意的是 capacity 必須保證為 2 的 n 次方。
size	鍵值對數(shù)量,。
threshold	size 的臨界值,，當(dāng) size 大于等于 threshold 就必須進(jìn)行擴(kuò)容操作。
loadFactor	裝載因子,，table 能夠使用的比例,，threshold = capacity * loadFactor。

static final int DEFAULT_INITIAL_CAPACITY = 16;  //capacity默認(rèn)值

static final int MAXIMUM_CAPACITY = 1 << 30;     //capacity最大值

static final float DEFAULT_LOAD_FACTOR = 0.75f;  //loadFactor默認(rèn)值

transient Entry[] table;

transient int size;

int threshold;

final float loadFactor;

transient int modCount;

從下面的添加元素代碼中可以看出,，當(dāng)需要擴(kuò)容時,，令 capacity 為原來的兩倍。

void addEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    if (size   >= threshold)
        resize(2 * table.length);
}

擴(kuò)容使用 resize() 實(shí)現(xiàn),，需要注意的是,，擴(kuò)容操作同樣需要把 oldTable 的所有鍵值對重新插入 newTable 中,，因此這一步是很費(fèi)時的。

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }
    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable);
    table = newTable;
    threshold = (int)(newCapacity * loadFactor);
}

void transfer(Entry[] newTable) {
    Entry[] src = table;
    int newCapacity = newTable.length;
    for (int j = 0; j < src.length; j  ) {
        Entry<K,V> e = src[j];
        if (e != null) {
            src[j] = null;
            do {
                Entry<K,V> next = e.next;
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            } while (e != null);
        }
    }
}

擴(kuò)容-重新計算桶下標(biāo)

在進(jìn)行擴(kuò)容時,，需要把鍵值對重新放到對應(yīng)的桶上,。JDK 1.7 是采用根據(jù)哈希值重新計算桶下標(biāo)的方式， JDK 1.8 后 HashMap 使用了一個特殊的機(jī)制,，可以降低重新計算桶下標(biāo)的操作,。

假設(shè)原數(shù)組長度 capacity 為 16，擴(kuò)容之后 new capacity 為 32：

capacity     : 00010000
new capacity : 00100000

對于一個 Key,，

它的哈希值如果在第 5 位上為 0,，那么取模得到的結(jié)果和之前一樣；
如果為 1,，那么得到的結(jié)果為原來的結(jié)果 16,。

計算數(shù)組容量

HashMap 構(gòu)造函數(shù)允許用戶傳入的容量不是 2 的 n 次方，因?yàn)樗梢宰詣拥貙魅氲娜萘哭D(zhuǎn)換為 2 的 n 次方,。

先考慮如何求一個數(shù)的掩碼,，對于 10010000，它的掩碼為 11111111,，可以使用以下方法得到：

mask |= mask >> 1    11011000
mask |= mask >> 2    11111110
mask |= mask >> 4    11111111

mask 1 是大于原始數(shù)字的最小的 2 的 n 次方,。

num     10010000
mask 1  100000000

以下是 HashMap 中計算數(shù)組容量的代碼：

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n   1;
}

JDK 1.8 鏈表轉(zhuǎn)紅黑樹

JDK 1.8 對 HashMap 進(jìn)行了一些修改,，最大的不同就是利用了紅黑樹,，所以其由數(shù)組鏈表紅黑樹組成。

JDK 1.7 查找的時候,，根據(jù) hash 值我們能夠快速定位到數(shù)組的具體下標(biāo),，但是之后的話，需要順著鏈表一個個比較下去才能找到我們需要的,，時間復(fù)雜度取決于鏈表的長度,，為 O(n)。為了降低這部分的開銷,，JDK 1.8 當(dāng)鏈表中的元素超過了 8 個以后,，會將鏈表轉(zhuǎn)換為紅黑樹，在這些位置進(jìn)行查找的時候可以降低時間復(fù)雜度為 O(logn),。

紅黑樹時間復(fù)雜度比鏈表小,，為什么不全部使用紅黑樹結(jié)構(gòu)，而采用鏈表紅黑樹結(jié)構(gòu),？
時間復(fù)雜度是只留最高項,，并且去除系數(shù)，當(dāng)數(shù)據(jù)量很小的時候,，系數(shù)和常數(shù)項的影響因素很大,。數(shù)據(jù)量小的時候,，鏈表搜索效率比紅黑樹高，因此采用鏈表紅黑樹結(jié)構(gòu),。

參考

https://cyc2018./CS-Notes/#/notes/Java 容器
https://mp.weixin.qq.com/s/usLEfjU-PJ3RbrdmJ_bw3w

來源：https://www./content-4-399701.html

本站是提供個人知識管理的網(wǎng)絡(luò)存儲空間,，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn),。請注意甄別內(nèi)容中的聯(lián)系方式,、誘導(dǎo)購買等信息，謹(jǐn)防詐騙,。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,，請點(diǎn)擊一鍵舉報。

久久国产成人av_抖音国产毛片_a片网站免费观看_A片无码播放手机在线观看,色五月在线观看,亚洲精品m在线观看,女人自慰的免费网址,悠悠在线观看精品视频,一级日本片免费的,亚洲精品久,国产精品成人久久久久久久