轉(zhuǎn)載自微信公眾號(hào) 滌生的博客 ,。
以下是原文,。
最近有個(gè)同學(xué)說(shuō)他的服務(wù)剛啟動(dòng)就收到兩次 Full GC 告警, 按道理來(lái)說(shuō)剛啟動(dòng),,對(duì)象應(yīng)該不會(huì)太多,為啥會(huì)觸發(fā) Full GC 呢,?
帶著疑問(wèn),,還是先看看日志,畢竟日志的信息更多,。
2.日志
可以看到,,其實(shí)是兩次 CMS GC(監(jiān)控對(duì) Full GC 和 Old GC 不分)。但是你會(huì)發(fā)現(xiàn)一個(gè)奇怪的現(xiàn)象,,咦,,"CMS-initial-mark: 0K(3222528K)" 怎么 Old Gen 對(duì)象使用空間大小為 0,,細(xì)想服務(wù)剛啟動(dòng),Old Gen 為 0 也算正常,,但是為什么會(huì)觸發(fā)CMS GC 呢,?第一次 CMS GC 日志:
2019-07-16T16:44:56.270+0800: 8.446: [GC (CMS Initial Mark) [1 CMS-initial-mark: 0K(3222528K)] 1477124K(5152832K), 0.0445445 secs] [Times: user=0.16 sys=0.00, real=0.05 secs] 2019-07-16T16:44:56.315+0800: 8.490: [CMS-concurrent-mark-start] 2019-07-16T16:44:56.337+0800: 8.513: [CMS-concurrent-mark: 0.022/0.022 secs] [Times: user=0.03 sys=0.00, real=0.02 secs] 2019-07-16T16:44:56.337+0800: 8.513: [CMS-concurrent-preclean-start] 2019-07-16T16:44:56.343+0800: 8.518: [CMS-concurrent-preclean: 0.006/0.006 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 2019-07-16T16:44:56.343+0800: 8.518: [CMS-concurrent-abortable-preclean-start] CMS: abort preclean due to time 2019-07-16T16:45:01.469+0800: 13.644: [CMS-concurrent-abortable-preclean: 1.189/5.126 secs] [Times: user=4.76 sys=0.14, real=5.12 secs] 2019-07-16T16:45:01.469+0800: 13.645: [GC (CMS Final Remark) [YG occupancy: 389472 K (1930304 K)]13.645: [Rescan (parallel) , 0.0113744 secs]13.656: [weak refs processing, 0.0001506 secs]13.656: [class unloading, 0.0050100 secs]13.661: [scrub symbol table, 0.0027502 secs]13.664: [scrub string table, 0.0005121 secs][1 CMS-remark: 0K(3222528K)] 389472K(5152832K), 0.0210563 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] 2019-07-16T16:45:01.490+0800: 13.666: [CMS-concurrent-sweep-start] CMS: Large Block: 0x00000006fb520000; Proximity: 0x0000000000000000 -> 0x00000006fb51fac8 CMS: Large block 0x00000006fb520000 2019-07-16T16:45:01.490+0800: 13.666: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2019-07-16T16:45:01.490+0800: 13.666: [CMS-concurrent-reset-start] 2019-07-16T16:45:01.504+0800: 13.680: [CMS-concurrent-reset: 0.014/0.014 secs] [Times: user=0.02 sys=0.01, real=0.02 secs]
第二次 CMS GC 日志:
2019-07-16T16:45:37.496+0800: 49.672: [GC (CMS Initial Mark) [1 CMS-initial-mark: 0K(3222528K)] 1451537K(5152832K), 0.0575060 secs] [Times: user=0.22 sys=0.01, real=0.06 secs] 2019-07-16T16:45:37.554+0800: 49.730: [CMS-concurrent-mark-start] 2019-07-16T16:45:37.576+0800: 49.751: [CMS-concurrent-mark: 0.022/0.022 secs] [Times: user=0.03 sys=0.00, real=0.02 secs] 2019-07-16T16:45:37.576+0800: 49.752: [CMS-concurrent-preclean-start] 2019-07-16T16:45:37.582+0800: 49.758: [CMS-concurrent-preclean: 0.007/0.007 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 2019-07-16T16:45:37.582+0800: 49.758: [CMS-concurrent-abortable-preclean-start] CMS: abort preclean due to time 2019-07-16T16:45:42.639+0800: 54.814: [CMS-concurrent-abortable-preclean: 3.307/5.056 secs] [Times: user=4.63 sys=0.09, real=5.05 secs] 2019-07-16T16:45:42.639+0800: 54.815: [GC (CMS Final Remark) [YG occupancy: 1528656 K (1930304 K)]54.815: [Rescan (parallel) , 0.0710980 secs]54.886: [weak refs processing, 0.0003495 secs]54.887: [class unloading, 0.0158867 secs]54.903: [scrub symbol table, 0.0065509 secs]54.909: [scrub string table, 0.0008214 secs][1 CMS-remark: 0K(3222528K)] 1528656K(5152832K), 0.0980449 secs] [Times: user=0.30 sys=0.01, real=0.10 secs] 2019-07-16T16:45:42.738+0800: 54.913: [CMS-concurrent-sweep-start] CMS: Large Block: 0x00000006fb520000; Proximity: 0x00000006fb51fac8 -> 0x00000006fb51fac8 CMS: Large block 0x00000006fb520000 2019-07-16T16:45:42.738+0800: 54.913: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2019-07-16T16:45:42.738+0800: 54.913: [CMS-concurrent-reset-start] 2019-07-16T16:45:42.745+0800: 54.921: [CMS-concurrent-reset: 0.007/0.007 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
之前我整理過(guò)一些文章,其中提到很多種 CMS GC 觸發(fā)條件,,我們來(lái)一起來(lái)分析下吧,。
3.分析
首先,這必然是 CMS 的 background collector,。因?yàn)?foreground collector 觸發(fā)條件比較簡(jiǎn)單,,一般是遇到對(duì)象分配但空間不夠,就會(huì)直接觸發(fā) GC,。但是 Old Gen 占用為 0,,不可能申請(qǐng)時(shí)空間不足。因此,,可以斷定是 CMS 的 background collector,。另外, foreground collector 是沒(méi)有 Precleaning,、AbortablePreclean 階段的,,因此,從 CMS GC 日志上來(lái)看,,也能看出是 background collector,。
既然是 CMS background collector,我們結(jié)合 JVM 參數(shù)及日志,,再按照 background collector 的五大種情況一個(gè)個(gè)的排除唄,。
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Xms5114m -Xmx5114m -XX:MaxNewSize=1967m -XX:NewSize=1967m -XX:SurvivorRatio=22 -XX:PermSize=384m -XX:MaxPermSize=384m -Xss512k -XX:OldPLABSize=16 -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSFullGCsBeforeCompaction=0 -XX:CMSInitiatingOccupancyFraction=75 -XX:InitialCodeCacheSize=128m -XX:+PrintClassHistogram -XX:PrintFLSStatistics=1 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:ReservedCodeCacheSize=128m -XX:+StartAttachListener -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+DisableExplicitGC
-
根據(jù)是否是并行 Full GC
這種一般是由 System.gc 觸發(fā)的 Full GC,且在配置了 ExplicitGCInvokesConcurrent 參數(shù)的情況下,,進(jìn)而將 Full GC 轉(zhuǎn)成并行的 CMS GC,。可以看到 JVM 配置并無(wú)此參數(shù),,而且還配置了 DisableExplicitGC 參數(shù),,它會(huì)屏蔽 System.gc。因此,,排除這類(lèi)情況,。
-
根據(jù)統(tǒng)計(jì)數(shù)據(jù)動(dòng)態(tài)計(jì)算(僅未配置 UseCMSInitiatingOccupancyOnly 時(shí))
直接排除吧,因?yàn)?JVM 配置了 UseCMSInitiatingOccupancyOnly 參數(shù),。
-
根據(jù) Old Gen 占用情況判斷
這種情況主要是 Old Gen 空間使用占比情況與閾值比較,,從 JVM 配置可以看到 CMSInitiatingOccupancyFraction=75 閾值設(shè)置的是 75%,目前 Old Gen 還是 0 呢,,顯然不符合,。
-
根據(jù)增量 GC 是否可能會(huì)失?。ū^(guān)策略)
這種情況主要說(shuō)的是在兩代的 GC 體系中,如果 Young GC 已經(jīng)失敗或者可能會(huì)失敗,,JVM 就認(rèn)為需要進(jìn)行一次 CMS GC,。我們看日志也知道,并無(wú)此類(lèi)情況發(fā)生,,而且 Old Gen 剩余空間還非常大,,比整個(gè) Young Gen 都大。
-
根據(jù) metaspace 情況判斷
這是最后一種情況了,,這里主要看 metaspace 的 shouldconcurrentcollect 標(biāo)志,,這個(gè)標(biāo)志在 metaspace 進(jìn)行擴(kuò)容前如果配置了 CMSClassUnloadingEnabled 參數(shù)時(shí),會(huì)進(jìn)行設(shè)置,。
咋一看,,JVM 參數(shù)貌似沒(méi)設(shè)置這個(gè)參數(shù)呀,難道跟 metaspace 擴(kuò)容無(wú)關(guān),。其實(shí)不然,,CMSClassUnloadingEnabled 默認(rèn)就是 true 呢。
// Check again if the space is available. Another thread // may have similarly failed a metadata allocation and induced // a GC that freed space for the allocation. if (!MetadataAllocationFailALot) { _result = _loader_data->metaspace_non_null()->allocate(_size, _mdtype); if (UseConcMarkSweepGC) { if (CMSClassUnloadingEnabled) { MetaspaceGC::set_should_concurrent_collect(true); // For CMS expand since the collection is going to be concurrent. _result = _loader_data->metaspace_non_null()->expand_and_allocate(_size, _mdtype);
其實(shí)日志中也是有蛛絲馬跡的,,只是不容易發(fā)現(xiàn),,以下是夾在 CMS GC 過(guò)程中的兩段 Young GC 日志,可以看到日志中,,metaspace 的 capacity 從 32762K 到 60333K,,這也說(shuō)明了,metaspace 在擴(kuò)容,。
Young GC 日志
第一次 Young GC 日志:
{Heap before GC invocations=0 (full 1): par new generation total 1930304K, used 1846400K [0x0000000680600000, 0x00000006fb500000, 0x00000006fb500000) eden space 1846400K, 100% used [0x0000000680600000, 0x00000006f1120000, 0x00000006f1120000) from space 83904K, 0% used [0x00000006f1120000, 0x00000006f1120000, 0x00000006f6310000) to space 83904K, 0% used [0x00000006f6310000, 0x00000006f6310000, 0x00000006fb500000) concurrent mark-sweep generation total 3222528K, used 0K [0x00000006fb500000, 0x00000007c0000000, 0x00000007c0000000) Metaspace used 32159K, capacity 32762K, committed 33172K, reserved 1079296K class space used 3979K, capacity 4274K, committed 4312K, reserved 1048576K 2019-07-16T16:45:00.420+0800: 12.596: [GC (Allocation Failure)[ParNew Desired survivor size 42958848 bytes, new threshold 15 (max 15) - age 1: 36618400 bytes, 36618400 total : 1846400K->35858K(1930304K), 0.0408140 secs] 1846400K->35858K(5152832K)[Times: user=0.13 sys=0.02, real=0.05 secs] Heap after GC invocations=1 (full 1): par new generation total 1930304K, used 35858K [0x0000000680600000, 0x00000006fb500000, 0x00000006fb500000) eden space 1846400K, 0% used [0x0000000680600000, 0x0000000680600000, 0x00000006f1120000) from space 83904K, 42% used [0x00000006f6310000, 0x00000006f8614ac0, 0x00000006fb500000) to space 83904K, 0% used [0x00000006f1120000, 0x00000006f1120000, 0x00000006f6310000) concurrent mark-sweep generation total 3222528K, used 0K [0x00000006fb500000, 0x00000007c0000000, 0x00000007c0000000) Metaspace used 32159K, capacity 32762K, committed 33172K, reserved 1079296K class space used 3979K, capacity 4274K, committed 4312K, reserved 1048576K
第二次 Young GC:
{Heap before GC invocations=1 (full 2): par new generation total 1930304K, used 1882258K [0x0000000680600000, 0x00000006fb500000, 0x00000006fb500000) eden space 1846400K, 100% used [0x0000000680600000, 0x00000006f1120000, 0x00000006f1120000) from space 83904K, 42% used [0x00000006f6310000, 0x00000006f8614ac0, 0x00000006fb500000) to space 83904K, 0% used [0x00000006f1120000, 0x00000006f1120000, 0x00000006f6310000) concurrent mark-sweep generation total 3222528K, used 0K [0x00000006fb500000, 0x00000007c0000000, 0x00000007c0000000) Metaspace used 58703K, capacity 60333K, committed 60588K, reserved 1101824K class space used 7085K, capacity 7578K, committed 7664K, reserved 1048576K 2019-07-16T16:45:53.020+0800: 65.196: [GC (Allocation Failure)[ParNew Desired survivor size 42958848 bytes, new threshold 1 (max 15) - age 1: 62204096 bytes, 62204096 total - age 2: 13370728 bytes, 75574824 total : 1882258K->79755K(1930304K), 0.0500188 secs] 1882258K->79755K(5152832K)[Times: user=0.14 sys=0.01, real=0.05 secs] Heap after GC invocations=2 (full 2): par new generation total 1930304K, used 79755K [0x0000000680600000, 0x00000006fb500000, 0x00000006fb500000) eden space 1846400K, 0% used [0x0000000680600000, 0x0000000680600000, 0x00000006f1120000) from space 83904K, 95% used [0x00000006f1120000, 0x00000006f5f02d90, 0x00000006f6310000) to space 83904K, 0% used [0x00000006f6310000, 0x00000006f6310000, 0x00000006fb500000) concurrent mark-sweep generation total 3222528K, used 0K [0x00000006fb500000, 0x00000007c0000000, 0x00000007c0000000) Metaspace used 58703K, capacity 60333K, committed 60588K, reserved 1101824K class space used 7085K, capacity 7578K, committed 7664K, reserved 1048576K
因此,,這是一次因 Metaspace 擴(kuò)容導(dǎo)致的 CMS GC。
4.解決
既然是 Metaspace 擴(kuò)容導(dǎo)致的,,我們應(yīng)該避免這種情況發(fā)生,。那怎么避免呢,?指定個(gè)大小吧,。
大家都知道 jdk8 Metaspace 替代了之前的 Perm Gen,Metaspace 的最大大小,,也就是 MaxMetaspaceSize 默認(rèn)基本是無(wú)窮大,,也就是它會(huì)充分利用操作系統(tǒng)能提供的最大大小。
但是初始大小是多大呢,?主要由 MetaspaceSize 參數(shù)控制,,默認(rèn) 20.8M 左右(x86 下開(kāi)啟 c2 模式),非常小,,它控制 metaspaceGC 發(fā)生的初始閾值,,也是最小閾值,。
關(guān)于初始大小,有興趣的可以計(jì)算下 (16*13/10)M = 20.8M,。???????
define_pd_global(uintx,MetaspaceSize, ScaleForWordSize(16*M)); #define ScaleForWordSize(x) align_size_down_((x) * 13 / 10, HeapWordSize) #define ScaleForWordSize(x) (x)
因此,,最終設(shè)置下這兩個(gè)參數(shù)大小,問(wèn)題就解決啦,。
-XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=256m
5.總結(jié)
在服務(wù)運(yùn)行過(guò)程中,,總會(huì)遇到奇奇怪怪得 GC 問(wèn)題,關(guān)鍵是理清 GC 的脈絡(luò),,做到成竹在胸,,自然總能找到蛛絲馬跡,從而定位并解決問(wèn)題,。另外,,規(guī)范化 JVM 參數(shù)配置是避免詭異 GC 問(wèn)題一個(gè)重要方法。
|