《Optimizing Java》读书笔记中

未分类 一条评论


开头吐槽一句:当初被Java骗的呀,什么自动内存管理,到头来还是都要学的。还不如直接去学C++呐。

第六章:理解垃圾收集

标记-清除

for each object in allocatedObjectList:
    clearing the mark bit
    // 所以对象是8字节的倍数,遍历还可以跳着来

DFS starting from GC-Roots:
    set the reached object mark bit

for each object in allocatedObjectList:
    if mark bit hasn't setted:
        remove it from allocatedObjectList

内存布局如下图

这里写图片描述

jmap -histo [pid]

 num     #instances         #bytes  class name
 ----------------------------------------------
   1:         20839       14983608  [B
   2:        118743       12370760  [C
   3:         14528        9385360  [I
   4:           282        6461584  [D
   5:        115231        3687392  java.util.HashMap$Node
   6:        102237        2453688  java.lang.String
   7:         68388        2188416  java.util.Hashtable$Entry
   8:          8708        1764328  [Ljava.util.HashMap$Node;
   9:         39047        1561880  jdk.nashorn.internal.runtime.CompiledFunction
  10:         23688        1516032  com.mysql.jdbc.ConnectionPropertiesImpl$BooleanConnectionProperty
  11:         24217        1356152  jdk.nashorn.internal.runtime.ScriptFunction
  12:         27344        1301896  [Ljava.lang.Object;
  13:         10040        1107896  java.lang.Class
  14:         44090        1058160  java.util.LinkedList$Node
  15:         29375         940000  java.util.LinkedList
  16:         25944         830208  jdk.nashorn.internal.runtime.FinalScriptFunctionData
  17:            20         655680  [Lscala.concurrent.forkjoin.ForkJoinTask;
  18:         19943         638176  java.util.concurrent.ConcurrentHashMap$Node
  19:           730         614744  [Ljava.util.Hashtable$Entry;
  20:         24022         578560  [Ljava.lang.Class;

HotSpot 运行时

Ordinary Object Pointer: 这是Java对象在JVM中的表示,以两个机器字长大的对象头作为开头,mark word指向对象独有的元数据(如hashcode),klass word指向类级别的元数据(PermGen永久代中的)

使用-XX:+UseCompressedOops压缩对象头,在Java7以上是默认开启的。

KlassOops和Class Objects

这里写图片描述

Oops的继承结构

oop (abstract base)
 |-instanceOop (instance objects)
 |-methodOop (representations of methods)
 |-arrayOop (array abstract base)
 |-symbolOop (internal symbol / string class)
 |-klassOop (klass Header) (Java 7 and before only)
 |-markOop

GC Roots

  • 栈帧
  • JNI
  • 寄存器
  • Code roots(from JVM code cache)
  • 全局对象
  • 加载类的元数据

GC In HotSpot

Weak Generational Hypothesis发现大量对象是很短命的,只有一部分对象能够活得时间长一些。

  • 记录了每个对象的年龄 (逃过了几次GC)
  • 对象优先分配了Eden区,哪怕存活也要移到Survivor区
  • 由另一个内存区域-老年代保存长期存活的对象

这里写图片描述

为了加快mark-sweep的速度,HotSpot维持一个“Card table”的数据结构,记录下哪些老年代对象指向年轻代对象。表中每个元素与512字节相对应

cards[*instanceOop >> 9] = 0;

TLABs: thread local allocation buffers, 在线程独有的一块缓冲区分配对象。

这里写图片描述

并发收集器

在Java8以前,默认的收集器是并发收集器,因此YGC和FGC都是要STW的。并发收集器为了吞吐量而设计,在STW后,收集器竭尽所能尽快完成内存回收。

  • ParallelGC: 年轻代最简单的收集器
  • ParNew:和ParallelGC区别很小,主要为了和CMS配合使用
  • ParallelOld:老年代(包括永久代)的并发收集器

年轻代并行回收:但对象在Eden区分配失败,JVM就会停止用户线程,进行垃圾回收

这里写图片描述
这里写图片描述

老年代并发回收:和年轻代不同,老年代会为年轻代提供空间分配担保,且老年代使用一整块连续的内存空间,因此老年代没有临时存放对象的地方,所以ParallelOld使用标记-压缩算法。

复制算法 vs 压缩算法

这里写图片描述

JVM内存分配实例

堆分配

Heap AreaSize
Overall2G
Old Gen1.5G
Young Gen500M
Eden400M
S150M
S250M

GC数据

Allocation Rate100M/s
YGC time2ms
FGC time100ms
Object lifetime200ms

因为对象分配速率为100MB/s, 所以4s就将Eden分配光了,即每4s会发生一次YGC

GC次数时间点数据情况
GC04s20M Eden -> S1(20M)
GC18.002s20M Eden -> S2(20M)
GC212.004s20M Eden -> S1(20M)
public class ModelAllocator implements Runnable {
    private volatile boolean shutdown = false;

    private double chanceOfLongLived = 0.02;
    private int multiplierForLongLived = 20;
    private int x = 1024;
    private int y = 1024;
    private int mbPerSec = 50;
    private int shortLivedMs = 100;
    private int nThreads = 8;
    private Executor exec = Executors.newFixedThreadPool(nThreads);

    public void run() {
        final int mainSleep = (int) (1000.0 / mbPerSec);

        while (!shutdown) {
            for (int i = 0; i < mbPerSec; i++) {
                ModelObjectAllocation to = new ModelObjectAllocation(x, y, lifetime());
                exec.execute(to);
                try {
                    Thread.sleep(mainSleep);
                } catch (InterruptedException ex) {
                    shutdown = true;
                }
            }
        }
    }

    // Simple function to model Weak Generational Hypothesis
    // Returns the expected lifetime of an object - usually this
    // is very short, but there is a small chance of an object
    // being "long-lived"
    public int lifetime() {
        if (Math.random() < chanceOfLongLived) {
            return multiplierForLongLived * shortLivedMs;
        }

        return shortLivedMs;
    }

    static class ModelObjectAllocation implements Runnable {
        private final int[][] allocated;
        private final int lifeTime;

        public ModelObjectAllocation(final int x, final int y, final int liveFor) {
            allocated = new int[x][y];
            lifeTime = liveFor;
        }

        @Override
        public void run() {
            try {
                Thread.sleep(lifeTime);
                System.err.println(System.currentTimeMillis() +": "+ allocated.length);
            } catch (InterruptedException ex) {
            }
        }
    }
}

第七章:高级垃圾收集

选择GC的指标

  • 停顿时间
  • 吞吐量(GC time/app run time)
  • 停顿频率
  • 回收效率(一个停顿周期能回收多少内存)
  • 停顿一致性(是否每次停顿的时间差不多)
大数据应用应该更在乎吞吐量而不是停顿时间。对于一些批处理任务,10s的暂停时间也无关紧要,GC算法更关心CPU的使用效率和吞吐量。

并发GC理论

safepoint: JVM开始执行GC时,线程的暂停点

  • JVM不会强制一个线程到safepoint
  • JVM可以阻止一个线程离开safepoint

到达safepoint的流程

  • JVM设置一个全局的“time to safepoint”标志
  • 应用线程能够查询这个标志位
  • 应用线程暂停,并等待被唤醒

safepoint情景

  • 线程自动达到safepoint,当线程被锁阻塞
  • 线程自动达到safepoint,当线程在执行JNI代码
  • 线程不必达到safepoint,当线程被OS打断
  • 线程不必达到safepoint,当字节码执行到一半

Tri-color marking

  • GC roots 被标记为灰色
  • 其他对象被标记为白色
  • 标记线程如果能沿着灰节点移动到白节点,就标记为灰色
  • 如果灰节点没有白色子节点,就标记为黑色
  • 停止标记,直到没有灰色节点
  • 回收所有白节点
    这里写图片描述
    这里写图片描述
    这里写图片描述
  • 当一个对象已经被一个线程标记为黑色,然后又被标记为白色。即Mutator(获取?)线程会使标记对象无效。
  • 在并发标记期间,没有黑色的对象会持有一个指向白色对象的引用。

CMS

流程

  • 初始标记(STW)
  • 并发标记
  • 并发预清理
  • 重新标记(STW)
  • 并发清理
  • 并发重置

CMF并发模式失败

这里写图片描述

  • 如果老年代有太多的对象,而年轻代中晋升得太多了
  • 则会使用ParallelOld, 这会使得完全的STW。

这里写图片描述

  • 而CMS在老年代75%(默认)的时候,就会进行回收
  • CMS在回收老年代时,不会进行压缩,空间是分散的
  • 而如果老年代没有可用的连续空间,也会使用ParallelOld
  • -XX:+UseConcMarkSweepGC

第8章:GC日志,监控,调优,工具

GC日志简介

-Xloggc:gc.log -XX:+PrintGCDetails 
-XX:+PrintTenuringDistribution
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
EffectFlags
Controls which file to log GC events to-Xloggc:gc.log
Logs GC event details-XX:+PrintGCDetails
Prints the wallclock time that GC events occured at.-XX:+PrintGCDateStamps
Prints the time (in secs since VM start) that GC events occured at.-XX:+PrintGCTimeStamps
Adds extra GC event detail that is vital for tooling-XX:+PrintTenuringDistribution
Switches on log file rotation-XX:+UseGCLogFileRotation
Set the maximum number of log files to keep-XX:+NumberOfGCLogFiles=< n>
Set the maximum size of each file before rotation-XX:+GCLogFileSize=< size>

Log分析工具

  • Censum
  • GCViewer

基本调优

Table 8-3. GC heap sizing flags

EffectFlag
Set the minimum size reserved for the heap-Xms< size>
Set the maximum size reserved for the heap-Xmx< size>
Set the maximum size permitted for PermGen (Java 7)-XX:MaxPermSize=< size>
Set the maximum size permitted for Metaspace (Java 8)-XX:MaxMetaspaceSize=< size>
临界对象大小-XX:PretenureSizeThreshold=N>
最小TLAB大小-XX:MinTLABSize=N

GC测试代码

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@OutputTimeUnit(TimeUnit.SECONDS)
@Fork(1)
public class SimulateCardTable {

    // OldGen is 3/4 of heap, 2M of card table is required for 1G of old gen
    private static final int SIZE_FOR_20_GIG_HEAP = 15 * 2 * 1024 * 1024;

    private static final byte[] cards = new byte[SIZE_FOR_20_GIG_HEAP];

    @Setup
    public static final void setup() {
        final Random r = new Random(System.nanoTime());
        for (int i=0; i<100_000; i++) {
            cards[r.nextInt(SIZE_FOR_20_GIG_HEAP)] = 1;
        }
    }


    @Benchmark
    public int scanCardTable() {
        int found = 0;
        for (int i=0; i<SIZE_FOR_20_GIG_HEAP; i++) {
            if (cards[i] > 0)
                found++;
        }
        return found;
    }

}

/*
Result "scanCardTable":
  108.904 ±(99.9%) 16.147 ops/s [Average]
  (min, avg, max) = (102.915, 108.904, 114.266), stdev = 4.193
  CI (99.9%): [92.757, 125.051] (assumes normal distribution)


# Run complete. Total time: 00:01:46

Benchmark                         Mode  Cnt    Score    Error  Units
SimulateCardTable.scanCardTable  thrpt    5  108.904 ± 16.147  ops/s
*/

并发调优

EffectFlag
(Old flag) Set ratio of YoungGen to Heap-XX:NewRatio=N
(Old flag) Set ratio of Survivor spaces to YoungGen-XX:SurvivorRatio=N
(Old flag) Set min size of YoungGen-XX:NewSize=N
(Old flag) Set max size of YoungGen-XX:MaxNewSize=N
(Old flag) Set min % of heap free after GC to avoid expanding-XX:MinHeapFreeRatio
(Old flag) Set max % of heap free after GC to avoid shrinking-XX:MaxHeapFreeRatio
Flags set:

-XX:NewRatio=N
-XX:SurvivorRatio=K

YoungGen = 1 / (N+1) of heap
OldGen = N / (N+1) of heap

Eden = (K2) / K of YoungGen
Survivor1 = 1 / K of YoungGen
Survivor2 = 1 / K of YoungGen

第9章:JVM上的代码执行

。。。。。。

第10章:理解JIT编译

JITWatch

https://github.com/AdoptOpenJDK/jitwatch/

-XX:+UnlockDiagnosticVMOptions 
-XX:+TraceClassLoading 
-XX:+LogCompilation

hsdis

-XX:+PrintAssembly

内联

SwitchDefault (JDK 8, Linux x86_64)Explanation
-XX:MaxInlineSize=n35 bytes of bytecodeInline methods up to this size
-XX:FreqInlineSize=n325 bytes of bytecodeInline “hot” (frequently called) methods up to this size
-XX:InlineSmallCode=n1000 bytes of native code (non-Tiered)2000 bytes of native code (Tiered)Do not inline methods where there is already a final-tier compilation that occupies more than this amount of space in the code cache.
-XX:MaxInlineLevel9Maximum number of call frames to inline

1条评论

Q says: 回复

你有《Optimizing Java》的高清电子版吗?我想买一份

发表评论

电子邮件地址不会被公开。 必填项已用*标注

昵称 *