Skip to main content

My Monday at JavaOne 2013

Posted by haroldcarr on September 24, 2013 at 11:51 AM PDT


Last Modified : 2013 Sep 24 (Tue) 11:59:38 by carr.

Monday September 23, 2012

10:00am - Looking into the JVM Crystal Ball

  • Mikael Vidstedt, JVM Architect, Oracle


JVM Convergence

  • HotSpot + JRocket + CDC (embedded)
  • merge around HotSpot
  • from JRocket
    • serviceability : Java Flight Recorder
  • from enbedded
    • scalability
  • goal: same JVM from small devices to large hardware

Recent year, most time spent on security.

Java Flight Recorder - JDK 7u40 (from JRocket)

  • Event-based tracer and profiler
    • vm internal info
    • jdk level events (e.g., I/O)
  • cyclic buffer in memory or optional store-to-disk
  • overhead: ~2-3%

Java Mission Control - JDK 7u40

  • $JAVA_HOME/bin/jmc
  • monitoring/mgmt
  • java heap/GC, hot methods, …
  • visualization of Java Flight Recorder data

Misc - JDK7u40

  • rewrite of invokedynamic impl
    • from assembly language to Java impl
    • still needs work
  • G1 (garbage first) GC improvements
    • tune default settings (based on feedback JDK-8001425)
    • humongous allocations prevent mixed GCs (JDK-8020155)
  • too many GC imples
    • plan to converge around G1
  • String.intern() table performance
    • bumped to 60013 entries on 64 bit systems (was: 1009 entries)
    • goal is to make size dynamic
  • 300+ bug fixes in HotSpot


Removing Permanent Generation (JEP 122)


| Java Heap | PermGen | Native Memory |
  • PermGen
    • where class metadata stored
    • set at startup - not dynamic
  • PermGen moved to Java Heap
    • perm gen setting now ignored

Tiered Compilation

  • compiler convergence
    • interpreter
    • client compiler (C1)
      • faster startup
    • server compiler (C2)
      • top performance (but compile cost)
  • tiered
    • collect profiling info using C1
      • used to happen in interpreter
    • then use info in C2
    • leads to faster startup

Memory Footprint

  • optimizations of common data types
    • no unused bits
  • class, method structures
    • Class ~30-40 bytes
    • Method ~ 32 bytes
  • Example: 10k classes, 100k methods -> ~3.5MB/process
    • critical for embedded

JVM Future



  • e.g.,: thousands of JVMs running (almost) the same app on same
  • manage resources carefull
    • memory/cpu varies significantly
    • virtualization adds to this
  • adapt to resource changes between JVMs
    • maximum density
  • maintain high-availabilty and isolation

Manageability and Observability

  • ergonomics
    • good enough default settings for majority of workloads
  • provide visibility into Java processes
    • low-level data + high-level aggregation


  • JSR292 / invokedynamic
    • performance*
    • lamba relying heavily
    • nashorn - javascript on JVM (JDK 8)
      • instead of rhino
  • improved Java <-> native


  • mult-core + data parallelism
    • lambdas + fork/join -> .parallelString()
    • synchronization/locks
  • memory
    • huge heaps (40GB++)
    • NUMA
  • footprint + embedded
    • streamline Java for small devices

JVM Components


  • Sumatra : Java on GPUs
  • code memory management
  • compiler manageability and observability
    • why/control over compiler decisions
  • compile time and warm-up time
    • AOT?


  • G1 (-XX:+UseG1GC)
    • works by dividing heap into many small regions
    • regions individually GCed
  • focus: big heaps, low/consistent pause times
    • without needing excessive tuning
  • settings
    • regions selected, sizes of generations, number of GC threads
  • goal: deprecate/remove CMS
  • feedback:


  • modularization/jigsaw
  • dynamic resizing of string and symbol tables
  • class data sharing (CDS)
  • contended locking improvements


  • Java Flight Recorder
    • additional events, enable dynamically
    • event sampling
    • auto analysis in Java Mission Control
  • jcmd continued
    • goal: deprecated other j* serviceability commands over time
      • jstack, jinfo, jmap, …
  • JMX
    • annotations for defining MBeans, REST protocol, batched operations
  • deprecate/remove
    • JConsole : move functionality to Java Mission Control/VisualVM
    • hprof agent
      • what parts are people using - replace its functionality elsewhere


  • improved testability
    • unit testing of JVM internals
  • clean up HotSPot OS code and Makefiles
    • unify copy/pasted logic


11:30am - Purely Functional Data Structures

  • Dan Rosen, Twitter



  • unstable identity (as viewed by containing objects)
    • if internal data changes and used by HashCode then identity changes
  • difficult to satisfy superclass behavior contracts in subclasses
  • 3: prevent container contents from changing
    • Collections.unmodifiableSet
    • sync + copy (yuck)

Solution for 3: Persistent data structures

  • final fields in objects
  • setters return new object with copies of fields
  • common substructure between copies

Example: Stack

  • implemented as singly-linked list
  • shared all substructure

Example: Queue

  • implemented as doubly-linked list
  • bad: no common substructure -> deep copy
  • solution: use two stacks: incoming/outgoing
    • analysis
      • enqueue: O(1)
      • dequeue: O(1) - O(n) (if empty)
    • amorotized analysis
      • assume each enqueued element will eventually be rotated
      • pay for cost of rotation for each element when enqueuing
      • enqueue: O(1) with 1 credit
      • dequeue: O(1), debiting when needed to rotate
      • put not suitable for real-time app

New invariant: stack can be at most 3 elements long

  • rotation now constant time
  • longer stacks then nested

1pm - Performance Tuning Where Java Meets the Hardware

  • Darryl Gove - Senior Principal Engineer, Oracle
  • Charlie Hunt - Architect Performance Engineering,

Compiling -> File.class             -> JVM (optimization)
File.c    -> File.o (optimization)  -> File.exe

Optimization at runtime

  • use runtime info
  • finer-grained hardware info
  • optimize used code
  • super optimize important code paths

JIT compilation

  • sometimes see time move from expected location to different location
    • because of inlining

Watching for GC

  • after Interpreter/JIT warmup then GC is majority of system time
  • -XX:+PrintGC -XX:+PrintGCDetails Xloggc:gc.log

What hardware can tell you about software

Harware performance counters

  • events
    • instructions executed
    • cycles taken
    • loads from memory

instructions per cycle (IPC)

  • typical measure of performance
  • seemingly : low IPC/bad, high IPC/bad
  • but high might mean doing something unnecessary


  • cputrack -ef -c instr_retired.any,cpu_clk_unhalted.core java shapelist

cause of low IPC

  • long latency instructions (e.g., divide)
  • fetch data from memory
  • "bad things" happening to processor

collect -h PAPI_l2_tcm -p on -j on shapelist

pointers and memory

  • pointers are virtual address
  • processor use TLB to get physical address
  • hardware uses physical address to access memory
  • memory return value back to process

memory and caches

  • cpu checks cache for data
  • if not in cache, cpu fetches from memory
  • memory returns value
    • fetching memory is costly

object layout in memory

  • every link in a data structure is a potential cache miss
  • avoid pointers - colocate data
  • threads, cache lines and data
  • to update memory, thread neeeds exclusive access to data
  • data is hsared at cache line boundaires
    • two threads cannot simultaneously update same cache line

data proximity

  • doing timing on low core and/or low thread machine may give
    dramatically different numbers than high core/thread machine

memory access rules

  • increase useful data fetched from memory
    • group used together…

impact of polymorphism

  • do not have to test type
  • just call virtual method

3pm - Type Inference in Java SE 8

  • Daniel Smith - Java Language Designer, Oracle

7:30 - SOAP over WebSocket and InfiniBand with JAX-WS Pluggable Transports

  • Harold Carr - Software Architect, Oracle

The JAX-WS standard includes APIs for using POJOs or XML for remote
messages, but it does not include APIs that enable the user to control
the transport. This BOF discusses adding pluggable transport APIs to
the JAX-WS standard. It shows a candidate pluggable transport
mechanism for JAX-WS that enables you to use other transports besides
HTTP. In particular, it shows the benefits of using WebSocket and
InfiniBand transports for SOAP message exchanges.

NOTE: I will post my slides soon.