Skip to main content

OSCON Monday

Posted by haroldcarr on July 26, 2011 at 5:29 PM PDT

Monday, 07/25/2011

1 10:40am - Playful Explorations of Public and Personal Data

attendance : 70 people

first map with "America" on it

power of place

  • real world, land usage, elevation, parcels, streets, people
  • maps reflect and define reality

Andrew demoed : data
analysis/visualization for the masses.

For use by: citizens, organization, media, science, developers, enterprise.

Also see:

2 11:30am Monday

  • Testing in Scala
  • Java: JVM
  • Tags: jvm, scalatest, tdd, java, scalacheck, testing, specs, test_driven_development
  • Daniel Hinojosa (

attendance : 20 people

import org.scalatest.testing.TestNGSuite
import org.scalatest.matchers.MustMatchers

class X extends TestNGSuite with MustMatchers {
    def testFoo() {
        ... must equal ("Lex")
ScalaTest with WordSpec
import org.scalatest.testing.{WordSpec, Spec}
import org.scalatest.matchers.MustMatchers

class X extends WordSpec with MustMatchers {
    "An Employee should" {
        "return the ..." in {
ScalaTest with FreeSpec
import org.scalatest.testing.FreeSpec
import org.scalatest.matchers.MustMatchers

class X extends FreeSpec with MustMatchers {
    "An Employee" - {
        "is a " - {
             ... must equal("lex")
ScalaTest with FeatureSpec
class X extends FeatureSpec with GivenWhenThen with ShouldMatchers {
    feature("blah") {
        info("when ...")
        scenario("...") {
            val firstName = "Dan"
            val lastName = "Hinojosa"
Specs2 Specs2 DataTables Specs2 Acceptance Specification Specs2 Given-When-Then Acceptance Specification ScalaCheck Borachio

3 1:30pm - Above the Clouds: Introducing Akka

  • Java: Cloud
  • Tags: akka, scala, replication, high-availability, jvm, cloud, failover, java, scalability, grid, concurrency
  • Martin Odersky (Typesafe)

attendance : 35 people

Problem : hard to build

  • highly concurrent, scalable, fault-tolerant, self-healing systems

Akka is name of mountain in Sweden

spring | guice | camel | AMQP | REST | ....

fault tolerance : local actor supervision | remote actor super ...

scalability : client managed remote actors | server ma... | cluster mem


Used in

  • finance, betting and gaming, telecon, simulation, e-commerce

3.1 basic Actor

3.1.1 Actor

  • event driven thread that does behavior on state.
  • Messages arrive async on queues
  • fire-and-forget
  • send and get future

3.1.2 Dataflow

  • more expressive than functional

3.1.3 HotSwap

  • define new behavior to be incorporate in Actor
  • can unbecome

3.2 Remote Actor

  • NIO (Netty) and ProtoIO
  • client initiated/managed
  • server initiated/managed
  • akka 2.0 decouples actor address from deployment

3.3 Fault-Tolerance

  • erlang "let it crash" model
  • have robust failure recovery
  • classification of state/data
    • scratch/transient
    • state (supplied at boot or other components)
    • dynamic
      • can recompute
      • critical: data from sources that is impossible to recompute

Question: Should we use akka actors instead of Scala actors?

  • Answer: long term: YES
  • they do plan to merge the two libraries in the future

4 2:20pm Monday - The Ghost in the Virtual Machine: A Reference to References

  • Java: Trends
  • Bob Lee (Square Inc.)

attendance : 65


  • take mystery out of GC
  • perform manual cleanup in right way


  • if a live thread can access it
  • heapp roots: system classes, thread stacks, in-flight exceptions, JNI globals,
    finalizer Q, interned String pool, …

Manual cleanup

  • listeners, file descriptors, native mem, external state
  • tools: finally, Object.finalize, references and reference queues
  • pros
    • easy, handle exception in same thread, ensure cleanup keeps pace
  • cons
    • more work for programmer, error prone, cleanup in same thread
  • ARM (automatic resource management) will help


  • GC will invoke when reclaiming that instance
  • but: not guaranteed to run, undefined threading model, exceptions
    are ignored, keeps objects alive longer than necessary, can
    resurrect references, can make reclamation SLOW
  • mess up the reference API
  • good for one thing: log warnings - but still SLOW

Reference API

  • strong
  • soft : for caching
    • cleared when VM runs low on memory (LRU)
    • for quick and dirty caching only
    • no notion of "weight" : memory usage, computational time, CPU usage
    • can exacerbate low memory conditions
    • good example: cache reflection results
  • weak : for fast cleanup (pre-finalizer)
    • cleared as soon as no strong or soft refs remain
    • cleared ASAP - before finalizer
    • Not for cachine: use soft
  • phantom : for safe cleanup (post-finalizer)
    • enqueued after no other refs remain (post finalizer)
    • can suffer similar problems to finalizers
    • must be cleared manually, for no good reason (because of patent issue)
    • get() always returns null
      • so must use a reference queue
    • use-case : for memory-mapped file
    • note: ease-of-use via Guava libraries (cleanup in background thread)
  • reference queues : for notifications
  • java.util.WeakHashMap
    • useful for emulating additional fields
    • keeps weak refs to keys, strong refs to values
    • not concurrent
    • uses equals (but should use ==)
  • Guava MapMaker
    • near drop-in replacement for WeakHashMap
    • strong, soft, weak keys and/or value refs
    • concurent
    • uses ==
    • supports on-demand computation
    • supports size limiting

5 3:30pm Monday - Future-proofing Collections: From Mutable to Persistent to Parallel

  • Java: JVM
  • Tags: parallel_programming, scala, jvm, craftsmanship, emerging_languages, collections
  • Martin Odersky (Typesafe)

attendance: 60


  • 2003: scala 1.0: no common organization
  • 2005: scala 2.0: generic collection framework
  • 2005-2009: bit rot
  • 2009: scala 2.8: same API, but internally more composition/abstraction


  • de-emphasize destructive updates
  • focus on transformers that map collections to collections
  • persistent collections

Note: runs a scala REPL in a shell inside emacs (23) - yes!


  • good for one or two small transformers


  • more readable for more complex transformation

Demo of An Empirical Comparision of Seven Programming Languages
(IEEE Computer 33(10):23-29 (2000)) using Eclipse.

  • 20 loc scala
  • also see Joshua Bloch's Java solution

Scala 2.9 parallel collections

  • split work by number of processors
  • each thread has work queue
  • See: Cascade: J. Suereth, D. Mahler @ Google
  • implemented in library (compiler does not know about it)
  • extensible

Future of persistent collections

  • easy to use, concise, safe, fast, scalable

User of a persistent library

  • easy, intuitive

Creator of a persistent library

  • hard

6 4:20pm Monday - QYZ: LaTeX, R and Redis for Beautiful Analytics

  • Data: Analytics and Visualization
  • Tags: latex, postgresql, analytics, data_scientists, r-lang, business, reporting, redis, statistics
  • Noah Pepper (Lucky Sort), Homer Strong (Qmedtrix Systems)

attendance: 40

Goal: automate reporting on medical billing data

OSS stack: PostgreSQL, R, Redis, LaTeX

See: Hadley Wickham

Considerations for reporting graphics

  • vector graphics preferable to scalar
    • exception: many maps in single report
  • ggplot2
    • perfect for obsession with DRY
    • encourage simplicity; discourages pie charts
    • good legends
    • but looking forward to ggplot3
      • aligning different plots of same page

Considerations for reporting speed

  • distributed task queue
  • fast, simple
  • most report components produced in parallel
  • cache individual component outputs


  • flexible
  • but poor package management
  • latex/r : sweave, brew, xtables
  • rolled their own

Tables and Graphics

  • different views of same model
  • formatting tables is frustrating - no grammar of tables
  • captions can be generated to explain the view


  • No ORM for R

Good to use Sweave/Brew/Xtables great if you do not need LaTeX will or
do not need huge flexibility


  • used testthat (by Hadley)
  • testing forces you to encode your methodology somewhere
Related Topics >>