Skip to main content

OSCON Monday

Posted by haroldcarr on July 26, 2011 at 5:29 PM PDT

Monday, 07/25/2011


1 10:40am - Playful Explorations of Public and Personal Data

attendance : 70 people

http://upload.wikimedia.org/wikipedia/commons/thumb/d/db/Waldseemuller_map_closeup_with_America.jpg/547px-Waldseemuller_map_closeup_with_America.jpg

first map with "America" on it

power of place

  • real world, land usage, elevation, parcels, streets, people
  • maps reflect and define reality

Andrew demoed http://geocommons.com/ : data
analysis/visualization for the masses.

For use by: citizens, organization, media, science, developers, enterprise.

Also see: http://www.virtualvehicle.com


2 11:30am Monday

  • Testing in Scala
  • Java: JVM
  • Tags: jvm, scalatest, tdd, java, scalacheck, testing, specs, test_driven_development
  • Daniel Hinojosa (evolutionnext.com)

attendance : 20 people

http://www.scala-lang.org/sites/default/files/newsflash_logo.png

https://github.com/dhinojosa/testing-scala

ScalaTest
import org.scalatest.testing.TestNGSuite
import org.scalatest.matchers.MustMatchers

class X extends TestNGSuite with MustMatchers {
    @Test
    def testFoo() {
        ... must equal ("Lex")
    }
}
ScalaTest with WordSpec
import org.scalatest.testing.{WordSpec, Spec}
import org.scalatest.matchers.MustMatchers

class X extends WordSpec with MustMatchers {
    "An Employee should" {
        "return the ..." in {
        }
    }
}
ScalaTest with FreeSpec
import org.scalatest.testing.FreeSpec
import org.scalatest.matchers.MustMatchers

class X extends FreeSpec with MustMatchers {
    "An Employee" - {
        "is a " - {
             ... must equal("lex")
        }
    }
}
ScalaTest with FeatureSpec
class X extends FeatureSpec with GivenWhenThen with ShouldMatchers {
    feature("blah") {
        info("when ...")
        scenario("...") {
            given("...")
            val firstName = "Dan"
            val lastName = "Hinojosa"
            ...
        }
    }
}
Specs2 Specs2 DataTables Specs2 Acceptance Specification Specs2 Given-When-Then Acceptance Specification ScalaCheck Borachio

3 1:30pm - Above the Clouds: Introducing Akka

  • Java: Cloud
  • Tags: akka, scala, replication, high-availability, jvm, cloud, failover, java, scalability, grid, concurrency
  • Martin Odersky (Typesafe)

attendance : 35 people

http://akka.io/images/akka-logo-159h.png

Problem : hard to build

  • highly concurrent, scalable, fault-tolerant, self-healing systems

Akka is name of mountain in Sweden

spring | guice | camel | AMQP | REST | ....

fault tolerance : local actor supervision | remote actor super ...

scalability : client managed remote actors | server ma... | cluster mem

Concurrency : ACTORS STM AGENTS DATAFLOW

Used in

  • finance, betting and gaming, telecon, simulation, e-commerce

3.1 basic Actor

3.1.1 Actor

  • event driven thread that does behavior on state.
  • Messages arrive async on queues
  • fire-and-forget
  • send and get future

3.1.2 Dataflow

  • more expressive than functional

3.1.3 HotSwap

  • define new behavior to be incorporate in Actor
  • can unbecome

3.2 Remote Actor

  • NIO (Netty) and ProtoIO
  • client initiated/managed
  • server initiated/managed
  • akka 2.0 decouples actor address from deployment

3.3 Fault-Tolerance

  • erlang "let it crash" model
  • have robust failure recovery
  • classification of state/data
    • scratch/transient
    • state (supplied at boot or other components)
    • dynamic
      • can recompute
      • critical: data from sources that is impossible to recompute

http://akka.io

typesafe.com

Question: Should we use akka actors instead of Scala actors?

  • Answer: long term: YES
  • they do plan to merge the two libraries in the future

4 2:20pm Monday - The Ghost in the Virtual Machine: A Reference to References

  • Java: Trends
  • Bob Lee (Square Inc.)

attendance : 65

Goals

  • take mystery out of GC
  • perform manual cleanup in right way

Reachable

  • if a live thread can access it
  • heapp roots: system classes, thread stacks, in-flight exceptions, JNI globals,
    finalizer Q, interned String pool, …

Manual cleanup

  • listeners, file descriptors, native mem, external state
  • tools: finally, Object.finalize, references and reference queues
finally
  • pros
    • easy, handle exception in same thread, ensure cleanup keeps pace
  • cons
    • more work for programmer, error prone, cleanup in same thread
  • ARM (automatic resource management) will help

finalizer

  • GC will invoke when reclaiming that instance
  • but: not guaranteed to run, undefined threading model, exceptions
    are ignored, keeps objects alive longer than necessary, can
    resurrect references, can make reclamation SLOW
  • mess up the reference API
  • good for one thing: log warnings - but still SLOW

Reference API

  • strong
  • soft : for caching
    • cleared when VM runs low on memory (LRU)
    • for quick and dirty caching only
    • no notion of "weight" : memory usage, computational time, CPU usage
    • can exacerbate low memory conditions
    • good example: cache reflection results
  • weak : for fast cleanup (pre-finalizer)
    • cleared as soon as no strong or soft refs remain
    • cleared ASAP - before finalizer
    • Not for cachine: use soft
  • phantom : for safe cleanup (post-finalizer)
    • enqueued after no other refs remain (post finalizer)
    • can suffer similar problems to finalizers
    • must be cleared manually, for no good reason (because of patent issue)
    • get() always returns null
      • so must use a reference queue
    • use-case : for memory-mapped file
    • note: ease-of-use via Guava libraries (cleanup in background thread)
  • reference queues : for notifications
  • java.util.WeakHashMap
    • useful for emulating additional fields
    • keeps weak refs to keys, strong refs to values
    • not concurrent
    • uses equals (but should use ==)
  • Guava MapMaker
    • near drop-in replacement for WeakHashMap
    • strong, soft, weak keys and/or value refs
    • concurent
    • uses ==
    • supports on-demand computation
    • supports size limiting

5 3:30pm Monday - Future-proofing Collections: From Mutable to Persistent to Parallel

  • Java: JVM
  • Tags: parallel_programming, scala, jvm, craftsmanship, emerging_languages, collections
  • Martin Odersky (Typesafe)

attendance: 60

http://typesafe.com/public/images/logo.png

history

  • 2003: scala 1.0: no common organization
  • 2005: scala 2.0: generic collection framework
  • 2005-2009: bit rot
  • 2009: scala 2.8: same API, but internally more composition/abstraction

collections

  • de-emphasize destructive updates
  • focus on transformers that map collections to collections
  • persistent collections

Note: runs a scala REPL in a shell inside emacs (23) - yes!

map

  • good for one or two small transformers

for

  • more readable for more complex transformation

Demo of An Empirical Comparision of Seven Programming Languages
(IEEE Computer 33(10):23-29 (2000)) using Eclipse.

  • 20 loc scala
  • also see Joshua Bloch's Java solution

Scala 2.9 parallel collections

  • split work by number of processors
  • each thread has work queue
  • See: Cascade: J. Suereth, D. Mahler @ Google
  • implemented in library (compiler does not know about it)
  • extensible

Future of persistent collections

  • easy to use, concise, safe, fast, scalable

User of a persistent library

  • easy, intuitive

Creator of a persistent library

  • hard

6 4:20pm Monday - QYZ: LaTeX, R and Redis for Beautiful Analytics

  • Data: Analytics and Visualization
  • Tags: latex, postgresql, analytics, data_scientists, r-lang, business, reporting, redis, statistics
  • Noah Pepper (Lucky Sort), Homer Strong (Qmedtrix Systems)

attendance: 40

http://had.co.nz/ggplot2/geom_area.png

Goal: automate reporting on medical billing data

OSS stack: PostgreSQL, R, Redis, LaTeX

See: Hadley Wickham

Considerations for reporting graphics

  • vector graphics preferable to scalar
    • exception: many maps in single report
  • ggplot2
    • perfect for obsession with DRY
    • encourage simplicity; discourages pie charts
    • good legends
    • but looking forward to ggplot3
      • aligning different plots of same page

Considerations for reporting speed

  • distributed task queue
  • fast, simple
  • most report components produced in parallel
  • cache individual component outputs

Latex

  • flexible
  • but poor package management
  • latex/r : sweave, brew, xtables
  • rolled their own

Tables and Graphics

  • different views of same model
  • formatting tables is frustrating - no grammar of tables
  • captions can be generated to explain the view

R + SQL

  • No ORM for R

Good to use Sweave/Brew/Xtables great if you do not need LaTeX will or
do not need huge flexibility

Testing

  • used testthat (by Hadley)
  • testing forces you to encode your methodology somewhere
Related Topics >>