OSCON Monday
Monday, 07/25/2011
Table of Contents
- 1 10:40am - Playful Explorations of Public and Personal Data
- 2 11:30am Monday
- 3 1:30pm - Above the Clouds: Introducing Akka
- 4 2:20pm Monday - The Ghost in the Virtual Machine: A Reference to References
- 5 3:30pm Monday - Future-proofing Collections: From Mutable to Persistent to Parallel
- 6 4:20pm Monday - QYZ: LaTeX, R and Redis for Beautiful Analytics
1 10:40am - Playful Explorations of Public and Personal Data
- Data: Roulette
- Tags: polymaps, mapping, realtime, analysis, hbase
- Andrew Turner (GeoIQ)
attendance : 70 people
![]()
first map with "America" on it
power of place
- real world, land usage, elevation, parcels, streets, people
- maps reflect and define reality
Andrew demoed http://geocommons.com/ : data
analysis/visualization for the masses.
For use by: citizens, organization, media, science, developers, enterprise.
Also see: http://www.virtualvehicle.com
2 11:30am Monday
- Testing in Scala
- Java: JVM
- Tags: jvm, scalatest, tdd, java, scalacheck, testing, specs, test_driven_development
- Daniel Hinojosa (evolutionnext.com)
attendance : 20 people
https://github.com/dhinojosa/testing-scala
ScalaTest
- created by Bill Venners
- JUnit, TestNG integration
- http://www.scalatest.org/
- Runs on
ant,sbt,maven
import org.scalatest.testing.TestNGSuite
import org.scalatest.matchers.MustMatchers
class X extends TestNGSuite with MustMatchers {
@Test
def testFoo() {
... must equal ("Lex")
}
}
ScalaTest with WordSpec
import org.scalatest.testing.{WordSpec, Spec}
import org.scalatest.matchers.MustMatchers
class X extends WordSpec with MustMatchers {
"An Employee should" {
"return the ..." in {
}
}
}
ScalaTest with FreeSpec
import org.scalatest.testing.FreeSpec
import org.scalatest.matchers.MustMatchers
class X extends FreeSpec with MustMatchers {
"An Employee" - {
"is a " - {
... must equal("lex")
}
}
}
ScalaTest with FeatureSpec
class X extends FeatureSpec with GivenWhenThen with ShouldMatchers {
feature("blah") {
info("when ...")
scenario("...") {
given("...")
val firstName = "Dan"
val lastName = "Hinojosa"
...
}
}
}
Specs2
- Created by Eric Torreborre
- http://etorreborre.github.com/specs2/
Specs2 DataTables
Specs2 Acceptance Specification
Specs2 Given-When-Then Acceptance Specification
ScalaCheck
- created by Rickard Nilsson
- inspired by QuickCheck (Haskell)
- http://code.google.com/p/scalacheck/
- run by itself or with
Scala2or … - generates data and tests
Borachio
- create by Paul Butcher
- mocking framework
- test traits and functions
- http://borachio.com
3 1:30pm - Above the Clouds: Introducing Akka
- Java: Cloud
- Tags: akka, scala, replication, high-availability, jvm, cloud, failover, java, scalability, grid, concurrency
- Martin Odersky (Typesafe)
attendance : 35 people
Problem : hard to build
- highly concurrent, scalable, fault-tolerant, self-healing systems
Akka is name of mountain in Sweden
spring | guice | camel | AMQP | REST | ....
fault tolerance : local actor supervision | remote actor super ...
scalability : client managed remote actors | server ma... | cluster mem
Concurrency : ACTORS STM AGENTS DATAFLOW
Used in
- finance, betting and gaming, telecon, simulation, e-commerce
3.1 basic Actor
3.1.1 Actor
- event driven thread that does behavior on state.
- Messages arrive async on queues
- fire-and-forget
- send and get future
3.1.2 Dataflow
- more expressive than functional
3.1.3 HotSwap
- define new behavior to be incorporate in Actor
- can
unbecome
3.2 Remote Actor
- NIO (Netty) and ProtoIO
- client initiated/managed
- server initiated/managed
- akka 2.0 decouples actor address from deployment
3.3 Fault-Tolerance
- erlang "let it crash" model
- have robust failure recovery
- classification of state/data
- scratch/transient
- state (supplied at boot or other components)
- dynamic
- can recompute
- critical: data from sources that is impossible to recompute
Question: Should we use akka actors instead of Scala actors?
- Answer: long term: YES
- they do plan to merge the two libraries in the future
4 2:20pm Monday - The Ghost in the Virtual Machine: A Reference to References
- Java: Trends
- Bob Lee (Square Inc.)
attendance : 65
Goals
- take mystery out of GC
- perform manual cleanup in right way
Reachable
- if a live thread can access it
- heapp roots: system classes, thread stacks, in-flight exceptions, JNI globals,
finalizer Q, interned String pool, …
Manual cleanup
- listeners, file descriptors, native mem, external state
- tools: finally, Object.finalize, references and reference queues
finally
- pros
- easy, handle exception in same thread, ensure cleanup keeps pace
- cons
- more work for programmer, error prone, cleanup in same thread
- ARM (automatic resource management) will help
finalizer
- GC will invoke when reclaiming that instance
- but: not guaranteed to run, undefined threading model, exceptions
are ignored, keeps objects alive longer than necessary, can
resurrect references, can make reclamation SLOW - mess up the reference API
- good for one thing: log warnings - but still SLOW
Reference API
- strong
- soft : for caching
- cleared when VM runs low on memory (LRU)
- for quick and dirty caching only
- no notion of "weight" : memory usage, computational time, CPU usage
- can exacerbate low memory conditions
- good example: cache reflection results
- weak : for fast cleanup (pre-finalizer)
- cleared as soon as no strong or soft refs remain
- cleared ASAP - before finalizer
- Not for cachine: use soft
- phantom : for safe cleanup (post-finalizer)
- enqueued after no other refs remain (post finalizer)
- can suffer similar problems to finalizers
- must be cleared manually, for no good reason (because of patent issue)
get()always returnsnull- so must use a reference queue
- use-case : for memory-mapped file
- note: ease-of-use via Guava libraries (cleanup in background thread)
- reference queues : for notifications
java.util.WeakHashMap- useful for emulating additional fields
- keeps weak refs to keys, strong refs to values
- not concurrent
- uses
equals(but should use ==)
- Guava
MapMaker- near drop-in replacement for
WeakHashMap - strong, soft, weak keys and/or value refs
- concurent
- uses ==
- supports on-demand computation
- supports size limiting
- near drop-in replacement for
5 3:30pm Monday - Future-proofing Collections: From Mutable to Persistent to Parallel
- Java: JVM
- Tags: parallel_programming, scala, jvm, craftsmanship, emerging_languages, collections
- Martin Odersky (Typesafe)
attendance: 60
history
- 2003: scala 1.0: no common organization
- 2005: scala 2.0: generic collection framework
- 2005-2009: bit rot
- 2009: scala 2.8: same API, but internally more composition/abstraction
collections
- de-emphasize destructive updates
- focus on transformers that map collections to collections
- persistent collections
Note: runs a scala REPL in a shell inside emacs (23) - yes!
map
- good for one or two small transformers
for
- more readable for more complex transformation
Demo of An Empirical Comparision of Seven Programming Languages
(IEEE Computer 33(10):23-29 (2000)) using Eclipse.
- 20 loc scala
- also see Joshua Bloch's Java solution
Scala 2.9 parallel collections
- split work by number of processors
- each thread has work queue
- …
- See: Cascade: J. Suereth, D. Mahler @ Google
- implemented in library (compiler does not know about it)
- extensible
Future of persistent collections
- easy to use, concise, safe, fast, scalable
User of a persistent library
- easy, intuitive
Creator of a persistent library
- hard
6 4:20pm Monday - QYZ: LaTeX, R and Redis for Beautiful Analytics
- Data: Analytics and Visualization
- Tags: latex, postgresql, analytics, data_scientists, r-lang, business, reporting, redis, statistics
- Noah Pepper (Lucky Sort), Homer Strong (Qmedtrix Systems)
attendance: 40
Goal: automate reporting on medical billing data
OSS stack: PostgreSQL, R, Redis, LaTeX
See: Hadley Wickham
Considerations for reporting graphics
- vector graphics preferable to scalar
- exception: many maps in single report
- ggplot2
- perfect for obsession with DRY
- encourage simplicity; discourages pie charts
- good legends
- but looking forward to ggplot3
- aligning different plots of same page
Considerations for reporting speed
- distributed task queue
- fast, simple
- most report components produced in parallel
- cache individual component outputs
Latex
- flexible
- but poor package management
- latex/r : sweave, brew, xtables
- rolled their own
Tables and Graphics
- different views of same model
- formatting tables is frustrating - no grammar of tables
- captions can be generated to explain the view
R + SQL
- No ORM for R
Good to use Sweave/Brew/Xtables great if you do not need LaTeX will or
do not need huge flexibility
Testing
- used
testthat(by Hadley) - testing forces you to encode your methodology somewhere
- Login or register to post comments
- Printer-friendly version
- haroldcarr's blog
- 1016 reads





