Skip to main content

OSCON Tuesday

Posted by haroldcarr on July 26, 2011 at 5:37 PM PDT

Tuesday, 07/26/2011

1 9:00am - Git Foundations

  • Tools and Techniques
  • Tags: master_class, version_control, source_code_control, github, open_source, git, dvcs, vcs
  • Tim Berglund (August Technology Group, LLC), Matthew McCullough
    (Ambient Ideas, LLC)

attendance : 110

# List the current config
git config --global --list

# new config
git config --global ""
git config --global "foo@bar.baz"

# see ~/.gitconfig
# can sync that file to other machines

# force files to be LF in the repo, even on Mac/Linux
git config --global core.autocflf input

# force windows to convert to platform on checkout
# and to LF on commit
git config --global core.autocrlf true

# editor
git config --global core.editor "emacs"

config levels (in precedence order - top highest):

  • local : config a setting in a .git repo
  • global : in the user's home dir
  • system : all users on the system
working    staging    repo
    <- checkout --------

# new git working directory
cd <somewhere>
git init project1
cd project1
git status

# add a file to repo
echo "foo" > bar.txt
git status
git add bar.txt
git status
git commit -m "commit comment"

# edit bar.txt, then:
git status
# add is not "adding new" it is "add activity: change, move, delete, ..."
git add *.txt
git status
git commit -m "update"

# edit bar.txt, then (see: content addressable file system)
git add -p bar.txt
# verbose commit (give you lots of info)
git commit -v
# skip staging area and go from modified files to commiting them
git commit -a

# view what is modified but not stage
git diff
# view what is staged but not commited
git diff --staged
# view what is modifed or staged but not committed
git diff HEAD

# history
git log
git log --stat
git log -p
git log --diff-filter=A
git log --pretty=raw
git log -3

Use magit for emacs + git

ignored  untracked  tracked  tracked  tracked
                    unmod    moded    staged
                      <--- checkout-----

# to ignore - this file anywhere in tree applies downward
# ! means to keep
# <foo>/ means directory
emacs .gitignore


2 10:40am - Taming the Big Data Fire Hose

  • Data: Real-Time and Streaming
  • Tags: real_time_traveler
  • John Hugg (VoltDB)

attendance: 25

big data define

  • velocity
  • volume
  • variety

This talk about velocity

  • lot of independent things happening at high frequency
  • want to update some state based on those events
  • want to query that state in real time - usually pre-defined queries
  • you want to record into persistent store after analysis
  • usually on a budget


  • finance trade, telco calls, micro tx, geo
-> ... events -> velocity --cooked events--> analytic store/engine
                  engine                           (TB+)

Velocity engine

  • validate
  • respond
  • count/aggregate
  • enrich

Started with H-Store (rethink RDBMS for 21st century). VoltDB

Workshop Tuesday at 5pm at Hilton.

Tables partitioned. Partitions put on machines. Stored procedures
ordered. Serialized to machines.

Concurrency by scheduling, not locking.

3 11:30am - Managing Thousands of Cloud Instances with Java

  • Java: Cloud
  • Tags: cloud_computing, java, server, database
  • Patrick Lightbody (wrote books -> jive/selenium -> gomez -> BrowserMob -> Neustar Webmetrics)

attendance: 25

example use case: load test: Spin up N browsers in cloud and hit your service

  • AWS SDK for Java
  • Typica
    • supports EC2, SNS, SQS, SimpleDB, FPS, CloudWatch
  • jclouds
    • abstraction in front of lots of cloud vendors
    • API based on Google Guice


  • architect with pricing structure in mind
    • e.g., no data transfer charge between EC2 and other AWS in same region
  • minimize the number of machine images
  • use user-data to self-configure
  • understand how EBS volumes work
  • use spot instance when you can
  • pick smart inputs and boundaires for autoscaling
    • use twillio
  • use IAM and consolidated billing
  • boot faster with EBS-backed instances
  • detect dead instances
    • telnet to port 22
  • the cloud is not infinite
  • be a good citizen

4 1:30pm - Google App Engine Workshop

  • Cloud Computing
  • Tags: chun, appengine, app_engine, core_python_, development, google_appengine, hosting, cloud, cloud_computing, computing, datastore, java, platform, nosql, scalability, business, enterprise, django, google, wesley_chun, google_app_engine, python
  • wesley chun (Google)

attendance: 110

  • SaaS
    • Google docs/spreadsheet, netsuite, IBM LotusLive,
  • PaaS
    • Rollbase, GAE,, Azure
  • IaaS
    • rackspace, joyent, vmware, AWS


  • build and test app locally
  • upload to GAE
  • GAE runs - not need to worry about machines, network, storage, scalability,…
  • "we wear pagers so you don't have to"

DIY hosting

  • idle capacity, patches/upgrades, license, maintenance, traffic, …


  • scalable infrastructure
    • Linux, GFS, Bigtable, Hardware
  • language runtimes
    • python, java (Scala, JRuby, Groovy, Rhino/JavaScript, Jython, Quercus/PhP), go
    • java
      • servlet (web app container), JDO/JPA (datastore API),, javax.mail, javax.cache (memcache)
  • web-based admin
    • logs, quota, data store, billing, health
  • SDK
    • run locally, deploy, versioning, …


  • BestBuy, ebay, Forbes, SocialWok, BuddyPoke, gigya, webFilings, …


  • Memcache
  • Datastore
  • URL Fetch
  • Mail
  • XMPP
  • Task Queue
  • Images
  • Blobstore
  • Users Service

5 2:20pm - Open Source Compiler Construction for the JVM

attendance: 20


  • stack-based architecture-independent VM
  • impls: Oracle/HotSpot, Apache/Harmony, OpenJDK, …


  • scala
  • has own standard library
  • runs on JVM (and .NET)

Apache BCEL

  • Emit JVM bytecode via API
  • could use other libraries besides BCEL

Compiler Architecture

  • scanner : tokenizer
  • parser : organizes tokens into Abstract Syntax Tree (AST)
  • semantic checks
  • code gen : traverse AST to produce target code

Parsing with Scala parser combinators

  • combine small functions to describe a language in pseudo-EBNF

Example: calculator BNF


  • VIM
  • apache builder (less horrible maven)

6 3:30pm - Using jQuery with Node.js

  • Node Day
  • Elijah Insua (None)

attendance: 30


  • scraping web sites

Isn't jQuery a browser library?

  • yes, but use jsdom 0.2.1 on service


  • tree of nodes representing/manipulate a document
  • level 1: foundation: document node, attribute, element, append/removeChild, getElementByTagName
  • level 2 core: namespaces, getElementById/ByNameNS
  • level 2 events: react to events; mutation events
  • level 2 html: a, form, div, img, …
  • level 3: normalize; compareDocumentPosition; get/setUserData, lookupNamespace

DOMWindow (aka window)

  • global context for javascript
  • location, self, frames, navigator, scren, getComputedStyle, …
jsdom.jQueryify or jsdom.env

Current impl has memory leak.

7 4:20pm - Lumberyard: Time Series Indexing at Scale

  • Data: Analytics and Visualization, Data: Hadoop, Data: NoSQL Databases
  • Tags: index, hadoop, data_scientists, hbase, nosql_nerd, timeseries, scaling_geek, search
  • Josh Patterson (Cloudera)

attendance: 40


  • time seriex iSAX indexing stroed in HBase for persistent/scalable
    index storage

Original Motivation

  • 120 sensors, 30 samples/second = 4.3B/day
  • needed to find "unbounded oscillations"
  • found "SAX" by Keogh for time series data

Time Series Data : time stamp + floating point value

Speed at scale is the killer app

Unstructured data explosion

HBase: BigTable-like storage for Hadoop

  • leverages HDFS as BigTable leveraged GFS

How to query time series with SQL?

iSAX and Time Series Data

  • Indexable Symbolic Aggregate approXimation
  • discretizes curves
  • Modifies SAX to allow extensible hashing and multi-level resolution
  • similar to a b-tree
    • nodes represent iSAX words
    • internal nodes and leaf nodes
    • leaf nodes fill up until reaching a threshold

100 million smples indexed, 1/2 TB of data

  • linear scan: 1800 minutes
  • exact iSAX: 90 minutes
  • approximate iSAX: 1.1 seceond


  • jmotif implements core iSAX
  • lumberyard implements storage backend in HBase
  • index size now scale up to TB

For fast fuzzy query lookups that don't need an exact match


Related Topics >>