Skip to content

Instantly share code, notes, and snippets.

@jmgimeno
Forked from philandstuff/euroclojure2014.org
Created June 26, 2014 17:26
Show Gist options
  • Select an option

  • Save jmgimeno/9bd65b0973099389dcd6 to your computer and use it in GitHub Desktop.

Select an option

Save jmgimeno/9bd65b0973099389dcd6 to your computer and use it in GitHub Desktop.

Revisions

  1. @philandstuff philandstuff revised this gist Jun 26, 2014. 1 changed file with 456 additions and 4 deletions.
    460 changes: 456 additions & 4 deletions euroclojure2014.org
    Original file line number Diff line number Diff line change
    @@ -436,9 +436,9 @@
    - *you should embed /deeply/ into clojure*
    ** links
    - http://twitter.com/otfrom
    - http://cljsfiddle.net/fiddle/thattommyhall.ceomlab.core
    - http://cljsfiddle.net/fiddle/thattommyhall.ceomlab.demo
    - http://cljsfiddle.net/fiddle/thattommyhall.ceomlab.bruce
    - http://cljsfiddle.net/fiddle/thattommyhall.geomlab.core
    - http://cljsfiddle.net/fiddle/thattommyhall.geomlab.demo
    - http://cljsfiddle.net/fiddle/thattommyhall.geomlab.bruce
    - http://www.complexityexplorer.org/
    - http://cljsfiddle.net/fiddle/thattommyhall.ants.core
    - http://ccl.northwestern.edu/tortoise/2013-10-25/Ants.html
    @@ -817,10 +817,462 @@
    - in general, if you leave it running over time and let the
    evidence build, it should be fine in the long run
    - but that is definitely a flaw
    * Tommi ?, Schema and Swagger to improve your web APIs
    * Tommi Reiman, Schema and Swagger to improve your web APIs
    ** super simple web api in clojure
    - just using compojure
    - "sausage" as example data
    - ~PUT /foo/sausage/:id~
    - example:
    - in Java: immutable value object
    - in Scala: case class
    - in Clojure:
    - free-form map?
    - constructor fn with bunch of validation?
    - prismatic/schema!
    ** prismatic schema
    - define structure of sausage
    - then call ~s/validate~ to validate
    - schema can define functions

    #+begin_src clojure
    (s/defn get-sausage :- (s/maybe Sausage) [id :- Long]
    (@sausages id))

    (s/defn ^:always-validate get-sausage2 :- Sausage [id :- Long]
    (@sausages id))
    #+end_src

    *** schema coercion

    #+begin_src clojure
    (defmodel Pizza {:id Long
    :name String
    :price Double
    :hot Boolean
    (s/optional-key :description) String
    :toppings #{(s/enum :cheese :olives :ham :pepperoni :habanero)}})
    #+end_src
    - allows slurping JSON data, but imposing extra types
    - eg above we can slurp toppings from a JSON array into a Clojure
    set rather than a vector

    *** double schema

    - loose schema for first input
    - ~(def Customer {...})~
    - tighter schema for validated input
    - ~(def ValidCustomer (merge Customer {...}))~

    *** schema selectors

    - accept but remove unrecognised params with ~select-schema~

    *** generative schema

    - generate random orders for test data
    - davegolland/generative-schema.clj

    *** contribs

    - sfx/schema-contrib
    - cddr/integrity
    ** swagger
    - a specification for describing, producing, consuming, visualising RESTful web services
    - https://helloreverb.com/developers/swagger
    - existing adapters
    - clojure options:
    - octohipster
    - swag
    - ring-swagger
    - compojure-api
    - fnhouse-swagger
    - endpoint definitions in JSON
    - data models as a JSON Schema
    - swagger UI
    - visualises the API
    - code gen
    - no clojure support yet (anyone?)
    - swagger-socket
    - run it all on top of websockets
    ** ring-swagger
    - https://github.com/metosin/ring-swagger
    - JSON-Schema has some dates
    - but prismatic/schema will never support dates, as it's more
    generic
    - higher level abstractions on top of swagger, but nothing for the
    web developer
    ** compojure-api
    - an extendable web api lib on top of compojure
    - macros & middleware with good defaults
    - schema-based models & coercion
    - ~GET*~ macro to define input and output schemas
    ** fnhouse-swagger
    - prismatic/fnhouse
    - launched at clojure/west
    - ~defnk~ with metadata → annotated handler
    - fnhouse-swagger
    - metosin/fnhouse-swagger
    ** summary
    - schema is an awesome tool
    - describe, validate, coerce your data
    - building on top of ring-swagger
    - compojure-api → declarative web apis
    - fn-swagger → meta-data done right
    - or do your own!
    - kekkonen.io
    - CQRS-lib
    * Renzo Borgatti, The Compiler, the Runtime and other interesting beasts from the clojure codebase
    - http://twitter.com/reborg
    ** an amazing growth:
    - mar 2006: first commit
    - oct 2006: 30k loc (7 month old)
    - oct 2007: clojure announced!
    - oct 2008: invited to Lisp50 to celebrate 50 years of lisp
    - May 2009: 1.0 + book!
    - now: almost 90k loc
    ** initial milestones
    - apr 06: lisp2java sources
    - may 06: boot.clj appears
    - may 06: STM first cut
    - june 06: first persistent data structure
    - sep 06: java2java sources
    - aug 07: java2bytecode started
    - right after: almost all the rest: refs, lockingtx
    ** drew on lots of sources of knowledge
    - collection of papers
    ** high-level view:
    - ~(def lister (fn [& args] args))~
    - read → analyse → emit/compile → compile
    - although the lines between the stages get blurred at times
    ** reader
    - takes stream, returns data structures
    - PersistentList, Symbol, etc
    ** analyser
    - input: data structure
    - output: exprs
    - DefExpr
    - Var
    - FnExpr
    - Sym
    - PersistentList
    - FnMethod
    - LocalBinding(Sym("args")),
    - BodyExpr
    - PersistentVector
    - LocalBindingExpr
    ** Emission
    - bytecode generation for Exprs
    - prerequisite for evaluation
    - emit() method in Expr interface
    - Notable exception: called over ??
    ** Evaluation
    - transform Exprs into their "usable form"
    - eg
    - new object
    - a var
    - namespace
    - FnExpr is just getCompiledClass().newInstance
    ** Compilation
    - Usually coordination for emit
    - Compiler.compile namespace -> file
    - ...
    ** Emit
    - input: Exprs
    - output: bytecode
    ** monsters!
    *** RT
    - this is how the RT class gets initialised: the first time it gets
    referenced:
    #+begin_src java
    final static private Var REQUIRE = RT.var("clojure.core", "require");
    #+end_src
    - simply referring to it here causes the static initializers to run
    - RT has a *lot* of behaviour in static initializers
    - inside it is the ~doInit();~ call
    - which loads all of ~clojure.core~
    - all just from referring to RT in some otherwise unrelated class!
    *** Compiler
    - inner classes for each Expr type
    *** LispReader
    - inner classes for each token you might encounter
    - ~<clinit>~
    - sets up reader macros
    - ~macros~ and ~dispatchMacros~ (latter for ~#{~ ~#(~ ~#_~ ~#^~ etc)
    *** analyze()
    - not a class, but a family of methods
    - ~analyzeSeq~
    - ~new ConstantExpr~
    - ~MapExpr.parse~
    - FnExpr.parse
    - invokes the compiling phase during parsing phase
    *** emission
    - ASM lib used to generate bytecode
    - FnExpr.emitMethods()
    - generate a method for each of the arities of the function
    *** other beasts
    - LockingTransaction and Ref
    ** DynamicClassLoader
    - ~clojure.lang.DynamicClassLoader.findClass(String)~
    - ~RT.classForName()~
    - ~Compiler$HostExpr.maybeClass()~
    - Class.forName() goes up the hierarchy of classloaders and asks
    each what they know
    - an instance of DynamicClassloader is created for each namespace
    - and also for each form
    - (this is true for the bootstrap phase; not always true eg in
    AOT (ahead-of-time) compilation)
    - supporting dynamicity
    - in defineClass:
    - ~classCache.put(name, new SoftReference(c,rq));~
    - in findClass:
    - ~Reference<Class> cr = classCache.get(name);~
    - SoftReferences are used to save PermGen, since if we redef a
    var we don't want it to keep consuming PermGen
    ** Bonus: clojure was initially implemented in lisp
    - ~1600 loc to implement read, analyse, compile, eval
    - although emitting Java code, not bytecode
    - was also generating C♯
    ** Q: some things in bytecode can't be expressed in java
    - is there anything which clojure generates which can't be
    decompiled back to Java?
    - I'm pretty sure yes, but not sure exactly what
    - Rich:
    - locals-clearing
    - constructs which use goto (which exists in bytecode but not
    Java)
    * Rich Hickey, the insides of core.async channels
    ** aside: here's what clojure looks like in a good IDE
    - (ie IntelliJ)
    - yes, Compiler.java is massive
    - but if your IDE has a structure editor, you can navigate them
    all easily
    - it's all in one file because I don't want 300 files
    ** aside2: the classloader has a cache in a branch
    - fast-load branch
    ** warning! implementation details ahead
    - subject to change!
    - informational only
    ** the problems
    - single channel implementation
    - for use from both dedicated threads and go threads
    - simultaneously, on same channel
    - alt and atomicity
    - Java CSP libraries often didn't support alt well
    - it's tricky to do atomically
    - multi-reader/multi-writer
    - concurrency
    - construct deals with the ick of threads and mutexes
    - (this talk: focus on JVM impl; JS version has less of these
    issues)
    ** API
    - ~>!~ ~>!!~ ~put!~ ~alt!~ → channel → ~<!~ ~<!!~ ~take!~ ~alt!~
    - it's not an RPC mechanism, it's just a conveyor belt
    ** SPI (service provider interface)
    - ~>!~ ~>!!~ ~put!~ ~alt!~ → ~impl/put! [val handler]~ → channel →
    ~impl/take! [handler]~ → ~<!~ ~<!!~ ~take!~ ~alt!~
    ** anatomy
    - channel has:
    - pending puts (fifo)
    - a buffer (optional) in the middle
    - contains data
    - pending takes (fifo)
    - flag indicating if channel is closed
    - fifos implemented as linked queues
    - important to distinguish queues of operations from buffer of data
    ** invariants
    - never pending puts and takes simultaneously
    - never takes and anything in buffer
    - never puts and room in buffer
    - take! and put! use channel mutex
    - no global mutex
    - or even multi-channel mutex
    ** put! scenarios
    1. one or more waiting take! operations
    - gets paired up, takes handler gets completed
    2. stuff in the buffer, but with room in buffer
    - puts its stuff in the buffer, succeeds and immediately
    completes
    3. buffer full (or no buffer)
    - enter puts queue, block
    - results in backpressure
    4. full buffer, but windowed
    - sliding buffer: latest information takes priority, drop head
    of buffer (oldest item in fifo), put! completes immediately
    and enters buffer
    - dropping buffer: drop put! on floor, but completes immediately
    - could have more sophisticated policies in future
    ** take! scenarios
    1. nothing in buffer
    - enqueued
    2. buffer has stuff, but no puts waiting
    - get data, immediately complete
    3. buffer full (or no buffer), puts pending
    - get something (either head of buffer or get paired with first
    put!)
    - first waiting put! completes (either enters buffer or hands
    directly to take!)
    ** close! scenario
    - all pending takes complete with nil (closed)
    - subsequent puts complete with nil (already closed) (relatively
    new)
    - subsequent takes consume ordinarily until empty
    - any pending puts complete with true
    - takes then complete with nil
    ** queue limits
    - puts and takes queues are not unbounded either
    - 1024 pending ops limit
    - somewhat arbitrary, might change
    - will throw if exceeded
    - if you're seeing this, it's an architecture smell
    - most likely if you use put! on the edge of your system
    ** alt(s!!)
    - attempts more than one op
    - on more than one channel
    - without global mutex
    - nor multi-channel locks
    - exactly one op can succeed
    *** implications
    - registration of handlers is *not* atomic
    - completion might occur before registrations are finished, or any
    time thereafter
    - completion of one alternative must 'disable' the others
    atomically
    - cleanup
    ** handlers
    - wrapper around a callback
    - callbacks are icky, so we want to hide them
    - SPI
    - active?
    - commit → callback-fn
    - lock-id → unique-id
    - ~java.util.concurrent.locks.Lock~: lock, unlock
    ** take/put handlers
    - simple wrapper on callback
    - lock is no-op
    - lock-id is 0
    - active? always true
    - commit → the callback
    ** alt handlers
    - each op handler wraps its own callback, but delegates rest to
    shared "flag" handler
    - flag handler has lock
    - a boolean active? flag that starts true and makes one-time
    atomic transition
    - commit transitions shared flag and returns callback
    - must be called under lock
    ** alt concurrency
    - no global or multi-channel locking
    - but channel does multi-handler locking
    - some ops commit both a put and a take
    - lock-ids used to ensure consistent lock acquisition order
    - (avoids deadlock)
    ** alt cleanup
    - "disabled" handlers will still be in queues
    - channel ops purge
    ** SPI revisited
    - handler callback only invoked on async completion
    - only 2 scenarios
    - when not "parked", op happens immediately
    - /callback is not used/
    - /non-nil return value is op return/
    - only time ops park
    - put! when it gets blocked on full buffer
    - take! when it gets blocked on empty buffer
    - only time ops complete asynchronously
    - take! with pending puts
    - put! with pending takes
    ** wiring !/!!
    - blocking ops (!!)
    - create promise
    - callback delivers
    - only deref promise on nil return from op
    - non-nil indicates immediate success (and so callback never
    gets called)
    - parking go ops (!)
    - IOC state machine code is callback
    ** summary
    - you don't need to know any of this
    - but understanding the "machine" can help you make good decisions
    ** Q: why use alt! for putting? what's rationale?
    - taking multiple channels is like a select(2)
    - when you have consumers of different capabilities
    - I want to try to write to everyone, but whenever the first one
    is ready, I give it to them
    - Q: what's the difference between that and having four consumers
    on a single channel?
    - you might have a priority metric, or a cost metric
    - though yes sometimes you can achieve same result two
    different ways
    ** Q: why is global or multi-channel mutex not good enough?
    - well it would be easy! :)
    - a global mutex could make registration atomic
    - you'd have to make disabling other alts atomic
    - you'd have to make rendezvous atomic
    - you could have two unrelated sets of channel operations, why
    should they contend?
    - people hate global locks
    - rules out by my aesthetic sense :)
    ** Q: David Nolen had an example of 10000 go blocks updating a textarea, did he hit the 1024 limit?
    - no I don't think so, but not sure exactly
    ** Q: are buffer & queue sizes useful metrics to monitor?
    - that would be great, and making them monitorable is on the TODO
    list
    ** Q: other possible extensions?
    - buffer policies
    - you might have logic about priority
    - core.async has proven its utility and it's become important
    - ~go~ macro is a great PoC of what you can do with a macro with
    several kLoC behind it
    - has its own subcompiler inside it
    - kind of implements a subset of clojure
    - maybe build async support into the compiler?
    - move locals from the stack to fields on the method object
    - I don't need the stack anymore
    - I can be paused and resumed on another thread
    - declare a fn as async
    - comply with this SPI
    - could build other things like generators & yield
    - the pride moment of "look you can do this with a macro" is not
    dominated by the desire to make this performant and more solid
    - Q: continuations? how do they differ?
    - continuations are more general
    - this won't use continuation-passing-style
    - it's related
    - it won't be like call/cc
    - it won't be first-class
    - you won't be able to resume it more than once
    - for a specific set of use-cases
    - Oleg did a talk that just generators are enough to do stuff
    that people think you need a lot more for
    ** Q: is there something planned for dynamic binding and the ~go~ macro?
    - there are fns which allow you to do the conveyance
    - don't know if ~go~ allows all of them to work
    ** Q: channels on the network?
    - it's easy to have something you call a channel and put over a wire
    - pretty hard to have all the semantics of these channels over the
    wire
    - already have queues and all sorts of interfaces to do similar
    things
    - atomic alt! over more than one wire not going to happen
    - maybe semantics for ports
    - or limitations on alt!
    - the wire has its own semantics, this is the key thing here
    - failure, queueing, delays
    - really easy to just take something from the wire and call put!
    ** Q: is there a typical way to monitor a go block?
    - what kind of monitoring?
    - /see that it's still working, still alive?/
    - if the channels were monitorable, you could see if things were
    producing/consuming properly
    ** Q: what other options did you consider & reject in the design of core.async
    - something other than CSP?
    - the generators stuff
    - continuations
    - I liked what golang did
    - they made a good choice
    - there's a java csp lib that impls the same kinds of ops
    - it's difficult to get the semantics correct
    - wanted ~alts!~ to be a regular fn, not syntax
    - which feels like an enhancement over go
    - what we're putting on these channels is immutable
    - which gives extra robustness
  2. @philandstuff philandstuff revised this gist Jun 26, 2014. 1 changed file with 369 additions and 0 deletions.
    369 changes: 369 additions & 0 deletions euroclojure2014.org
    Original file line number Diff line number Diff line change
    @@ -455,3 +455,372 @@
    - otherwise this is all a recent exploration
    - errors in cljsfiddle are not reported well
    - again problematic for day zero
    * Mathieu Gauthron, JVM-breakglass
    - using a clojure REPL to troubleshoot live java/JVM processes
    - http://slides-euroclojure2014.matlux.net
    - when you see fire, you break glass
    - when your jvm process is on fire, you use JVM-breakglass
    ** troubleshooting a java application
    - debugger
    - only powerful when you can narrow down the problem to a series
    of breakpoints
    - when the problem is a race condition, it will change the nature
    of the problem you're studying
    - log/print statements
    - you need to plan before compilation
    - when the problem is in production, it might be too late
    - jmx
    - again, you need to plan for it in advance
    - ad-hoc interactive mechanism
    ** what is jvm-breakglass
    - open source
    - integrates with any jvm process
    - console onto a jvm process
    ** main features
    - interactive prompt
    - see inside private members
    - call arbitrary methods
    - create new object instances
    - create new classes
    - monitor object state
    - no need to use clojure to develop the app
    ** how does it work?
    - jvm-breakglass runs inside the JVM and starts an nrepl server
    - you can then connect using an nrepl client (eg lein)
    ** how to use it?
    - add it to your maven dependencies
    - add an entry point (as a ~<bean>~ or in java code)
    - connect with ~lein repl :connect localhost:1112~
    ** demo (enterprise application)
    - tomcat JVM
    - employee/dept data structure
    - report generation
    - java/spring mvc webapp
    - jvm-breakglass
    - spring data
    - in XML, naturally
    *** homepage
    - oh no! one of the reports isn't working?
    - "list employees in london" is empty
    - but we know that employee Mick Jagger lives in london
    - what's going on?
    *** breakglass to the rescue
    - view environment:
    - current directory, System/getProperties
    - view conf directory
    - list all loaded Spring beans
    - instrospect into object private members
    - ~bean~ builtin fn
    - ~to-tree~ to do so recursively
    - view methods or fields for a given object
    - redefine a class
    - in this case, ~(proxy [Address] ["1 Mayfair", "SW1", "London"]
    (getCity [] "London"))~ to define the new version, overriding
    a method
    - ~(.setAddress (:Mick employees) address)~ to inject it into
    the live data
    ** take a step back
    - remember what it's like to be a java programmer?
    - working with jmx beans and suchlike to try to understand why
    production is down
    - this stuff looks like magic
    ** Q: how do you convince production people to put nrepl server in place?
    - short answer: impossible
    - that's not how you present it
    - either you do it sneakily (that's bad), and only pull the trump
    card when the team is desparate
    - or you convince the team that it would be useful in the UAT
    environment, and "of course it's never going to be used in
    production"
    -
    ** Q: have you considered a high-level switch that would prevent you mutating anything in the host application?
    - don't know how you'd be able to do that
    - have been thinking about it
    - maybe using clojail
    - kind of defeats the point
    ** Q: have you tested this with a scala app?
    - haven't tried
    - I've reverse-engineered the java bytecode, and it's readable
    - as long as you know how it compiles, it seems reasonable
    ** Q: you were using methods like get-obj and passing string name. how does breakglass know which object to get?
    - eg if you have multiple instances of Department, how does it know *which* department?
    - in Spring it's a Spring bean which is named
    - if you're not using Spring, what's your entry point?
    - when you create your NreplServer to enable jvm-breakglass,
    you can add your entry points there
    - ~new NreplServer(port).put("department"),myObject);~
    - static methods & fields can be used too
    * Gary Crawford, Using Clojure for Sentiment Analysis of the Twittersphere
    - leiningen versus the ants, carl stephenson
    - leiningen versus apache ant?
    - clojure versus java?
    - FP versus OO?
    ** stratified medicine
    - determine the best treatment for someone based on their genetic
    makeup to manage their chronic disease
    ** sentiment analysis
    - Paper: "Twitter mood predicts the stock market"
    - predicted Dow Jones average through monitoring tweets
    - people who suffer chronic disease tend to be neurocompromised
    - what would normally be a minor illness can prove fatal
    - can we use twitter to predict spread of disease?
    ** so we tried
    - score tweets for flu symptoms
    - the data science wasn't very difficult
    - but scaling it was
    - 30 million geo-tagged tweets sent from UK
    - couldn't scale, even with
    - HDFS/hadoop
    - mongo/aggregation
    - mongo/mapreduce
    - postgres
    ** how can we do fast, real-time analytics of social media?
    - application: how do people feel about Scotland's independence
    referendum?
    - data increases in value as we analyse it
    - tweets
    - analytically prepared data
    - analysis
    - insight
    - predictions
    - the raw data isn't what you care about
    - don't store the raw tweets, only store the analytically prepared
    data
    - stored in redis using ptaoussanis/carmine
    - it has great support for bitmaps
    ** example
    - ~(car/setbit sentiment tweet-id 1)~
    - ~(car/bitcount "SCOTLAND")~ -- tells me how many tweets have
    mentioned Scotland
    - how many people in england are happy?
    #+begin_src clojure
    (wcar*
    (car/bitop "AND" "ENGLAND&JOVIALITY" "ENGLAND" "JOVIALITY")
    (car/expire "ENGLAND&JOVIALITY" 10) ;; don't keep the data longer than 10 seconds
    (car/bitcount "ENGLAND&JOVIALITY"))
    #+end_src

    - further: "how many people in Scotland are tired or grumpy?"
    ** getting the data in
    - adamwynne/twitter-api
    - you can specify you only want tweets from a certain geographical
    locality with a bounding box
    - but this is literally a rectangle
    - need it around Europe
    - LMAX-Exchange/disruptor to communicate
    - journaling
    - syncing
    - business logic
    *** what sentiment?
    - this is hard!
    - "I'm loving #EuroClojure! :D"
    - Positive Affect: enthusiastic, active, alert
    - Negative Affect: subjective distress
    - actually two separate dimensions, not opposites
    - Watson et al, 1988
    - PANAS
    - then PANAS-x
    - then PANAS-t
    - accounts for bias on social media
    - outlines sanitisation
    - validate against 10 real events
    **** sanitisation
    - https://github.com/dakrone/clojure-opennlp
    - get rid of spam
    - account for text speak
    - account for emoticons and emoji
    - word stemming (or lemmatisation)
    - part of speech tagging
    *** where? reverse geocoding
    - don't want to rely on external services
    - don't want heavy IO
    - don't want round trips to database
    - accuracy not too much of a concern
    - we already lose accuracy in interpreting the sentiment of the
    tweet
    - convert a map of the uk to colours:
    - look up geocode coords in map
    - check colour → get country code
    - problem: the world is a sphere
    - projecting a sphere onto a rectangle
    - prior art in d3.js
    - use JavaFX to exploit it
    *** when?
    - there's a lot of seconds in a day
    - and even more seconds in a year
    - really not interested in seconds anyway
    - want to group tweets by minute
    - and also group by hour
    - and also group by day, and month, and year
    ** why?
    - why are we doing this?
    - online social media are surveillance
    - the line between public and private is becoming blurred
    - if we don't need data, we shouldn't collect it
    - in this example:
    - we're never more granular than country
    - we're never more granular than overall sentiment
    - we're never more granular than minute
    - hopefully this is enough to prevent anyone being identified
    - Datensparsamkeit
    ** Q: have you used Storm for this?
    - no
    ** Q: any preliminary results on the Scotland referendum analysis?
    - I've had more luck with tech than data science?
    ** Q: which way should we vote?
    - haha
    ** Q: how do you verify your results?
    - it's very crude at the moment?
    * Paul Ingles, Multi-armed Bandit Optimisation in Clojure
    - @pingles
    ** problem statement
    - product optimisation cycles are long, complex, and inefficient
    - the multi-armed bandit model shows lots of things we're getting
    wrong
    - eg: online newspapers
    - fundamentally human-led, editorially-led
    - people behave irrationally
    - Dan Ariely & Daniel Kahnemann
    - (@philandstuff suggestion: Stuart Sutherland, Irrationality)
    - economist subscription options
    1. online $59
    2. print $125
    3. print & online $125
    4. the ridiculousness of option 2. makes option 3. seem more
    reasonable
    - need machines to optimise at scale; but need humans to provide
    stuff only they can
    - running RCTs to optimise sites
    - doing so on a continuing basis
    - measuring big effects work with small numbers of participants
    - but measuring small effects requires ever larger numbers
    - to the extent that you can only run ~12 experiments a year
    - which is not really good enough
    ** Bandit strategies can help
    - a product for procrastinators by a procrastinator
    - Product: Notflix!
    - video website
    - http://notflix.herokuapp.com/
    - shows 3 different videos
    - show good videos at top of page, and less good at bottom
    - show best possible thumbnail for each video
    - optimising with multi-armed bandits
    - optimising order and thumbnails
    ** multi-armed bandit problem
    - slot machine = one-armed bandit
    - problem: you have a bunch of money you want to "invest" in a
    casino
    - you have a number of different machines to play
    - each machine has a different probability of reward
    - you don't know what that probability is up front
    - need to balance "exploration" and "exploitation"
    - ie learning about the world vs using that knowledge to maximise
    income
    - analogy: trying new foods out vs sticking to what you like
    ** bandit model
    - number of *arms* {1, 2, ..., /K/ }
    - number of *trials*: 1, 2, ..., /T/
    - *rewards*: {0,1}
    - /K/-headlines
    - options of different text
    - /K/-buttons
    - options of button text, colour, etc
    - /K/-pages
    - whole page redesigns
    - explore this space with notflix

    ** bandit strategy

    #+begin_src clojure
    ;; choose which arm to pull
    (defn select-arm [arms]
    ...)

    ;; update arm with feedback
    (defn pulled [arm]
    ...)
    (defn reward [arm x]
    ...)

    (defrecord Arm [name pulls value])
    #+end_src

    *** ε-greedy
    - "hello world" algorithm
    - generally exploit
    - ε (epsilon) is the rate of exploration
    - eg if ε = 0.1, your strategy is:
    - with probability 10%, try a random arm with equal
    probability
    - with probability 90%, try the best arm based on current
    knowledge
    - if ε = 0, always exploit; if ε = 1, always explore
    - example with bernoulli-bandit
    #+begin_src clojure
    (bernoulli-bandit {:arm1 0.1 :arm2 0.1 :arm3 0.1 :arm4 0.1 :arm5 0.9})
    #+end_src

    - with ε=0.2, you converge faster on the best arm
    - but ε=0.1, you exploit it more when you find it
    - once you've found the best arm, you should be able to double down
    - ie explore more at the beginning (when you have least
    knowledge) and less at the end
    - lots of extensions to ε-greedy to factor things like this in

    *** Thompson sampling

    - Arm model
    - Θ_k: Arm k's hidden true probability of reward (in range
    [0,1])
    - can build a distribution for Θ_k based on current knowledge
    - small number of pulls means wide distribution; large number
    means narrow distribution
    - captures uncertainty in value of Θ_k
    - each iteration, take a random sample from each distribution,
    take the largest sample
    - algorithm naturally balances exploration/exploitation
    trade-off
    - the more it learns, the narrower the distributions get, and so
    the more likely it is to choose an arm with a higher expected
    value
    - incanter example
    - Thompson-sampling example with same Bernoulli-bandit from above
    - compared with ε-greedy, explores much more much earlier, and
    exploits much more later on
    - considered optimal convergence
    - we can use it to rank things (not just select)
    - take a sample from each arm distribution, then order arms by
    that value
    - in notflix, can use for ordering the videos we show
    ** applied to notflix
    - video rank bandit
    - for each video, a thumbnail bandit
    - at the end, the best video should be at the top
    - and each video should show the best thumbnail
    ** results
    - videos, worst to best
    - "hero of the coconut pain"
    - "100 Danes eat 1000 chillies"
    - "3 year-old with a portal gun"
    - thumbnail bandit data
    - "we built a fictional but /amazing/ product"
    ** links
    - [bandit/bandit-core "0.2.1-SNAPSHOT"]
    - https://github.com/pingles/bandit
    ** Q: this model assume bandits have same probability through time
    - can it readapt?
    - Thompson sampling does adapt
    - it won't change back as quickly
    ** Q: isn't there an interaction between the two bandits?
    - if the thumbnail is crappy, they might not click the video
    - made an assumption about this
    - in general, if you leave it running over time and let the
    evidence build, it should be fine in the long run
    - but that is definitely a flaw
    * Tommi ?, Schema and Swagger to improve your web APIs
    ** super simple web api in clojure
    ** prismatic schema
    ** swagger
    ** ring-swagger
    ** compojure-api
    ** fnhouse-swagger
  3. @philandstuff philandstuff created this gist Jun 26, 2014.
    457 changes: 457 additions & 0 deletions euroclojure2014.org
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,457 @@
    #+TITLE: EuroClojure 2014, Krakow

    * Fergal Byrne, Clortex: Machine Intelligence based on Jeff Hawkins' HTM Theory
    - @fergbyrne
    - HTM = Hierarchical Temporal Memory
    ** big data
    - big data is like teenage sex
    - noone knows how to do it
    - everyone thinks everyone else is doing it
    - so everyone claims to be doing
    - (Dan Ariely)
    ** machine learning is important
    - people don't trust other people
    - they have their own agendas
    - so they place too much trust in machines
    ** asimov's take
    - we gain knowledge faster than we gain wisdom
    - applies to human knowledge
    - applies to data: gathering data is easy, drawing conclusions is
    not
    ** a problem in neuroscience
    - rate of papers published is growing exponentially
    - 2013: 1 every 32 minutes
    - 2014 so far: 1 every 17 minutes
    ** can AI learn from neuroscience?
    ** Jeff Hawkins' goals in HTM
    - Study the neocortex and establish its principles
    - open sourced NuPIC in 2013
    ** neocortex
    - the wrinkly part at the surface of the brain
    - grey matter: processing
    - white matter: wiring
    - about 2mm thick, 10cm^2 in area
    - 30-50MM neurons
    - 1G connections
    - *hierarchical*
    - *uniform*
    - ie all looks physically the same
    - all regions have the same algorithm
    ** 6 key principles
    *** on-line learning from streaming data
    - up to 10 million senses feed the brain
    - we don't (can't) store this data
    - we build models from live data
    - models constantly updated
    *** hierarchy jof regions
    - sensory data enters at the bottom
    - models are built in every region
    - things change more slowly as you go up
    - hierarchy enables sequences of sequences
    - seq of waves
    - seq of phonemes
    - seq of words
    - seq of sentences
    - hierarchy works upwards and downwards
    *** sequence memory
    - all sensory data involves time
    - sequence memory allows predictions
    - structure in data elaborated over time
    - sequences can be c
    *** sparse distributed representations
    - in each region, many neruons, few active
    - SDRs represent spatial patterns
    - fault-tolerant, semantic ops, high-capacity
    - key to understanding & building intelligent systems
    *** all regions are both sensory and motor
    - behaviour provides context for sensory data
    - structure in model navigated via behaviour
    *** attention
    - use attention to manage the neocortex
    - planning and previsualisation
    - whole subhierarchies can be switched on and off
    ** layers of neocortex
    - from molecular upwards
    - around 5 or 6
    *** neurons
    - distral dendrites detect coincidence of incoming activity from
    neighbouring cells
    - you don't just see what you're seeing now, you predict what
    you're going to see next
    - (reality is much more complicated, but this algorithm is
    sufficient to explain a lot)
    ** clortex
    *** background: numenta's nupic
    - in dev since 2005
    - partially implements HTM/CLA
    - python/c++
    - open source
    **** strengths
    - skilled dev team
    - eat their own dog food (grok uses nupic)
    - operates on subset of HTM/CLA principles
    - tunable using swarming on your data
    - works well on streaming scalar data (eg machine-generated)
    - great community -- http://numenta.org
    **** limitations
    - codebase has evolved as theory has developed
    - difficult/scary to rewrite for flexibility
    - OO with large, coupled, classes (~1500 LoC per class)
    - need to swarm to find parameters, no real-time control
    - not easy to extend beyonnd streaming scalar use case
    *** clortex requirements
    - directly analogous to HTM/CLA theory
    - transparenntly understandable source code
    - a neuroscientist should be able to read & review code
    - directly observable data
    - sufficiently performant
    - useful metrics
    - appropriate platform
    - portability
    - scalability
    *** architectural simplicity
    - first role: be useful!
    - best software is that which is not needed at all
    - human comprehension is king
    - if people can't understand your code, your code is not
    finished
    - unit tests are not sufficient in themselves
    - machine sympathy is queen
    - software is a process of R&D
    - software development is challenging & intellectual
    - more science than engineering
    - engineering: you have a good model already, you just have to
    plug in the particular parameters
    - science: there are a bunch of unknowns which you have to
    learn & understand
    *** #1: Just use data!
    - maps, vectors, sets
    - all done in a one-page datomic schema
    *** #2: Clojure & its ecosystem
    - clojure data not domain objects
    *** #3: russ miles' life preserver
    - everything either "core" or "integration"
    - core: a datomic database for the neocortex
    - core: each "patch" of neurons is a graph
    - integration: algorithms, encoders, classifiers, SDRs
    *** key clj libs & tools
    - datomic (+adi)
    - quil/processing
    - incanter
    - lein-midje-doc for literate documentation
    - hoplon-reveal-js for presentations
    - lighttable
    ** review
    - Big Data isn't just Machine Intelligence problem
    - HTM is exciting
    ** links
    - http://numenta.org
    - http://inbits.com
    - https://github.com/fergalbyrne/clortex
    - writing a leanpub book
    * Logan Campbell, Clojure at a Post OFfice
    ** history:
    - was at clojure user group
    - a guy turns up and says he's hiring a team of clojure developers
    - he was at Australia Post
    - a million lines of Java worked on by a team in India
    - wanted to bring it back in-house
    ** project: digital mailbox
    - big companies spend a lot of money sending out bills & junk mail
    - product to seamlessly replace that workflow
    - switch from physical mail to cheaper model
    - consumer can sign up to receive water bill online
    - I was brought on as the "clojure expert"
    - (I'd been playing with it for a couple of years)
    - drama:
    - the people they could hire:
    - really experienced java devs
    - keen on FP
    - they said as they were hiring "you might be doing clojure or
    you might be doing scala"
    - first few people were scala fans
    - scala v clojure battles
    - "we need static typing"
    - "we need OO for domain modelling"
    - "clojure is slow" (?)
    - "what framework do you use?"
    - "we need static typing? okay, we'll use core.typed"
    - domain modelling:
    - when people are used to domain modelling in OO, telling them to
    just use maps feels like a cop-out
    - records + protocols kind of feel like classes
    - wasn't til I showed them code I'd written and comparing it with
    their code that they realized that you can just use maps
    - online scala course
    - we did it as a team
    - I also did the exercises in clojure
    - did one exercise three different ways in clojure
    - conditional
    - match
    - stream processing
    - showed them my solutions
    - they already understood the problems because they'd solved
    them themselves
    - clojure performance was a surprise, because I'd come from ruby (!)
    - clojure is /fast/
    - there was an underlying feeling that "we need scala for
    performance"
    - I'm a consultant, so was happy for the team to make the language
    decisions
    - "if you're keen on scala, let's find out a way to pitch it to
    management"
    - web stack: kept hearing "async async async"
    - felt like premature optimization
    - but still we used http-kit
    - benchmark started to allay fears that clojure was slow
    ** feature: make a payment on a bill
    - not necessarily a full payment

    : POST /bills/:bill-id/payments
    : Session: user-id
    : Post Data: amount

    - GET credit card token for user
    - POST request to payment gateway
    - GET how much left to be paid
    - if payment succeeds: display amount remaining
    - if payment fails: display error
    ** candidates solutions
    - synchronous promises
    - promise monad
    - lamina
    - etc etc
    ** solution 0: synchronous
    - http-kit's requests return a promise
    - just @deref the promise (blocks the thread)
    ** solution 1.1: promise monad
    - ~do~ is aware of promises
    - doesn't block thread, but waits for promise to be executed
    before continuing
    - felt natural way to write with promises
    - but incorrect: too much waiting, no concurrency
    ** solution 1.2: promise monad let/do
    - ~let~ to define promises
    - ~do~ to pseudo-block on them
    - introduces correctness but reduces readability
    ** solution 1.3: let/do/do
    - okay, let's step away from monads
    ** solution 2: (?)
    ** solution 3: raw promises
    - ~when~ to explicitly wait for a particular promise
    ** solution 4: raw callbacks
    - not viable
    - would have just written a hacky little promise library
    ** solution 5: core.async:
    - great! same shape as synchronous code, but correct concurrency
    ** solution 6: lamina
    - didn't feel totally suited to the situation
    ** solution 7: meltdown (LMAX disruptor based)
    - not appropriate
    ** solution 8: pulsar promises
    - looks exactly the same as the synchronous code, except for one
    character
    - pulsar rearranges your code at the bytecode level
    - uses JVM agents (normally used for tracing/debugging)
    - pass a fn to one of pulsar's functions
    - turns synchronous code to async code
    ** solution 9: pulsar actors
    - not appropriate
    ** winners
    - 0: synchronous
    - 5: core.async
    - 8: pulsar
    ** scala solution, for comparison
    - scala futures (basically promises)
    - all monadic
    - I don't understand it entirely
    - concise
    - battle of the benchmarks, fastest first
    - pulsar-async
    - pulsar-sync
    - core-async
    - raw-callback
    - scala-play-future (significantly less than others)
    ** CQRS (command-query responsibility segregation)
    - want fast reads
    - reduce number of queries
    - don't want to have to update write code every time we add a new
    reader
    ** structure
    - service A → cassandra → service B
    - custom triggers in cassandra in clojure (just drop in the .jar!)
    - publish to rabbitmq
    - notify index maintainer
    - write index to cassandra
    - service B reads from cassandra
    ** cassandra triggers
    - can just throw the clojure jar in there
    - everything is byte buffers
    - you need to know the type of all the fields out-of-band
    - not self-describing data at all
    ** microservices
    - I thought we would have a user service and a provider service and
    a mail service
    - but this gets tricky when you want data about users and providers
    - you need to split things much more fine grained
    - user service →
    - authentication
    - multi-factor auth
    - authorization
    - user profile
    - password reset
    - does it belong in user profile?
    - there's a bit of workflow here
    - send out email
    - get user to click link
    - enough to warrant its own service
    - drama: needed to talk to systems team to deploy
    - I did things badly
    - I didn't get anything into production in my 6 months there
    - systems team: we need monitoring and config and stuff
    - if we'd had something early on which had gone through these
    barriers, we would have had much less stress
    - benchmarks end petty arguments
    ** Q&A
    *** can you share some experience with monitoring & resilience?
    - appdynamics
    - classnames are expected to be java-style class names
    - clojure ones are close enough
    - clj-metrics to expose more high-level metrics
    - requests/second from ring
    - number of bills paid
    - appdynamics could pick it up from jmx
    - nomad for configuration
    *** with http-kit+core.async, what happens when server dies and there's loads of threads?
    - bottleneck was amount of memory
    - when server runs out, it slows down a lot
    - way to get around that is to monitor resources on your machine
    and ideally have autoscaling
    *** were the scala guys finally writing clojure in the end?
    - we have one person still hardcore for scala, but sees the merits
    of clojure
    - a few who did the online scala courses are clojure folks now
    - people who come from the java world of static typing feel they
    need that
    - but now they've written code that actually works, they're more
    comfortable with that now
    * Tom Hall, Escaping DSL Hell by having parens all the way down
    - @thattommyhall
    ** DSLs
    - languages made for specific purposes
    - config mgmt
    - science
    - learning
    - distinction between:
    - internal DSLs: embedded in another language
    - external DSLs: implemented in another language
    ** problems with puppet
    - zen of python:
    - namespaces are a honking great idea, let's do more of them!
    *** puppet namespaces
    - Exec['install'] in two different modules will result in a
    naming collision
    - fail :(
    - end up with Exec['tom::install'] but this is a hack
    *** iteration
    - file type lets you pass in an array
    - nagios_host doesn't
    - iteration is responsibility of type, not language
    - as far as I know
    *** but you need to know ruby anyway
    - if you want to extend puppet, you need ruby
    - if you need to know ruby, why do we bother with the puppet DSL
    in the first place?
    *** experimental features: lambdas and iteration
    - any language where lambdas arrive late is not a good language
    ** ansible
    - just YAML
    - oh wait, I might want to iterate
    - oh wait, I've got embedded ginger templates in my YAML strings
    - what's the scope of names in my templates?
    ** if you give people a "language" they will expect loops
    - maybe lambdas
    - probably namespaces
    - this has been done before
    ** chef gets it right
    - it's embedded in ruby
    - you get iteration and namespaces from ruby
    ** teaching people to program
    - if you design a language:
    - you need a parser, which is hard
    - you need an interpreter/compiler, which is hard
    - if you embed it, you get that stuff for free
    ** geomlab
    - minimal language for teaching
    - talks about pictures
    - intro to FP
    - gets you into recursion early on
    - ~man $ woman~ - "next to"
    - ~man & man~ - "on top of"
    - ~(man $ woman) $ tree~ = ~man $ (woman $ tree)~
    - ~man $ (woman & tree)~ -- scales nicely to get a nice aspect ratio
    - learn about operator precedence
    - de morgan's laws
    - although not always held, due to scale
    - define functions
    : define manrow(n) = manrow(n-1) $ man when n>1
    : ~ manrow(1) = man
    - builds up to an escher tiling
    - but once you've done that, where do we go?
    - only exists in this sim
    - if you want to extend it, you need java
    - "I'm really excited about FP now, but I've got nowhere to go"
    ** what if we did it in clojurescript?
    - let's use 'below and 'beside instead of $ and &
    - ~(below man woman)~
    - ~(beside tree star)~
    - http://cljsfiddle.net/fiddle/thattommyhall.geomlab.demo
    - let's say I want to change ~man~ -- what does it mean?
    - it's implemented in the same sort of language
    - I can see there's a url in there where I fetch an image from
    the internet
    - I know recursion, because I learned that from the geomlab
    exercises
    - I can extend the language itself
    ** science languages
    - R
    - wolfram alpha
    - maple
    - matlab
    - these things just aren't very good languages, even if they are
    good at their domain
    ** another problem with DSLs
    - netlogo
    - http://ccl.northwestern.edu/tortoise/2013-10-25/Ants.html
    - If you're based on applets, and Oracle drops applet support, you
    find you need to port your whole language to a new platform (in
    this case javascript)
    - again, reimplement in clojurescript?
    - anyone interested in hacking on this with me?
    ** conclusion
    - you probably don't need to make a new language
    - if you do it will probably be rubbish
    - at least for a while
    - think about power and reach
    - *you should embed /deeply/ into clojure*
    ** links
    - http://twitter.com/otfrom
    - http://cljsfiddle.net/fiddle/thattommyhall.ceomlab.core
    - http://cljsfiddle.net/fiddle/thattommyhall.ceomlab.demo
    - http://cljsfiddle.net/fiddle/thattommyhall.ceomlab.bruce
    - http://www.complexityexplorer.org/
    - http://cljsfiddle.net/fiddle/thattommyhall.ants.core
    - http://ccl.northwestern.edu/tortoise/2013-10-25/Ants.html
    ** Q&A
    *** what makes a good first language?
    - clojure needs a better day 0 story
    - at some coder dojos where I've taught kids, some don't even know
    about files and folders
    - so if you say "open a terminal, cd into a directory" you've
    lost them
    - and it's not their fault
    *** have you had any kids look at your examples here?
    - I've done the geomlab example
    - otherwise this is all a recent exploration
    - errors in cljsfiddle are not reported well
    - again problematic for day zero