-
-
Save jmgimeno/9bd65b0973099389dcd6 to your computer and use it in GitHub Desktop.
Revisions
-
philandstuff revised this gist
Jun 26, 2014 . 1 changed file with 456 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -436,9 +436,9 @@ - *you should embed /deeply/ into clojure* ** links - http://twitter.com/otfrom - http://cljsfiddle.net/fiddle/thattommyhall.geomlab.core - http://cljsfiddle.net/fiddle/thattommyhall.geomlab.demo - http://cljsfiddle.net/fiddle/thattommyhall.geomlab.bruce - http://www.complexityexplorer.org/ - http://cljsfiddle.net/fiddle/thattommyhall.ants.core - http://ccl.northwestern.edu/tortoise/2013-10-25/Ants.html @@ -817,10 +817,462 @@ - in general, if you leave it running over time and let the evidence build, it should be fine in the long run - but that is definitely a flaw * Tommi Reiman, Schema and Swagger to improve your web APIs ** super simple web api in clojure - just using compojure - "sausage" as example data - ~PUT /foo/sausage/:id~ - example: - in Java: immutable value object - in Scala: case class - in Clojure: - free-form map? - constructor fn with bunch of validation? - prismatic/schema! ** prismatic schema - define structure of sausage - then call ~s/validate~ to validate - schema can define functions #+begin_src clojure (s/defn get-sausage :- (s/maybe Sausage) [id :- Long] (@sausages id)) (s/defn ^:always-validate get-sausage2 :- Sausage [id :- Long] (@sausages id)) #+end_src *** schema coercion #+begin_src clojure (defmodel Pizza {:id Long :name String :price Double :hot Boolean (s/optional-key :description) String :toppings #{(s/enum :cheese :olives :ham :pepperoni :habanero)}}) #+end_src - allows slurping JSON data, but imposing extra types - eg above we can slurp toppings from a JSON array into a Clojure set rather than a vector *** double schema - loose schema for first input - ~(def Customer {...})~ - tighter schema for validated input - ~(def ValidCustomer (merge Customer {...}))~ *** schema selectors - accept but remove unrecognised params with ~select-schema~ *** generative schema - generate random orders for test data - davegolland/generative-schema.clj *** contribs - sfx/schema-contrib - cddr/integrity ** swagger - a specification for describing, producing, consuming, visualising RESTful web services - https://helloreverb.com/developers/swagger - existing adapters - clojure options: - octohipster - swag - ring-swagger - compojure-api - fnhouse-swagger - endpoint definitions in JSON - data models as a JSON Schema - swagger UI - visualises the API - code gen - no clojure support yet (anyone?) - swagger-socket - run it all on top of websockets ** ring-swagger - https://github.com/metosin/ring-swagger - JSON-Schema has some dates - but prismatic/schema will never support dates, as it's more generic - higher level abstractions on top of swagger, but nothing for the web developer ** compojure-api - an extendable web api lib on top of compojure - macros & middleware with good defaults - schema-based models & coercion - ~GET*~ macro to define input and output schemas ** fnhouse-swagger - prismatic/fnhouse - launched at clojure/west - ~defnk~ with metadata → annotated handler - fnhouse-swagger - metosin/fnhouse-swagger ** summary - schema is an awesome tool - describe, validate, coerce your data - building on top of ring-swagger - compojure-api → declarative web apis - fn-swagger → meta-data done right - or do your own! - kekkonen.io - CQRS-lib * Renzo Borgatti, The Compiler, the Runtime and other interesting beasts from the clojure codebase - http://twitter.com/reborg ** an amazing growth: - mar 2006: first commit - oct 2006: 30k loc (7 month old) - oct 2007: clojure announced! - oct 2008: invited to Lisp50 to celebrate 50 years of lisp - May 2009: 1.0 + book! - now: almost 90k loc ** initial milestones - apr 06: lisp2java sources - may 06: boot.clj appears - may 06: STM first cut - june 06: first persistent data structure - sep 06: java2java sources - aug 07: java2bytecode started - right after: almost all the rest: refs, lockingtx ** drew on lots of sources of knowledge - collection of papers ** high-level view: - ~(def lister (fn [& args] args))~ - read → analyse → emit/compile → compile - although the lines between the stages get blurred at times ** reader - takes stream, returns data structures - PersistentList, Symbol, etc ** analyser - input: data structure - output: exprs - DefExpr - Var - FnExpr - Sym - PersistentList - FnMethod - LocalBinding(Sym("args")), - BodyExpr - PersistentVector - LocalBindingExpr ** Emission - bytecode generation for Exprs - prerequisite for evaluation - emit() method in Expr interface - Notable exception: called over ?? ** Evaluation - transform Exprs into their "usable form" - eg - new object - a var - namespace - FnExpr is just getCompiledClass().newInstance ** Compilation - Usually coordination for emit - Compiler.compile namespace -> file - ... ** Emit - input: Exprs - output: bytecode ** monsters! *** RT - this is how the RT class gets initialised: the first time it gets referenced: #+begin_src java final static private Var REQUIRE = RT.var("clojure.core", "require"); #+end_src - simply referring to it here causes the static initializers to run - RT has a *lot* of behaviour in static initializers - inside it is the ~doInit();~ call - which loads all of ~clojure.core~ - all just from referring to RT in some otherwise unrelated class! *** Compiler - inner classes for each Expr type *** LispReader - inner classes for each token you might encounter - ~<clinit>~ - sets up reader macros - ~macros~ and ~dispatchMacros~ (latter for ~#{~ ~#(~ ~#_~ ~#^~ etc) *** analyze() - not a class, but a family of methods - ~analyzeSeq~ - ~new ConstantExpr~ - ~MapExpr.parse~ - FnExpr.parse - invokes the compiling phase during parsing phase *** emission - ASM lib used to generate bytecode - FnExpr.emitMethods() - generate a method for each of the arities of the function *** other beasts - LockingTransaction and Ref ** DynamicClassLoader - ~clojure.lang.DynamicClassLoader.findClass(String)~ - ~RT.classForName()~ - ~Compiler$HostExpr.maybeClass()~ - Class.forName() goes up the hierarchy of classloaders and asks each what they know - an instance of DynamicClassloader is created for each namespace - and also for each form - (this is true for the bootstrap phase; not always true eg in AOT (ahead-of-time) compilation) - supporting dynamicity - in defineClass: - ~classCache.put(name, new SoftReference(c,rq));~ - in findClass: - ~Reference<Class> cr = classCache.get(name);~ - SoftReferences are used to save PermGen, since if we redef a var we don't want it to keep consuming PermGen ** Bonus: clojure was initially implemented in lisp - ~1600 loc to implement read, analyse, compile, eval - although emitting Java code, not bytecode - was also generating C♯ ** Q: some things in bytecode can't be expressed in java - is there anything which clojure generates which can't be decompiled back to Java? - I'm pretty sure yes, but not sure exactly what - Rich: - locals-clearing - constructs which use goto (which exists in bytecode but not Java) * Rich Hickey, the insides of core.async channels ** aside: here's what clojure looks like in a good IDE - (ie IntelliJ) - yes, Compiler.java is massive - but if your IDE has a structure editor, you can navigate them all easily - it's all in one file because I don't want 300 files ** aside2: the classloader has a cache in a branch - fast-load branch ** warning! implementation details ahead - subject to change! - informational only ** the problems - single channel implementation - for use from both dedicated threads and go threads - simultaneously, on same channel - alt and atomicity - Java CSP libraries often didn't support alt well - it's tricky to do atomically - multi-reader/multi-writer - concurrency - construct deals with the ick of threads and mutexes - (this talk: focus on JVM impl; JS version has less of these issues) ** API - ~>!~ ~>!!~ ~put!~ ~alt!~ → channel → ~<!~ ~<!!~ ~take!~ ~alt!~ - it's not an RPC mechanism, it's just a conveyor belt ** SPI (service provider interface) - ~>!~ ~>!!~ ~put!~ ~alt!~ → ~impl/put! [val handler]~ → channel → ~impl/take! [handler]~ → ~<!~ ~<!!~ ~take!~ ~alt!~ ** anatomy - channel has: - pending puts (fifo) - a buffer (optional) in the middle - contains data - pending takes (fifo) - flag indicating if channel is closed - fifos implemented as linked queues - important to distinguish queues of operations from buffer of data ** invariants - never pending puts and takes simultaneously - never takes and anything in buffer - never puts and room in buffer - take! and put! use channel mutex - no global mutex - or even multi-channel mutex ** put! scenarios 1. one or more waiting take! operations - gets paired up, takes handler gets completed 2. stuff in the buffer, but with room in buffer - puts its stuff in the buffer, succeeds and immediately completes 3. buffer full (or no buffer) - enter puts queue, block - results in backpressure 4. full buffer, but windowed - sliding buffer: latest information takes priority, drop head of buffer (oldest item in fifo), put! completes immediately and enters buffer - dropping buffer: drop put! on floor, but completes immediately - could have more sophisticated policies in future ** take! scenarios 1. nothing in buffer - enqueued 2. buffer has stuff, but no puts waiting - get data, immediately complete 3. buffer full (or no buffer), puts pending - get something (either head of buffer or get paired with first put!) - first waiting put! completes (either enters buffer or hands directly to take!) ** close! scenario - all pending takes complete with nil (closed) - subsequent puts complete with nil (already closed) (relatively new) - subsequent takes consume ordinarily until empty - any pending puts complete with true - takes then complete with nil ** queue limits - puts and takes queues are not unbounded either - 1024 pending ops limit - somewhat arbitrary, might change - will throw if exceeded - if you're seeing this, it's an architecture smell - most likely if you use put! on the edge of your system ** alt(s!!) - attempts more than one op - on more than one channel - without global mutex - nor multi-channel locks - exactly one op can succeed *** implications - registration of handlers is *not* atomic - completion might occur before registrations are finished, or any time thereafter - completion of one alternative must 'disable' the others atomically - cleanup ** handlers - wrapper around a callback - callbacks are icky, so we want to hide them - SPI - active? - commit → callback-fn - lock-id → unique-id - ~java.util.concurrent.locks.Lock~: lock, unlock ** take/put handlers - simple wrapper on callback - lock is no-op - lock-id is 0 - active? always true - commit → the callback ** alt handlers - each op handler wraps its own callback, but delegates rest to shared "flag" handler - flag handler has lock - a boolean active? flag that starts true and makes one-time atomic transition - commit transitions shared flag and returns callback - must be called under lock ** alt concurrency - no global or multi-channel locking - but channel does multi-handler locking - some ops commit both a put and a take - lock-ids used to ensure consistent lock acquisition order - (avoids deadlock) ** alt cleanup - "disabled" handlers will still be in queues - channel ops purge ** SPI revisited - handler callback only invoked on async completion - only 2 scenarios - when not "parked", op happens immediately - /callback is not used/ - /non-nil return value is op return/ - only time ops park - put! when it gets blocked on full buffer - take! when it gets blocked on empty buffer - only time ops complete asynchronously - take! with pending puts - put! with pending takes ** wiring !/!! - blocking ops (!!) - create promise - callback delivers - only deref promise on nil return from op - non-nil indicates immediate success (and so callback never gets called) - parking go ops (!) - IOC state machine code is callback ** summary - you don't need to know any of this - but understanding the "machine" can help you make good decisions ** Q: why use alt! for putting? what's rationale? - taking multiple channels is like a select(2) - when you have consumers of different capabilities - I want to try to write to everyone, but whenever the first one is ready, I give it to them - Q: what's the difference between that and having four consumers on a single channel? - you might have a priority metric, or a cost metric - though yes sometimes you can achieve same result two different ways ** Q: why is global or multi-channel mutex not good enough? - well it would be easy! :) - a global mutex could make registration atomic - you'd have to make disabling other alts atomic - you'd have to make rendezvous atomic - you could have two unrelated sets of channel operations, why should they contend? - people hate global locks - rules out by my aesthetic sense :) ** Q: David Nolen had an example of 10000 go blocks updating a textarea, did he hit the 1024 limit? - no I don't think so, but not sure exactly ** Q: are buffer & queue sizes useful metrics to monitor? - that would be great, and making them monitorable is on the TODO list ** Q: other possible extensions? - buffer policies - you might have logic about priority - core.async has proven its utility and it's become important - ~go~ macro is a great PoC of what you can do with a macro with several kLoC behind it - has its own subcompiler inside it - kind of implements a subset of clojure - maybe build async support into the compiler? - move locals from the stack to fields on the method object - I don't need the stack anymore - I can be paused and resumed on another thread - declare a fn as async - comply with this SPI - could build other things like generators & yield - the pride moment of "look you can do this with a macro" is not dominated by the desire to make this performant and more solid - Q: continuations? how do they differ? - continuations are more general - this won't use continuation-passing-style - it's related - it won't be like call/cc - it won't be first-class - you won't be able to resume it more than once - for a specific set of use-cases - Oleg did a talk that just generators are enough to do stuff that people think you need a lot more for ** Q: is there something planned for dynamic binding and the ~go~ macro? - there are fns which allow you to do the conveyance - don't know if ~go~ allows all of them to work ** Q: channels on the network? - it's easy to have something you call a channel and put over a wire - pretty hard to have all the semantics of these channels over the wire - already have queues and all sorts of interfaces to do similar things - atomic alt! over more than one wire not going to happen - maybe semantics for ports - or limitations on alt! - the wire has its own semantics, this is the key thing here - failure, queueing, delays - really easy to just take something from the wire and call put! ** Q: is there a typical way to monitor a go block? - what kind of monitoring? - /see that it's still working, still alive?/ - if the channels were monitorable, you could see if things were producing/consuming properly ** Q: what other options did you consider & reject in the design of core.async - something other than CSP? - the generators stuff - continuations - I liked what golang did - they made a good choice - there's a java csp lib that impls the same kinds of ops - it's difficult to get the semantics correct - wanted ~alts!~ to be a regular fn, not syntax - which feels like an enhancement over go - what we're putting on these channels is immutable - which gives extra robustness -
philandstuff revised this gist
Jun 26, 2014 . 1 changed file with 369 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -455,3 +455,372 @@ - otherwise this is all a recent exploration - errors in cljsfiddle are not reported well - again problematic for day zero * Mathieu Gauthron, JVM-breakglass - using a clojure REPL to troubleshoot live java/JVM processes - http://slides-euroclojure2014.matlux.net - when you see fire, you break glass - when your jvm process is on fire, you use JVM-breakglass ** troubleshooting a java application - debugger - only powerful when you can narrow down the problem to a series of breakpoints - when the problem is a race condition, it will change the nature of the problem you're studying - log/print statements - you need to plan before compilation - when the problem is in production, it might be too late - jmx - again, you need to plan for it in advance - ad-hoc interactive mechanism ** what is jvm-breakglass - open source - integrates with any jvm process - console onto a jvm process ** main features - interactive prompt - see inside private members - call arbitrary methods - create new object instances - create new classes - monitor object state - no need to use clojure to develop the app ** how does it work? - jvm-breakglass runs inside the JVM and starts an nrepl server - you can then connect using an nrepl client (eg lein) ** how to use it? - add it to your maven dependencies - add an entry point (as a ~<bean>~ or in java code) - connect with ~lein repl :connect localhost:1112~ ** demo (enterprise application) - tomcat JVM - employee/dept data structure - report generation - java/spring mvc webapp - jvm-breakglass - spring data - in XML, naturally *** homepage - oh no! one of the reports isn't working? - "list employees in london" is empty - but we know that employee Mick Jagger lives in london - what's going on? *** breakglass to the rescue - view environment: - current directory, System/getProperties - view conf directory - list all loaded Spring beans - instrospect into object private members - ~bean~ builtin fn - ~to-tree~ to do so recursively - view methods or fields for a given object - redefine a class - in this case, ~(proxy [Address] ["1 Mayfair", "SW1", "London"] (getCity [] "London"))~ to define the new version, overriding a method - ~(.setAddress (:Mick employees) address)~ to inject it into the live data ** take a step back - remember what it's like to be a java programmer? - working with jmx beans and suchlike to try to understand why production is down - this stuff looks like magic ** Q: how do you convince production people to put nrepl server in place? - short answer: impossible - that's not how you present it - either you do it sneakily (that's bad), and only pull the trump card when the team is desparate - or you convince the team that it would be useful in the UAT environment, and "of course it's never going to be used in production" - ** Q: have you considered a high-level switch that would prevent you mutating anything in the host application? - don't know how you'd be able to do that - have been thinking about it - maybe using clojail - kind of defeats the point ** Q: have you tested this with a scala app? - haven't tried - I've reverse-engineered the java bytecode, and it's readable - as long as you know how it compiles, it seems reasonable ** Q: you were using methods like get-obj and passing string name. how does breakglass know which object to get? - eg if you have multiple instances of Department, how does it know *which* department? - in Spring it's a Spring bean which is named - if you're not using Spring, what's your entry point? - when you create your NreplServer to enable jvm-breakglass, you can add your entry points there - ~new NreplServer(port).put("department"),myObject);~ - static methods & fields can be used too * Gary Crawford, Using Clojure for Sentiment Analysis of the Twittersphere - leiningen versus the ants, carl stephenson - leiningen versus apache ant? - clojure versus java? - FP versus OO? ** stratified medicine - determine the best treatment for someone based on their genetic makeup to manage their chronic disease ** sentiment analysis - Paper: "Twitter mood predicts the stock market" - predicted Dow Jones average through monitoring tweets - people who suffer chronic disease tend to be neurocompromised - what would normally be a minor illness can prove fatal - can we use twitter to predict spread of disease? ** so we tried - score tweets for flu symptoms - the data science wasn't very difficult - but scaling it was - 30 million geo-tagged tweets sent from UK - couldn't scale, even with - HDFS/hadoop - mongo/aggregation - mongo/mapreduce - postgres ** how can we do fast, real-time analytics of social media? - application: how do people feel about Scotland's independence referendum? - data increases in value as we analyse it - tweets - analytically prepared data - analysis - insight - predictions - the raw data isn't what you care about - don't store the raw tweets, only store the analytically prepared data - stored in redis using ptaoussanis/carmine - it has great support for bitmaps ** example - ~(car/setbit sentiment tweet-id 1)~ - ~(car/bitcount "SCOTLAND")~ -- tells me how many tweets have mentioned Scotland - how many people in england are happy? #+begin_src clojure (wcar* (car/bitop "AND" "ENGLAND&JOVIALITY" "ENGLAND" "JOVIALITY") (car/expire "ENGLAND&JOVIALITY" 10) ;; don't keep the data longer than 10 seconds (car/bitcount "ENGLAND&JOVIALITY")) #+end_src - further: "how many people in Scotland are tired or grumpy?" ** getting the data in - adamwynne/twitter-api - you can specify you only want tweets from a certain geographical locality with a bounding box - but this is literally a rectangle - need it around Europe - LMAX-Exchange/disruptor to communicate - journaling - syncing - business logic *** what sentiment? - this is hard! - "I'm loving #EuroClojure! :D" - Positive Affect: enthusiastic, active, alert - Negative Affect: subjective distress - actually two separate dimensions, not opposites - Watson et al, 1988 - PANAS - then PANAS-x - then PANAS-t - accounts for bias on social media - outlines sanitisation - validate against 10 real events **** sanitisation - https://github.com/dakrone/clojure-opennlp - get rid of spam - account for text speak - account for emoticons and emoji - word stemming (or lemmatisation) - part of speech tagging *** where? reverse geocoding - don't want to rely on external services - don't want heavy IO - don't want round trips to database - accuracy not too much of a concern - we already lose accuracy in interpreting the sentiment of the tweet - convert a map of the uk to colours: - look up geocode coords in map - check colour → get country code - problem: the world is a sphere - projecting a sphere onto a rectangle - prior art in d3.js - use JavaFX to exploit it *** when? - there's a lot of seconds in a day - and even more seconds in a year - really not interested in seconds anyway - want to group tweets by minute - and also group by hour - and also group by day, and month, and year ** why? - why are we doing this? - online social media are surveillance - the line between public and private is becoming blurred - if we don't need data, we shouldn't collect it - in this example: - we're never more granular than country - we're never more granular than overall sentiment - we're never more granular than minute - hopefully this is enough to prevent anyone being identified - Datensparsamkeit ** Q: have you used Storm for this? - no ** Q: any preliminary results on the Scotland referendum analysis? - I've had more luck with tech than data science? ** Q: which way should we vote? - haha ** Q: how do you verify your results? - it's very crude at the moment? * Paul Ingles, Multi-armed Bandit Optimisation in Clojure - @pingles ** problem statement - product optimisation cycles are long, complex, and inefficient - the multi-armed bandit model shows lots of things we're getting wrong - eg: online newspapers - fundamentally human-led, editorially-led - people behave irrationally - Dan Ariely & Daniel Kahnemann - (@philandstuff suggestion: Stuart Sutherland, Irrationality) - economist subscription options 1. online $59 2. print $125 3. print & online $125 4. the ridiculousness of option 2. makes option 3. seem more reasonable - need machines to optimise at scale; but need humans to provide stuff only they can - running RCTs to optimise sites - doing so on a continuing basis - measuring big effects work with small numbers of participants - but measuring small effects requires ever larger numbers - to the extent that you can only run ~12 experiments a year - which is not really good enough ** Bandit strategies can help - a product for procrastinators by a procrastinator - Product: Notflix! - video website - http://notflix.herokuapp.com/ - shows 3 different videos - show good videos at top of page, and less good at bottom - show best possible thumbnail for each video - optimising with multi-armed bandits - optimising order and thumbnails ** multi-armed bandit problem - slot machine = one-armed bandit - problem: you have a bunch of money you want to "invest" in a casino - you have a number of different machines to play - each machine has a different probability of reward - you don't know what that probability is up front - need to balance "exploration" and "exploitation" - ie learning about the world vs using that knowledge to maximise income - analogy: trying new foods out vs sticking to what you like ** bandit model - number of *arms* {1, 2, ..., /K/ } - number of *trials*: 1, 2, ..., /T/ - *rewards*: {0,1} - /K/-headlines - options of different text - /K/-buttons - options of button text, colour, etc - /K/-pages - whole page redesigns - explore this space with notflix ** bandit strategy #+begin_src clojure ;; choose which arm to pull (defn select-arm [arms] ...) ;; update arm with feedback (defn pulled [arm] ...) (defn reward [arm x] ...) (defrecord Arm [name pulls value]) #+end_src *** ε-greedy - "hello world" algorithm - generally exploit - ε (epsilon) is the rate of exploration - eg if ε = 0.1, your strategy is: - with probability 10%, try a random arm with equal probability - with probability 90%, try the best arm based on current knowledge - if ε = 0, always exploit; if ε = 1, always explore - example with bernoulli-bandit #+begin_src clojure (bernoulli-bandit {:arm1 0.1 :arm2 0.1 :arm3 0.1 :arm4 0.1 :arm5 0.9}) #+end_src - with ε=0.2, you converge faster on the best arm - but ε=0.1, you exploit it more when you find it - once you've found the best arm, you should be able to double down - ie explore more at the beginning (when you have least knowledge) and less at the end - lots of extensions to ε-greedy to factor things like this in *** Thompson sampling - Arm model - Θ_k: Arm k's hidden true probability of reward (in range [0,1]) - can build a distribution for Θ_k based on current knowledge - small number of pulls means wide distribution; large number means narrow distribution - captures uncertainty in value of Θ_k - each iteration, take a random sample from each distribution, take the largest sample - algorithm naturally balances exploration/exploitation trade-off - the more it learns, the narrower the distributions get, and so the more likely it is to choose an arm with a higher expected value - incanter example - Thompson-sampling example with same Bernoulli-bandit from above - compared with ε-greedy, explores much more much earlier, and exploits much more later on - considered optimal convergence - we can use it to rank things (not just select) - take a sample from each arm distribution, then order arms by that value - in notflix, can use for ordering the videos we show ** applied to notflix - video rank bandit - for each video, a thumbnail bandit - at the end, the best video should be at the top - and each video should show the best thumbnail ** results - videos, worst to best - "hero of the coconut pain" - "100 Danes eat 1000 chillies" - "3 year-old with a portal gun" - thumbnail bandit data - "we built a fictional but /amazing/ product" ** links - [bandit/bandit-core "0.2.1-SNAPSHOT"] - https://github.com/pingles/bandit ** Q: this model assume bandits have same probability through time - can it readapt? - Thompson sampling does adapt - it won't change back as quickly ** Q: isn't there an interaction between the two bandits? - if the thumbnail is crappy, they might not click the video - made an assumption about this - in general, if you leave it running over time and let the evidence build, it should be fine in the long run - but that is definitely a flaw * Tommi ?, Schema and Swagger to improve your web APIs ** super simple web api in clojure ** prismatic schema ** swagger ** ring-swagger ** compojure-api ** fnhouse-swagger -
philandstuff created this gist
Jun 26, 2014 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,457 @@ #+TITLE: EuroClojure 2014, Krakow * Fergal Byrne, Clortex: Machine Intelligence based on Jeff Hawkins' HTM Theory - @fergbyrne - HTM = Hierarchical Temporal Memory ** big data - big data is like teenage sex - noone knows how to do it - everyone thinks everyone else is doing it - so everyone claims to be doing - (Dan Ariely) ** machine learning is important - people don't trust other people - they have their own agendas - so they place too much trust in machines ** asimov's take - we gain knowledge faster than we gain wisdom - applies to human knowledge - applies to data: gathering data is easy, drawing conclusions is not ** a problem in neuroscience - rate of papers published is growing exponentially - 2013: 1 every 32 minutes - 2014 so far: 1 every 17 minutes ** can AI learn from neuroscience? ** Jeff Hawkins' goals in HTM - Study the neocortex and establish its principles - open sourced NuPIC in 2013 ** neocortex - the wrinkly part at the surface of the brain - grey matter: processing - white matter: wiring - about 2mm thick, 10cm^2 in area - 30-50MM neurons - 1G connections - *hierarchical* - *uniform* - ie all looks physically the same - all regions have the same algorithm ** 6 key principles *** on-line learning from streaming data - up to 10 million senses feed the brain - we don't (can't) store this data - we build models from live data - models constantly updated *** hierarchy jof regions - sensory data enters at the bottom - models are built in every region - things change more slowly as you go up - hierarchy enables sequences of sequences - seq of waves - seq of phonemes - seq of words - seq of sentences - hierarchy works upwards and downwards *** sequence memory - all sensory data involves time - sequence memory allows predictions - structure in data elaborated over time - sequences can be c *** sparse distributed representations - in each region, many neruons, few active - SDRs represent spatial patterns - fault-tolerant, semantic ops, high-capacity - key to understanding & building intelligent systems *** all regions are both sensory and motor - behaviour provides context for sensory data - structure in model navigated via behaviour *** attention - use attention to manage the neocortex - planning and previsualisation - whole subhierarchies can be switched on and off ** layers of neocortex - from molecular upwards - around 5 or 6 *** neurons - distral dendrites detect coincidence of incoming activity from neighbouring cells - you don't just see what you're seeing now, you predict what you're going to see next - (reality is much more complicated, but this algorithm is sufficient to explain a lot) ** clortex *** background: numenta's nupic - in dev since 2005 - partially implements HTM/CLA - python/c++ - open source **** strengths - skilled dev team - eat their own dog food (grok uses nupic) - operates on subset of HTM/CLA principles - tunable using swarming on your data - works well on streaming scalar data (eg machine-generated) - great community -- http://numenta.org **** limitations - codebase has evolved as theory has developed - difficult/scary to rewrite for flexibility - OO with large, coupled, classes (~1500 LoC per class) - need to swarm to find parameters, no real-time control - not easy to extend beyonnd streaming scalar use case *** clortex requirements - directly analogous to HTM/CLA theory - transparenntly understandable source code - a neuroscientist should be able to read & review code - directly observable data - sufficiently performant - useful metrics - appropriate platform - portability - scalability *** architectural simplicity - first role: be useful! - best software is that which is not needed at all - human comprehension is king - if people can't understand your code, your code is not finished - unit tests are not sufficient in themselves - machine sympathy is queen - software is a process of R&D - software development is challenging & intellectual - more science than engineering - engineering: you have a good model already, you just have to plug in the particular parameters - science: there are a bunch of unknowns which you have to learn & understand *** #1: Just use data! - maps, vectors, sets - all done in a one-page datomic schema *** #2: Clojure & its ecosystem - clojure data not domain objects *** #3: russ miles' life preserver - everything either "core" or "integration" - core: a datomic database for the neocortex - core: each "patch" of neurons is a graph - integration: algorithms, encoders, classifiers, SDRs *** key clj libs & tools - datomic (+adi) - quil/processing - incanter - lein-midje-doc for literate documentation - hoplon-reveal-js for presentations - lighttable ** review - Big Data isn't just Machine Intelligence problem - HTM is exciting ** links - http://numenta.org - http://inbits.com - https://github.com/fergalbyrne/clortex - writing a leanpub book * Logan Campbell, Clojure at a Post OFfice ** history: - was at clojure user group - a guy turns up and says he's hiring a team of clojure developers - he was at Australia Post - a million lines of Java worked on by a team in India - wanted to bring it back in-house ** project: digital mailbox - big companies spend a lot of money sending out bills & junk mail - product to seamlessly replace that workflow - switch from physical mail to cheaper model - consumer can sign up to receive water bill online - I was brought on as the "clojure expert" - (I'd been playing with it for a couple of years) - drama: - the people they could hire: - really experienced java devs - keen on FP - they said as they were hiring "you might be doing clojure or you might be doing scala" - first few people were scala fans - scala v clojure battles - "we need static typing" - "we need OO for domain modelling" - "clojure is slow" (?) - "what framework do you use?" - "we need static typing? okay, we'll use core.typed" - domain modelling: - when people are used to domain modelling in OO, telling them to just use maps feels like a cop-out - records + protocols kind of feel like classes - wasn't til I showed them code I'd written and comparing it with their code that they realized that you can just use maps - online scala course - we did it as a team - I also did the exercises in clojure - did one exercise three different ways in clojure - conditional - match - stream processing - showed them my solutions - they already understood the problems because they'd solved them themselves - clojure performance was a surprise, because I'd come from ruby (!) - clojure is /fast/ - there was an underlying feeling that "we need scala for performance" - I'm a consultant, so was happy for the team to make the language decisions - "if you're keen on scala, let's find out a way to pitch it to management" - web stack: kept hearing "async async async" - felt like premature optimization - but still we used http-kit - benchmark started to allay fears that clojure was slow ** feature: make a payment on a bill - not necessarily a full payment : POST /bills/:bill-id/payments : Session: user-id : Post Data: amount - GET credit card token for user - POST request to payment gateway - GET how much left to be paid - if payment succeeds: display amount remaining - if payment fails: display error ** candidates solutions - synchronous promises - promise monad - lamina - etc etc ** solution 0: synchronous - http-kit's requests return a promise - just @deref the promise (blocks the thread) ** solution 1.1: promise monad - ~do~ is aware of promises - doesn't block thread, but waits for promise to be executed before continuing - felt natural way to write with promises - but incorrect: too much waiting, no concurrency ** solution 1.2: promise monad let/do - ~let~ to define promises - ~do~ to pseudo-block on them - introduces correctness but reduces readability ** solution 1.3: let/do/do - okay, let's step away from monads ** solution 2: (?) ** solution 3: raw promises - ~when~ to explicitly wait for a particular promise ** solution 4: raw callbacks - not viable - would have just written a hacky little promise library ** solution 5: core.async: - great! same shape as synchronous code, but correct concurrency ** solution 6: lamina - didn't feel totally suited to the situation ** solution 7: meltdown (LMAX disruptor based) - not appropriate ** solution 8: pulsar promises - looks exactly the same as the synchronous code, except for one character - pulsar rearranges your code at the bytecode level - uses JVM agents (normally used for tracing/debugging) - pass a fn to one of pulsar's functions - turns synchronous code to async code ** solution 9: pulsar actors - not appropriate ** winners - 0: synchronous - 5: core.async - 8: pulsar ** scala solution, for comparison - scala futures (basically promises) - all monadic - I don't understand it entirely - concise - battle of the benchmarks, fastest first - pulsar-async - pulsar-sync - core-async - raw-callback - scala-play-future (significantly less than others) ** CQRS (command-query responsibility segregation) - want fast reads - reduce number of queries - don't want to have to update write code every time we add a new reader ** structure - service A → cassandra → service B - custom triggers in cassandra in clojure (just drop in the .jar!) - publish to rabbitmq - notify index maintainer - write index to cassandra - service B reads from cassandra ** cassandra triggers - can just throw the clojure jar in there - everything is byte buffers - you need to know the type of all the fields out-of-band - not self-describing data at all ** microservices - I thought we would have a user service and a provider service and a mail service - but this gets tricky when you want data about users and providers - you need to split things much more fine grained - user service → - authentication - multi-factor auth - authorization - user profile - password reset - does it belong in user profile? - there's a bit of workflow here - send out email - get user to click link - enough to warrant its own service - drama: needed to talk to systems team to deploy - I did things badly - I didn't get anything into production in my 6 months there - systems team: we need monitoring and config and stuff - if we'd had something early on which had gone through these barriers, we would have had much less stress - benchmarks end petty arguments ** Q&A *** can you share some experience with monitoring & resilience? - appdynamics - classnames are expected to be java-style class names - clojure ones are close enough - clj-metrics to expose more high-level metrics - requests/second from ring - number of bills paid - appdynamics could pick it up from jmx - nomad for configuration *** with http-kit+core.async, what happens when server dies and there's loads of threads? - bottleneck was amount of memory - when server runs out, it slows down a lot - way to get around that is to monitor resources on your machine and ideally have autoscaling *** were the scala guys finally writing clojure in the end? - we have one person still hardcore for scala, but sees the merits of clojure - a few who did the online scala courses are clojure folks now - people who come from the java world of static typing feel they need that - but now they've written code that actually works, they're more comfortable with that now * Tom Hall, Escaping DSL Hell by having parens all the way down - @thattommyhall ** DSLs - languages made for specific purposes - config mgmt - science - learning - distinction between: - internal DSLs: embedded in another language - external DSLs: implemented in another language ** problems with puppet - zen of python: - namespaces are a honking great idea, let's do more of them! *** puppet namespaces - Exec['install'] in two different modules will result in a naming collision - fail :( - end up with Exec['tom::install'] but this is a hack *** iteration - file type lets you pass in an array - nagios_host doesn't - iteration is responsibility of type, not language - as far as I know *** but you need to know ruby anyway - if you want to extend puppet, you need ruby - if you need to know ruby, why do we bother with the puppet DSL in the first place? *** experimental features: lambdas and iteration - any language where lambdas arrive late is not a good language ** ansible - just YAML - oh wait, I might want to iterate - oh wait, I've got embedded ginger templates in my YAML strings - what's the scope of names in my templates? ** if you give people a "language" they will expect loops - maybe lambdas - probably namespaces - this has been done before ** chef gets it right - it's embedded in ruby - you get iteration and namespaces from ruby ** teaching people to program - if you design a language: - you need a parser, which is hard - you need an interpreter/compiler, which is hard - if you embed it, you get that stuff for free ** geomlab - minimal language for teaching - talks about pictures - intro to FP - gets you into recursion early on - ~man $ woman~ - "next to" - ~man & man~ - "on top of" - ~(man $ woman) $ tree~ = ~man $ (woman $ tree)~ - ~man $ (woman & tree)~ -- scales nicely to get a nice aspect ratio - learn about operator precedence - de morgan's laws - although not always held, due to scale - define functions : define manrow(n) = manrow(n-1) $ man when n>1 : ~ manrow(1) = man - builds up to an escher tiling - but once you've done that, where do we go? - only exists in this sim - if you want to extend it, you need java - "I'm really excited about FP now, but I've got nowhere to go" ** what if we did it in clojurescript? - let's use 'below and 'beside instead of $ and & - ~(below man woman)~ - ~(beside tree star)~ - http://cljsfiddle.net/fiddle/thattommyhall.geomlab.demo - let's say I want to change ~man~ -- what does it mean? - it's implemented in the same sort of language - I can see there's a url in there where I fetch an image from the internet - I know recursion, because I learned that from the geomlab exercises - I can extend the language itself ** science languages - R - wolfram alpha - maple - matlab - these things just aren't very good languages, even if they are good at their domain ** another problem with DSLs - netlogo - http://ccl.northwestern.edu/tortoise/2013-10-25/Ants.html - If you're based on applets, and Oracle drops applet support, you find you need to port your whole language to a new platform (in this case javascript) - again, reimplement in clojurescript? - anyone interested in hacking on this with me? ** conclusion - you probably don't need to make a new language - if you do it will probably be rubbish - at least for a while - think about power and reach - *you should embed /deeply/ into clojure* ** links - http://twitter.com/otfrom - http://cljsfiddle.net/fiddle/thattommyhall.ceomlab.core - http://cljsfiddle.net/fiddle/thattommyhall.ceomlab.demo - http://cljsfiddle.net/fiddle/thattommyhall.ceomlab.bruce - http://www.complexityexplorer.org/ - http://cljsfiddle.net/fiddle/thattommyhall.ants.core - http://ccl.northwestern.edu/tortoise/2013-10-25/Ants.html ** Q&A *** what makes a good first language? - clojure needs a better day 0 story - at some coder dojos where I've taught kids, some don't even know about files and folders - so if you say "open a terminal, cd into a directory" you've lost them - and it's not their fault *** have you had any kids look at your examples here? - I've done the geomlab example - otherwise this is all a recent exploration - errors in cljsfiddle are not reported well - again problematic for day zero