sourcec0de · November 1, 2016 23:28 · Nov 1, 2016
diff --git a/Session_Analytics.md b/Session_Analytics.md
@@ -0,0 +1,210 @@
+session_analytics
+=================
+
+This PostgreSQL extension provides some functions for querying hstore arrays.
+
+Functions
+---------
+ * **count\_elements(elements hstore[], pattern text)**. returns the number of
+   elements that match the given pattern.
+ * **multi\_count\_elements(elements hstore[], patterns text[])**. returns an array
+   of integers which contains number of elements that match the corresponding
+   pattern. This is equivalent of multiple calls of count\_elements, but is usually
+   faster.
+ * **sum(int[])**. aggregate function for summing vectors. This is useful for
+   aggregating result of multi\_count\_elements.
+ * **filter\_elements(elements hstore[], pattern text)**. returns the subsequence
+   of elements that match the given pattern.
+ * **contains\_elements(elements hstore[], patterns text[])**. returns 1 if we
+   can find a subsequence of elements such that they match the corresponding
+   patterns. Otherwise, it returns 0.
+ * **funnel\_events(elements hstore[], patterns text[])**. returns an array with
+   length equal to length of patterns. if patterns[1..i] matches a subesequence of
+   patterns, then result[i] is 1, otherwise result[i] is 0.
+ * **group\_elements(elements hstore[], fields text[])**. groups the elements
+   by the value of the given fields, and returns a row per each group. Each group
+   is tuple (keys text[], elements hstore[]).
+ * **group\_over\_time(elements hstore[], fields text[], time\_field text,
+   truncation\_unit text, client\_offset interval)**. this function is similar to
+   group\_elements, but also adds the value of the given time field to group
+   keys. The time field is shifted by the client offset and truncated by the given
+   unit before being add to the group keys.
+ * **filter\_groups(group group_elements_result, pattern text)**. filters out the
+   given group if it doesn't match the given pattern.
+
+
+Deprecated Functions
+--------------------
+ * **cohort\_over\_time(elements hstore[], fields text[], time_field text,
+   truncation\_unit text, client\_offset interval, patterns text[],
+   min\_sizes int[])**. similar to group\_over\_time, but filters out the groups
+   who doesn't have minimum count of elements matching the given patterns.
+   This function now can be replaced with a nested call to filter\_groups and
+   group\_over\_time.
+
+
+Pattern Language
+---------------
+The pattern language below is used to match hstore objects using field based expressions 
+that can be combined using AND (",") and OR ("|") operators to create complex 
+expressions:
+ * **field\_name => value**. Is value of field\_name equal to the given value?
+ * **field\_name =>not value**. Is value of field\_name not equal to the given value?
+ * **field\_name =>gt value**, **field\_name =>gte value**,
+   **field\_name =>lt value**, **field\_name =>lte value**. int64 comparison between
+   value of field\_name and given value.
+ * **field\_name =>contains value**. Is given value substring of the value of
+   field\_name?
+ * **field\_name =>not_contains value**. Opposite of =>contains.
+ * **field\_name =>regex pattern**. Does value of field\_name match the given regex
+   pattern?
+ * **field\_name =>like pattern**. Does value of field\_name match the given like
+   pattern?
+ * **field\_name =>ilike pattern**. Similar to =>like, but case insensitive.
+ * **A , B**. Does both A and B hold?
+ * **A | B**. Does A or B hold?
+ * **(A)**. Parentheses for arbitrary mixing AND and OR operators.
+ * values can be escaped and quoted.
+
+The above pattern language is extended for filter\_groups to support aggregate
+functions:
+
+ * You can use **function(pattern)** in place of field\_name in any of the above
+   patterns.
+ * **count(pattern)**. number of elements matching the pattern.
+ * **sum(pattern;field\_name)**. sum of field\_name in elements matching the pattern.
+ * **max(pattern;field\_name)**. max of field\_name in elements matching the pattern.
+ * **min(pattern;field\_name)**. min of field\_name in elements matching the pattern.
+ * **avg(pattern;field\_name)**. integer part of average of field\_name in elements
+   matching the pattern.
+
+Examples
+--------
+These examples are based on a database where you have a schema like the following:
+
+```
+┌─────────────┬──────────┬───────────┐
+│   Column    │   Type   │ Modifiers │
+├─────────────┼──────────┼───────────┤
+│ cust_id     │ bigint   │           │
+│ cust_events │ hstore[] │           │
+└─────────────┴──────────┴───────────┘
+```
+
+You can think of element in the cust_events array as representing a single event that a
+customer generated. An example event might look like:
+
+```
+{
+    client      => iphone-32.2.1a,
+    page_class  => message,
+    action      => send,
+    time        => 1398363313
+}
+```
+
+## Count occurences of certain types of events
+Here is an example where you are counting the total number of message page views:
+
+```SQL
+SELECT
+    sum(count_elements(cust_events, 'action=>view,page_class=>message'))
+FROM
+    cust_events;
+```
+
+## Count occurences of certain types of events grouped by additional event metadata
+Here is an example of counting the types of clients cucstomers use to view or list
+messages. Note the use of the '|' and ',' operators to do OR and AND operations and 
+also the use of '(...)' to group expressions.
+
+```SQL
+SELECT
+    count(*),
+    unnest(
+        filter_elements(cust_events, '(action=>view,page_class=>message)|(action=>list,page_class=>message)')
+    )->'client' as client,
+FROM
+    cust_events
+GROUP BY
+    client;
+```
+
+Another way to ask the same question illustrating how filters can be chained:
+```SQL
+SELECT
+    count(*),
+    unnest(
+        filter_elements(
+            filter_elements(cust_events, 'page_class=>message'),
+            'action=>list|action=>view'
+        )
+    )->'client' as client,
+FROM
+    cust_events
+GROUP BY
+    client;
+
+```
+
+## Count customers who performaned certain actions over their lifetime
+Here's an example illustrating the use of the count_elements function to count the number
+of users who repeatedly viewed a message:
+
+```SQL
+SELECT
+    count(*)
+FROM
+    cust_events
+WHERE
+    count_elements(cust_events, 'action=>view,page_class=>message') > 2;
+```
+
+## Listing customers who performed a sequence of actions
+Say you run a retail application and want to list users who searched for a product, viewed
+it, and went on to purchase:
+```SQL
+SELECT
+    user_id
+FROM
+    cust_events
+WHERE
+    contains_elements(cust_events, 
+        ARRAY['action=>search', 'action=>view,page_class=>product', 'action=>purchase']
+    );
+```
+
+## Generating funnel queries
+Now imagine an application similar to the one above but instead of a list of users you want
+to know the counts of users that moved through various stages of your funnel:
+```SQL
+SELECT
+    sum(contains_elements(cust_events, ARRAY['action=>search']) as search,
+    sum(contains_elements(cust_events, ARRAY['action=>search', 
+                                             'action=>view,page_class=>product']) as view,
+    sum(contains_elements(cust_events, ARRAY['action=>search', 
+                                             'action=>view,page_class=>product', 
+                                             'action=>purchase']) as purchase
+FROM
+    cust_events;
+```
+
+Adding New Operators
+--------------------
+To add new operators:
+
+1. add the operator to the TOKENIZATION_REGEX,
+2. add it to the list of token string pound-definitions,
+3. add it to the OperatorCode enum,
+4. add it to the OperatorTokens mapping,
+5. For string operators, modify StringOperator()/StringOperatorResult()
+   functions.
+6. For numeric operators, modify NumericOperator()/NumericOperatorResult()
+   functions.
+
+
+Copyright
+---------
+Copyright 2014-2016 Citus Data.
+
+Licensed under the Apache License, Version 2.0.