@@ -0,0 +1,210 @@
session_analytics
=================
This PostgreSQL extension provides some functions for querying hstore arrays.
Functions
---------
* ** count\_ elements(elements hstore[ ] , pattern text)** . returns the number of
elements that match the given pattern.
* ** multi\_ count\_ elements(elements hstore[ ] , patterns text[ ] )** . returns an array
of integers which contains number of elements that match the corresponding
pattern. This is equivalent of multiple calls of count\_ elements, but is usually
faster.
* ** sum(int[ ] )** . aggregate function for summing vectors. This is useful for
aggregating result of multi\_ count\_ elements.
* ** filter\_ elements(elements hstore[ ] , pattern text)** . returns the subsequence
of elements that match the given pattern.
* ** contains\_ elements(elements hstore[ ] , patterns text[ ] )** . returns 1 if we
can find a subsequence of elements such that they match the corresponding
patterns. Otherwise, it returns 0.
* ** funnel\_ events(elements hstore[ ] , patterns text[ ] )** . returns an array with
length equal to length of patterns. if patterns[ 1..i] matches a subesequence of
patterns, then result[ i] is 1, otherwise result[ i] is 0.
* ** group\_ elements(elements hstore[ ] , fields text[ ] )** . groups the elements
by the value of the given fields, and returns a row per each group. Each group
is tuple (keys text[ ] , elements hstore[ ] ).
* ** group\_ over\_ time(elements hstore[ ] , fields text[ ] , time\_ field text,
truncation\_ unit text, client\_ offset interval)** . this function is similar to
group\_ elements, but also adds the value of the given time field to group
keys. The time field is shifted by the client offset and truncated by the given
unit before being add to the group keys.
* ** filter\_ groups(group group_elements_result, pattern text)** . filters out the
given group if it doesn't match the given pattern.
Deprecated Functions
--------------------
* ** cohort\_ over\_ time(elements hstore[ ] , fields text[ ] , time_field text,
truncation\_ unit text, client\_ offset interval, patterns text[ ] ,
min\_ sizes int[ ] )** . similar to group\_ over\_ time, but filters out the groups
who doesn't have minimum count of elements matching the given patterns.
This function now can be replaced with a nested call to filter\_ groups and
group\_ over\_ time.
Pattern Language
---------------
The pattern language below is used to match hstore objects using field based expressions
that can be combined using AND (",") and OR ("|") operators to create complex
expressions:
* ** field\_ name => value** . Is value of field\_ name equal to the given value?
* ** field\_ name =>not value** . Is value of field\_ name not equal to the given value?
* ** field\_ name =>gt value** , ** field\_ name =>gte value** ,
** field\_ name =>lt value** , ** field\_ name =>lte value** . int64 comparison between
value of field\_ name and given value.
* ** field\_ name =>contains value** . Is given value substring of the value of
field\_ name?
* ** field\_ name =>not_contains value** . Opposite of =>contains.
* ** field\_ name =>regex pattern** . Does value of field\_ name match the given regex
pattern?
* ** field\_ name =>like pattern** . Does value of field\_ name match the given like
pattern?
* ** field\_ name =>ilike pattern** . Similar to =>like, but case insensitive.
* ** A , B** . Does both A and B hold?
* ** A | B** . Does A or B hold?
* ** (A)** . Parentheses for arbitrary mixing AND and OR operators.
* values can be escaped and quoted.
The above pattern language is extended for filter\_ groups to support aggregate
functions:
* You can use ** function(pattern)** in place of field\_ name in any of the above
patterns.
* ** count(pattern)** . number of elements matching the pattern.
* ** sum(pattern;field\_ name)** . sum of field\_ name in elements matching the pattern.
* ** max(pattern;field\_ name)** . max of field\_ name in elements matching the pattern.
* ** min(pattern;field\_ name)** . min of field\_ name in elements matching the pattern.
* ** avg(pattern;field\_ name)** . integer part of average of field\_ name in elements
matching the pattern.
Examples
--------
These examples are based on a database where you have a schema like the following:
```
┌─────────────┬──────────┬───────────┐
│ Column │ Type │ Modifiers │
├─────────────┼──────────┼───────────┤
│ cust_id │ bigint │ │
│ cust_events │ hstore[] │ │
└─────────────┴──────────┴───────────┘
```
You can think of element in the cust_events array as representing a single event that a
customer generated. An example event might look like:
```
{
client => iphone-32.2.1a,
page_class => message,
action => send,
time => 1398363313
}
```
## Count occurences of certain types of events
Here is an example where you are counting the total number of message page views:
``` SQL
SELECT
sum (count_elements(cust_events, ' action=>view,page_class=>message' ))
FROM
cust_events;
```
## Count occurences of certain types of events grouped by additional event metadata
Here is an example of counting the types of clients cucstomers use to view or list
messages. Note the use of the '|' and ',' operators to do OR and AND operations and
also the use of '(...)' to group expressions.
``` SQL
SELECT
count (* ),
unnest(
filter_elements(cust_events, ' (action=>view,page_class=>message)|(action=>list,page_class=>message)' )
)- > ' client' as client,
FROM
cust_events
GROUP BY
client;
```
Another way to ask the same question illustrating how filters can be chained:
``` SQL
SELECT
count (* ),
unnest(
filter_elements(
filter_elements(cust_events, ' page_class=>message' ),
' action=>list|action=>view'
)
)- > ' client' as client,
FROM
cust_events
GROUP BY
client;
```
## Count customers who performaned certain actions over their lifetime
Here's an example illustrating the use of the count_elements function to count the number
of users who repeatedly viewed a message:
``` SQL
SELECT
count (* )
FROM
cust_events
WHERE
count_elements(cust_events, ' action=>view,page_class=>message' ) > 2 ;
```
## Listing customers who performed a sequence of actions
Say you run a retail application and want to list users who searched for a product, viewed
it, and went on to purchase:
``` SQL
SELECT
user_id
FROM
cust_events
WHERE
contains_elements(cust_events,
ARRAY[' action=>search' , ' action=>view,page_class=>product' , ' action=>purchase' ]
);
```
## Generating funnel queries
Now imagine an application similar to the one above but instead of a list of users you want
to know the counts of users that moved through various stages of your funnel:
``` SQL
SELECT
sum (contains_elements(cust_events, ARRAY[' action=>search' ]) as search,
sum (contains_elements(cust_events, ARRAY[' action=>search' ,
' action=>view,page_class=>product' ]) as view,
sum (contains_elements(cust_events, ARRAY[' action=>search' ,
' action=>view,page_class=>product' ,
' action=>purchase' ]) as purchase
FROM
cust_events;
```
Adding New Operators
--------------------
To add new operators:
1 . add the operator to the TOKENIZATION_REGEX,
2 . add it to the list of token string pound-definitions,
3 . add it to the OperatorCode enum,
4 . add it to the OperatorTokens mapping,
5 . For string operators, modify StringOperator()/StringOperatorResult()
functions.
6 . For numeric operators, modify NumericOperator()/NumericOperatorResult()
functions.
Copyright
---------
Copyright 2014-2016 Citus Data.
Licensed under the Apache License, Version 2.0.