rvanbruggen · December 6, 2021 20:12 · Dec 6, 2021
diff --git a/transactionbatching.mdx b/transactionbatching.mdx
@@ -0,0 +1,129 @@
+# Revisiting contact tracing with Neo4j 4.4's transaction batching capabilities
+
+<img src="https://icon2.cleanpng.com/20180419/vsw/kisspng-sinterklaas-netherlands-zwarte-piet-surprise-dutch-st-vector-5ad81a547e09d4.5770211115241119565163.jpg" align="right" width="150"></img> 
+
+Yes! It's been a few months, but Saint Nicholas just brought us a brand new and shiny release of [Neo4j 4.4](https://neo4j.com/blog/neo4j-4-4-the-fastest-path-to-graph-database-productivity-generally-available/) to play with. One of the key features is a _generic transaction batching_ capability, similar to what we have been using in `apoc.periodic.iterate` but now built right into the core of the database. It is referred to as the [CALL in Transaction](https://neo4j.com/docs/cypher-manual/current/introduction/transactions/) capability - and of course it is a really interesting feature. 
+
+So in this article I will be revisiting [this blogpost](http://blog.bruggen.com/2021/06/revisiting-covid-19-contact-tracing.html), but without the need for [APOC's `apoc.periodic.iterate` feature](https://neo4j.com/labs/apoc/4.2/overview/apoc.periodic/apoc.periodic.iterate/). Let's see how that goes.
+
+---
+
+## Create a synthetic contact tracing graph - size of Antwerp
+
+The first step of course is going to be similar to, if not exactly the same as, the work I did in 2020 on contact tracing. Take a look at (http://blog.bruggen.com/2020/06/what-recommender-systems-and-contact.html) to see how that went. The key thing to recall there is that I was using the fantastic `faker` plugin. You can download it yourself from the [github page](https://github.com/neo4j-contrib/neo4j-faker). Install is super easy. Just need to make sure the config is updated too - and that you whitelisted `fkr.*` just like you do with `gds.*` and `apoc.*`.
+
+As with the previous post, I will be pushing the scale up to the size of my home city of [Antwerp](www.antwerpen.be), Belgium. And critically, we would not even use APOC - but use the transaction batching instead.
+
+---
+
+## Create 500000 `(Person)` nodes
+Previously we did this in one transaction - which is probably at the limits of what I would normally do. But since we now have this _transaction batching_ mechanism in Cypher, let's use it:
+
+```cypher
+:auto UNWIND range(1,500000) as id
+    CALL {
+        WITH id
+            CREATE (p:Person {id: id})
+            SET p += fkr.person('1950-01-01','2021-12-01')
+            SET p.healthstatus = fkr.stringElement("Sick,Healthy")
+            SET p.confirmedtime = datetime()-duration("P"+toInteger(round(rand()*100))+"DT"+toInteger(round(rand()*10))+"H")
+            SET p.birthDate = datetime(p.birthDate)
+            SET p.addresslocation = point({x: toFloat(51.210197+rand()/100), y: toFloat(4.402771+rand()/100)})
+            SET p.name = p.fullName
+            REMOVE p.fullName
+        } IN transactions of 25000 ROWS;
+```
+This returns a little more slowly than a single shot transaction would, but that is to be expected. Here's the result:
+
+![](https://drive.google.com/uc?id=11ad66omLXh0yLNTMH2tLLEhx98RjIMhA)
+
+Then, we will create the (Place) nodes.
+
+---
+
+## Create 10000 `(Place)` nodes
+Adding the places is instantaneous, even with two batches of 5000:
+```cypher
+:auto UNWIND range (1,10000) as id
+CALL {
+    WITH id
+    CREATE (p:Place { id: id, name: "Place nr "+id})
+    SET p.type = fkr.stringElement("Grocery shop,Theater,Restaurant,School,Hospital,Mall,Bar,Park")
+    SET p.location = point({x: toFloat(51.210197+rand()/100), y: toFloat(4.402771+rand()/100)})
+    } IN transactions of 5000 rows;
+```
+The result looks like this:
+![](https://drive.google.com/uc?id=11cmuG7dUv9fVDfYr60JExpXAUmPCM0A9)
+
+---
+
+## Put in place some indexes on the NODES and future RELATIONSHIPS
+We don't really need them for this demo - but could be useful for other queries. Note that we are using the relationship-centric model here - as we proved in the last blogpost that this is at least as capable, and much simpler, as the reified model that used `(Visit)` nodes.
+
+So here we add the node indexes:
+```cypher
+CREATE INDEX placenodeid FOR (p:Place) ON (p.id);
+CREATE INDEX placenodelocation FOR (p:Place) ON (p.location);
+CREATE INDEX placenodename FOR (p:Place) ON (p.name);
+CREATE INDEX personnodeid FOR (p:Person) ON (p.id);
+CREATE INDEX personnodenam FOR (p:Person) ON (p.name);
+CREATE INDEX personnodehealthstatus FOR (p:Person) ON (p.healthstatus);
+CREATE INDEX personnodeconfirmedtime FOR (p:Person) ON (p.confirmedtime);
+```
+And we also the index to the `-[:VISITS]->` relationship property:
+```cypher
+CREATE INDEX visitrelstarttime FOR ()-[v:VISITS]->() ON (v.starttime);
+```
+
+![](https://drive.google.com/uc?id=11knLKRCXebDwfjoKuEx42gy0RaspX7hq)
+
+Now we can add the 1,5M relationships - the real test of the new transaction batching functionality.
+
+## Add 1500000 random visits to places
+It's pretty straightforward and similar to the previous examples, so let's just dive in:
+
+```cypher
+:auto UNWIND range(1,1500000) as iteration 
+CALL {
+    WITH iteration
+    MATCH (p:Person {id: toInteger(rand()*500000)+1}), (pl:Place {id:toInteger(rand()*10000)+1 })
+        create (p)-[virel:VISITS]->(pl)
+        set virel.starttime = datetime()-duration("P"+toInteger(round(rand()*100))+"DT"+toInteger(round(rand()*10))+"H")
+        set virel.endtime = virel.starttime + duration("PT"+toInteger(round(rand()*10))+"H"+toInteger(round(rand()*60))+"M")
+        set virel.visittime = duration.between(virel.starttime,virel.endtime)
+        set virel.visittimeinseconds = virel.visittime.seconds
+} IN TRANSACTIONS of 25000 rows;
+```
+The result was pretty quick: 75 seconds, not even!
+
+![](https://drive.google.com/uc?id=11lSa8gocymgvI_l5_pFy0c9U-SAZVJo8)
+
+---
+
+## Query on VISITS relationships
+Just for completeness, I will revisit the main query that we explored in the previous blogpost here as well. This is what that query looks like:
+
+```cypher
+match (p:Person)-[v:VISITS]->(pl:Place)
+where v.starttime > datetime()-duration("P20DT17H")
+and v.starttime < datetime()-duration("P20DT10H")
+return p.name, sum(v.visittime) as totalvisittime, sum(v.visittimeinseconds) as totalvisittimeinseconds
+order by totalvisittime desc
+limit 10;
+```
+
+![](https://drive.google.com/uc?id=11mmOgfGkZYok-1NSPMfgK8Zcb03kWRcg)
+
+---
+
+## Conclusion:
+The new transaction batching functionality makes for a great addition to our toolbox - and clear performs quite well. Looking forward to using it in other use cases, already!
+
+Cheers
+
+Rik Van Bruggen 
+- [Twitter](https://twitter.com/rvanbruggen) 
+- [Blog](http://blog.bruggen.com/)
+- [LinkedIn](https://www.linkedin.com/in/rikvanbruggen/)
+
+
No results found