Skip to content

Instantly share code, notes, and snippets.

@DDmitrid
Forked from federico-garcia/redis.md
Created March 9, 2018 08:34
Show Gist options
  • Select an option

  • Save DDmitrid/0bf3ebcc41d1a5d23cf8727d5ff52f28 to your computer and use it in GitHub Desktop.

Select an option

Save DDmitrid/0bf3ebcc41d1a5d23cf8727d5ff52f28 to your computer and use it in GitHub Desktop.
Redis

Redis

Redis stands for REmote DIctionary Server. By default, redis stores all data in memory. It's a key-structure database. redis-server is the actual datastore. redis-cli is the command line interface that performs any redis command. By default, redis binds to port 6379.

Starting the redis server

redis-server

While you can build a complete system using Redis only, I think most people will find that it supplements their more generic data solution - whether that be a traditional relational database, a document-oriented system, or something else. It’s the kind of solution you use to implement specific features.

Starting a redis container with persistent storage

docker run --name redis-test -d -v `{pwd}`/data:/data redis:alpine redis-server --appendonly yes

Starting a redis container with a custom config

docker run --name redis-test -d -v `{pwd}`/data:/data -v `{pwd}`/config/redis.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

Connecting to the redis server using redis-cli

docker run -it --link redis-test:redis-server --rm redis:alpine redis-cli -h redis-server -p 6379

Data structures

Redis exposes different data structures. Each one comes with a set of commands that run on the server in order to manipulate the data, this is very powerful since you don't have to read the value, change in the client and then send the altered value back to the server. You just tell the server what you want to do and everything happens in the server which is very performant. This is what set Redis aparts from other cache systems.

  • Strings
  • Lists
  • Hashes
  • Sets
  • Sorted Sets
  • Binary logs
  • HyperLogLog

Strings

It can store any type of data: text, integers, floats or binary data(video, image or audio). A String value cannot exceed 512 MB.

use cases

  • cache mechanisms. SET, GET, MSET and MGET
  • cache with automatic expiration. Very useful to cache DB queries that take a long time to run for a short period of time. SETEX, EXPIRE and EXPIREAT
  • Counting. e.g page views, likes, metrics. INCR, INCRBY, DECR, DECRBY and INCRFLOATBY.
$ redis-cli
127.0.0.1:6379> MSET first "First Key value" second "Second Key value"
OK
127.0.0.1:6379> MGET first second
1) "First Key value"
2) "Second Key value"
127.0.0.1:6379> SET current_chapter "Chapter 1"
OK
127.0.0.1:6379> EXPIRE current_chapter 10
(integer) 1
127.0.0.1:6379> GET current_chapter
"Chapter 1"
127.0.0.1:6379> TTL current_chapter
(integer) 3
127.0.0.1:6379> SET counter 100
OK
127.0.0.1:6379> INCR counter
(integer) 101
127.0.0.1:6379> INCRBY counter 5 
(integer) 106
127.0.0.1:6379> DECR counter
(integer) 105
127.0.0.1:6379> DECRBY counter 100
(integer) 5

Lists

Lists are linked lists, inserts/deletes from the beginning or the end run in constant time, O(1). Meaning, it doesn't depend on the length of the list. It could be memory optimized if it has less elements than list-max-ziplist-entries and each value is smaller in size than list-max-ziplist-value (bytes). The max number of entries is 2^32 - 1, more than 4 billions elements. List's indices are zero-based and can be positive or negative.

use cases

  • Event queue. e.g Kue.js
  • Storing most recent "something". e.g most recent user posts, news, user activity, etc. LPUSH, RPUSH, LLEN, LINDEX, LRANGE,LPOP,RPOP,RPOPPUSH
$ redis-cli
127.0.0.1:6379> LPUSH books "Clean Code"
(integer) 1
127.0.0.1:6379> RPUSH books "Code Complete"
(integer) 2
127.0.0.1:6379> LPUSH books "Peopleware"
(integer) 3
127.0.0.1:6379> LLEN books
(integer) 3
127.0.0.1:6379> LINDEX books 1
"Clean Code"
127.0.0.1:6379> LRANGE books 0 1
1) "Peopleware"
2) "Clean Code"
127.0.0.1:6379> LPOP books
"Peopleware"
127.0.0.1:6379> RPOP books
"Code Complete"

Hashes

Hashes are a great data structure for storing objects because you can map fields to values. It could be memory optimized if it has less elements than hash-max-ziplist-entries and each value is smaller in size than hash-max-ziplist-value (bytes). Internally, a Hash can be a ziplist or a hash table. A ziplist is a dually linked list designed to be memory efficient. In a ziplist, integers are stored as real integers rather than a sequence of characters. Although a ziplist has memory optimizations, lookups are not performed in constant time. On the other hand, a hash table has constant-time lookup but is not memory-optimized.

$ redis-cli
127.0.0.1:6379> HSET movie "title" "The Godfather"
(integer) 1
127.0.0.1:6379> HMSET movie "year" 1972 "rating" 9.2 "watchers" 10000000
OK
127.0.0.1:6379> HINCRBY movie "watchers" 3
(integer) 10000003
127.0.0.1:6379> HGET movie "title"
"The Godfather"
127.0.0.1:6379> HMGET movie "title" "watchers"
1) "The Godfather"
2) "10000003"
127.0.0.1:6379> HDEL movie "watchers"
(integer) 1
127.0.0.1:6379> HGETALL movie
1) "title"
2) "The Godfather"
3) "year"
4) "1972"
5) "rating"
6) "9.2"

It is possible to retrieve only the field names or field values of a Hash with the commands HKEYS and HVALS respectively.

Sets

A Set in Redis is an unordered collection of distinct Strings—it's not possible to add repeated elements to a Set. Internally, a Set is implemented as a hash table. The maximum number of elements that a Set can hold is 2^32 - 1, which means that there can be more than 4 billion elements per Set.

use cases

  • Data filtering
  • Data grouping
  • Membership checking
$ redis-cli
127.0.0.1:6379> SADD user:max:favorite_artists "Arcade Fire" "Arctic Monkeys" "Belle & Sebastian" "Lenine"
(integer) 4
127.0.0.1:6379> SADD user:hugo:favorite_artists "Daft Punk" "The Kooks" "Arctic Monkeys"
(integer) 3
127.0.0.1:6379> SINTER user:max:favorite_artists user:hugo:favorite_artists
1) "Arctic Monkeys"
127.0.0.1:6379> SDIFF user:max:favorite_artists user:hugo:favorite_artists
1) "Belle & Sebastian"
2) "Arcade Fire"
3) "Lenine"
127.0.0.1:6379> SUNION user:max:favorite_artists user:hugo:favorite_artists
1) "Lenine"
2) "Daft Punk"
3) "Belle & Sebastian"
4) "Arctic Monkeys"
5) "Arcade Fire"
6) "The Kooks"
127.0.0.1:6379> SRANDMEMBER user:max:favorite_artists
"Arcade Fire"
127.0.0.1:6379> SRANDMEMBER user:max:favorite_artists
"Lenine"
127.0.0.1:6379> SISMEMBER user:max:favorite_artists "Arctic Monkeys"
(integer) 1
127.0.0.1:6379> SREM user:max:favorite_artists "Arctic Monkeys"
(integer) 1
127.0.0.1:6379> SISMEMBER user:max:favorite_artists "Arctic Monkeys"
(integer) 0
127.0.0.1:6379> SCARD user:max:favorite_artists
(integer) 3
127.0.0.1:6379> SMEMBERS user:max:favorite_artists
1) "Belle & Sebastian"
2) "Arcade Fire"
3) "Lenine"

Sorted sets

a Sorted Set is a collection of nonrepeating Strings sorted by score. It is possible to have elements with repeated scores. In this case, the repeated elements are ordered lexicographically (in alphabetical order).

use cases

  • Build a real time waiting list for customer service
  • Show a leaderboard of a massive online game that displays the top players, users with similar scores, or the scores of your friends
  • Build an autocomplete system using millions of words
$ redis-cli
127.0.0.1:6379> ZADD leaders 100 "Alice"
(integer) 1
127.0.0.1:6379> ZADD leaders 100 "Zed"
(integer) 1
127.0.0.1:6379> ZADD leaders 102 "Hugo"
(integer) 1
127.0.0.1:6379> ZADD leaders 101 "Max"
(integer) 1

There is a family of commands that can fetch ranges in a Sorted Set: ZRANGE, ZRANGEBYLEX, ZRANGEBYSCORE, ZREVRANGE, ZREVRANGEBYLEX, and ZREVRANGEBYSCORE.

  • ZRANGE returns elements from the lowest to the highest score, and it uses ascending lexicographical order if a score tie exists
  • ZREVRANGE returns elements from the highest to the lowest score, and it uses descending lexicographical order if a score tie exists Both of these commands expect a key name, a start index, and an end index.
127.0.0.1:6379> ZREVRANGE leaders 0 -1
1) "Hugo"
2) "Max"
3) "Zed"
4) "Alice"
127.0.0.1:6379> ZREVRANGE leaders 0 -1 WITHSCORES
1) "Hugo"
2) "102"
3) "Max"
4) "101"
5) "Zed"
6) "100"
7) "Alice"
8) "100"
127.0.0.1:6379> ZREM leaders "Hugo"
(integer) 1
127.0.0.1:6379> ZSCORE leaders "Max"
"101"
127.0.0.1:6379> ZRANK leaders "Max"
(integer) 2
127.0.0.1:6379> ZREVRANK leaders "Max"
(integer) 0

Bitmaps

A Bitmap is not a real data type in Redis. Under the hood, a Bitmap is a String. We can also say that a Bitmap is a set of bit operations on a String. A Bitmap is a sequence of bits where each bit can store 0 or 1. You can think of a Bitmap as an array of ones and zeroes. Bitmaps are memory efficient, support fast data lookups, and can store up to 2^32 bits (more than 4 billion bits).

use cases Bitmaps are a great match for applications that involve real-time analytics, because they can tell whether a user performed an action (that is, "Did user X perform action Y today?") or how many times an event occurred (that is, "How many users performed action Y this week?"). Each user is identified by an ID, which is a sequential integer. Each Bitmap offset represents a user: user 1 is offset 1, user 30 is offset 30, and so on.

127.0.0.1:6379> SETBIT visits:2015-01-01 10 1
(integer) 0
127.0.0.1:6379> SETBIT visits:2015-01-01 15 1
(integer) 0
127.0.0.1:6379> SETBIT visits:2015-01-02 10 1
(integer) 0
127.0.0.1:6379> SETBIT visits:2015-01-02 11 1
(integer) 0
127.0.0.1:6379> GETBIT visits:2015-01-01 10
(integer) 1
127.0.0.1:6379> GETBIT visits:2015-01-02 15
(integer) 0
127.0.0.1:6379> BITCOUNT visits:2015-01-01
(integer) 2

HyperLogLog

Conceptually, a HyperLogLog is an algorithm that uses randomization in order to provide a very good approximation of the number of unique elements that exist in a Set. The Redis implementation of the HyperLogLog has a standard error of 0.81 percent.

use cases

  • Counting the number of unique users who visited a website
  • Counting the number of distinct terms that were searched for on your website on a specific date or time
  • Counting the number of distinct hashtags that were used by a user
  • Counting the number of distinct words that appear in a book A HyperLogLog uses up to 12 kB to store 100,000 unique visits (or any cardinality). On the other hand, a Set uses 3.2 MB to store 100,000 UUIDs that are 32 bytes each.
$ redis-cli
127.0.0.1:6379> PFADD visits:2015-01-01 "carl" "max" "hugo" "arthur"
(integer) 1
127.0.0.1:6379> PFADD visits:2015-01-01 "max" "hugo"
(integer) 0
127.0.0.1:6379> PFADD visits:2015-01-02 "max" "kc" "hugo" "renata"
(integer) 
127.0.0.1:6379> PFCOUNT visits:2015-01-01
(integer) 4
127.0.0.1:6379> PFCOUNT visits:2015-01-02
(integer) 4
127.0.0.1:6379> PFCOUNT visits:2015-01-01 visits:2015-01-02
(integer) 6
127.0.0.1:6379> PFMERGE visits:total visits:2015-01-01 visits:2015-01-02
OK
127.0.0.1:6379> PFCOUNT visits:total
(integer) 6

Replication

It means that while you write to a Redis instance, master, it will ensure that one or more instances, slaves, become exact copies of the master.

Starting the redis master instance

docker run --name redis-master -d redis:alpine redis-server

Starting 2 slave instances pointing to the master instance. By default, slave instances are read-only. There are three ways to set up replication:

  1. config file. slaveof <IP> <PORT>
  2. When starting teh server. redis-server --slaveof <IP> <PORT>
  3. Redis CLI. slaveof <IP> <PORT>
docker run --name redis-slave-1 --link redis-master:redis-master -d -v `{pwd}`/data/slave-1:/data redis:alpine redis-server --appendonly yes --slaveof redis-master 6379
docker run --name redis-slave-2 --link redis-master:redis-master -d -v `{pwd}`/data/slave-2:/data redis:alpine redis-server --appendonly yes --slaveof redis-master 6379

Connect to the master and make some changes to the dataset

docker run -it --link redis-master:redis-master --rm redis:alpine redis-cli -h redis-master -p 6379

Connect to the slave instances and double check you can read keys you created in the master instance

docker run -it --link redis-slave-1:redis-slave-1 --rm redis:alpine redis-cli -h redis-slave-1 -p 6379
docker run -it --link redis-slave-2:redis-slave-2 --rm redis:alpine redis-cli -h redis-slave-2 -p 6379

In case of a master failure scenario, an slave instance can be promoted to master. All clients should connect to the new master instance. Execute this in a slave instance:

slaveof no one

Sentinel

Redis sentinel is a distributed system designed to automatically promote a Redis slave to master if the existing master fails. One sentinel for each Redis server. Sentinel listens on its own port and is a separate process.

Partitioning

It's a general term to describe the act of breaking up data and distribute it across different hosts. In teb case of Redis, this means, distributing keys across different Redis instances (Horizontal partitioning or sharrding). This is useful when the total data to be stored is larger than the total memory available in a single Redis instance.

Partitioning types:

  1. Range. Data is distributed based on a range of keys.
  2. Hash. It consists in finding the instance to send the commands by applying a hash function to the Redis key.
  3. Consistent hashing. Best option.

Different ways to implement partitioning:

  • The client layer. Your own implementation.
  • The proxy layer. It's an extra layer that proxies all redis queries and performs partitioning for applications. e.g twemproxy, also read this, this
  • The query router layer. It's implemented in the data store itself. e.g Redis Cluster

Tagging

It's a technique of ensuring that keys are stored on the same server. The convention is to add a tag to a key name with the tag name inside curly braces.

users:1{users}
users:3{users}

Redis Cluster

Official documentation

It was designed to automatically shard data across different Redis instances and perform automatic failover if any problems happens to any master instance. It uses to ports, lower (for client connections) and higher (node-to-node communication).

It requires at least 3 master instances. It's recommended that you have at least one replica per master.

When connecting to a Redis cluster using the redis-cli, the -c parameter is required to enable cluster mode.

redis-cli -c -h <hostname or IP> -p <port-number>

The data partitioning method used is called hash slot. Each master in a cluster owns a portion of the 16384 slots. A master without any slots won't be able to store any data. You need to manually assign x number of slots to each master.

HASH_SLOT = CRC16(key) mod 16384

hash tags are used to apply the hast function and ensure than different key names end up in the same hash slot. In the following example, all keys would be stored in the same slot based on the hash tag {user123}.

SADD {user123}:friends:usa "John" "Bob"
SADD {user123}:friends:brazil "Max" "Hugo"

Creating a cluster

Since the redis instances need to be able to connect to each other, we should create a docker network they can join

docker network create redis-cluster-network

Creating 3 redis instances in cluster mode

docker run --name redis-master-1 --network redis-cluster-network -d -v `{pwd}`/data/master-1:/data -v `{pwd}`/config/redis-cluster-master-1.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

docker run --name redis-master-2 --network redis-cluster-network -d -v `{pwd}`/data/master-2:/data -v `{pwd}`/config/redis-cluster-master-2.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

docker run --name redis-master-3 --network redis-cluster-network -d -v `{pwd}`/data/master-3:/data -v `{pwd}`/config/redis-cluster-master-3.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

Listing all redis master nodes

docker container ps

Connecting to a master node and getting information about the cluster. It should report the cluster state as fail since we're not done setting up the cluster.

docker container exec -it redis-master-3 /bin/sh 
redis-cli -c
cluster info

Next, we should distribute the 16384 slots evenly across all 3 Redis instances. The cluster addslots informs the node what slots it should own.

Note: Since we will use bash text expansion {0..5460}, it's bash trick, If you need to install bash on Linux alpine, do the following:

apk update
apk add bash
bash

Assigning the slots each redis instance should own. Slots are where keys will be stored based on the key's hash. In order to allow redis cluster to start in a safe way, we should manually change the configuration epoch. Note: don't do this again, this is the only time when you need to change the configuration epoch.

docker container exec -it redis-master-1 /bin/sh 
redis-cli -c cluster addslots {0..5460}
redis-cli -c cluster set-config-epoch 1

docker container exec -it redis-master-2 /bin/sh 
redis-cli -c cluster addslots {5461..10922}
redis-cli -c cluster set-config-epoch 2

docker container exec -it redis-master-3 /bin/sh 
redis-cli -c cluster addslots {10923..16383}
redis-cli -c cluster set-config-epoch 3

Making all redis instances aware of each other so they can exchange information. e.g on redis-master-1 execute:

redis-cli -c cluster meet <redis-master-2 IP> 6379
redis-cli -c cluster meet <redis-master-3 IP> 6379

Double-checking the cluster is up and running:

redis-cli -c cluster info

cluster_state:ok                                                                                                       
cluster_slots_assigned:16384                                                                                           
cluster_slots_ok:16384                                                                                                 
cluster_slots_pfail:0                                                                                                  
cluster_slots_fail:0                                                                                                   
cluster_known_nodes:3                                                                                                  
cluster_size:3                                                                                                         
cluster_current_epoch:3     
cluster_my_epoch:1                                                                                                    
cluster_stats_messages_sent:191
cluster_stats_messages_received:191

Now that the cluster is up and running, let's add a key for testing sake:

  1. Connect to any redis instance in the cluster
  2. Create a key e.g set cebroker:dev:test-cluster "Yay!"
  3. Connect to all redis instances and try to get the newly created key. e.g get cebroker:dev:test-cluster
redis-cli -c
set cebroker:dev:test-cluster "Yay!"
get cebroker:dev:test-cluster

Adding replicas to the master Redis instances. So far, we have 3 Redis masters but not slaves, we should have at least one slave per master, and even having one or two extra slaves above the minimum required (cluster-migration-barrier) is recommended.

  1. Create a new Redis inatnce in cluster mode
docker run --name redis-slave-1 --network redis-cluster-network -d -v `{pwd}`/data/slave-1:/data -v `{pwd}`/config/redis-cluster-slave-1.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

docker run --name redis-slave-2 --network redis-cluster-network -d -v `{pwd}`/data/slave-2:/data -v `{pwd}`/config/redis-cluster-slave-2.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

docker run --name redis-slave-3 --network redis-cluster-network -d -v `{pwd}`/data/slave-3:/data -v `{pwd}`/config/redis-cluster-slave-3.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf
  1. Add the new Redis instance to the cluster using cluster meet
docker container exec -it redis-slave-1 /bin/sh
redis-cli -c cluster meet 172.19.0.2 6379

docker container exec -it redis-slave-2 /bin/sh
redis-cli -c cluster meet 172.19.0.2 6379

docker container exec -it redis-slave-3 /bin/sh
redis-cli -c cluster meet 172.19.0.2 6379
  1. Getting the node ID of the master that it'll be replicated. cluster nodes outputs a list of all the nodes that belong to the cluster, alogn with their properties. The node ID is the first string that is displayed in each row.
redis-cli -c cluster nodes
  1. Start the replication by using the command cluster replicate <master-node-id>
-- Slave 1
redis-cli -c cluster replicate 7e78c9a76ee462350a064694683fae266b1afc3a

-- Slave 2
redis-cli -c cluster replicate 2eb1abc6c8ad9a98333eeb1dafe088748ecf97d5

--Slave 3
redis-cli -c cluster replicate b749483152945869cdd062cb29a0f780b6f0ce29

Avoiding traps (best practices)

  • Use benchmarks to decide what data type works best for your case. FLUSHALL + create keys + INFO memory.
  • Instead of using multiple redis DBs, you should run multiple redis servers. Since redis is single threaded, a redis server with multiple DBs will only use one CPU.
  • Use namespaces for your keys. e.g namespace:key-name, music-online:album:10001:songs
  • Inappropriate persistence strategy. If your applicartion doesn't need persistence, disable RDB and AOF. If your application has tolerance for data loss, use RDB. If your application requires fully durable persistence, use both RDB and AOF.
  • Enable authentication e.g requirepass password-in-plain-text
  • Disable critical commands. e.g FLUSHDB, FLUSHALL, CONFIG, KEYS, DEBUG and SAVE. You do this by including a renamed-commands.conf into the redis.conf file.
  • Encrypt client to server communication using stunnel.
  • All read operations are handled by slave instances. All write operations are handled by the master instance.
  • Persistance can be moved ot the slaves so the master don't have to write to disk. Don't restart the master otherwise it will lose all the data and will replicate its empty dataset to the slaves.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment