Skip to content

Instantly share code, notes, and snippets.

@JonCole
Last active January 6, 2020 17:52
Show Gist options
  • Save JonCole/db0e90bedeb3fc4823c2 to your computer and use it in GitHub Desktop.
Save JonCole/db0e90bedeb3fc4823c2 to your computer and use it in GitHub Desktop.

Revisions

  1. JonCole revised this gist Jan 6, 2020. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,7 @@
    Diagnosing Redis errors on the *client side*
    ---------------

    The content from this GIST has been moved to official Azure Cache for Redis Documentation. The new location is [https://docs.microsoft.com/azure/azure-cache-for-redis/cache-how-to-troubleshoot#client-side-troubleshooting](https://docs.microsoft.com/azure/azure-cache-for-redis/cache-how-to-troubleshoot#client-side-troubleshooting)
    # Content moved
    The content from this GIST has been moved to official Azure Cache for Redis Documentation. The new location is [https://docs.microsoft.com/azure/azure-cache-for-redis/cache-troubleshoot-client](https://docs.microsoft.com/azure/azure-cache-for-redis/cache-troubleshoot-client)

    Please update your bookmarks.
  2. JonCole revised this gist Jun 27, 2019. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    Diagnosing Redis errors on the *client side*
    ---------------

    The content from this GIST has been moved to official Azure Cache for Redis Documentation. The new location is [https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-troubleshoot#client-side-troubleshooting](https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-troubleshoot#client-side-troubleshooting)
    The content from this GIST has been moved to official Azure Cache for Redis Documentation. The new location is [https://docs.microsoft.com/azure/azure-cache-for-redis/cache-how-to-troubleshoot#client-side-troubleshooting](https://docs.microsoft.com/azure/azure-cache-for-redis/cache-how-to-troubleshoot#client-side-troubleshooting)

    Please update your bookmarks.
  3. JonCole revised this gist Mar 26, 2019. 1 changed file with 2 additions and 79 deletions.
    81 changes: 2 additions & 79 deletions DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -1,83 +1,6 @@
    Diagnosing Redis errors on the *client side*
    ---------------
    Customers periodically ask "Why am I getting errors when talking to Redis". The answer is complicated - it could be a client or server side problem. In this article, I am going to talk about client side issues. For server side issues, [see here](https://gist.github.com/JonCole/9225f783a40564c9879d)

    Clients can see connectivity issues or timeouts for several reason, here are some of the common ones I see:
    The content from this GIST has been moved to official Azure Cache for Redis Documentation. The new location is [https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-troubleshoot#client-side-troubleshooting](https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-how-to-troubleshoot#client-side-troubleshooting)

    ---------------

    ### Memory pressure

    `Problem:` Memory pressure on the client machine leads to all kinds of performance problems that can delay processing of data that was sent by the Redis instance without any delay. When memory pressure hits, the system typically has to page data from physical memory to virtual memory which is on disk. This *page faulting* causes the system to slow down significantly.

    `Measurement:`

    1. Monitory memory usage on machine to make sure that it does not exceed available memory.
    2. Monitor the *Page Faults/Sec* perf counter. Most systems will have some page faults even during normal operation, so watch for spikes in this page faults perf counter which correspond with timeouts.

    `Resolution:` Upgrade to a larger client VM size with more memory or dig into your memory usage patterns to reduce memory consuption.

    ----------

    ### Burst of traffic

    `Problem:` Bursts of requests on a given client machine can cause client side spikes in CPU, threads creation delays, bandwidth limits being hit, Network I/O limits being hit and other problems that lead to delays in processing responses sent by Redis quickly but consumed slowly by the client application. For instance, entire responses from Redis can sit idle in the client's underlying socket kernel buffer because the CPU is overwhelmed or the I/O system is waiting for a thread to be available to process the data.

    `Measurement:` Watch for suddent spikes in CPU, I/O, thread counts, etc. In .NET, monitor how your ThreadPool statistics change over time using code [like this](https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs). You can also look at the TimeoutException message from StackExchange.Redis. Here is an example :

    System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
    IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)

    In the above message, there are several issues that are interesting:

    1. Notice that in the "IOCP" section and the "WORKER" section you have a "Busy" value that is greater than the "Min" value. This means that your threadpool settings need adjusting.
    2. You can also see "in: 64221". This indicates that 64211 bytes have been received at the kernel socket layer but haven't yet been read by the application (e.g. StackExchange.Redis). This typically means that your application isn't reading data from the network as quickly as the server is sending it to you.

    `Resolution:` Scale up your client VM size to handle bursts, find ways to smooth out concurrent calls on a given machine, investigate what is causing CPU spikes, etc. In .NET, configure your [ThreadPool Settings](https://gist.github.com/JonCole/e65411214030f0d823cb) to make sure that your threadpool will scale up quickly under burst scenarios.

    ----------

    ### High CPU usage

    `Problem:` High CPU usage on the client is an indication that the system cannot keep up with the work that it has been asked to perform. High CPU is a problem because the CPU is busy and it can't keep up with the work the application is asking it to do. The response from Redis can come very quickly, but because the CPU isn't keeping up with the workload, the response sits in the socket's kernel buffer waiting to be processed. If the delay is long enough, a timeout occurs in spite of the requested data having already arrived from the server.

    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described above in the "Burst of traffic" section. Note that in newer builds of StackExchange.Redis, the client-side CPU will be printed out in the timeout error message as long as the environment doesn't block access to the CPU perf counter and if the *ConnectionMulitplexer.IncludePerformanceCountersInExceptions* property has been set to true.

    `Note:`If you are looking at the Azure portal to determine whether or not you are seeing spikes, please keep in mind that the metrics in the portal are gathered at some sampling rate (e.g. every 30 seconds). We have seen many cases where a CPU spike happens between samples and does not show up in the portal. StackExchange.Redis version 1.1.603 (or newer) now prints out "local-cpu" usage when a timeout occurs to help understand when client-side CPU usage may be affecting performance. However, some environments like Azure App Services, access to system performance counters has been blocked. In such cases, you will see "local-cpu: unavailable". Also, when debugging possible performance problems in an app, it is typically recommended that you look at the MAX CPU usage as opposed to AVG CPU. The reason is that AVG can hide shorter lived CPU spikes that could explain issues like Timeouts.

    `Resolution:` Upgrade to a larger VM size with more CPU capacity or investigate what is causing CPU spikes.

    ----------

    ### Client Side Bandwidth Exceeded

    `Problem:` Different sized client machines have limitations on how much network bandwidth they have available. If the client exceeds the available bandwidth, then data will not be processed on the client side as quickly as the server is sending it. This can lead to timeouts.

    `Measurement:` Monitor how your Bandwidth usage change over time using code [like this](https://github.com/JonCole/SampleCode/blob/master/BandWidthMonitor/BandwidthLogger.cs). Note that this code may not run successfully in some environments with restricted permissions (like Azure WebSites).

    `Resolution:` Increase Client VM size or reduce network bandwidth consumption.

    ----------

    ### Large Request/Response Size

    `Problem:` A large request/response can cause timeouts. As an example, suppose your timeout value configured is 1 second. Your application requests two keys (e.g. 'A' and 'B') at the same time using the same physical network connection. Most clients support "Pipelining" of requests, such that both requests 'A' and 'B' are sent on the wire to the server one after the other without waiting for the responses. The server will send the responses back in the same order. If response 'A' is large enough it can eat up most of the timeout for subsequent requests.

    Below, I will try to demonstrate this. In this scenario, Request 'A' and 'B' are sent quickly, the server starts sending responses 'A' and 'B' quickly, but because of data transfer times, 'B' get stuck behind the other request and times out even though the server responded quickly.

    |-------- 1 Second Timeout (A)----------|
    |-Request A-|
    |-------- 1 Second Timeout (B) ----------|
    |-Request B-|
    |- Read Response A --------|
    |- Read Response B-| (**TIMEOUT**)



    `Measurement:` This is a difficult one to measure. You basically have to instrument your client code to track large requests and responses.

    `Resolution:`

    1. Redis is optimized for a large number of small values, rather than a few large values. The preferred solution is to break up your data into related smaller values. [See here](https://groups.google.com/forum/#!searchin/redis-db/size/redis-db/n7aa2A4DZDs/3OeEPHSQBAAJ) for details around why smaller values are recommended.
    2. Increase the size of your VM (for client and Redis Cache Server), to get higher bandwidth capabilities, reducing data transfer times for larger responses. Note that getting more bandwidth on just the server or just on the client may not be enough. Measure your bandwidth usage and compare it to the capabilities of the size of VM you currently have.
    3. Increase the number of ConnectionMultiplexer objects you use and round-robin requests over different connections (e.g. use a connection pool). If you go this route, make sure that you don't create a brand new ConnectionMultiplexer for each request as the overhead of creating the new connection will kill your performance. Also, you may want to consider having different connections for different purposes - e.g. large requests/responses use one set of connections and smaller requests/responses use a different set of connections. This would allow you to have different timeout values for each pool of connections.
    Please update your bookmarks.
  4. JonCole revised this gist Jul 26, 2018. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -41,9 +41,9 @@ In the above message, there are several issues that are interesting:

    `Problem:` High CPU usage on the client is an indication that the system cannot keep up with the work that it has been asked to perform. High CPU is a problem because the CPU is busy and it can't keep up with the work the application is asking it to do. The response from Redis can come very quickly, but because the CPU isn't keeping up with the workload, the response sits in the socket's kernel buffer waiting to be processed. If the delay is long enough, a timeout occurs in spite of the requested data having already arrived from the server.

    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described above in the "Burst of traffic" section. Note that in newer builds of StackExchange.Redis, the client-side CPU will be printed out in the timeout error message as long as the environment doesn't block access to the CPU perf counter.
    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described above in the "Burst of traffic" section. Note that in newer builds of StackExchange.Redis, the client-side CPU will be printed out in the timeout error message as long as the environment doesn't block access to the CPU perf counter and if the *ConnectionMulitplexer.IncludePerformanceCountersInExceptions* property has been set to true.

    `Note:`If you are looking at the Azure portal to determine whether or not you are seeing spikes, please keep in mind that the metrics in the portal are gathered at some sampling rate (e.g. every 30 seconds). We have seen many cases where a CPU spike happens between samples and does not show up in the portal. StackExchange.Redis version 1.1.603 (or newer) now prints out "local-cpu" usage when a timeout occurs to help understand when client-side CPU usage may be affecting performance. However, some environments like Azure App Services, access to system performance counters has been blocked. In such cases, you will see "local-cpu: unavailable".
    `Note:`If you are looking at the Azure portal to determine whether or not you are seeing spikes, please keep in mind that the metrics in the portal are gathered at some sampling rate (e.g. every 30 seconds). We have seen many cases where a CPU spike happens between samples and does not show up in the portal. StackExchange.Redis version 1.1.603 (or newer) now prints out "local-cpu" usage when a timeout occurs to help understand when client-side CPU usage may be affecting performance. However, some environments like Azure App Services, access to system performance counters has been blocked. In such cases, you will see "local-cpu: unavailable". Also, when debugging possible performance problems in an app, it is typically recommended that you look at the MAX CPU usage as opposed to AVG CPU. The reason is that AVG can hide shorter lived CPU spikes that could explain issues like Timeouts.

    `Resolution:` Upgrade to a larger VM size with more CPU capacity or investigate what is causing CPU spikes.

  5. JonCole revised this gist Oct 13, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -21,7 +21,7 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    ### Burst of traffic

    `Problem:` Bursts of requests on a given client machine can cause client side spikes in CPU, threads creation delays, bandwidth limits being hit, Network I/O limits being hit and other problems that lead to delays in processing responses sent by Redis but consumed slowly by the client application. For instance, entire responses from Redis can sit idle in the client's underlying socket kernel buffer because the CPU is overwhelmed or the I/O system is waiting for a thread to be available to process the data.
    `Problem:` Bursts of requests on a given client machine can cause client side spikes in CPU, threads creation delays, bandwidth limits being hit, Network I/O limits being hit and other problems that lead to delays in processing responses sent by Redis quickly but consumed slowly by the client application. For instance, entire responses from Redis can sit idle in the client's underlying socket kernel buffer because the CPU is overwhelmed or the I/O system is waiting for a thread to be available to process the data.

    `Measurement:` Watch for suddent spikes in CPU, I/O, thread counts, etc. In .NET, monitor how your ThreadPool statistics change over time using code [like this](https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs). You can also look at the TimeoutException message from StackExchange.Redis. Here is an example :

  6. JonCole revised this gist Oct 13, 2017. 1 changed file with 3 additions and 3 deletions.
    6 changes: 3 additions & 3 deletions DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -21,9 +21,9 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    ### Burst of traffic

    `Problem:` Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.
    `Problem:` Bursts of requests on a given client machine can cause client side spikes in CPU, threads creation delays, bandwidth limits being hit, Network I/O limits being hit and other problems that lead to delays in processing responses sent by Redis but consumed slowly by the client application. For instance, entire responses from Redis can sit idle in the client's underlying socket kernel buffer because the CPU is overwhelmed or the I/O system is waiting for a thread to be available to process the data.

    `Measurement:` Monitor how your ThreadPool statistics change over time using code [like this](https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs). You can also look at the TimeoutException message from StackExchange.Redis. Here is an example :
    `Measurement:` Watch for suddent spikes in CPU, I/O, thread counts, etc. In .NET, monitor how your ThreadPool statistics change over time using code [like this](https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs). You can also look at the TimeoutException message from StackExchange.Redis. Here is an example :

    System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
    IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)
    @@ -33,7 +33,7 @@ In the above message, there are several issues that are interesting:
    1. Notice that in the "IOCP" section and the "WORKER" section you have a "Busy" value that is greater than the "Min" value. This means that your threadpool settings need adjusting.
    2. You can also see "in: 64221". This indicates that 64211 bytes have been received at the kernel socket layer but haven't yet been read by the application (e.g. StackExchange.Redis). This typically means that your application isn't reading data from the network as quickly as the server is sending it to you.

    `Resolution:` Configure your [ThreadPool Settings](https://gist.github.com/JonCole/e65411214030f0d823cb) to make sure that your threadpool will scale up quickly under burst scenarios.
    `Resolution:` Scale up your client VM size to handle bursts, find ways to smooth out concurrent calls on a given machine, investigate what is causing CPU spikes, etc. In .NET, configure your [ThreadPool Settings](https://gist.github.com/JonCole/e65411214030f0d823cb) to make sure that your threadpool will scale up quickly under burst scenarios.

    ----------

  7. JonCole revised this gist Apr 3, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -43,7 +43,7 @@ In the above message, there are several issues that are interesting:

    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described above in the "Burst of traffic" section. Note that in newer builds of StackExchange.Redis, the client-side CPU will be printed out in the timeout error message as long as the environment doesn't block access to the CPU perf counter.

    `Note:`If you are looking at the Azure portal to determine whether or not you are seeing spikes, please keep in mind that the metrics in the portal are gathered at some sampling rate (e.g. every 30 seconds). We have seen many cases where a CPU spike happens between samples and does not show up in the portal. StackExchange.Redis version 1.1.603 (or newer) now prints out "local-cpu" usage when a timeout occurs to help understand when client-side CPU usage may be affecting performance.
    `Note:`If you are looking at the Azure portal to determine whether or not you are seeing spikes, please keep in mind that the metrics in the portal are gathered at some sampling rate (e.g. every 30 seconds). We have seen many cases where a CPU spike happens between samples and does not show up in the portal. StackExchange.Redis version 1.1.603 (or newer) now prints out "local-cpu" usage when a timeout occurs to help understand when client-side CPU usage may be affecting performance. However, some environments like Azure App Services, access to system performance counters has been blocked. In such cases, you will see "local-cpu: unavailable".

    `Resolution:` Upgrade to a larger VM size with more CPU capacity or investigate what is causing CPU spikes.

  8. JonCole revised this gist Mar 21, 2017. 1 changed file with 5 additions and 5 deletions.
    10 changes: 5 additions & 5 deletions DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -6,7 +6,7 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    ---------------

    ###Memory pressure
    ### Memory pressure

    `Problem:` Memory pressure on the client machine leads to all kinds of performance problems that can delay processing of data that was sent by the Redis instance without any delay. When memory pressure hits, the system typically has to page data from physical memory to virtual memory which is on disk. This *page faulting* causes the system to slow down significantly.

    @@ -19,7 +19,7 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    ----------

    ###Burst of traffic
    ### Burst of traffic

    `Problem:` Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.

    @@ -37,7 +37,7 @@ In the above message, there are several issues that are interesting:

    ----------

    ###High CPU usage
    ### High CPU usage

    `Problem:` High CPU usage on the client is an indication that the system cannot keep up with the work that it has been asked to perform. High CPU is a problem because the CPU is busy and it can't keep up with the work the application is asking it to do. The response from Redis can come very quickly, but because the CPU isn't keeping up with the workload, the response sits in the socket's kernel buffer waiting to be processed. If the delay is long enough, a timeout occurs in spite of the requested data having already arrived from the server.

    @@ -49,7 +49,7 @@ In the above message, there are several issues that are interesting:

    ----------

    ###Client Side Bandwidth Exceeded
    ### Client Side Bandwidth Exceeded

    `Problem:` Different sized client machines have limitations on how much network bandwidth they have available. If the client exceeds the available bandwidth, then data will not be processed on the client side as quickly as the server is sending it. This can lead to timeouts.

    @@ -59,7 +59,7 @@ In the above message, there are several issues that are interesting:

    ----------

    ###Large Request/Response Size
    ### Large Request/Response Size

    `Problem:` A large request/response can cause timeouts. As an example, suppose your timeout value configured is 1 second. Your application requests two keys (e.g. 'A' and 'B') at the same time using the same physical network connection. Most clients support "Pipelining" of requests, such that both requests 'A' and 'B' are sent on the wire to the server one after the other without waiting for the responses. The server will send the responses back in the same order. If response 'A' is large enough it can eat up most of the timeout for subsequent requests.

  9. JonCole revised this gist Mar 17, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -43,7 +43,7 @@ In the above message, there are several issues that are interesting:

    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described above in the "Burst of traffic" section. Note that in newer builds of StackExchange.Redis, the client-side CPU will be printed out in the timeout error message as long as the environment doesn't block access to the CPU perf counter.

    Note: StackExchange.Redis version 1.1.603 or later now prints out "local-cpu" usage when a timeout occurs to help understand when client-side CPU usage may be affecting performance.
    `Note:`If you are looking at the Azure portal to determine whether or not you are seeing spikes, please keep in mind that the metrics in the portal are gathered at some sampling rate (e.g. every 30 seconds). We have seen many cases where a CPU spike happens between samples and does not show up in the portal. StackExchange.Redis version 1.1.603 (or newer) now prints out "local-cpu" usage when a timeout occurs to help understand when client-side CPU usage may be affecting performance.

    `Resolution:` Upgrade to a larger VM size with more CPU capacity or investigate what is causing CPU spikes.

  10. JonCole revised this gist Feb 7, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -80,4 +80,4 @@ Below, I will try to demonstrate this. In this scenario, Request 'A' and 'B' ar

    1. Redis is optimized for a large number of small values, rather than a few large values. The preferred solution is to break up your data into related smaller values. [See here](https://groups.google.com/forum/#!searchin/redis-db/size/redis-db/n7aa2A4DZDs/3OeEPHSQBAAJ) for details around why smaller values are recommended.
    2. Increase the size of your VM (for client and Redis Cache Server), to get higher bandwidth capabilities, reducing data transfer times for larger responses. Note that getting more bandwidth on just the server or just on the client may not be enough. Measure your bandwidth usage and compare it to the capabilities of the size of VM you currently have.
    3. Increase the number of ConnectionMultiplexer objects you use and round-robin requests over different connections (e.g. use a connection pool). If you go this route, make sure that you don't create a brand new ConnectionMultiplexer for each request as the overhead of creating the new connection will kill your performance.
    3. Increase the number of ConnectionMultiplexer objects you use and round-robin requests over different connections (e.g. use a connection pool). If you go this route, make sure that you don't create a brand new ConnectionMultiplexer for each request as the overhead of creating the new connection will kill your performance. Also, you may want to consider having different connections for different purposes - e.g. large requests/responses use one set of connections and smaller requests/responses use a different set of connections. This would allow you to have different timeout values for each pool of connections.
  11. JonCole revised this gist Oct 20, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -39,7 +39,7 @@ In the above message, there are several issues that are interesting:

    ###High CPU usage

    `Problem:` High CPU usage on the client is an indication that the system cannot keep up with the work that it has been asked to perform. This means that the client may fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.
    `Problem:` High CPU usage on the client is an indication that the system cannot keep up with the work that it has been asked to perform. High CPU is a problem because the CPU is busy and it can't keep up with the work the application is asking it to do. The response from Redis can come very quickly, but because the CPU isn't keeping up with the workload, the response sits in the socket's kernel buffer waiting to be processed. If the delay is long enough, a timeout occurs in spite of the requested data having already arrived from the server.

    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described above in the "Burst of traffic" section. Note that in newer builds of StackExchange.Redis, the client-side CPU will be printed out in the timeout error message as long as the environment doesn't block access to the CPU perf counter.

  12. JonCole revised this gist Aug 24, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -41,7 +41,7 @@ In the above message, there are several issues that are interesting:

    `Problem:` High CPU usage on the client is an indication that the system cannot keep up with the work that it has been asked to perform. This means that the client may fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.

    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described above in the "Burst of traffic" section.
    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described above in the "Burst of traffic" section. Note that in newer builds of StackExchange.Redis, the client-side CPU will be printed out in the timeout error message as long as the environment doesn't block access to the CPU perf counter.

    Note: StackExchange.Redis version 1.1.603 or later now prints out "local-cpu" usage when a timeout occurs to help understand when client-side CPU usage may be affecting performance.

  13. JonCole revised this gist Jul 22, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -61,7 +61,7 @@ Note: StackExchange.Redis version 1.1.603 or later now prints out "local-cpu" us

    ###Large Request/Response Size

    `Problem:` A large request/response can cause timeouts. As an example, Suppose your timeout value configured on your client is 1 second. Your application requests two keys (e.g. 'A' and 'B') at the same time (using the same physical network connection). Most clients support "Pipelining" of requests, such that both requests 'A' and 'B' are sent on the wire to the server one after the other without waiting for the responses. The server will send the responses back in the same order. If response 'A' is large enough it can eat up most of the timeout for subsequent requests.
    `Problem:` A large request/response can cause timeouts. As an example, suppose your timeout value configured is 1 second. Your application requests two keys (e.g. 'A' and 'B') at the same time using the same physical network connection. Most clients support "Pipelining" of requests, such that both requests 'A' and 'B' are sent on the wire to the server one after the other without waiting for the responses. The server will send the responses back in the same order. If response 'A' is large enough it can eat up most of the timeout for subsequent requests.

    Below, I will try to demonstrate this. In this scenario, Request 'A' and 'B' are sent quickly, the server starts sending responses 'A' and 'B' quickly, but because of data transfer times, 'B' get stuck behind the other request and times out even though the server responded quickly.

  14. JonCole revised this gist Jul 5, 2016. 1 changed file with 5 additions and 8 deletions.
    13 changes: 5 additions & 8 deletions DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -6,7 +6,7 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    ---------------

    ***Memory pressure***
    ###Memory pressure

    `Problem:` Memory pressure on the client machine leads to all kinds of performance problems that can delay processing of data that was sent by the Redis instance without any delay. When memory pressure hits, the system typically has to page data from physical memory to virtual memory which is on disk. This *page faulting* causes the system to slow down significantly.

    @@ -19,7 +19,7 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    ----------

    ***Burst of traffic***
    ###Burst of traffic

    `Problem:` Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.

    @@ -37,7 +37,7 @@ In the above message, there are several issues that are interesting:

    ----------

    ***High CPU usage***
    ###High CPU usage

    `Problem:` High CPU usage on the client is an indication that the system cannot keep up with the work that it has been asked to perform. This means that the client may fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.

    @@ -49,7 +49,7 @@ Note: StackExchange.Redis version 1.1.603 or later now prints out "local-cpu" us

    ----------

    ***Client Side Bandwidth Exceeded***
    ###Client Side Bandwidth Exceeded

    `Problem:` Different sized client machines have limitations on how much network bandwidth they have available. If the client exceeds the available bandwidth, then data will not be processed on the client side as quickly as the server is sending it. This can lead to timeouts.

    @@ -59,7 +59,7 @@ Note: StackExchange.Redis version 1.1.603 or later now prints out "local-cpu" us

    ----------

    ***Large Request/Response Size***
    ###Large Request/Response Size

    `Problem:` A large request/response can cause timeouts. As an example, Suppose your timeout value configured on your client is 1 second. Your application requests two keys (e.g. 'A' and 'B') at the same time (using the same physical network connection). Most clients support "Pipelining" of requests, such that both requests 'A' and 'B' are sent on the wire to the server one after the other without waiting for the responses. The server will send the responses back in the same order. If response 'A' is large enough it can eat up most of the timeout for subsequent requests.

    @@ -81,6 +81,3 @@ Below, I will try to demonstrate this. In this scenario, Request 'A' and 'B' ar
    1. Redis is optimized for a large number of small values, rather than a few large values. The preferred solution is to break up your data into related smaller values. [See here](https://groups.google.com/forum/#!searchin/redis-db/size/redis-db/n7aa2A4DZDs/3OeEPHSQBAAJ) for details around why smaller values are recommended.
    2. Increase the size of your VM (for client and Redis Cache Server), to get higher bandwidth capabilities, reducing data transfer times for larger responses. Note that getting more bandwidth on just the server or just on the client may not be enough. Measure your bandwidth usage and compare it to the capabilities of the size of VM you currently have.
    3. Increase the number of ConnectionMultiplexer objects you use and round-robin requests over different connections (e.g. use a connection pool). If you go this route, make sure that you don't create a brand new ConnectionMultiplexer for each request as the overhead of creating the new connection will kill your performance.



  15. JonCole revised this gist Jun 9, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -80,7 +80,7 @@ Below, I will try to demonstrate this. In this scenario, Request 'A' and 'B' ar

    1. Redis is optimized for a large number of small values, rather than a few large values. The preferred solution is to break up your data into related smaller values. [See here](https://groups.google.com/forum/#!searchin/redis-db/size/redis-db/n7aa2A4DZDs/3OeEPHSQBAAJ) for details around why smaller values are recommended.
    2. Increase the size of your VM (for client and Redis Cache Server), to get higher bandwidth capabilities, reducing data transfer times for larger responses. Note that getting more bandwidth on just the server or just on the client may not be enough. Measure your bandwidth usage and compare it to the capabilities of the size of VM you currently have.
    3. Increase the number of ConnectionMultiplexer objects you use and round-robin requests over different connections.
    3. Increase the number of ConnectionMultiplexer objects you use and round-robin requests over different connections (e.g. use a connection pool). If you go this route, make sure that you don't create a brand new ConnectionMultiplexer for each request as the overhead of creating the new connection will kill your performance.



  16. JonCole revised this gist May 20, 2016. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -43,6 +43,8 @@ In the above message, there are several issues that are interesting:

    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described above in the "Burst of traffic" section.

    Note: StackExchange.Redis version 1.1.603 or later now prints out "local-cpu" usage when a timeout occurs to help understand when client-side CPU usage may be affecting performance.

    `Resolution:` Upgrade to a larger VM size with more CPU capacity or investigate what is causing CPU spikes.

    ----------
  17. JonCole revised this gist Apr 12, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -39,7 +39,7 @@ In the above message, there are several issues that are interesting:

    ***High CPU usage***

    `Problem:` High CPU usage can mean that the client side can fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.
    `Problem:` High CPU usage on the client is an indication that the system cannot keep up with the work that it has been asked to perform. This means that the client may fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.

    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described above in the "Burst of traffic" section.

  18. JonCole revised this gist Mar 18, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -76,7 +76,7 @@ Below, I will try to demonstrate this. In this scenario, Request 'A' and 'B' ar

    `Resolution:`

    1. Redis is optimized for a large number of small values, rather than a few large values. The preferred solution is to break up your data into related smaller values.
    1. Redis is optimized for a large number of small values, rather than a few large values. The preferred solution is to break up your data into related smaller values. [See here](https://groups.google.com/forum/#!searchin/redis-db/size/redis-db/n7aa2A4DZDs/3OeEPHSQBAAJ) for details around why smaller values are recommended.
    2. Increase the size of your VM (for client and Redis Cache Server), to get higher bandwidth capabilities, reducing data transfer times for larger responses. Note that getting more bandwidth on just the server or just on the client may not be enough. Measure your bandwidth usage and compare it to the capabilities of the size of VM you currently have.
    3. Increase the number of ConnectionMultiplexer objects you use and round-robin requests over different connections.

  19. JonCole revised this gist Mar 16, 2016. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -76,8 +76,9 @@ Below, I will try to demonstrate this. In this scenario, Request 'A' and 'B' ar

    `Resolution:`

    1. Redis is optimized for a large number of small values, rather than a few large values. Some things you Break up your data into related smaller values. This is the preferred solution...
    1. Redis is optimized for a large number of small values, rather than a few large values. The preferred solution is to break up your data into related smaller values.
    2. Increase the size of your VM (for client and Redis Cache Server), to get higher bandwidth capabilities, reducing data transfer times for larger responses. Note that getting more bandwidth on just the server or just on the client may not be enough. Measure your bandwidth usage and compare it to the capabilities of the size of VM you currently have.
    3. Increase the number of ConnectionMultiplexer objects you use and round-robin requests over different connections.



  20. JonCole revised this gist Mar 15, 2016. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -25,7 +25,8 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    `Measurement:` Monitor how your ThreadPool statistics change over time using code [like this](https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs). You can also look at the TimeoutException message from StackExchange.Redis. Here is an example :

    System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0, IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)
    System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
    IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)

    In the above message, there are several issues that are interesting:

  21. JonCole revised this gist Mar 15, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -40,7 +40,7 @@ In the above message, there are several issues that are interesting:

    `Problem:` High CPU usage can mean that the client side can fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.

    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described below in the "Burst of traffic" section below.
    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described above in the "Burst of traffic" section.

    `Resolution:` Upgrade to a larger VM size with more CPU capacity or investigate what is causing CPU spikes.

  22. JonCole revised this gist Mar 15, 2016. 1 changed file with 11 additions and 17 deletions.
    28 changes: 11 additions & 17 deletions DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -4,6 +4,8 @@ Customers periodically ask "Why am I getting errors when talking to Redis". The

    Clients can see connectivity issues or timeouts for several reason, here are some of the common ones I see:

    ---------------

    ***Memory pressure***

    `Problem:` Memory pressure on the client machine leads to all kinds of performance problems that can delay processing of data that was sent by the Redis instance without any delay. When memory pressure hits, the system typically has to page data from physical memory to virtual memory which is on disk. This *page faulting* causes the system to slow down significantly.
    @@ -15,22 +17,8 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    `Resolution:` Upgrade to a larger client VM size with more memory or dig into your memory usage patterns to reduce memory consuption.


    ----------


    ***High CPU usage***

    `Problem:` High CPU usage can mean that the client side can fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.

    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts.

    `Resolution:` Upgrade to a larger VM size with more CPU capacity or investigate what is causing CPU spikes.


    ----------


    ***Burst of traffic***

    `Problem:` Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.
    @@ -46,9 +34,17 @@ In the above message, there are several issues that are interesting:

    `Resolution:` Configure your [ThreadPool Settings](https://gist.github.com/JonCole/e65411214030f0d823cb) to make sure that your threadpool will scale up quickly under burst scenarios.


    ----------

    ***High CPU usage***

    `Problem:` High CPU usage can mean that the client side can fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.

    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts. As a result of high CPU, you may also see high "in: XXX" values in TimeoutException error messages as described below in the "Burst of traffic" section below.

    `Resolution:` Upgrade to a larger VM size with more CPU capacity or investigate what is causing CPU spikes.

    ----------

    ***Client Side Bandwidth Exceeded***

    @@ -58,10 +54,8 @@ In the above message, there are several issues that are interesting:

    `Resolution:` Increase Client VM size or reduce network bandwidth consumption.


    ----------


    ***Large Request/Response Size***

    `Problem:` A large request/response can cause timeouts. As an example, Suppose your timeout value configured on your client is 1 second. Your application requests two keys (e.g. 'A' and 'B') at the same time (using the same physical network connection). Most clients support "Pipelining" of requests, such that both requests 'A' and 'B' are sent on the wire to the server one after the other without waiting for the responses. The server will send the responses back in the same order. If response 'A' is large enough it can eat up most of the timeout for subsequent requests.
  23. JonCole revised this gist Mar 15, 2016. 1 changed file with 25 additions and 1 deletion.
    26 changes: 25 additions & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -15,6 +15,10 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    `Resolution:` Upgrade to a larger client VM size with more memory or dig into your memory usage patterns to reduce memory consuption.


    ----------


    ***High CPU usage***

    `Problem:` High CPU usage can mean that the client side can fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.
    @@ -23,14 +27,29 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    `Resolution:` Upgrade to a larger VM size with more CPU capacity or investigate what is causing CPU spikes.


    ----------


    ***Burst of traffic***

    `Problem:` Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.

    `Measurement:` Monitor how your ThreadPool statistics change over time using code [like this](https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs).
    `Measurement:` Monitor how your ThreadPool statistics change over time using code [like this](https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs). You can also look at the TimeoutException message from StackExchange.Redis. Here is an example :

    System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0, IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)

    In the above message, there are several issues that are interesting:

    1. Notice that in the "IOCP" section and the "WORKER" section you have a "Busy" value that is greater than the "Min" value. This means that your threadpool settings need adjusting.
    2. You can also see "in: 64221". This indicates that 64211 bytes have been received at the kernel socket layer but haven't yet been read by the application (e.g. StackExchange.Redis). This typically means that your application isn't reading data from the network as quickly as the server is sending it to you.

    `Resolution:` Configure your [ThreadPool Settings](https://gist.github.com/JonCole/e65411214030f0d823cb) to make sure that your threadpool will scale up quickly under burst scenarios.


    ----------


    ***Client Side Bandwidth Exceeded***

    `Problem:` Different sized client machines have limitations on how much network bandwidth they have available. If the client exceeds the available bandwidth, then data will not be processed on the client side as quickly as the server is sending it. This can lead to timeouts.
    @@ -39,6 +58,10 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    `Resolution:` Increase Client VM size or reduce network bandwidth consumption.


    ----------


    ***Large Request/Response Size***

    `Problem:` A large request/response can cause timeouts. As an example, Suppose your timeout value configured on your client is 1 second. Your application requests two keys (e.g. 'A' and 'B') at the same time (using the same physical network connection). Most clients support "Pipelining" of requests, such that both requests 'A' and 'B' are sent on the wire to the server one after the other without waiting for the responses. The server will send the responses back in the same order. If response 'A' is large enough it can eat up most of the timeout for subsequent requests.
    @@ -62,3 +85,4 @@ Below, I will try to demonstrate this. In this scenario, Request 'A' and 'B' ar
    2. Increase the size of your VM (for client and Redis Cache Server), to get higher bandwidth capabilities, reducing data transfer times for larger responses. Note that getting more bandwidth on just the server or just on the client may not be enough. Measure your bandwidth usage and compare it to the capabilities of the size of VM you currently have.



  24. JonCole revised this gist Mar 15, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -11,7 +11,7 @@ Clients can see connectivity issues or timeouts for several reason, here are som
    `Measurement:`

    1. Monitory memory usage on machine to make sure that it does not exceed available memory.
    2. Monitor the *Page Faults/Sec* perf counter. Most systems will have some page faults even during normal operation, so watch for spikes in CPU usage that correspond with timeouts.
    2. Monitor the *Page Faults/Sec* perf counter. Most systems will have some page faults even during normal operation, so watch for spikes in this page faults perf counter which correspond with timeouts.

    `Resolution:` Upgrade to a larger client VM size with more memory or dig into your memory usage patterns to reduce memory consuption.

  25. JonCole revised this gist Mar 8, 2016. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -50,7 +50,7 @@ Below, I will try to demonstrate this. In this scenario, Request 'A' and 'B' ar
    |-------- 1 Second Timeout (B) ----------|
    |-Request B-|
    |- Read Response A --------|
    |- Read Response B-| (**TIMEOUT**)
    |- Read Response B-| (**TIMEOUT**)



  26. JonCole revised this gist Mar 8, 2016. 1 changed file with 2 additions and 3 deletions.
    5 changes: 2 additions & 3 deletions DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -41,17 +41,16 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    ***Large Request/Response Size***

    `Problem:` A large request/response can cause timeouts. As an example, Suppose your timeout value configured on your client is 1 second. Your application requests two keys (e.g. 'A' and 'B') at the same time (using the same physical network connection. Most clients support "Pipelining" of requests, such that both requests 'A' and 'B' are sent on the wire to the server one after the other without waiting for the responses. The server will send the responses back in the same order. If response 'A' is large enough it can eat up most of the timeout for subsequent requests.
    `Problem:` A large request/response can cause timeouts. As an example, Suppose your timeout value configured on your client is 1 second. Your application requests two keys (e.g. 'A' and 'B') at the same time (using the same physical network connection). Most clients support "Pipelining" of requests, such that both requests 'A' and 'B' are sent on the wire to the server one after the other without waiting for the responses. The server will send the responses back in the same order. If response 'A' is large enough it can eat up most of the timeout for subsequent requests.

    Below, I will try to demonstrate this. In this scenario, Request 'A' and 'B' are sent quickly, the server starts sending responses 'A' and 'B' quickly, but because of data transfer times, 'B' get stuck behind the other request and times out even though the server responded quickly.

    |-------- 1 Second Timeout (A)----------|
    |-Request A-|
    |-------- 1 Second Timeout (B) ----------|
    |-Request B-|
    |- Read Response A --------|
    |- Read Response A --------|
    |- Read Response B-| (**TIMEOUT**)



  27. JonCole revised this gist Mar 2, 2016. 1 changed file with 25 additions and 0 deletions.
    25 changes: 25 additions & 0 deletions DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -38,3 +38,28 @@ Clients can see connectivity issues or timeouts for several reason, here are som
    `Measurement:` Monitor how your Bandwidth usage change over time using code [like this](https://github.com/JonCole/SampleCode/blob/master/BandWidthMonitor/BandwidthLogger.cs). Note that this code may not run successfully in some environments with restricted permissions (like Azure WebSites).

    `Resolution:` Increase Client VM size or reduce network bandwidth consumption.

    ***Large Request/Response Size***

    `Problem:` A large request/response can cause timeouts. As an example, Suppose your timeout value configured on your client is 1 second. Your application requests two keys (e.g. 'A' and 'B') at the same time (using the same physical network connection. Most clients support "Pipelining" of requests, such that both requests 'A' and 'B' are sent on the wire to the server one after the other without waiting for the responses. The server will send the responses back in the same order. If response 'A' is large enough it can eat up most of the timeout for subsequent requests.

    Below, I will try to demonstrate this. In this scenario, Request 'A' and 'B' are sent quickly, the server starts sending responses 'A' and 'B' quickly, but because of data transfer times, 'B' get stuck behind the other request and times out even though the server responded quickly.

    |-------- 1 Second Timeout (A)----------|
    |-Request A-|
    |-------- 1 Second Timeout (B) ----------|
    |-Request B-|
    |- Read Response A --------|
    |- Read Response B-| (**TIMEOUT**)



    `Measurement:` This is a difficult one to measure. You basically have to instrument your client code to track large requests and responses.

    `Resolution:`

    1. Redis is optimized for a large number of small values, rather than a few large values. Some things you Break up your data into related smaller values. This is the preferred solution...
    2. Increase the size of your VM (for client and Redis Cache Server), to get higher bandwidth capabilities, reducing data transfer times for larger responses. Note that getting more bandwidth on just the server or just on the client may not be enough. Measure your bandwidth usage and compare it to the capabilities of the size of VM you currently have.


  28. JonCole revised this gist Feb 22, 2016. 1 changed file with 18 additions and 8 deletions.
    26 changes: 18 additions & 8 deletions DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -6,25 +6,35 @@ Clients can see connectivity issues or timeouts for several reason, here are som

    ***Memory pressure***

    `Problem:` Memory pressure on the client machine leads to all kinds of performance problems that can delay processing of data that Redis sent immediately.
    `Problem:` Memory pressure on the client machine leads to all kinds of performance problems that can delay processing of data that was sent by the Redis instance without any delay. When memory pressure hits, the system typically has to page data from physical memory to virtual memory which is on disk. This *page faulting* causes the system to slow down significantly.

    `Measurement:` coming soon...
    `Measurement:`

    `Resolution:` coming soon...
    1. Monitory memory usage on machine to make sure that it does not exceed available memory.
    2. Monitor the *Page Faults/Sec* perf counter. Most systems will have some page faults even during normal operation, so watch for spikes in CPU usage that correspond with timeouts.

    `Resolution:` Upgrade to a larger client VM size with more memory or dig into your memory usage patterns to reduce memory consuption.

    ***High CPU usage***

    `Problem:` High CPU usage can mean that the client side can fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.

    `Measurement:` coming soon...
    `Measurement:` Monitor the System Wide CPU usage through the azure portal or through the associated perf counter. Be careful not to monitor *process* CPU because a single process can have low CPU usage at the same time that overall system CPU can be high. Watch for spikes in CPU usage that correspond with timeouts.

    `Resolution:` coming soon...
    `Resolution:` Upgrade to a larger VM size with more CPU capacity or investigate what is causing CPU spikes.

    ***Burst of traffic***

    `Problem:` Bursts of traffic combined with poor [ThreadPool settings](https://gist.github.com/JonCole/e65411214030f0d823cb) can result in
    `Problem:` Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.

    `Measurement:` Monitor how your ThreadPool statistics change over time using code [like this](https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs).

    `Resolution:` Configure your [ThreadPool Settings](https://gist.github.com/JonCole/e65411214030f0d823cb) to make sure that your threadpool will scale up quickly under burst scenarios.

    ***Client Side Bandwidth Exceeded***

    `Measurement:` coming soon...
    `Problem:` Different sized client machines have limitations on how much network bandwidth they have available. If the client exceeds the available bandwidth, then data will not be processed on the client side as quickly as the server is sending it. This can lead to timeouts.

    `Resolution:` coming soon...
    `Measurement:` Monitor how your Bandwidth usage change over time using code [like this](https://github.com/JonCole/SampleCode/blob/master/BandWidthMonitor/BandwidthLogger.cs). Note that this code may not run successfully in some environments with restricted permissions (like Azure WebSites).

    `Resolution:` Increase Client VM size or reduce network bandwidth consumption.
  29. JonCole revised this gist Nov 14, 2015. 1 changed file with 7 additions and 1 deletion.
    8 changes: 7 additions & 1 deletion DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -7,18 +7,24 @@ Clients can see connectivity issues or timeouts for several reason, here are som
    ***Memory pressure***

    `Problem:` Memory pressure on the client machine leads to all kinds of performance problems that can delay processing of data that Redis sent immediately.

    `Measurement:` coming soon...

    `Resolution:` coming soon...

    ***High CPU usage***

    `Problem:` High CPU usage can mean that the client side can fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.
    `Measurement:` coming soon...

    `Measurement:` coming soon...

    `Resolution:` coming soon...

    ***Burst of traffic***

    `Problem:` Bursts of traffic combined with poor [ThreadPool settings](https://gist.github.com/JonCole/e65411214030f0d823cb) can result in

    `Measurement:` coming soon...

    `Resolution:` coming soon...

  30. JonCole revised this gist Nov 14, 2015. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions DiagnoseRedisErrors-ClientSide.md
    Original file line number Diff line number Diff line change
    @@ -5,16 +5,19 @@ Customers periodically ask "Why am I getting errors when talking to Redis". The
    Clients can see connectivity issues or timeouts for several reason, here are some of the common ones I see:

    ***Memory pressure***

    `Problem:` Memory pressure on the client machine leads to all kinds of performance problems that can delay processing of data that Redis sent immediately.
    `Measurement:` coming soon...
    `Resolution:` coming soon...

    ***High CPU usage***

    `Problem:` High CPU usage can mean that the client side can fail to process a response from Redis in a timely fashion even though Redis sent the response very quickly.
    `Measurement:` coming soon...
    `Resolution:` coming soon...

    ***Burst of traffic***

    `Problem:` Bursts of traffic combined with poor [ThreadPool settings](https://gist.github.com/JonCole/e65411214030f0d823cb) can result in
    `Measurement:` coming soon...
    `Resolution:` coming soon...