Skip to content

Instantly share code, notes, and snippets.

@cyhook
Forked from rahulwa/ELK_stack.md
Created January 27, 2018 15:52
Show Gist options
  • Select an option

  • Save cyhook/e2cf1432b28560e295bf614d3f9ea7d9 to your computer and use it in GitHub Desktop.

Select an option

Save cyhook/e2cf1432b28560e295bf614d3f9ea7d9 to your computer and use it in GitHub Desktop.

Revisions

  1. @rahulwa rahulwa revised this gist Sep 20, 2016. 1 changed file with 0 additions and 2 deletions.
    2 changes: 0 additions & 2 deletions ELK_stack.md
    Original file line number Diff line number Diff line change
    @@ -17,7 +17,6 @@
    ```yml
    cluster.name: elk_production
    ```
    -
    ```yml
    node.name: es_001_data
    @@ -30,7 +29,6 @@ path.logs: /path/to/logs #will use defaults
    # Path to where plugins are installed:
    path.plugins: /path/to/plugins #will use defaults
    ```
    -
    ```yml
    discovery.zen.minimum_master_nodes: 2
  2. @rahulwa rahulwa revised this gist Sep 20, 2016. 1 changed file with 8 additions and 3 deletions.
    11 changes: 8 additions & 3 deletions ELK_stack.md
    Original file line number Diff line number Diff line change
    @@ -15,17 +15,22 @@
    - Elasticsearch ships with very good defaults, especially when it comes to performance- related settings and options. When in doubt, just leave the settings alone.

    ```yml
    cluster.name: elk_production```
    cluster.name: elk_production
    ```
    -
    ```yml
    node.name: es_001_data```
    node.name: es_001_data
    ```
    ```yml
    path.data: /es_data #/path/to/data1,/path/to/data2
    # Path to log files:
    path.logs: /path/to/logs #will use defaults
    # Path to where plugins are installed:
    path.plugins: /path/to/plugins #will use defaults```
    path.plugins: /path/to/plugins #will use defaults
    ```
    -
    ```yml
    discovery.zen.minimum_master_nodes: 2
  3. @rahulwa rahulwa revised this gist Sep 20, 2016. 1 changed file with 2 additions and 4 deletions.
    6 changes: 2 additions & 4 deletions ELK_stack.md
    Original file line number Diff line number Diff line change
    @@ -15,12 +15,10 @@
    - Elasticsearch ships with very good defaults, especially when it comes to performance- related settings and options. When in doubt, just leave the settings alone.

    ```yml
    cluster.name: elk_production
    ```
    cluster.name: elk_production```

    ```yml
    node.name: es_001_data
    ```
    node.name: es_001_data```

    ```yml
    path.data: /es_data #/path/to/data1,/path/to/data2
  4. @rahulwa rahulwa revised this gist Sep 20, 2016. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions ELK_stack.md
    Original file line number Diff line number Diff line change
    @@ -17,6 +17,7 @@
    ```yml
    cluster.name: elk_production
    ```
    ```yml
    node.name: es_001_data
    ```
  5. @rahulwa rahulwa revised this gist Sep 20, 2016. 1 changed file with 5 additions and 4 deletions.
    9 changes: 5 additions & 4 deletions ELK_stack.md
    Original file line number Diff line number Diff line change
    @@ -18,7 +18,8 @@
    cluster.name: elk_production
    ```
    ```yml
    node.name: es_001_data```
    node.name: es_001_data
    ```
    ```yml
    path.data: /es_data #/path/to/data1,/path/to/data2
    @@ -44,7 +45,7 @@ PUT /_cluster/settings
    ```
    ```yml
    gateway.recover_after_nodes: 2
    gateway.expected_nodes: 2
    gateway.expected_nodes: 3
    gateway.recover_after_time: 5m
    ```
    ```yml
    @@ -54,10 +55,10 @@ discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]
    - The default threadpool settings in Elasticsearch are very sensible. So. Next time you want to tweak a threadpool, please don’t.

    ```sh
    export ES_HEAP_SIZE=4g #memory/2
    export ES_HEAP_SIZE=8g #memory/2
    # Give (less than) Half Your Memory to Lucene
    # Don’t Cross 32 GB!
    ES_HEAP_SIZE=4g
    ES_HEAP_SIZE=8g
    # OR /etc/default/elasticsearch
    ```
    - Heap: Sizing and Swapping
  6. @rahulwa rahulwa created this gist Sep 20, 2016.
    132 changes: 132 additions & 0 deletions ELK_stack.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,132 @@

    ## TODO in production -

    ### Elasticsearch
    - select large memory instance
    - A machine with 64 GB of RAM is the ideal sweet spot, but 32 GB and 16 GB machines are also common. Less than 8 GB tends to be counterproductive (you end up needing many, many small machines), and greater than 64 GB has problems.
    - In general, it is better to prefer medium-to-large boxes.
    - create swap using instance store disk, not EBS.
    - Disks should be ssd and iops
    - `cfq` (default I/O Scheduler in *nix) is inefficient for SSD, however, since there are no spinning platters involved. Instead, `deadline` or `noop` should be used instead. The deadline scheduler optimizes based on how long writes have been pending, while noop is just a simple FIFO queue.
    - always run the most recent version of the Java Virtual Machine (JVM).
    - Java 8 is preferred over Java 7. Java 6 is no longer supported.
    - Please Do Not Tweak JVM Settings.
    - Use configuration management software for deployment.
    - Elasticsearch ships with very good defaults, especially when it comes to performance- related settings and options. When in doubt, just leave the settings alone.

    ```yml
    cluster.name: elk_production
    ```
    ```yml
    node.name: es_001_data```

    ```yml
    path.data: /es_data #/path/to/data1,/path/to/data2
    # Path to log files:
    path.logs: /path/to/logs #will use defaults
    # Path to where plugins are installed:
    path.plugins: /path/to/plugins #will use defaults```

    ```yml
    discovery.zen.minimum_master_nodes: 2
    ```
    - This setting should always be configured to a quorum (majority) of your master-eligible nodes. A quorum is `(number of master-eligible nodes / 2) + 1`.
    - It would be extremely irritating if you had to push new configurations to each node and restart your whole cluster just to change the setting.
    - For this reason, minimum_master_nodes (and other settings) can be configured via a dynamic API call. You can change the setting while your cluster is online:

    ```
    PUT /_cluster/settings
    {
    "persistent" : {
    "discovery.zen.minimum_master_nodes" : 2
    }
    }
    ```
    ```yml
    gateway.recover_after_nodes: 2
    gateway.expected_nodes: 2
    gateway.recover_after_time: 5m
    ```
    ```yml
    discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]
    ```
    - Do not change the default garbage collector! The official recommendation is to use Concurrent-Mark and Sweep (CMS).
    - The default threadpool settings in Elasticsearch are very sensible. So. Next time you want to tweak a threadpool, please don’t.

    ```sh
    export ES_HEAP_SIZE=4g #memory/2
    # Give (less than) Half Your Memory to Lucene
    # Don’t Cross 32 GB!
    ES_HEAP_SIZE=4g
    # OR /etc/default/elasticsearch
    ```
    - Heap: Sizing and Swapping

    ```sh
    # using syscall for lower swappiness
    vm.swappiness = 1
    # A swappiness of 1 is better than 0, since on some kernel versions a swappiness of 0 can invoke the OOM-killer.
    ```
    ```yml
    bootstrap.mlockall: true
    #This allows the JVM to lock its memory and prevent it from being swapped by the OS.
    ```
    - File Descriptors and MMap

    ```sh
    # You should increase your file descriptor count to something very large, such as 64,000.
    # /etc/default/elasticsearch
    MAX_OPEN_FILES=131070
    # /etc/security/limits.conf
    * soft nofile 64000
    * hard nofile 64000
    root soft nofile 64000
    root hard nofile 64000
    # /etc/pam.d/common-session
    session required pam_limits.so
    # /etc/pam.d/common-session-noninteractive
    session required pam_limits.so
    ```
    ```
    sysctl -w vm.max_map_count=262144
    # Or you can set it permanently by modifying `vm.max_map_count` setting in your /etc/sysctl.conf.
    ```
    - eliminate the possibility of an accidental mass-deletion of indices
    ```yml
    action.destructive_requires_name: true
    ```
    - use index aliases

    ```
    curl -XPOST 'localhost:9200/_aliases?pretty' -d'
    {
    "actions": [
    { "remove": { "index": "my_index_v1", "alias": "my_index" }},
    { "add": { "index": "my_index_v2", "alias": "my_index" }}
    ]
    }'
    ```
    - Perhaps you are using Elasticsearch to index millions of log files, and you would prefer to optimize for index speed rather than near real-time search. You can reduce the frequency of refreshes on a per-index basis by setting the refresh_interval:

    ```
    PUT /my_logs
    {
    "settings": {
    "refresh_interval": "30s"
    }
    }
    ```
    - Shards are flushed automatically every 30 minutes, or when the translog becomes too big. That said, it is beneficial to flush your indices before restarting a node or closing an index. When Elasticsearch tries to recover or reopen an index, it has to replay all of the operations in the translog, so the shorter the log, the faster the recovery.

    ```
    POST /blogs/_flush
    POST /_flush?wait_for_ongoing #Flush all indices and wait until all flushes have completed before returning.
    ```
    - The optimize API is best described as the forced merge API. It forces a shard to be merged down to the number of segments specified in the max_num_segments parameter. The intention is to reduce the number of segments (usually to one) in order to speed up search performance. The typical use case is for logging, where logs are stored in an index per day, week, or month. Older indices are essentially read-only; they are unlikely to change. In this case, it can be useful to optimize the shards of an old index down to a single segment each; it will use fewer resources and searches will be quicker:

    ```
    POST /logstash-2014-10/_optimize?max_num_segments=1
    # Be aware that merges triggered by the optimize API are not throttled at all. They can consume all of the I/O on your nodes, leaving nothing for search and potentially making your cluster unresponsive.
    ```

    - [Retiring data](https://www.elastic.co/guide/en/elasticsearch/guide/current/retiring-data.html)