##TUNING##

#### Configuration ####

System:
set file descriptors to 32K or 64K

vim **/etc/security/limit.conf**

```
elasticsearch - nofile 65535
elasticsearch - memlock unlimited
```

use following command to check

```
curl localhost:9200/_nodes/process?pretty

"process" : {
     "refresh_interval_in_millis" : 1000,
     "id" : 2697,
     "max_file_descriptors" : 65535,
     "mlockall" : true
 }
```

To set this value permanently, update the vm.max_map_count setting in /etc/sysctl.conf

```
sysctl -w vm.max_map_count=262144
#If you installed Elasticsearch using a package (.deb, .rpm) this setting 
#will be changed automatically. To verify, run sysctl vm.max_map_count.
```

Disable swap
```
vm.swappiness to 0
```

#### Disk Performance ####

For SSDs in r3, maybe it's better to mount with `discard` option since it supports TRIM:

vim **/etc/fstab/**

```
/dev/xvdb /mnt ext4 defaults,noatime,nodiratime,discard 0 0
```

Use **noop** scheduler for SSD:

```
echo noop | sudo tee /sys/block/xvdc/queue/scheduler
```

#### ES Settings ####

vim /etc/default/elasticsearch

use **half** of machine memory for JVM or not excess **32g**

```
ES_HEAP_SIZE=15g
MAX_OPEN_FILES=65535
MAX_LOCKED_MEMORY=unlimited
```

vim /etc/elasticsearch/elasticsearch.yaml

never swaping

```
bootstrap.mlockall: true
```

indexing performance

```
"indices.memory.index_buffer_size": "30%",    #10%
"index.translog.flush_threshold_ops": 50000,  #1000
"index.refresh_interval": "5s",               #1s
#"index.store.type": "mmapfs"
```
adjust thoughput from 20mb to 100mb

```
PUT /_cluster/settings
{
    "persistent" : {
        "indices.store.throttle.max_bytes_per_sec" : "100mb"
    }
}
```

#### Mapping #####

1. elasticsearch 會儲存原始檔案在 _source 欄位, 如果不需要可以關閉
2. elasticsearch 會把所有欄位的資料處理好放在 _all 欄位, 如果不需要也可以關閉
   
   ```
   { 
     '_id': 1
     'title': 'this is first blog', 
     'author': 'kakashi', 
     'content': 'test 123'
   }
   存到ES後會變成
   {
     '_id': 1,
     '_all': 'this, is, first, blog, kakashi, test, 123',
     'title': 'this, is, first, blog',
     'author': 'kakashi',
     'content': 'test, 123',
     '_source': {
         'title': 'this is first blog', 
         'author': 'kakashi', 
         'content': 'test 123'
     }
   ```
3. 如果把 _source 關閉, 可以利用 _store 決定是否要儲存此field
   
   ```
   {
      "tweet" : {
        "properties" : {
            "message" : {
                "type" : "string",
                "store" : true,
                "index" : "analyzed",
            },
   ```
4. 使用 _source 和 _store 的最大差別, 用 _source 可以利用 update API 去更新值
5. 在 analyze field 時, 如果不需要算出score (相關性), 可以把norms關閉, 會節省大量memory
6. index_options 可以決定要不要存term frequencies 還有 positions
7. 不需要index的欄位請使用no, 該欄位不需要切詞可以用not_analyzed

#### 建立mapping的方式 ####

1. 利用template

   ```
   PUT _template/blog-template
   {  
     "template": "db*",  <--- index(db) name
     "mappings": { 
        "blog": {        <---- type (table) name
           "properties": {
             "author": {
               "type": "string",
               "index": "not_analyzed"
             },
             "content": {
               "type": "string"
            }
         }
      }
   }
   ```
2. 取得mapping `GET db/_mapping/`
3. 直接修改db的mapping `PUT db/_mapping`

#### Indexing ####

1. 利用Bulk indexing的方式, 最好控制在1MB~5MB間
2. 重要性較低的資料可以用bulk UDP indexing （可以忍受掉資料)
3. reindexing時可以將refresh_interval設成-1, Bulk indexing時手動做refresh
4. 可以利用index warmer增加搜索速度 (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-warmers.html)


#### Sharding & Replica ###

1. 增加Sharding & 機器 -> 增加indexing能力
2. 增加Replica & 機器 -> 增加Read能力


#### Reference#### 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html
https://blog.codecentric.de/en/2014/05/elasticsearch-indexing-performance-cheatsheet/
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html