##TUNING## #### Configuration #### System: set file descriptors to 32K or 64K vim **/etc/security/limit.conf** ``` elasticsearch - nofile 65535 elasticsearch - memlock unlimited ``` use following command to check ``` curl localhost:9200/_nodes/process?pretty "process" : { "refresh_interval_in_millis" : 1000, "id" : 2697, "max_file_descriptors" : 65535, "mlockall" : true } ``` To set this value permanently, update the vm.max_map_count setting in /etc/sysctl.conf ``` sysctl -w vm.max_map_count=262144 #If you installed Elasticsearch using a package (.deb, .rpm) this setting #will be changed automatically. To verify, run sysctl vm.max_map_count. ``` Disable swap ``` vm.swappiness to 0 ``` #### Disk Performance #### For SSDs in r3, maybe it's better to mount with `discard` option since it supports TRIM: vim **/etc/fstab/** ``` /dev/xvdb /mnt ext4 defaults,noatime,nodiratime,discard 0 0 ``` Use **noop** scheduler for SSD: ``` echo noop | sudo tee /sys/block/xvdc/queue/scheduler ``` #### ES Settings #### vim /etc/default/elasticsearch use **half** of machine memory for JVM or not excess **32g** ``` ES_HEAP_SIZE=15g MAX_OPEN_FILES=65535 MAX_LOCKED_MEMORY=unlimited ``` vim /etc/elasticsearch/elasticsearch.yaml never swaping ``` bootstrap.mlockall: true ``` indexing performance ``` "indices.memory.index_buffer_size": "30%", #10% "index.translog.flush_threshold_ops": 50000, #1000 "index.refresh_interval": "5s", #1s #"index.store.type": "mmapfs" ``` adjust thoughput from 20mb to 100mb ``` PUT /_cluster/settings { "persistent" : { "indices.store.throttle.max_bytes_per_sec" : "100mb" } } ``` #### Mapping ##### 1. elasticsearch 會儲存原始檔案在 _source 欄位, 如果不需要可以關閉 2. elasticsearch 會把所有欄位的資料處理好放在 _all 欄位, 如果不需要也可以關閉 ``` { '_id': 1 'title': 'this is first blog', 'author': 'kakashi', 'content': 'test 123' } 存到ES後會變成 { '_id': 1, '_all': 'this, is, first, blog, kakashi, test, 123', 'title': 'this, is, first, blog', 'author': 'kakashi', 'content': 'test, 123', '_source': { 'title': 'this is first blog', 'author': 'kakashi', 'content': 'test 123' } ``` 3. 如果把 _source 關閉, 可以利用 _store 決定是否要儲存此field ``` { "tweet" : { "properties" : { "message" : { "type" : "string", "store" : true, "index" : "analyzed", }, ``` 4. 使用 _source 和 _store 的最大差別, 用 _source 可以利用 update API 去更新值 5. 在 analyze field 時, 如果不需要算出score (相關性), 可以把norms關閉, 會節省大量memory 6. index_options 可以決定要不要存term frequencies 還有 positions 7. 不需要index的欄位請使用no, 該欄位不需要切詞可以用not_analyzed #### 建立mapping的方式 #### 1. 利用template ``` PUT _template/blog-template { "template": "db*", <--- index(db) name "mappings": { "blog": { <---- type (table) name "properties": { "author": { "type": "string", "index": "not_analyzed" }, "content": { "type": "string" } } } } ``` 2. 取得mapping `GET db/_mapping/` 3. 直接修改db的mapping `PUT db/_mapping` #### Indexing #### 1. 利用Bulk indexing的方式, 最好控制在1MB~5MB間 2. 重要性較低的資料可以用bulk UDP indexing (可以忍受掉資料) 3. reindexing時可以將refresh_interval設成-1, Bulk indexing時手動做refresh 4. 可以利用index warmer增加搜索速度 (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-warmers.html) #### Sharding & Replica ### 1. 增加Sharding & 機器 -> 增加indexing能力 2. 增加Replica & 機器 -> 增加Read能力 #### Reference#### http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html https://blog.codecentric.de/en/2014/05/elasticsearch-indexing-performance-cheatsheet/ http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html