Last active
March 11, 2025 03:37
-
-
Save JasonTrue/3cd6a7094e23cd72bfb870604521f415 to your computer and use it in GitHub Desktop.
Revisions
-
JasonTrue revised this gist
Sep 6, 2021 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -21,7 +21,7 @@ When you change the search_data hash structure, you'll need to reindex that mode # Searching In all recent versions of Elasticsearch, you need to explicitly specify the fields you'll search. Your search should look something like this: -
JasonTrue revised this gist
Apr 2, 2019 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -159,7 +159,7 @@ And may require using an explicit search body. However one solution to avoid this complexity would be to use the exact matching above, and index the field as lowercase, and maybe to pre-filter strings that look like email addresses in queries to lower-case. ## Case sensitivity By default searches are case insensitive. To override that for everything, you can alter the searchkick call `searchkick case_sensitive: [:field :list]`, or use exact matching: UserCourse.search(query, -
JasonTrue revised this gist
Sep 27, 2018 . 1 changed file with 52 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -21,18 +21,67 @@ When you change the search_data hash structure, you'll need to reindex that mode # Searching In all recent versions of Postgres, you need to explicitly specify the fields you'll search. Your search should look something like this: ```ModelName.search(query, fields: ['stringified_id', 'name', 'description', ...])``` # Common Indexing challenges, common solutions ## We want to be able to search by ID (in full-text queries) By default, an integer field can only be searched as an integer, but if you coerce the field to be a string it's searchable with full text search. def search_data { id: id; _stringified_id: id.to_s,_ } end Your search should look something like this: ```ModelName.search(query, fields: ['stringified_id', 'name', 'description', ...])``` It's worth noting that because you can use non-string types (including arrays of non-string types), it sometimes comes in handly to do more of your searching/filtering in Elastic than in Postgres. You can combine a full text query with some specific fields. def search_data { blog_id: blog_id, author: user.name, author_id: user.id, publish_year: publish_at.year, publish_month: publish_at.month, publish_day: publish_at.day, publish_at: publish_at, created_at: created_at, updated_at: updated_at, tags: tag_list, story: story, title: title, approved: approved } end Then a flexible, type-aware search that still does full text search on some fields, like title and story: search_params = { approved: true, publish_at: { lte: 'now/m' } } search_params = search_params.merge(blog_id: @blog.id) if @blog.present? search_params = search_params.merge(publish_year: @year) if @year.present? search_params = search_params.merge(publish_month: @month) if @month.present? search_params = search_params.merge(publish_day: @day) if @day.present? search_params = search_params.merge(tags: {all: @tags}) if @tags.present? if @query.present? @posts = Post.search(@query, fields: [:title, :story], where: search_params, page: params[:page]) else @posts = Post.search(fields: [:title, :story], where: search_params, page: params[:page], per_page: 20, order: [{publish_at: :desc}]) end logger.info({ query: @query, params: search_params }) ## We want to eager load associations so that it's not so expensive to update the index. -
JasonTrue revised this gist
Sep 27, 2018 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -19,7 +19,7 @@ In practice, you'll need to customize what gets indexed. This is done by definin When you change the search_data hash structure, you'll need to reindex that model. You can do that in the rails console by typing `Model.reindex` but you can also use the rake task `searchkick:reindex:all`, or index just one specific model. # Searching Your search should look something like this: -
JasonTrue revised this gist
Sep 27, 2018 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -107,6 +107,8 @@ This will require something like: searchkick merge_mappings: true, mappings: {...} And may require using an explicit search body. However one solution to avoid this complexity would be to use the exact matching above, and index the field as lowercase, and maybe to pre-filter strings that look like email addresses in queries to lower-case. ## with case insensitive By default searches are case insensitive. To override that for everything, you can alter the searchkick call `searchkick case_sensitive: [:field :list]`, or use exact matching: -
JasonTrue revised this gist
Sep 27, 2018 . 1 changed file with 5 additions and 5 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -108,14 +108,14 @@ This will require something like: And may require using an explicit search body. ## with case insensitive By default searches are case insensitive. To override that for everything, you can alter the searchkick call `searchkick case_sensitive: [:field :list]`, or use exact matching: UserCourse.search(query, fields: [{my_field: :exact}, :other_field] ## Japanese-aware indexing While there's reasonable support out of the box for Japanese search, you can get additional features with the elasticsearch analysis-kuromoji plugin. searchkick language: "japanese" @@ -124,14 +124,14 @@ If you go down this route, and want to support multiple analyzers, you need to u See https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html for some possible options, and the searchkick docs for how to do custom mappings and custom/advanced search. ## Any notice of combinations above Generally combinations are supported by choosing the right field to query. Most of the parameters that normally take a symbol can be replaced with a hash from that symbol to various options. ( https://github.com/ankane/searchkick will have better examples than I can provide). ## Not compatible with each other In principle, you can create several fields that have their own analyzers and behaviors. When you build up the Search call, you can combine options. I'm not aware of specific incompatibilities but relevancy weighting may appear better or worse depending on the user's expectations. So for example, if you have a dilemma about how to search something, you could potentially use very dissimilar pseudo_fields with different search rules, and just include all of them, with potentially different boosting rules, in your search call. ## Some parameters update frequently or require a lot of CPU time to reindex In conjunction with a scheduled background job, you can call ModelName.reindex(:custom_reindexer) and have a method like that returns only the fields that need special treatment. def custom_reindexer -
JasonTrue revised this gist
Sep 27, 2018 . 1 changed file with 3 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -125,11 +125,11 @@ If you go down this route, and want to support multiple analyzers, you need to u See https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html for some possible options, and the searchkick docs for how to do custom mappings and custom/advanced search. # Any notice of combinations above Generally combinations are supported by choosing the right field to query. Most of the parameters that normally take a symbol can be replaced with a hash from that symbol to various options. ( https://github.com/ankane/searchkick will have better examples than I can provide). # Not compatible with each other In principle, you can create several fields that have their own analyzers and behaviors. When you build up the Search call, you can combine options. I'm not aware of specific incompatibilities but relevancy weighting may appear better or worse depending on the user's expectations. So for example, if you have a dilemma about how to search something, you could potentially use very dissimilar pseudo_fields with different search rules, and just include all of them, with potentially different boosting rules, in your search call. # Some parameters update frequently or require a lot of CPU time to reindex In conjunction with a scheduled background job, you can call ModelName.reindex(:custom_reindexer) and have a method like that returns only the fields that need special treatment. -
JasonTrue revised this gist
Sep 27, 2018 . 1 changed file with 5 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -109,8 +109,12 @@ This will require something like: And may require using an explicit search body. # with case insensitive By default searches are case insensitive. To override that for everything, you can alter the searchkick call `searchkick case_sensitive: [:field :list]`, or use exact matching: UserCourse.search(query, fields: [{my_field: :exact}, :other_field] # Japanese-aware indexing While there's reasonable support out of the box for Japanese search, you can get additional features with the elasticsearch analysis-kuromoji plugin. -
JasonTrue revised this gist
Sep 27, 2018 . 1 changed file with 10 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -19,11 +19,20 @@ In practice, you'll need to customize what gets indexed. This is done by definin When you change the search_data hash structure, you'll need to reindex that model. You can do that in the rails console by typing `Model.reindex` but you can also use the rake task `searchkick:reindex:all`, or index just one specific model. #Searching Your search should look something like this: ```ModelName.search(query, fields: ['stringified_id', 'name', 'description', ...])``` In all recent versions of Postgres, you need to explicitly specify the fields you'll search. # Common Indexing challenges, common solutions ## We want to be able to search by ID (in full-text queries). When you conduct a search with ElasticSearch, you specify which fields you want to query. ## We want to eager load associations so that it's not so expensive to update the index. -
JasonTrue revised this gist
Sep 27, 2018 . 1 changed file with 24 additions and 9 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -53,9 +53,13 @@ UserCourse.search(params[:query], { fields: ["name^5", "id"], misspellings: {below: 5} Alternatively, `MyModel.search query, body_options: {min_score: 1}` tunes out a lot of noise. ## Match "Reallyenglish, Co., Ltd." organization with "really" Make sure the index includes this configuration for the field you want: searchkick word_start: [:name] Match word start on a specific search: @@ -64,11 +68,13 @@ Match word start on a specific search: match: :word_start ) ## Match "Test Reallyenglish Program" program with "english" (In the middle of name) Make sure the index includes this configuration for the field you want: searchkick word_middle: [:name] Then for the search: UserCourse.search(query, fields: ['stringified_id', 'name', 'description', ...] @@ -83,8 +89,18 @@ Don't match "新潟大学" organization with "新人" (Disabling ambiguity) match: :word_middle ) This is a case sensitive search, however, and probably not exactly what you want. More likely you'll want a tokenizer to treat an email address as a single word, which is a little more complicated. An article below covers this, but requires a custom mapping to implement, and a reconfigured analzyer. https://medium.com/linagora-engineering/searching-email-address-in-elasticsearch-3b09a11e3c2b This will require something like: searchkick merge_mappings: true, mappings: {...} And may require using an explicit search body. # with case insensitive By default searches are case insensitive. To override that, you can alter the Settings # Japanese-aware indexing While there's reasonable support out of the box for Japanese search, you can get additional features with the elasticsearch analysis-kuromoji plugin. @@ -96,12 +112,11 @@ If you go down this route, and want to support multiple analyzers, you need to u See https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html for some possible options, and the searchkick docs for how to do custom mappings and custom/advanced search. # Any notice of combinations above Generally combinations # Not compatible with each other In principle, you can create several fields that have their own analyzers. When you build up the Search call, you can combine options. I'm not aware of specific incompatibilities but relevancy weighting may appear better or worse depending on the user's expectations. # Some parameters update frequently or require a lot of CPU time to reindex In conjunction with a scheduled background job, you can call ModelName.reindex(:custom_reindexer) and have a method like that returns only the fields that need special treatment. -
JasonTrue revised this gist
Sep 27, 2018 . 1 changed file with 62 additions and 8 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -44,18 +44,72 @@ However, this scope is used only for batch import. When an individual entity is !deleted end ``` ## Avoid short query strings (single or two character searches) returning lots of results. By default, misspelling-gentle search is turned on in searchkick. So the two ways to reduce unwanted search results are to turn off or adjust the misspelling-friendly feature, or to query with a relevancy score filter. For example, UserCourse.search(params[:query], { fields: ["name^5", "id"], misspellings: {below: 5} ## Match "Reallyenglish, Co., Ltd." organization with "really" Depending on the goals, this can be solved a few different ways. Match word start on a specific search: UserCourse.search(query, fields: ['stringified_id', 'name', 'description', ...] match: :word_start ) Index a specific field to always support word_start earches: searchkick word_start: [:name] ## Match "Test Reallyenglish Program" program with "english" (In the middle of name) UserCourse.search(query, fields: ['stringified_id', 'name', 'description', ...] match: :word_middle ) Don't match "新潟大学" organization with "新人" (Disabling ambiguity) ## Exact match with User Email UserCourse.search(query, fields: [{email: :exact}, :name] match: :word_middle ) # with case insensitive By default searches are case insensitive. To override that, you can # Japanese-aware indexing While there's reasonable support out of the box for Japanese search, you can get additional features with the elasticsearch analysis-kuromoji plugin. searchkick language: "japanese" If you go down this route, and want to support multiple analyzers, you need to use the searchkick mappings feature and multiple fields. It's not terribly hard, but it's more involved than a quick FAQ can handle. See https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html for some possible options, and the searchkick docs for how to do custom mappings and custom/advanced search. # Any notice of combinations above # Not compatible with each other In principle, you can create several indexes and Benchmark ILIKE vs partial match with elasticsearch using more than 1 million records of users table Others # Some parameters update frequently or require a lot of CPU time to reindex In conjunction with a scheduled background job, you can call ModelName.reindex(:custom_reindexer) and have a method like that returns only the fields that need special treatment. def custom_reindexer { just_the_field_that_matters: calculation_method } end -
JasonTrue revised this gist
Sep 27, 2018 . 1 changed file with 52 additions and 10 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,15 +5,57 @@ https://github.com/ankane/searchkick By default, simply adding the call 'searchkick' to a model will do an unclever indexing of all fields (but not has_many or belongs_to attributes). In practice, you'll need to customize what gets indexed. This is done by defining a method on your model called `search_data` def search_data { id: id; stringified_id: id.to_s, tags: tags.join(" "), user: user.full_name, pass_rate: calculate_pass_rate } end When you change the search_data hash structure, you'll need to reindex that model. You can do that in the rails console by typing `Model.reindex` but you can also use the rake task `searchkick:reindex:all`, or index just one specific model. # Common Indexing challenges, common solutions ## We want to be able to search by ID (in full-text queries). When you conduct a search with ElasticSearch, you specify which fields you want to query. (T ## We want to eager load associations so that it's not so expensive to update the index. Define a scope by this name, and invoke appropriate #joins or #includdes. `scope :search_import, -> { includes(study_tracking: study_tracking_details) }` ## We have soft-deleted records and want to exclude them from indexing Similar to the above solution, `scope :search_import, -> { where(deleted: false) }` However, this scope is used only for batch import. When an individual entity is saved, it is updated separately, so you'll also want to implement: ``` def should_index? !deleted end ``` # Match "Reallyenglish, Co., Ltd." organization with "really" This can be solved by prefix Match "Test Reallyenglish Program" program with "english" (In the middle of name) Don't match "新潟大学" organization with "新人" (Disabling ambiguity) Exact match with User Email with case insensitive If you can finish above in short time, next issues are Any notice of combinations above Not compatible with each other Benchmark ILIKE vs partial match with elasticsearch using more than 1 million records of users table Others -
JasonTrue created this gist
Sep 27, 2018 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,19 @@ # Resources: https://github.com/ankane/searchkick # Indexing By default, simply adding the call 'searchkick' to a model will do an unclever indexing of all fields (but not has_many or belongs_to attributes). In practice, you'll need to customize what gets indexed. This is done by defining a method called `search_data` def search_data { name: name, department_name: department.name, on_sale: sale_price.present? } end For example, When you change the struct