Skip to content

Instantly share code, notes, and snippets.

@JasonTrue
Last active March 11, 2025 03:37
Show Gist options
  • Select an option

  • Save JasonTrue/3cd6a7094e23cd72bfb870604521f415 to your computer and use it in GitHub Desktop.

Select an option

Save JasonTrue/3cd6a7094e23cd72bfb870604521f415 to your computer and use it in GitHub Desktop.

Revisions

  1. JasonTrue revised this gist Sep 6, 2021. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -21,7 +21,7 @@ When you change the search_data hash structure, you'll need to reindex that mode

    # Searching

    In all recent versions of Postgres, you need to explicitly specify the fields you'll search.
    In all recent versions of Elasticsearch, you need to explicitly specify the fields you'll search.

    Your search should look something like this:

  2. JasonTrue revised this gist Apr 2, 2019. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -159,7 +159,7 @@ And may require using an explicit search body.

    However one solution to avoid this complexity would be to use the exact matching above, and index the field as lowercase, and maybe to pre-filter strings that look like email addresses in queries to lower-case.

    ## with case insensitive
    ## Case sensitivity
    By default searches are case insensitive. To override that for everything, you can alter the searchkick call `searchkick case_sensitive: [:field :list]`, or use exact matching:

    UserCourse.search(query,
  3. JasonTrue revised this gist Sep 27, 2018. 1 changed file with 52 additions and 3 deletions.
    55 changes: 52 additions & 3 deletions searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -21,18 +21,67 @@ When you change the search_data hash structure, you'll need to reindex that mode

    # Searching

    In all recent versions of Postgres, you need to explicitly specify the fields you'll search.

    Your search should look something like this:

    ```ModelName.search(query, fields: ['stringified_id', 'name', 'description', ...])```

    In all recent versions of Postgres, you need to explicitly specify the fields you'll search.

    # Common Indexing challenges, common solutions

    ## We want to be able to search by ID (in full-text queries).
    ## We want to be able to search by ID (in full-text queries)

    When you conduct a search with ElasticSearch, you specify which fields you want to query.
    By default, an integer field can only be searched as an integer, but if you coerce the field to be a string it's searchable with full text search.

    def search_data
    {
    id: id;
    _stringified_id: id.to_s,_
    }
    end

    Your search should look something like this:

    ```ModelName.search(query, fields: ['stringified_id', 'name', 'description', ...])```

    It's worth noting that because you can use non-string types (including arrays of non-string types), it sometimes comes in handly to do more of your searching/filtering in Elastic than in Postgres. You can combine a full text query with some specific fields.

    def search_data
    {
    blog_id: blog_id,
    author: user.name,
    author_id: user.id,
    publish_year: publish_at.year,
    publish_month: publish_at.month,
    publish_day: publish_at.day,
    publish_at: publish_at,
    created_at: created_at,
    updated_at: updated_at,
    tags: tag_list,
    story: story,
    title: title,
    approved: approved
    }
    end

    Then a flexible, type-aware search that still does full text search on some fields, like title and story:

    search_params = { approved: true, publish_at: { lte: 'now/m' } }
    search_params = search_params.merge(blog_id: @blog.id) if @blog.present?
    search_params = search_params.merge(publish_year: @year) if @year.present?
    search_params = search_params.merge(publish_month: @month) if @month.present?
    search_params = search_params.merge(publish_day: @day) if @day.present?
    search_params = search_params.merge(tags: {all: @tags}) if @tags.present?

    if @query.present?
    @posts = Post.search(@query, fields: [:title, :story], where: search_params, page: params[:page])
    else
    @posts = Post.search(fields: [:title, :story], where: search_params, page: params[:page], per_page: 20,
    order: [{publish_at: :desc}])
    end
    logger.info({ query: @query, params: search_params })

    ## We want to eager load associations so that it's not so expensive to update the index.

  4. JasonTrue revised this gist Sep 27, 2018. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -19,7 +19,7 @@ In practice, you'll need to customize what gets indexed. This is done by definin

    When you change the search_data hash structure, you'll need to reindex that model. You can do that in the rails console by typing `Model.reindex` but you can also use the rake task `searchkick:reindex:all`, or index just one specific model.

    #Searching
    # Searching

    Your search should look something like this:

  5. JasonTrue revised this gist Sep 27, 2018. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -107,6 +107,8 @@ This will require something like:
    searchkick merge_mappings: true, mappings: {...}

    And may require using an explicit search body.

    However one solution to avoid this complexity would be to use the exact matching above, and index the field as lowercase, and maybe to pre-filter strings that look like email addresses in queries to lower-case.

    ## with case insensitive
    By default searches are case insensitive. To override that for everything, you can alter the searchkick call `searchkick case_sensitive: [:field :list]`, or use exact matching:
  6. JasonTrue revised this gist Sep 27, 2018. 1 changed file with 5 additions and 5 deletions.
    10 changes: 5 additions & 5 deletions searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -108,14 +108,14 @@ This will require something like:

    And may require using an explicit search body.

    # with case insensitive
    ## with case insensitive
    By default searches are case insensitive. To override that for everything, you can alter the searchkick call `searchkick case_sensitive: [:field :list]`, or use exact matching:

    UserCourse.search(query,
    fields: [{my_field: :exact}, :other_field]
    # Japanese-aware indexing
    ## Japanese-aware indexing
    While there's reasonable support out of the box for Japanese search, you can get additional features with the elasticsearch analysis-kuromoji plugin.

    searchkick language: "japanese"
    @@ -124,14 +124,14 @@ If you go down this route, and want to support multiple analyzers, you need to u

    See https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html for some possible options, and the searchkick docs for how to do custom mappings and custom/advanced search.

    # Any notice of combinations above
    ## Any notice of combinations above
    Generally combinations are supported by choosing the right field to query. Most of the parameters that normally take a symbol can be replaced with a hash from that symbol to various options. (
    https://github.com/ankane/searchkick will have better examples than I can provide).

    # Not compatible with each other
    ## Not compatible with each other
    In principle, you can create several fields that have their own analyzers and behaviors. When you build up the Search call, you can combine options. I'm not aware of specific incompatibilities but relevancy weighting may appear better or worse depending on the user's expectations. So for example, if you have a dilemma about how to search something, you could potentially use very dissimilar pseudo_fields with different search rules, and just include all of them, with potentially different boosting rules, in your search call.

    # Some parameters update frequently or require a lot of CPU time to reindex
    ## Some parameters update frequently or require a lot of CPU time to reindex
    In conjunction with a scheduled background job, you can call ModelName.reindex(:custom_reindexer) and have a method like that returns only the fields that need special treatment.

    def custom_reindexer
  7. JasonTrue revised this gist Sep 27, 2018. 1 changed file with 3 additions and 3 deletions.
    6 changes: 3 additions & 3 deletions searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -125,11 +125,11 @@ If you go down this route, and want to support multiple analyzers, you need to u
    See https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html for some possible options, and the searchkick docs for how to do custom mappings and custom/advanced search.

    # Any notice of combinations above
    Generally combinations

    Generally combinations are supported by choosing the right field to query. Most of the parameters that normally take a symbol can be replaced with a hash from that symbol to various options. (
    https://github.com/ankane/searchkick will have better examples than I can provide).

    # Not compatible with each other
    In principle, you can create several fields that have their own analyzers. When you build up the Search call, you can combine options. I'm not aware of specific incompatibilities but relevancy weighting may appear better or worse depending on the user's expectations.
    In principle, you can create several fields that have their own analyzers and behaviors. When you build up the Search call, you can combine options. I'm not aware of specific incompatibilities but relevancy weighting may appear better or worse depending on the user's expectations. So for example, if you have a dilemma about how to search something, you could potentially use very dissimilar pseudo_fields with different search rules, and just include all of them, with potentially different boosting rules, in your search call.

    # Some parameters update frequently or require a lot of CPU time to reindex
    In conjunction with a scheduled background job, you can call ModelName.reindex(:custom_reindexer) and have a method like that returns only the fields that need special treatment.
  8. JasonTrue revised this gist Sep 27, 2018. 1 changed file with 5 additions and 1 deletion.
    6 changes: 5 additions & 1 deletion searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -109,8 +109,12 @@ This will require something like:
    And may require using an explicit search body.

    # with case insensitive
    By default searches are case insensitive. To override that, you can alter the Settings
    By default searches are case insensitive. To override that for everything, you can alter the searchkick call `searchkick case_sensitive: [:field :list]`, or use exact matching:

    UserCourse.search(query,
    fields: [{my_field: :exact}, :other_field]
    # Japanese-aware indexing
    While there's reasonable support out of the box for Japanese search, you can get additional features with the elasticsearch analysis-kuromoji plugin.

  9. JasonTrue revised this gist Sep 27, 2018. 1 changed file with 10 additions and 1 deletion.
    11 changes: 10 additions & 1 deletion searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -19,11 +19,20 @@ In practice, you'll need to customize what gets indexed. This is done by definin

    When you change the search_data hash structure, you'll need to reindex that model. You can do that in the rails console by typing `Model.reindex` but you can also use the rake task `searchkick:reindex:all`, or index just one specific model.

    #Searching

    Your search should look something like this:

    ```ModelName.search(query, fields: ['stringified_id', 'name', 'description', ...])```

    In all recent versions of Postgres, you need to explicitly specify the fields you'll search.

    # Common Indexing challenges, common solutions

    ## We want to be able to search by ID (in full-text queries).

    When you conduct a search with ElasticSearch, you specify which fields you want to query. (T
    When you conduct a search with ElasticSearch, you specify which fields you want to query.


    ## We want to eager load associations so that it's not so expensive to update the index.

  10. JasonTrue revised this gist Sep 27, 2018. 1 changed file with 24 additions and 9 deletions.
    33 changes: 24 additions & 9 deletions searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -53,9 +53,13 @@ UserCourse.search(params[:query], {
    fields: ["name^5", "id"],
    misspellings: {below: 5}

    Alternatively, `MyModel.search query, body_options: {min_score: 1}` tunes out a lot of noise.

    ## Match "Reallyenglish, Co., Ltd." organization with "really"
    Depending on the goals, this can be solved a few different ways.

    Make sure the index includes this configuration for the field you want:

    searchkick word_start: [:name]

    Match word start on a specific search:

    @@ -64,11 +68,13 @@ Match word start on a specific search:
    match: :word_start
    )

    Index a specific field to always support word_start earches:
    ## Match "Test Reallyenglish Program" program with "english" (In the middle of name)

    searchkick word_start: [:name]
    Make sure the index includes this configuration for the field you want:

    ## Match "Test Reallyenglish Program" program with "english" (In the middle of name)
    searchkick word_middle: [:name]

    Then for the search:

    UserCourse.search(query,
    fields: ['stringified_id', 'name', 'description', ...]
    @@ -83,8 +89,18 @@ Don't match "新潟大学" organization with "新人" (Disabling ambiguity)
    match: :word_middle
    )

    This is a case sensitive search, however, and probably not exactly what you want. More likely you'll want a tokenizer to treat an email address as a single word, which is a little more complicated. An article below covers this, but requires a custom mapping to implement, and a reconfigured analzyer.

    https://medium.com/linagora-engineering/searching-email-address-in-elasticsearch-3b09a11e3c2b

    This will require something like:

    searchkick merge_mappings: true, mappings: {...}

    And may require using an explicit search body.

    # with case insensitive
    By default searches are case insensitive. To override that, you can
    By default searches are case insensitive. To override that, you can alter the Settings

    # Japanese-aware indexing
    While there's reasonable support out of the box for Japanese search, you can get additional features with the elasticsearch analysis-kuromoji plugin.
    @@ -96,12 +112,11 @@ If you go down this route, and want to support multiple analyzers, you need to u
    See https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html for some possible options, and the searchkick docs for how to do custom mappings and custom/advanced search.

    # Any notice of combinations above
    Generally combinations

    # Not compatible with each other
    In principle, you can create several indexes and

    Benchmark ILIKE vs partial match with elasticsearch using more than 1 million records of users table
    Others
    # Not compatible with each other
    In principle, you can create several fields that have their own analyzers. When you build up the Search call, you can combine options. I'm not aware of specific incompatibilities but relevancy weighting may appear better or worse depending on the user's expectations.

    # Some parameters update frequently or require a lot of CPU time to reindex
    In conjunction with a scheduled background job, you can call ModelName.reindex(:custom_reindexer) and have a method like that returns only the fields that need special treatment.
  11. JasonTrue revised this gist Sep 27, 2018. 1 changed file with 62 additions and 8 deletions.
    70 changes: 62 additions & 8 deletions searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -44,18 +44,72 @@ However, this scope is used only for batch import. When an individual entity is
    !deleted
    end
    ```
    ## Avoid short query strings (single or two character searches) returning lots of results.

    By default, misspelling-gentle search is turned on in searchkick. So the two ways to reduce unwanted search results are to turn off or adjust the misspelling-friendly feature, or to query with a relevancy score filter.

    For example,
    UserCourse.search(params[:query], {
    fields: ["name^5", "id"],
    misspellings: {below: 5}


    ## Match "Reallyenglish, Co., Ltd." organization with "really"
    Depending on the goals, this can be solved a few different ways.

    Match word start on a specific search:

    UserCourse.search(query,
    fields: ['stringified_id', 'name', 'description', ...]
    match: :word_start
    )

    Index a specific field to always support word_start earches:

    searchkick word_start: [:name]

    ## Match "Test Reallyenglish Program" program with "english" (In the middle of name)

    UserCourse.search(query,
    fields: ['stringified_id', 'name', 'description', ...]
    match: :word_middle
    )

    # Match "Reallyenglish, Co., Ltd." organization with "really"
    This can be solved by prefix
    Match "Test Reallyenglish Program" program with "english" (In the middle of name)
    Don't match "新潟大学" organization with "新人" (Disabling ambiguity)
    Exact match with User Email
    with case insensitive
    If you can finish above in short time, next issues are

    Any notice of combinations above
    Not compatible with each other
    ## Exact match with User Email
    UserCourse.search(query,
    fields: [{email: :exact}, :name]
    match: :word_middle
    )

    # with case insensitive
    By default searches are case insensitive. To override that, you can

    # Japanese-aware indexing
    While there's reasonable support out of the box for Japanese search, you can get additional features with the elasticsearch analysis-kuromoji plugin.

    searchkick language: "japanese"

    If you go down this route, and want to support multiple analyzers, you need to use the searchkick mappings feature and multiple fields. It's not terribly hard, but it's more involved than a quick FAQ can handle.

    See https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html for some possible options, and the searchkick docs for how to do custom mappings and custom/advanced search.

    # Any notice of combinations above

    # Not compatible with each other
    In principle, you can create several indexes and

    Benchmark ILIKE vs partial match with elasticsearch using more than 1 million records of users table
    Others

    # Some parameters update frequently or require a lot of CPU time to reindex
    In conjunction with a scheduled background job, you can call ModelName.reindex(:custom_reindexer) and have a method like that returns only the fields that need special treatment.

    def custom_reindexer
    {
    just_the_field_that_matters: calculation_method
    }
    end


  12. JasonTrue revised this gist Sep 27, 2018. 1 changed file with 52 additions and 10 deletions.
    62 changes: 52 additions & 10 deletions searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -5,15 +5,57 @@ https://github.com/ankane/searchkick

    By default, simply adding the call 'searchkick' to a model will do an unclever indexing of all fields (but not has_many or belongs_to attributes).

    In practice, you'll need to customize what gets indexed. This is done by defining a method called `search_data`

    def search_data
    {
    name: name,
    department_name: department.name,
    on_sale: sale_price.present?
    }
    In practice, you'll need to customize what gets indexed. This is done by defining a method on your model called `search_data`

    def search_data
    {
    id: id;
    stringified_id: id.to_s,
    tags: tags.join(" "),
    user: user.full_name,
    pass_rate: calculate_pass_rate
    }
    end

    When you change the search_data hash structure, you'll need to reindex that model. You can do that in the rails console by typing `Model.reindex` but you can also use the rake task `searchkick:reindex:all`, or index just one specific model.

    # Common Indexing challenges, common solutions

    ## We want to be able to search by ID (in full-text queries).

    When you conduct a search with ElasticSearch, you specify which fields you want to query. (T

    ## We want to eager load associations so that it's not so expensive to update the index.

    Define a scope by this name, and invoke appropriate #joins or #includdes.

    `scope :search_import, -> { includes(study_tracking: study_tracking_details) }`

    ## We have soft-deleted records and want to exclude them from indexing

    Similar to the above solution,

    `scope :search_import, -> { where(deleted: false) }`

    However, this scope is used only for batch import. When an individual entity is saved, it is updated separately, so you'll also want to implement:

    ```
    def should_index?
    !deleted
    end
    ```

    # Match "Reallyenglish, Co., Ltd." organization with "really"
    This can be solved by prefix
    Match "Test Reallyenglish Program" program with "english" (In the middle of name)
    Don't match "新潟大学" organization with "新人" (Disabling ambiguity)
    Exact match with User Email
    with case insensitive
    If you can finish above in short time, next issues are

    Any notice of combinations above
    Not compatible with each other
    Benchmark ILIKE vs partial match with elasticsearch using more than 1 million records of users table
    Others


    For example,
    When you change the struct
  13. JasonTrue created this gist Sep 27, 2018.
    19 changes: 19 additions & 0 deletions searchkick_and_elasticsearch_guidance.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,19 @@
    # Resources:
    https://github.com/ankane/searchkick

    # Indexing

    By default, simply adding the call 'searchkick' to a model will do an unclever indexing of all fields (but not has_many or belongs_to attributes).

    In practice, you'll need to customize what gets indexed. This is done by defining a method called `search_data`

    def search_data
    {
    name: name,
    department_name: department.name,
    on_sale: sale_price.present?
    }
    end

    For example,
    When you change the struct