Skip to content

Instantly share code, notes, and snippets.

@dbu
Last active November 5, 2018 17:02
Show Gist options
  • Save dbu/766c83ad0c977129d4ff21ad5a982aeb to your computer and use it in GitHub Desktop.
Save dbu/766c83ad0c977129d4ff21ad5a982aeb to your computer and use it in GitHub Desktop.

Revisions

  1. dbu revised this gist Sep 17, 2018. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion _README.md
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,5 @@
    We had a problem where invalid data got stored in elasticsearch. An array of objects had some objects placed in it that are missing a mandatory field. After fixing the mistake, we wanted to update all offending entires. For this, we need to get the IDs of affected items. The "obvious" query would be `_exists_:general_information AND !(_exists_:general_information.value)` but that does not work if there are other entries in the array that are valid.
    We had a problem where invalid data got stored in elasticsearch. An array of objects had some objects placed in it that are missing a mandatory field. After fixing the mistake, we wanted to update all offending entires. For this, we need to get the IDs of affected items.

    The "obvious" query would be `_exists_:general_information AND !(_exists_:general_information.value)`. But as soon as there is any array element with a value, the second condition will consider the value existing. If there are any valid entries in the array, the query will not work as expected.

    The solution we found was to use an ES script that loops over the elements in the source document and returns 1 if it finds one that has no data. To our positive surprise, running this on an index with over 1M entries only took a couple of seconds. Definitely not something for a routine query, but an acceptable time for a one-off query to fix a problem.
  2. dbu revised this gist Sep 17, 2018. 1 changed file with 5 additions and 5 deletions.
    10 changes: 5 additions & 5 deletions data.json
    Original file line number Diff line number Diff line change
    @@ -1,12 +1,12 @@
    {
    'id': 'ABCD',
    'general_information': [
    "id": "ABCD",
    "general_information": [
    {
    'key': 'foo',
    'value': 'bar'
    "key": "foo",
    "value": "bar"
    },
    {
    'key': 'invalid'
    "key": "invalid"
    }
    ]
    }
  3. dbu created this gist Sep 17, 2018.
    3 changes: 3 additions & 0 deletions _README.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,3 @@
    We had a problem where invalid data got stored in elasticsearch. An array of objects had some objects placed in it that are missing a mandatory field. After fixing the mistake, we wanted to update all offending entires. For this, we need to get the IDs of affected items. The "obvious" query would be `_exists_:general_information AND !(_exists_:general_information.value)` but that does not work if there are other entries in the array that are valid.

    The solution we found was to use an ES script that loops over the elements in the source document and returns 1 if it finds one that has no data. To our positive surprise, running this on an index with over 1M entries only took a couple of seconds. Definitely not something for a routine query, but an acceptable time for a one-off query to fix a problem.
    12 changes: 12 additions & 0 deletions data.json
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,12 @@
    {
    'id': 'ABCD',
    'general_information': [
    {
    'key': 'foo',
    'value': 'bar'
    },
    {
    'key': 'invalid'
    }
    ]
    }
    15 changes: 15 additions & 0 deletions query.json
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,15 @@
    {
    "stored_fields": ["id"],
    "query": {
    "bool": {
    "must": {
    "script": {
    "script": {
    "inline": "if (!params._source.containsKey('general_information')) {return 0;} for(def info : params._source.general_information) { if (!info.containsKey('value')) {return 1;} }",
    "lang": "painless"
    }
    }
    }
    }
    }
    }