Skip to content

Instantly share code, notes, and snippets.

@malli1983
Last active February 3, 2020 08:10
Show Gist options
  • Save malli1983/a7c5f59d8d3b31f17d360fd38b58bd2e to your computer and use it in GitHub Desktop.
Save malli1983/a7c5f59d8d3b31f17d360fd38b58bd2e to your computer and use it in GitHub Desktop.

Root Cause Analysis for AES EDI | JIRA Issue (AESEDI-53447) Date 2020-02-03 Authors Mallikarjunarao Operations Team Status Complete, resolved

Summary

The customer data was not sent from AES EDI. The investigation showed that the file with the data was sent however, it did not get processed due to an issue with the AES CIS service.

Impact

About 486,000 records were affected and the EDI to CIS monitoring service too.

Root Causes Sending files with a large number of records while simultaneously running the patching script caused a wreck in the data processing. This issue with AES CIS service futher escalated to data not getting processed leading to missing records.

Resolution Reloading the AES CIS monitoring service allowed us to spot the missed records that were not discovered automatically. Following this the data file was resent leading to the resolution. Detection Customer created a Jira Ticket to alert us on this failure. Please refer JIRA Issue: (AESEDI-53447) Action Items Writing of monitoring policy to detect records missings prevent Mallikarjunarao DONE Monitor the data ingesters and processors (ETL) prevent Mallikarjunarao (Jira Issue No: AESCIS-38263)TODO

  1. More monitoring plugins and modules to watch this critical part of our infrastructure.
  2. Slack notifications have to be added for alerting the team whenever a data discrepancy is detected in future so that such occurrences are prevented in future.
  3. Patching operations should not be executed while data processing is in progress at AES EDI.

Timeline 2019-06-07 (all times UTC)

Time Description

11:56 Discovering of the missing files 12:00 Restarting of the AES CIS monitoring module 12:15 Starting of the data processing of the records files 13:00 Completion of the data processing of all the 486,000 records files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment