Finding the Root Cause of a Failed Job
Step 1: Login to AWS Console
Find the AWS Account where the action failed. Most of the actions are performed directly in an Environment, which is linked to one AWS Account. The user can find the account ID in the environment status page.
Other actions are performed in the Management AWS Account, using CloudFormation StackSets to deploy stacks into accounts members of the AWS Organization.
Step 2: Go to CloudFormation
Go to CloudFormation and select the Region that the Citadel Environment uses. The user can find the region in the environment status page.
For troubleshooting the Management Account, always set the region to N. Virginia (us-east-1).
Step 3: Find the failed stack
Failed stacks are automatically deleted by Citadel. So to find the root cause of the failure, first, the user needs to change the filter to “Deleted” stacks, as seen below.
Once the user sees the stack, click on it and select the “Events” tab.
Find the first (chronologically) failed event in the list, example:
This will give the user the reason for the stack failure.
One common issue, which is “Resource creation canceled” usually happens in situations where multiple stacks are being deployed at once and one of the stacks fails, triggering a cancellation of all stacks from that set. In this case the best approach is to look for other deleted stacks in the Management, Log Archive and Audit AWS Accounts.
Step 4: Collect the data and contact support
Once the root failed event is found, please send all the information collected from AWS Console and describe which Citadel action triggered the issue to our support team at [email protected].
On this page