Skip to content

The Global Query

Logtopus

When I was oncall in a previous life, I found myself repeatedly needing to run AWS CloudWatch Logs Insights queries across multiple accounts and regions. I banged together a script one afternoon and kept finding new uses cases for it. In this post, I've recreated it and brought it to life with Speedrun.

Why did I need this?

  1. Answering a question myself. Aka: a customer reports an issue without following the template and provides partial information like a log snippet (sometimes as a screenshot 😳). Instead of asking and waiting for them to provide their app id and region, I could just search all regions for a snippet of their log to get that information myself.
  2. Determining blast radius. Aka: is this isolated to one region or customer or is it more widespread. For example if I noticed an error while rolling out a deployment, I could quickly determine whether it was isolated to the new deployment or was an existing issue. I could also determine extent if it was spread across multiple regions.

Show me the script!

Before I do, I'll highlight a few things about this script because it is paranormal.

  1. It's short. It does little more than get credentials in a role, get the log group, start a query and print the results.
  2. It's not meant to be run directly and requires the user to loop over the accounts and regions and pass in the appropriate escaped command line arguments.

A sane person appreciates 1, but not 2. No one likes building loops, escaping strings and passing in arguments in shell, especially if it's 7 arguments! We'll address that in the next section.

globalQuery.sh
#!/bin/bash

ACCOUNT_LINE=( $1 )
ACCOUNT=${ACCOUNT_LINE[0]}
REGION=${ACCOUNT_LINE[1]:-us-east-1}
ROLE=$2
LOG_GROUP_PATTERN=$3
START_TIME=$4
END_TIME=$5
QUERY=$6
RUN_ID=$7

# Get credentials for the specified role and account
credentials=$(curl -s -S -b ~/.speedrun/cookie -L -X POST -H "Content-Type: application/json; charset=UTF-8" -d '{"role": "'$ROLE'", "duration":3600}' -X POST https://speedrun-api.us-west-2.nobackspacecrew.com/v1/credentials/${ACCOUNT})
if [[ $credentials != *"AccessKeyId"* ]]; then
  echo -e "\033[31mUnable to get credentials: ${credentials}\033[m"
  exit 1;
fi
export $(printf "AWS_DEFAULT_REGION=$REGION AWS_REGION=$REGION AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s AWS_CREDENTIAL_EXPIRATION=%s" $(echo $credentials | cut -d\" -f4,8,12,16 | tr '"' '\n'))

# Get account alias for account if there is one
ACCOUNT_ALIAS=$(aws iam list-account-aliases --query "AccountAliases[0]" --output text)
if [[ ACCOUNT_ALIAS == 'None' ]]; then
  ACCOUNT_ALIAS = $ACCOUNT
fi

# Find the first loggroup the matches the search pattern
LOG_GROUP_NAME=$(aws logs describe-log-groups --log-group-name-pattern $LOG_GROUP_PATTERN --query "logGroups[0].logGroupName" --output json | sed s/\"//g)
if [[ $LOG_GROUP_NAME == "null" ]]; then
    echo -e "\033[31mNo matching logGroup for pattern: '$LOG_GROUP_PATTERN'\033[m";
    exit 2;
fi

# start query
QUERY_ID=$(aws logs start-query --log-group-name $LOG_GROUP_NAME --start-time $START_TIME --end-time $END_TIME --query-string "$QUERY" --query \"queryId\" --output text)

# print details about query
printf "ACCOUNT: $ACCOUNT\n"\
"ACCOUNT_ALIAS: $ACCOUNT_ALIAS\n"\
"REGION: $REGION\n"\
"QUERY: $QUERY\n"\
"START: %s\n"\
"END: %s\n"\
"QUERY_ID: $QUERY_ID\n" "$(date -u -r $START_TIME)" "$(date -u -r $END_TIME)" | tee -a $ACCOUNT_ALIAS-$REGION-$RUN_ID.txt;

# poll for query completion
while [[ $(aws logs get-query-results --query-id=$QUERY_ID --query "status" --output text) =~ ^(Scheduled|Running)$ ]]
do
    sleep 2;
done
status=$(aws logs get-query-results --query-id=$QUERY_ID --query "status" --output text)
if [[ $status != 'Complete' ]]; then
    echo -e "\033[31mInvalid query status: '$status'\033[m";
    exit 3;
fi

# on completion, print header and results
aws logs get-query-results --query-id $QUERY_ID \
  --query "[@][?status=='Complete'].results[*][?field!='@ptr']. [field,value]|[][*][0] | [0]" \
  --output text --no-paginate | tee -a $ACCOUNT_ALIAS-$REGION-$RUN_ID.txt
aws logs get-query-results --query-id $QUERY_ID \
  --query "[@][?status=='Complete'].results[*][?field!='@ptr'].[field,value]|[][*][1]" \
  --output text --no-paginate | tee -a $ACCOUNT_ALIAS-$REGION-$RUN_ID.txt

Making the script sane to run

Assuming you've copied the script above into your home directory as globalQuery.sh and made it executable with chmod +x globalQuery.sh, the next step is to create a list of accounts and regions to run it on. To do this create a file called accounts.txt in your home directory with an account and optional region per line. If region is omitted, it defaults to us-east-1.

accounts.txt
012345678901 us-west-2
112345678901

Now we need to loop over the accounts and wrap the script with a UI. Paste the following Speedrun block in GitHub markdown in a repository you've enabled Speedrun on to build the command you need to run. I've also put it here.

```
#copy
while read account
  do
      ./globalQuery.sh "$account" "${role}" "~~~Log Group Pattern~~~" ${endTime-~~~Lookback {type:'select', options:{'1 minute': 60,'1 hour':3600, '1 day': 86400, '3 days':259200, '1 week': 604800, '1 month':2678400}}~~~} "~~~endTime=EndTime {transform:"dayjs(value).valueOf()/1000", default:'${dayjs().format("YYYY-MM-DD HH:mm")}'}~~~" $'~~~globalQuery=Query {type:'textarea', transform:'bashEscape(value)'}~~~' ${Math.random().toString(36).substr(2)} &
  done < ~/accounts.txt
```

Some notes about this block:

  1. It is assumed that you have followed the instructions to create Speedrun roles in your accounts and followed the necessary steps to authenticate your command line for use with Speedrun.
  2. ${role} is replaced by the current role in your dropdown. If you want to use a different role, you can hardcode it, or set it for this block only by modifying the #copy line to #copy {role:'speedrun-DifferentRole'} where speedrun-DifferentRole is the role you want to use.
  3. It will prompt you for Log Group Pattern which uses the LogGroupNamePattern syntax.
  4. It will prompt you for the End Time of the query, it uses any-date-parser to parse the date and these are the allowed formats.
  5. The lookback is relative to your End Time.
  6. The query is a multi-line text area, likely you want to test it using the CloudWatch Logs Insights console and then paste it when you get it working. The most applicable queries for summarizing are aggregation queries.
  7. The last argument is a random string that is used to name the output file. This gives each run a unique id so you can run the script multiple times in the same directory and keep the results separate.

Here's an example of what it looks like in action:

Global Query In Action

Conclusion

One of the many use cases for Speedrun is to spend the minimum amount of time writing a scrappy script and then using Speedrun to make it safe and sane to run. It's often easier to do something in the browser than the command line and Speedrun allows you to mix and match the best of both worlds without overhead. I hope this post gets you thinking on how you can leverage Speedrun in your own work. You'll find links below to the Discord and Twitter if you want to join the community. Now go build, but faster!