What’s the worst place to leave your secrets? – Research into what happens to AWS credentials that are left in public places

TL;DR

I deployed canary tokens in various public places on the Internet, logged all access attempts, and discovered intriguing patterns on credential discovery and attack methodologies of threat actors.

Canary Tokens Primer

Canary tokens are a type of digital tripwire designed to detect unauthorized access or activity within a system. They work by embedding seemingly valuable but false information, such as login credentials, API keys, or other sensitive data, into various parts of a network, code base, or application. When an attacker attempts to use these canary tokens, an alert is triggered, notifying the owner of the token of a potential breach.

Research Method

In this research, I used AWS API credentials as canary tokens, strategically placing them in publicly accessible locations on the internet. For the purposes of this research, I used the canary generating service called “Canarytokens” by Thinkst, which is a great free service that provides multiple canary token types like AWS credentials, DNS tokens, executable tokens and more.

Using this service you can define what canary token you want, where you want to be alerted and the service generates a token for you. If some trips the alarm you’ll get notified via email.

Creating a new token on CanaryTokens.org

I chose to place my canary tokens in the following locations:

  • Public Code/Docker Repositories: GitHub, GitLab, BitBucket, DockerHub
  • Self-Managed Public Services: FTP server, web server, blog
  • SaaS Services: Pastebin, JSFiddle
  • Package Managers: NPMJS, PyPi
  • Cloud Storage Buckets: AWS S3, GCP Google Cloud Storage

By tracking who attempted to use my AWS canary tokens, I was able to gather valuable data on unauthorized access attempts, including information about the IP addresses, user agents, timestamps, and methods used by the intruders. This approach provided insights into the behavior of threat actors and highlighted the effectiveness of canary tokens as an early warning system for detecting security threats.

I chose to use AWS credentials as the canary token of choice for the purpose of this research, since I wanted to get actual illegal activity data. Other tokens types like DNS tokens, executable tokens, or image tokens will trigger if someone touches or executes these resources, which isn’t necessarily illegal. Attempting to gain access to an AWS account using credentials you found on the Internet on the other hand, is illegal (probably) since it’s an attempt to gain access to a restricted resource.

Using the Canarytokens service, I generated several AWS tokens, a single token for each public resource. Every time someone attempted to use the token, the service would send me an Email containing information about the attempt.

Email message indicating that a canary has been triggered

Motivation For Research

My motivation for conducting this research stems from a combination of personal curiosity and a desire to enhance the cyber security community’s awareness of an often underutilized resource: canary tokens. I find the concept of canary tokens both intriguing and cool. The idea of using a seemingly innocent piece of data as digital tripwires to detect unauthorized access fascinates me, and I wanted to explore its practical application in real-world scenarios. I was also intrigued by the question of how fast and how often do threat actors scan public resources, and breach targets using exposed credentials. 

Moreover, I believe that canary tokens are an undervalued resource among security engineers. Despite their simplicity and effectiveness, they are not as widely adopted as other security measures. By showcasing the insights and findings from my study, I hope to bring greater awareness to the usefulness of canary tokens and encourage more security professionals to incorporate them into their defensive strategies.


Results

Code/Image Repositories

Code repositories are the most common place for people to leave their credentials lying around and obviously GitHub is the most popular service in this category. For the three code repository services (GitHub, GitLab and Bitbucket) I cloned the Prowler project (An open source cloud audit tool) and added a config file containing the canary token and pushed the modified code into new repositories (The repos were configured to be public accessible).

The GitHub canary token

For DockerHub, I created a public docker Image with a NodeJS web application that has hardcoded credentials in its source code.  Anyone who pulls the docker image can easily see the tokens in the source code. I gave the image a juicy name to make it extra enticing for malicious actors.

The following graph shows the number of access attempts made to GitHub and DockerHub per hour since the canary tokens were published. BitBucket and GitLab are not shown here because, surprisingly, there were 0 attempts to access the canary tokens published on those platforms.

Access Attempts Per Hour on GitHub and DockerHub

Conversely, access with the canary tokens that were published on GitHub was attempted within seconds of their release.

For DockerHub, it took 170 hours (~7 Days), until the first access attempt, after which there was an access attempt every few days.

The following Pie chart shows the the distribution of IP addresses that attempted to use the GitHub canary tokens within the first 500 hours

IPs attempting to access the GitHub token

Self-managed Public Services

For this category, I spun up an EC2 on AWS, installed a few services and opened it to the Internet:

  • FTP server with anonymous access –  I installed an open-source FTP, configured it to allow anonymous access and placed a file with a canary token in it.
  • Web Server – First I set up a web server on port 80, added a robots.txt file and placed the token on the path /aws.config. The robots.txt file was supposed to lead automatic scrapers to the aws.config.
The canary token on the path /aws.config

Unfortunately, there were no takers and no access attempts for either the FTP and the web server. So, I decided to make it easier for the attackers and moved the canary token to the root of the web application, and then after a day I started to get some interesting results

  • On this Blog – I created a fake blog post on my website, pretending to be a guide on how to connect to AWS using a CLI. The blogpost had examples of how to connect to AWS that were in fact the canary token itself.

Here again, much to my surprise, some of the services didn’t get any access attempts. In fact, only when the token on /aws.config was moved to the root of the web server did results start to come in.

It took nearly 50 hours for the scrapers to get to my website and start using the token.

The following diagram shows the number of access attempts since the release of the canary token per groups of hours. I’m comparing the canary token on the root directory to the Pastebin canary (explained in the next section) since they had a similar number of access attempts.

Access attempts per hour on Pastebin and on a website

Saas Services

1) Pastebin – Pastebin is an online service that allows users to store and share text documents, such as code snippets, configuration files, and logs. Users can create a “paste” by submitting text, which is then stored on the Pastebin server and assigned a unique URL that can be shared with others.

For Pastebin, I tested out 2 tokens: one token on a password protected paste, with an easy to crack password 123456. The second token was without a password.

Canary token on Pastebin with a password

2) JSFiddle – is an online tool and collaborative web development environment that allows users to write, test, and share HTML, CSS, and JavaScript code snippets. I created a new paste with a hardcoded canary token in it that are supposed to mimic a service that lists S3 buckets.

JSfiddle token

The results show that Pastebin is a really bad place to put anything that is sensitive, without at least password protecting it first, as the canary token was immediately picked up and used. It also seems like pastes with a password aren’t being cracked as I got 0 hits on that token.

JSFiddle doesn’t seem to be as bad of place, as I got 0 hits from it as well. I assume that the reason for that is that since it’s used for client-side code, there aren’t many developers submitting hardcoded secrets into the code there, and as a result it isn’t monitored by hackers.

I then wondered what would happen if publish my fiddle link on a passwordless paste in Pastebin, but even then there were no takers.

Package Managers

Package managers are tools that automate the process of installing, updating, configuring, and managing software packages. These packages can be libraries, frameworks, or applications and most importantly, many packages are public and accessible by anyone.

Finding secrets in public packages stored on package managers, is a very realistic scenario, as occasionally developers accidentally publish packages with passwords or keys in them or publish a package as public instead of private. As the following data shows, doing this mistaken will likely mean that your secrets are stolen in seconds.

For this section of the research, I choose two popular package managers: Pypi and NPMJS. For both package managers, I created an application with hardcoded canary tokens in them and pushed the package into their respective package manager.

The Python package on Pypi containing the canary token
The NodeJS package on npmjs containing the canary token

The following diagram shows the number of access attempts NPMJS and Pypi per hour after releasing them to the public.

npmjs and pypi access attempts per hour after release

Caveat – I think there are some legitimate services online that routinely download and run any newly published package, so some of the results in the section may not actually indicate an illegitimate access attempt, but rather just some service that auto executes code. In retrospect, I probably shouldn’t have built these packages to execute a call to AWS automatically when executed, since that would cause an automatic alert trigger when the package is executed . On the other hand, I have some data showing that the same IP tried various AWS API calls using my Pypi and NPMJS canary tokens. The attacker tried to list vaults and secrets (which isn’t what the code in the public package automatically tries).

An access attempt using the pypi token that attempted to list all vaults

Buckets

Buckets in AWS (Amazon Web Services) and GCP (Google Cloud Platform) are storage containers used to store and manage data objects, such as files, images, and backups. Leaky buckets sometimes expose credentials as people use buckets for storing backups and configuration files, without realising that the bucket is configured to be public.

For the section I used AWS’s S3 service and GCP Google Cloud Storage and stored a different canary token on each one.

Canary token on a public S3 bucket

My main motivation for adding these buckets to the research, was my somewhat conspiratorial idea, that there are some malicious parties out there in the world who have a method of enumerating all public buckets. My hope was that I may uncover these lizard people and expose them for their lizardling ways! Alas after configuring the buckets to be publicly accessible, both buckets didn’t generate any hits.

Only after I published the bucket’s address on Pastebin, GitHub and on my website that I got several hits that look like someone from the United States attempting a bunch API functions on AWS (probably not a lizard though).

Fastest Access Time

As I mentioned, one of the things I was curious about was how fast exactly does some hacking bots grab and access stolen credentials, well the answer is… pretty damn fast for some services:

Attack Patterns

One of the interesting things that can be logged whenever you receive a canary token alert, is the function or event that is invoked using the canary token. Since the canary tokens are AWS API credentials, I could see which AWS API method the attacker tried to invoke. Unfortunately, this is where the canarytoken.org service disappoints a little bit since it doesn’t retain this information more than a few events back (it is not sent VIA Email), so some information was lost during the research. I managed to obtain about 70% of the events and here are the results for all access attempts across all canary tokens:

Count of AWS API events across all canary tokens

The InvokeModel event on AWS, in case you were wondering, refers to the action of invoking a machine learning model deployed on AWS services, such as AWS SageMaker. When the InvokeModel event is triggered, the specified model processes the input data and returns the prediction results. I have a few ideas as to why this was a popular choice for attackers, but I’ll leave it up to you to speculate on that.

All canary token access attempts

The following diagram shows the number of access attempts per canary token for all the services used in this research (including the ones that didn’t have any hits)

IP analysis

In total there were 45 unique IP addresses found in the logs across all canary tokens. Here are their details:

Access attempts per IP address across all canary tokens
Access attempts per county across all canary tokens

The distribution of threat actor IP addresses by country reminds me of the of the typical distribution of threat actors we typically see from other research into cyber activity. You can see that that North America is the leading source followed by a few countries in Asia. The only surprise here for me is the lack of Chinese IP addresses, but I wouldn’t put too much importance on the source of the IP addresses since a lot of the threat actors are probably using some offshore automated cloud service for their misgivings. CanaryTokens sometimes gives AWS Internal and SNS as the source of the IP addresses and that is why you see them here.

Malicious IP analysis

I was also wondering if these source IP addresses are flagged as malicious by any of the IP testing services out there.  Luckily VirusTotal provides a free IP scan that checks an IP address classification using 92 different engines. Using this method, each engine returns a result of either Clean, Unrated, Malicious or Suspicion.

So I ran all 45 unique IP addresses against all 92 engines (4140 results in total) and got back these results:

Clean: 1283 (31.0%)

Unrated: 2848 (68.8%)

Suspicious: 2 (0.04%)

Malicious: 7 (0.02%)

The results of this analysis shows that malicious IP classification method for identifying this type of attack is basically useless due to such high False Negatives rates.

User Agent Analysis

User agent data tells a little bit about how the bots are accessing AWS, while this isn’t too revealing since it can be easily forged, it may be used as a method of fingerprinting attackers as you can potentially track version numbers of the software being used to access AWS.

The following diagram shows the number of access attempts per User Agent.

The majority of requests were done using some version of botocore3 (the main library used by the official AWS SDK). There is also a large number of some well know HTTP libraries such as python-requests, axios and AIOHttp which suggests that the access attempts are being performed automatically, using custom made tools (as apposed to being access manually using a AWS CLI)

Access attempts per user agent across all canary tokens

Final Thoughts

Things that surprised me

A number of things surprised during analysis of the data:

  1. No access attempts on BitBucket and GitLab. Going into this I was sure the tokens on these services will be grabbed pretty quickly, probably not as fast as on GitHub, but still, I didn’t’ expect 0 hits. I’m still not sure the reason for this, perhaps it’s due to them being less popular or maybe it is harder to scrape them automatically.
  2. Grab Quickness – I find it pretty amazing that some tokens were grabbed and used after a few seconds. The NPMJS token was grabbed in less than a minute which includes the overhead time of logging and detecting the access attempt. GitHub and Pypi were close behind with about 2 minutes until the access attempt.

Key Take-homes from the results

The key take home from this research, that I think people in the cyber security industry should take, is that there are some pretty efficient threat actor groups out there that have optimized scraping open source online services for secrets and they will likely grab your token in a manner of minutes to hours depending on the online service. This means that if you have discovered that your organization’s secret has leaked somehow to one of these services you should Immediately roll the key and conduct a forensic investigation into any malicious usage of the key.

Thoughts on Canary Tokens

As I mentioned in the beginning, canary tokens are an underutilized tool that offer a quick and cost-effective layer of security for various IT and application products. Despite their simplicity, they can play a crucial role in detecting unauthorized access and potential threats. By strategically placing canary tokens within your systems, you can gain early warnings of malicious activity, allowing for swift response and mitigation. Their implementation requires minimal effort and expense, yet they can significantly enhance your security posture.

If you are interested on implementing your own brew of canary tokens, you are welcome to read about methods to implement it in this blog post I wrote.

How to mitigate the risk of secret leakage

Mitigating the risk of having your organization’s secrets stolen remains a well-known, but insufficiently practiced set of activities which include:

  • Enforce strict access controls – Ensure only authorized personnel have access to sensitive tokens.
  • Adopt least privilege principles: – Limit access rights for users to the bare minimum necessary.
  • Regularly rotate tokens – Implement automated systems for managing and updating tokens periodically.
  • Use environment-specific tokens – Limit the impact of a potential breach by using tokens specific to different environments (Dev, Prod, Test, etc.).
  • Integrate monitoring and logging systems – Detect and respond swiftly to unauthorized access attempts (this includes canary tokens).
  • Conduct regular security audits – Perform frequent audits to identify and address vulnerabilities.
  • Educate employees – Train staff on best practices for handling and securing secret tokens.