Skip to content
Make Your Internet Intelligence Blossom | Get 20% off Censys Search Teams or Solo annual plans with code Spring24 by 5/31 | Save Now
Blogs

Unmasking Deception: Navigating Red Herrings and Honeypots

 

Introduction

Here at Censys, our mission is to craft the ultimate blueprint of the web, map all the strange anomalies, and unearth where the wild things roam. We scan the internet indiscriminately and do an excellent job, too. And when you look at this data all day, like us, you tend to become accustomed to the strange little quirks (like commercial honeypots) you often encounter and become desensitized to the extremely odd things.

One reality that quickly emerges from this data is realizing the internet’s abundance of deception. I’m not referring to the realm of social media and the discussions of the earth’s roundness or the existence of non-existent birds, but the sudden understanding that the things running the internet aren’t always what they claim to be. There are no laws or regulatory mechanisms to compel internet-connected hosts to disclose their true identity or purpose:

You can generate an SSL certificate for google.com, and virtually no one is in your way to prevent you from deploying it. You can create a reverse DNS entry for your IP address that resolves to facebook.com., and nobody will bat an eye. Even tweaking your Apache web server to make it claim it’s running Nginx won’t trigger a raid by the internet police.

When you aim to paint a picture of what the internet looks like,  you must let go of preconceived truths and approach everything you encounter with a healthy dose of skepticism. However, most hosts are generally unlikely to deceive us about their true nature, as crafting and maintaining these falsehoods requires some effort.

What is a honeypot and how does it work in cybersecurity?

But there are particular classes of hosts where the actual goal is to deceive, either for security through obscurity or for the analysis of potentially malicious network traffic. These hosts are called Honeypots: hosts and networks deployed to gain insight into the types of attacks happening on networks, usually used in conjunction with IDSs and firewalls to refine an organization’s security posture.

These systems purposely lie about what type of service and software is running to trick would-be hackers into attempting to exploit the server. Even Amazon has recently jumped into the honeypot game with their MadPot project, and companies like Greynoise have been operating in the commercial honeypot sector for years. A quick GitHub topic search for honeypots yields over 500 projects, some with thousands of stars, and has been a popular security mechanism and hobby for decades.

Some honeypots are better than others, but usually, each one has a specific scenario in which it excels. But the reality is that poorly designed honeypots can be very noisy and easy to spot, while decently designed honeypots can often be found with a bit of scrutiny. In contrast, the best-designed honeypots will never be spotted.

For example, the specialized honeypot software GasPot attempts to emulate a legitimate Automated Tank Gauging service (ATG) (used for monitoring fuel levels) but is easily unmasked with little scrutiny.

Using Censys Search commercial honeypot tools to attract and detect malwares

(An actual ATG service)

(A GasPot (Fake) ATG service)

Three indicators differentiate GasPot and an actual ATG device:

  • GasPot has a limited number of diagnostic codes that it will accept, and for any code it does not understand, it will return the error code “9999FF1B”.
  • GasPot formats the timestamps in the payloads differently than real ATG devices. For example, GasPot formats them as “MM/DD/YYYY HH:MM”, whereas an actual ATG device formats its timestamps like this: “Nov  8, 2022 15:45”
  • Real ATG devices use CRLF (“\r\n”), while GasPot primarily uses newlines (“\n\n”) due to the code in the following screenshot

The GasPot code that generates newlines instead of CRLF

With that known, it’s reasonably easy to use Censys to find hosts running this GasPot honeypot server simply by searching for ATG services not using newlines instead of CRLF: services: (service_name=ATG and banner=”*\n\n\n\n*”)

And when you search for these GasPot services, you will notice that the majority of the results have hosts with all sorts of “interesting” and uncommon features and classifications of services that are often not found running together in the real world.

In the GasPot result screenshot above, many hosts have four or five different database technologies that are functionally identical (MSSQL, MySQL, Postgres, etc.). We also see services commonly associated with everyday web applications running alongside IoT and SCADA services. To top it all off, many of these ATG servers live in AWS, which, to my knowledge, doesn’t have direct access to physical tanks of gasoline. It’s not the best representation of reality.

What are red herrings?

Network scanners like Censys will record information from a service exactly as presented by the host, and on top of the raw data, we will augment the host details with information about the running services and software using labels and CPEs. The logic behind finding and applying these software and service labels is, for the most part, a simple process involving regular expressions and pattern matching using both internal and open-source data. And for most hosts on the internet, this works perfectly fine.

So when we were made aware of a new set of hosts that people were talking about on social media that attempted to not only lie about who they were but seemingly try to overload network scanners with false positives, I wasn’t surprised as we’ve witnessed similar things before.

On September 20, 2023, Censys started observing around 50 hosts with a unique and chaotic characteristic: in the HTTP response, these hosts included a 37,213-byte Server header (customarily used to identify the running server) with hundreds of different software names.

Over the next few weeks, we saw the number of hosts with this data increase dramatically, growing from three to six thousand hosts daily. By September 30th, we saw over 27,252 unique hosts presenting this huge and obnoxious server header.

More interesting is where these hosts were located (geographically and AS-wise). At the time of writing, all hosts exist in the autonomous system AMAZON-02, one of the largest AWS networks. But, at the start of this event, two other ASs were seen with these hosts: AMAZON-AES (AS14618) and BJ-GUANGHUAN-AP (AS55960), both of which stopped these specific services nine days later, on September 29th. Below is a (log scaled) graph of each AS found with this disruptive service and the number of hosts.

Beijing Guanghuan Xinwang Digital is the Internet Service Provider that Amazon partners with to legally operate AWS in China, meaning the IP blocks this AS announces are owned and operated by Amazon, but the data centers and transit are controlled by Xinwang Digital.

Given that you must have a valid Chinese business license to open an account in AWS China, the mere existence of these hosts points to a China-based operation. Interestingly, they attempted to mask this by moving everything to AMAZON-02, located primarily in the United States, late last month. With the sheer scale of this incident, it points to a very well-planned experiment with foreign, corporate (possibly state-based or even educational) backing. Since deploying services into AWS provides anonymity, and every one of these servers seems to be structurally similar without any other indication of who owns them, it is hard to tell precisely who is behind this.

Targets

We wondered whether Censys was being explicitly targeted or if there were other reasons these hosts existed. Ultimately, these hosts “tricked” our scanner into applying many (technically correct software labels (because that’s what the server told us)). Still, from a user standpoint, they were annoying and potentially deceptive.

We started by looking at the data that triggered these red herrings. At a cursory glance, all of the data seems to be just random server names that may have been aggregated from random hosts on the internet, along with defining multiple HTML <title> tags in the response body:

The reality is that these data points are not random, nor were they gleaned from host data. Each server header value seen in the response header and each HTML title found in the body directly corresponds to a rule in the open-source vulnerability scanner Project Discovery’s Nuclei.

NUCLEI

Nuclei templates are YAML-encoded rules that describe how to look for, validate, and run exploits on internet-connected hosts. Many of these rules will define where to look for something and regular expressions to determine if an exploit should be attempted and whether the exploit succeeded.

In this case, the creator of these noisy services parsed out specific sections of every web-focused Nuclei template to create a single service that would trigger every Nuclei-known web vulnerability.

For example, one of the HTML titles found in the data is “<title>Flowchart Maker & Online Diagram Software</title>”, which directly corresponds to the “matchers” configuration section of the Nuclei template “http/cves/2022/CVE-2022-1713.yaml“:

Nuclei’s matcher section contains the keyword “part, which defines in what part of an HTTP response this data can be found and then defines a set of rules to match within that part of the HTTP response. So if the Nuclei rule defines a “part: body, the matching text will be placed into the HTML body on the server. The same general rule applies to “part: title and “part: header.

Furthermore, Server headers like “compaqhttpserver” can be found in the Nuclei template configuration file “http/technologies/fingerprinthub-web-fingerprints.yaml
What’s funny about this data is that all the generated noise (even if it’s a Cookie header “matcher” definition) is stuffed as one giant line in the HTTP response Server header.

Even more amusing is that Nuclei uses Mustache templating, which can substitute strings encased between “{{“ and “}}” with dynamic data. For example, in the Nuclei definition for CVE-2012-0394, we see that it attempts to match data based on a Mustache-defined variable:

    matchers-condition: and
    matchers:
      - type: word
        words:
          - '{{result}}'

Since the creator of this noisy service did not consider this, we can see these raw variables on the hosts. This confirms the source of all this data.

RECOG

While the services running the Nuclei-derived 37k byte server header were shut down, the hosts continued operating with a different data set across thousands of ports. Below is a graph depicting the total service count on the top 10 hosts in AS55960 that initially had the 37k server header. Here, we can see that two hosts, 54.223.45.228 and 52.80.98.93, deviated significantly in the number of services created between September 20th and October 2nd, 2023, the former doubling its services overnight from 4,000 to 8,000 services.

These new services had a completely different fingerprint than the originals;

Given that these specific services were attempting to trigger rules in known security software, defining these as “honeypots” seemed like a stretch, as it does not require any interaction to function correctly. Instead, we have given these hosts the label “tarpit”, as they can slow down the analysis of a set of hosts once encountered. At the time of writing, over 47,000 hosts with over 190,000 services fell within this “tarpit” category.

Looking at the host 52.81.84.146 back on September 23rd, 2023, we can see that many services had the data described in this post. But if we look at a more recent day (October 5th, 2023), the number of services on this host has drastically reduced.

September 23rd, 2023

October 5th, 2023

These leftover services (above screenshot, on the right) do not match the indicators described earlier in this post. Instead, they are running a similar technique, but this time, seemingly using a different data source than before to generate these huge payloads. Instead of using Nuclei to create this false data, this other data seems to be derived from Rapid7’s Recog fingerprinting framework, which Metasploit uses to fetch software information from services:

Every entry in these server headers can correlate with an example field in Rapid7’s Recog fingerprint database. For example, the string “Agranat-EmWeb/R5_2_4”, as seen in the above screenshot, is ripped right out of an example in http_servers.xml.

So it looks like the developer of these services simply parsed out the example sections from web-based Recog fingerprint definitions and stuffed them all into a server header on the host. The same can be said for the multitudes of HTML titles found in the body of the responses, all seemingly derived from examples in the fingerprint definitions.

However, these Recog-derived servers have existed within our data for ages, so the working theory is that an organization in China that has been doing these types of setups for a while modified their service generator to include Nuclei-derived data. And since those services came and went within a few weeks, one might surmise that the results weren’t cutting it, so the Nuclei-sourced services were discarded, but the Recog method was kept running.

Unveiling Honeypot Deception: Insights into Network Security Tactics

After looking at these two (nuclei and Recog based) derivatives, we discovered several similar hosts that were not particularly tied to this most recent event but had comparable attributes: many different keywords in an HTML body that attempt to trigger different pattern-based rule engines. These other variants didn’t seem to source the data from any popular repository we could find; instead, we found a single GitHub repository that included several datasets strikingly similar to what we found running on some hosts.

This repository, with a name we cannot say out loud, includes several directories in the Chinese language that include data that looks much like these hosts. Whether this is the true source of this data is unknown, but it does seem to reference another product called “NSFocus Advanced Threat Hunting System”

“NSFOCUS Advanced Threat Hunting System (“ATH” for short) uses deception defense technology (next-generation honeypot technology) to accurately trap attack behaviors and provide clues to attackers’ intrusion activities.”

It even contains a more suspicious directory, and when translated from Chinese, says the following:

“The sample contains a large number of search keywords that contaminate cyberspace asset search engines.”

And when you compare the contents of ywkXErYz.html to 175.178.174.46 on October 5th, they match up.

 

ywkXErYz.html

HTTP Response body

So, we can confidently say that in this specific case, we are looking at a set of data crafted specifically for messing with online scanning databases like Censys. But, with these, there are only around seven of these hosts online at the time of writing, and they do not seem to have anything to do with this latest event as they all seem to live in completely different autonomous systems.

Conclusion

Was this incident an intentional “attack” against Censys? It appears to be more complex than that; it could be attributed to a few different possibilities. Firstly, these services could be viewed as somewhat naïve and straightforward honeypots. Alternatively, they might be services deliberately designed to trigger known vulnerability scanners into perceiving everything as vulnerable.

Considering that Metasploit, the widely utilized open-source penetration testing and exploitation tool, relies on Recog for software fingerprinting, and given the recent trend of researchers employing Nuclei for vulnerability testing, the latter scenario seems increasingly plausible. It suggests that someone, or something, may be attempting to deceive vulnerability scanners into generating alerts for a wide array of targets.

Another plausible theory is that an entity is systematically assessing the capabilities of online network scanners such as Censys. This could include identifying limitations, timing, and timeout mechanisms and even exploring potential methods to overload backend servers or undermine the reliability of the data scanners like Censys obtains and stores.

But our money is on many of these hosts are just very basic DIY honeypots entrapping the lowest common denominators. No matter the reason, we will closely monitor servers like these and tag them as necessary.

 

About the Author

The Censys Research Team
Attack Surface Management Solutions
Learn more