Hunting Botnets With CursorAI, GreyNoise, Censys, and Censeye

Threat Hunting Module

NoteThe code in this post can be found here.

Introduction

Not everyone is a fan of AI, but we’ve been known to use it here as a bootstrapping tool, which can be beneficial. As researchers, when it comes to code quality, sometimes getting meaningful data faster is better than solid code quality. But if you can get both, that is usually for the best. When we were recently granted access to CursorAI here at work, we all wanted to try it out and, in our case, use it for some general threat hunting capabilities.

Automating Ideas

This morning, we discussed Operational Relay Box (ORB) networks and how we could better identify them in Censys. One idea was to correlate scan data with GreyNoise data to find potential fingerprints for hosts that may be active in an ORB network (i.e., have the hosts we’re scanning been seen by GreyNoise’s honeypots?). 

Who is GreyNoise? They are a fantastic company that specializes in network honeypots. Censys knows what ports are open, and GreyNoise knows who is looking at those ports.

We can start by defining our problem by understanding the protocols that an ORB network may run, which are proxy-like protocols such as SOCKS. However, when we search for that protocol in Censys, we’ll see it is very common, as in over two million hosts common: 

But, one of the most defining characteristics of an ORB is not that it is running a proxy but that it is running that proxy in a specific type of network: residential networks. These networks are the least likely to be considered malicious as they are incredibly hard to both firewall and trace if you want everyday people to be able to use your server resources.

Writing with CursorAI and Prepping Data

While Censys does not currently have a method of filtering for only residential networks, we can use two labels to narrow things down a little: “SOHO” and “IOT,” where SOHO means “Small Office / Home Network” and “IOT” means “Internet of Things” In other words, these are hosts that are more likely to be running in a residential network based on the types of software we find. Then, if we combine these two labels with the SOCKS protocol, we are now looking at under 10,000 host results in Censys, which is a little better than our original 2.5 million.

But if we look over some of those results, we will notice a bunch of autonomous systems in there that we know (for a fact) are not residential networks, such as Amazon and Akamai, which means we can filter these out, too.

And after some manual pruning of ASNs in our query, we come up with something like this:

A query that results in just under 4,000 hosts, a query that is much more manageable than our original two queries!

So, now that we have a starting query, our next move is to create a tool to pull this data from the Censys API and join those results with GreyNoise data. The general idea here is that if a host runs a SOCKS proxy in a residential network, and GreyNoise has seen these hosts making malicious or suspicious requests to their sensors, we may be looking at an active participant in an ORB network. 

Our job, now, is to quickly develop a method for pulling data from Censys, feeding it to GreyNoise, and showing us the results. That’s where CursorAI comes into the picture.

Note: At the time of writing, the new Censys Platform Golang SDK is still under development and has not been publicly released. Still, I will use the development version for this post, so my code may look different from yours once it’s public.

We started by reviewing the API documentation at GreyNoise. Before this morning, we had only used GreyNoise here and there for small lookups, so we had to get acquainted with the API and terminology. We found that you could do bulk IP lookups using their “Multi-Quick API Endpoint,” which just gives us booleans on whether the IP is within one of their datasets. For hosts they had data for, we could use their “IP Context API Endpoint.

Luckily, the GreyNoise documentation included examples of these two endpoints using the Python “requests” library. Since we were using Golang (not Python), we pasted these examples into CursorAI and said, “Give me a nice Golang package using these Python examples.”

At first, the AI-generated code didn’t know the response structure, so we created some requests in Python to receive legitimate responses. After pasting the output JSON data back into CursorAI, saying, “This is the format of the different responses.” Surprisingly, CursorAI returned the _exact_ Golang structs I needed to unmarshal these responses. It also generated an example main() function that we could use to query multiple IP addresses in GreyNoise, which was helpful.

Next, we needed to define the Censys functionality. Since we had already been working with the new Golang SDK for the Censys Platform, we fed CursorAI a little command-line utility we had previously written to search hosts in the terminal so it could get a general idea of how to use it. Along with the code, we informed Cursor about some of the intricacies, such as setting up a search query and defining which fields we want returned in the search results. In our case, we only needed the host.ip field.

Once we saw that Cursor had figured out how to use the Censys SDK properly, we gave Cursor the general idea:

“We want a tool that takes a Censys search query, generates a list of IP addresses that matched, sends those in 1,000 host chunks to the GreyNoise ‘Multi-Quick API Endpoint’, and for each of those responses that had a match, fetches the detailed host data from the GreyNoise IP Context API. Oh yeah, and make it so we can get the data in JSON or a ‘pretty table’ format.” 

We also informed CursorAI about our affinity for the logging API Logrus and desire to use Cobra Command for argument parsing, which the AI happily obliged by updating all of our code.

We went back and forth with CursorAI on a few prompts, with little updates such as “please validate IP addresses,” “need a flag to only output matches,” “GreyNoise only supports 1,000 host batch requests,” and “Can you be a little less messy with your code structure?” Overall, we spent about 20 minutes on this, and we got what we wanted:

Analyzing, Censeye, and Pivoting

Out of the 4,000 SOCKS hosts in our results, 46 had been seen by a GreyNoise Sensor and are great targets for further study. But that’s not that great. We sought something more definitive, so we took a slightly different approach.

This time, we started by looking for all hosts (over all networks) running SOCKS services in very high port ranges. We then looked for results that gave us a reasonable number of hosts. For example, this query looking for SOCKS services on ports between 50,000 and 60,000 resulted in just under 1,000 hosts.

When this query was run through this little tool, it was disappointing to see that out of those 1,000 hosts, only 50 were seen by GreyNoise. But then we took those 50 malicious hosts, fed them to our cool open-source utility called Censeye, and let it run. The idea was to iterate over these (known malicious) hosts and try to extract search terms that could be used to find even more hosts. Below is a screenshot of the 42 “interesting” queries that Censeye identified:

 

An Interesting Pattern

While most of these Censeye results only had two or three other hosts per match, there were a few queries here that stood out:

Many of those 50 malicious hosts had what looked like an unauthenticated BusyBox shell. I tweaked the query slightly to be more generic and ran it in Censys search to see that 2,469 hosts matched.

So, we ran this BusyBox Censys query with this AI-generated tool to look in GreyNoise for any host with this BusyBox telnet banner. What we found was much more interesting than the previous results.

We found that GreyNoise had identified 750 out of the 2,000 BusyBox hosts (30%) as members of botnets or as having been involved in brute-force and exploitation attacks! 

This was more than an anomaly; it made us think there was something here. It could mean that all of these are compromised, or it could mean that only the ones that matched with GreyNoise were compromised, but the presence of this BusyBox shell meant one of two things: 

  1. A specific device out there with vulnerable software that runs a BusyBox shell is getting popped by a botnet.
  2. This BusyBox shell is installed post-compromise by some botnet variant.

It is likely the former because Mirai targets many devices built around a BusyBox system, so the chances of some BusyBox shell running in some manner on some crappy old insecure IOT device is highly likely. Still, we want to see if there were any common artifacts between these GreyNoise-identified BusyBox shells and the type of device they run on.

When looking at the port distribution of these BusyBox shells, there is a prevalent pattern, which is that almost 80% (around 600 out of the 750) of them are listening on TCP port 45634

This was all very strange, so we started going through these hosts on port 45634 and manually looking at a small subset to see if they had any commonalities in the software/firmware they were running. Many were running TP-Link routers, based on observations of TP-Link web administration interfaces, a few Hikvision devices, and a handful of Cambium Networks ePMP servers (all of which have significant vulnerabilities). 

…or, should we say, they were running web administration interfaces. Every time we manually checked these hosts, we found that many were only running the BusyBox shell telnet service and nothing else. We would then look at the host’s history and find a similar pattern; they all used to run some vulnerable service, but then those services would disappear, and over time, the BusyBox shell would appear.

For some devices that have been online for a long time, this pattern keeps repeating: a (vulnerable) web administration interface comes online, then disappears; after a while, a BusyBox shell is seen. Then the shell vanishes, and the web interface reappears, only to disappear again. It’s almost like some poor soul is rebooting their little routers out of frustration, completely unaware that they’re exposed to the internet and getting compromised repeatedly.

Our opinion is that this botnet variant automatically scans for and hacks various old network devices, including TP-Link, Hikvision, Cambium, Dahua, and GrandStream. Sometimes, this results in the web server crashing, but a backdoor is installed on port 45634.

If we look at the top five ports with this BusyBox shell and do not limit ourselves to just the hosts in GreyNoise, there are currently 1,139 hosts (which may be compromised) online.

There is a chance that these shells appear automatically in a crashed state (as in, if the system crashes, a debug port is started), but I can’t confirm this either way. 

Censys is a network scanner; at the end of the day, we can only observe what’s visible. We can enrich our data by joining it with other sources to uncover patterns, but it’s rare that we can say something is definitively malicious. The patterns we found during this exploration strongly suggest an infection, but there are always unknown variables outside our visibility. That said, Censys is unique in that it can detect these services on those ports. Most scanners only hit a narrow set of ports and services, whereas Censys can intelligently predict where services run, regardless of the port.

All in all, we’re living in weird times. CursorAI let us quickly prototype this whole idea, pulling from both Censys and GreyNoise to get insights into what both products knew without too much effort. And I can’t be mad about that. When combined, Censys, Censeye, GreyNoise, and CursorAI, we were able to find what looks like a potential botnet fingerprint. The tool we built here during this process has become a permanent addition to our toolboxes.

The code referenced in this post can be found in the Censys Research GitHub repository and Censeye.

AUTHOR
The Censys ARC Research Team

Censys ARC is a team of elite security and threat researchers dedicated to identifying, analyzing, and shedding light on Internet phenomena that impact our world. Using Censys’ Map of the Internet — the world’s most comprehensive, accurate, and up-to-date source for Internet infrastructure — ARC investigates and measures the entirety of the public Internet to share critical and emerging threat intelligence and insights with organizations around the world.