Skip to content
New Report: Get your copy of The 2024 State of the Internet Report! | Download Today
Blogs

Automated Hunting

Summary

  • Censys data is incredibly rich with details that often go unnoticed without a trained eye. This guide highlights the value of this highly structured data and provides insights into how we use it internally to find suspicious infrastructure.
  • We are releasing a free utility called Censeye, which can discover useful pivots in Censys host data and (optionally) crawl related hosts using data from those discoveries.

Pivot for Profit

After years of working with Censys data, you notice patterns approaching internet analysis. Through many investigations, your toolbox grows, filled with new utilities, terminology, and techniques that help you shift smoothly from one clue to the next. Whether identifying a piece of software never seen before or proactively tracking the internet-connected infrastructure of a suspected criminal, it often started with a single thread—one clue that, when pulled, began to unravel a broader story.

You may often see extremely generic-looking hosts with only a spattering of services that are only sometimes as they seem; when contrasted with the entirety of the internet, many of these hosts are, in fact, reasonably unique. Take the following HTTP response:

 

HTTP/1.1 200 OK
Date: <REDACTED>
Server: Apache/2.4.41 (Ubuntu)
Content-Length: 0
Content-Type: text/html; charset=UTF-8

At first glance, it is a typical HTTP response from an Apache web server running on Ubuntu. If we filter hosts by that exact Server header value, we find over 420,000 hosts with the same setup—not very unique. However, when we analyze the entire response using a SHA-256 hash and limit our search to port 80, those 420,000 matches are narrowed down to just 1,961 results.

This also goes the other way around; something that looks unique is often quite generic. JARM fingerprints are frequently thrown around as indicators for specific types of malware, but the reality is those fingerprints don’t represent the malware itself; they represent the underlying TLS API the malware runs on top of. So when you’re handed a JARM fingerprint of an alleged malicious server and see it matches tens of thousands of running hosts (like Metasploit JARM), you should take it with a grain of salt.

The overarching point is that the devil is in the details regarding internet scan data. When something very specific is found on a limited number of hosts, it often (but not always) means a connection can be made. But identifying those very specific things can be challenging, and it’s easy to overlook some things our brains are used to seeing.

An example of something very unique that I could easily see myself looking past with no other context is this TLS certificate:

Without any other information, I’d say this is a certificate for Microsoft Bing, a very large organization that would have many services on many hosts. But if I take the time to search for hosts with this TLS subject, I can see that there are only 58 hosts with that exact organizational unit (“Microsoft IT”) and only 11 hosts with that specific subject in its entirety:

 

If we pull back the curtain a little further, we’ll see that the “Microsoft IT” organization isn’t found anywhere near a Microsoft-owned network, and there are even four verified Cobalt Strike services (on two hosts) presenting a “Microsoft IT” certificate.

 

So then we go look at one of these “Microsoft IT” organizational units with a Cobalt Strike service, and we’re greeted with ten different services, one of which is an HTTP server on port 80 with the HTML title of “nmps error”:

That looks interesting,” you may say to yourself, and click into the host details and pivot into finding other hosts with the same HTML title, only to be greeted with two matching hosts in the same ASN but on different subnets:

So now we’re left wondering what the heck is “nmps,” is and why it is found on only two hosts, one of which is obviously malicious in nature. So again, we open up the host details page and look at the entire response body to figure out what this thing is since searching the web for “nmps” only lets us know that “New Mexico Professional Surveyors” is apparently a thing:

Still, no good information here would help me conclude what “nmps” is. The HREF seems to be truncated or maybe corrupted, so we take a snippet of the body and paste it into GitHub to see if it’s part of an open-source project, specifically the “404 not found,power by” text:

And now we have an answer. “nmps” is a fork of “nps”, which is (I quote) a “lightweight, high-performance, powerful intranet penetration proxy server” and, judging by the repo stats, a very well-known one.

 

Now that one host with “nps” not running Cobalt Strike, 47.108.57.1, is starting to look a lot more suspicious with this new information. If we head over to VirusTotal and search for the IP, we see that 14 vendors have flagged this IP as malicious:

Over in the community tab of the VT result, multiple users reported that there was a Cobalt Strike beacon found on port 80 only 19 days ago:

To verify, we look at the historical data associated with that host and found that, yes, until around October 26th, 2024, a Cobalt Strike beacon did, in fact, exist on this host, just like the other server with that “NPS” error does currently.

Unfortunately, internet scan data cannot tell us for sure whether these two hosts are related, but we do know that, with a few pivots, we were able to identify previously unknown malicious infrastructure.

Reporting & Automation

This is a task that we end up doing a lot of here at Censys: using one suspicious input to find even more things that look equally as suspicious. And if you don’t know which specific fields in our data are suitable to pivot into, then this task can be pretty cumbersome when done manually. But pivoting is king. So we went to automate some of these simple tasks for us to use internally, which turned out to be very useful – so much so that we’ve decided to make this tooling accessible to the broader internet community.

Censys scan data is what I would consider “highly structured” in that every bit of information is broken out into its own individual fields – one thing branches to another into a hierarchical tree, which lets you understand the ownership of one element to the next. This structured format lends itself very well to pivoting from one datum to another.

 

Introducing “Censeye”: an auto-pivoting reporting tool

Censeye is a terminal-based tool we hacked out over the course of a few weeks, which started with a straightforward premise: take a single structured host result, and for each field, tell me how many other hosts on the internet have the same thing. To achieve this, we simply parsed a scan result and generated a CenQL expression for all of the key/value pairs. Then, for each generated query, we ran an aggregate report where the breakdown field is the number of IPs. The goal was to find interesting pivots we may have overlooked when manually looking at a host.

We very quickly realized that not all search terms were good for finding pivots, so we started creating a list of permitted terms for report generation. We also added some simple logic that would start highlighting specific search terms that look like they may lead to more promising things. For example, if a key/value is found on multiple hosts but is less than some configurable max, we wanted those search terms to be front and center. Below is the default output of a report generated for the IP address “114.55.250.233”.

Here, we see a table with three columns – the first column is the number of unique IP addresses that matched the key of column 1 with the value of column 2. The bolded rows show search terms that had more than one but less than (the default) 120 matches. Finally, it displays all of the “interesting” search terms that were found along the way. If the terminal supports it, all the displayed data is interactive and will navigate you to the Censys search result for that specific element.

There is some additional manipulation that is done on the backend for each input; for example, some of the TLS results will actually generate several reports in different ways:

  • A report on the exact TLS key/value: services.tls.certificates.leaf_data.subject.common_name:”example.com”
  • Another report looking for (not services.tls.certificates.leaf_data.subject.common_name:”example.com”) and “example.com”

The point is to take a hostname from the certificate and see if it shows up anywhere else on the Internet.

In short, the Censeye workflow consists of a six-step process. Starting with an input IP address, the system retrieves host data from Censys, extracting and refining key-value pairs from service details to retain only the most relevant fields. An aggregate report is generated from Censys for each selected key-value pair, where IP address count serves as the breakdown metric. A semi-unique threshold is then applied to identify “interesting search terms.” If a depth parameter is specified, the utility fetches a list of matching hosts, extracts their IP addresses, and reuses them as input in the initial step. This cycle repeats until the depth counter reaches zero. Nothing fancy.

The no-nonsense reporting mechanism has proven incredibly useful on its own, as we have found many new things we may have overlooked without it. In fact, an everyday use case for us is to feed hosts that have already been labeled as a command-and-control (c2) server to find connections to other hosts that are not labeled “c2.” In other words, we ask whether there are known C2 servers that have links to unknown infrastructure.

By default, Censeye will read from stdin, so it’s easy to use other tools, like the Censys CLI, to seed it with hosts. In this case, we want to look at hosts that are already labeled as a “c2”, but when generating the host reports for each field, exclude any hosts labeled as “c2” from the totals. This is done with the “–query-prefix” flag:

~$ censys search 'labels=c2' | \
jq '.[].ip' | \
python censeye.py --query-prefix 'not labels=c2'

For each of the input IPs, the same tabular report is generated:

You may notice that several of these rows have a host count of zero; this is because of the query prefix argument we set above – this tells us that those rows were only found on hosts that already have the “c2” label applied, leaving us with three “interesting” search terms that are found on non-c2 hosts:

While we’re on the topic of certificates – when Censeye sees a TLS certificate fingerprint that is found on only a single host (meaning the certificate was seen only on the host you are viewing), it will attempt to determine if that certificate has been seen on any other host in the past, and if it has, that information will be made available. Take this alleged MoonBounce C2 server as an example:

This one certificate is only observed on this host currently, but it has popped up on over 17 different hosts since 2020. The main report also highlights this by showing the number of (unique) historical hosts in parenthesis. This is just a quick way to determine if a lone certificate on a host has been seen before somewhere else. It should be noted that history, and how far in the past you can see, is bound to the permission levels of your account.

Since we already indicate what we consider “interesting” search terms and have a strict list of fields that it will follow, with limitations on how much data we can pull, we have enough guardrails to allow the tool to work independently. So, instead of fetching a single report, looking at the results of found search terms, and then running more reports based on those searches, you can tell Censeye to do this for you.

To illustrate this, let’s look at a single IP address (5.188.87.38) that has recently been seen acting as a C2 server for the Stealc info-stealer. If we view the host on Censys, we see that there doesn’t appear to be much to go on:

Despite this, when viewed under Censeye, we can immediately see that the SSH service on port 22 has a fingerprint that matches 45 other hosts.

We can either click into this manually and look for ourselves, or we can supply a new flag that tells the tool to generate reports on all of the results from the “interesting search terms.”. Censeye can act like a crawler of sorts – at a depth of zero (the default), it will only show the reports on the queried host. However, if we increase this depth, the tool will fetch matching hosts for the “interesting search terms” it found on the original host and run the same reporting process on those.

Here, we modify our original arguments with “–depth 1.” For any “interesting search terms” found on 5.188.87.38, the tool will fetch a list of hosts matching that and generate the same report for each. Since we’ve already seen that the only shared search term on this host is the SSH fingerprint, at this depth, the tool will only show information about the other hosts running this same fingerprint.

The output now is a little different; we’re greeted with many more tabular reports for all of the matched hosts, but this time, we have a new result displaying the “pivot tree,” which is a visual representation of the hosts we discovered and how the tool arrived there.

In the above output, we see several new IP addresses, all running the same SSH fingerprint as our original IP, 5.188.87.38. We will also notice that we now have two “interesting search terms” displayed at the top, one being the original SSH fingerprint used to discover these hosts and a second (semi) unique SSH fingerprint, 6278464b, running on seven different hosts, including a single host, 179.60.149.209.

If we look at this new host in Censys, we’ll see that the original SSH fingerprint is running on port 22, while this new SSH fingerprint (along with seven other hosts) is bound to port 2222.

So, let’s fast-forward here a little bit and set our “–depth” value to ‘3’. This means the tool will attempt to find interesting Censys search terms in three iterations. All of these hosts have one or more connections with the parent; in this case, a whole bunch of shared SSH fingerprints.

In the above example, 193.29.13.183 ran the SSH fingerprint bd613b3b, the same as port 22 on 185.232.67.15. The host 185.232.67.15 also had 6278464b running, the same fingerprint found on 179.60.149.209 port 2222. This port ran f95812cb, the same fingerprint found on our original (Stealc C2) host, 5.188.87.38.

Note: the “(via: …)” part of the pivot tree is the CenQL term used to find that host.

In our final search (at depth 3), several new search terms were found that could provide more information:

It should be noted that all of this works with historical host data, too; for example, if I knew there was a Cobalt Strike server on a host that was either taken down or the service was removed, and you know the date it was last seen on the host, you can supply the ‘–at-time’ flag like so:

% python censeye.py 103.234.98.97 --at-time 2024-09-25

Suppose the tool finds a historical certificate (as described previously, a certificate that is not seen running currently but has been in the past), and the “–depth” flag is supplied. In that case, Censeye will use the historical host data where that certificate was found to find potential pivots in the current day. Both the host report and the pivot tree will let you know when this happens by displaying the date that these results were fetched:

WARNING!

  • This tool can use up many queries, meaning looking at a single host could drain your entire monthly quota.
  • The tool is also not very fast; a report is generated for each (configured) field on a host, meaning several API calls happen in the background. When you utilize the depth flag, you exponentially increase the number of API calls made.

However, we have implemented several caching layers to reduce the number of API calls over a single session. These are all stored within a user-defined “workspace” (which can be redefined with the “–workspace” argument). My (personal) workflow is to keep a different workspace for each “thing” I am investigating. This way, I have a local version of the data I originally fetched and can review it later and incrementally jump into different pivots the tool finds.

Outro

A work-in-progress README on the Censeye Github repository will have more information as it is still under active development.

Attack Surface Management Solutions
Learn more