These servers should not exist!
Now Generally Available for all Censys customers is a new asset type, Web Entities. A “Web Entity” in Censys allows users to treat their web-based assets as a single high-level commodity, grouping hosts and services as part of an organization’s web service ecosystem.
In addition to introducing this new data classification, we have developed novel methods for tracking potential Shadow IT assets within an organization. Shadow IT refers to technology, systems, and applications that employees use without the knowledge or approval of relevant IT departments. Such systems are deployed without proper authorization, either by mistake or in the name of productivity. Shadow IT can take various forms, including:
- The use of personal cloud storage services like Dropbox
- Teams that use project management tools without IT oversight
- Personal mobile devices or laptops that are not approved
- Using personal email addresses for work-related interactions
- Unauthorized use of cloud and web-hosting services
Deploying a server without the oversight of an IT organization will likely result in the server lacking access to the benefits of standard IT automation, such as monitoring and telemetry capabilities, regular system updates, and proper configuration hygiene. This lack of support can leave the system vulnerable to security risks and inefficiencies.
In this post, we will explain a simple technical method we utilize to pinpoint web servers that, by definition, should not exist. The absence of historical context at both the host and DNS layer renders these assets entirely imperceptible by most tools. However, we can gain further insight into potential data exposures by adding historical context to an organization’s attack surface.
But before we can get to the good stuff, we must have a basic understanding of the different views that Censys has of the internet.
The Views of Censys
Censys has two unique but similar views of the internet: the unnamed and the named.
(An unnamed host in Censys)
The “unnamed internet” view encompasses the hosts and services that respond directly from an IP address and react the same whether you ask for it via a hostname or IP address. Many internet services do not have the means for a client to specify the hostname in the request. For example, nothing in the SSH protocol can inform the remote server that you are interested in a particular hostname, so the response will be the same whether you connect to it directly via IP or a name.
(A named host in Censys)
On the other hand, the “named internet” view is the hosts and services that Censys can view independently of the physical IP and are instead referenced to by a name. For services to respond differently to a specific hostname, an exchange between the client and server must specify the name after establishing the connection. This process means there must be some method in the underlying protocol that initiates such an exchange.
“…the ‘named internet’ view is the hosts and services that Censys can view independently of the physical IP….”
Fortunately, two of the most common protocols found on the internet support such a mechanism, albeit for slightly different reasons:
- The HTTP protocol (starting in version 1.1) specifies that a “Host” header must be included with each client request, informing the server of the specific hostname and resource being requested. Without this header, every domain name would need its own dedicated IP address.
- TLS SNI (Server Name Indication) is an extension of the Transport Layer Security (TLS) protocol that allows a client to “indicate” the hostname of the server it is trying to connect to before establishing a secure connection. Without SNI, the server could not determine the correct hostname and associated underlying certificate and would return whatever default certificate the server had configured — this would mean that every SSL certificate would need its own dedicated IP address to function securely.
To summarize, the webserver utilizes SNI to reply with a certificate specific to the hostname, and the HTTP Host header assigns the request to a distinct backend entity, like a file-system directory. People usually refer to this entire process as “Virtual Hosting.”
Many modern web servers like Nginx have advanced configuration options where you can not only serve different directories based on the incoming client headers, but these requests can transparently route to separate listeners and applications, which can vastly change the view of a single host.
Given that most web servers will respond differently based on the client’s request, if Censys only scanned the world using the bare IP address of hosts, we would have a minimal picture of what the internet actually looks like, and our data would be wholly incomplete; this is why we introduced name-based scanning a few years back, which does exactly what is described above for both HTTP and TLS based protocols.
These name-based scans can answer some unique questions that someone may have. For example, with our data, we can easily fetch a report on the number of IPs per name: “last Tuesday, there were 325,484,066 hostnames with only a single IP address, and on the other side of the scale, there was ONE hostname that mapped to 10,733 IP addresses.”
And when you mix this named scan data with historical context, things get even more interesting.
Dead Hosts and Virtual Ghosts
When analyzing an attack surface, it is often a misstep to narrow our focus solely on observable aspects of the present moment, such as the existing state of DNS. Overlooking artifacts of the past can lead to a significant underestimation of the current hidden state of an attack surface.
“…it is often a misstep to narrow our focus solely on observable aspects of the present moment….”
In HTTP, both the Host header and the TLS SNI value are arbitrary strings. Nothing in the protocol specifications states that these values must have a function or even be legitimate. A client can request google.com from a twitter.com IP, and nothing would stop it from doing so. Sure, you might get an error from the server stating that the host could not find data for the requested hostname, but nothing stopped that exchange.
In the same vein, if an administrator removes a DNS record that points to an IP address but maintains the web server configuration that mapped that name to a local resource, then someone with prior knowledge of that hostname belonging to that IP could still access the data as if the DNS record still existed.
Alternatively, if an administrator purchased a new server for their ticketing platform and modified the DNS entry to point to the new server’s IP address, never removing the configuration from the old web server, and then patched a critical vulnerability in the software on the new server, but not the old, an attacker could use historical information to exploit the software on the old host.
In summary, when the DNS record for a hostname is altered to point to a new IP address or removed altogether from DNS, but the previous IP address is still operational with a valid virtual-host configuration for the same hostname, an attacker with historical knowledge of the host can still gain access to the old data on the old server.
In the absence of an established term to describe this method of host analysis, we have informally dubbed such hosts as “Virtual Ghosts,” a play on the term “Virtual Host.“
Virtual Ghost Busters
This “Virtual Ghost” concept might seem similar to dangling DNS or subdomain takeovers at first glance but is vastly different in execution. First, attackers do not take an active role in this other than targeting the host of prior knowledge. Second, this is not a DNS issue but a more DNS-adjacent one — the problem solely exists with poor server configuration hygiene.
Censys wanted to measure the scope of this “Virtual Ghosting” effect on the internet by sampling two name-based Censys scan snapshots, each a week apart, and analyzing all the IP-based hosts that are still serving content for a DNS record that no longer exists (i.e., the DNS record now returns an NXDOMAIN).
We conducted these tests using two datasets, one random sample and one with a more targeted approach.
Our first test input was 50,000 random hosts that had an entry in our named scan database on February 02, 2022, but by February 14, they were gone. This removal could mean two things:
- The host went down, or an administrator stopped the network services.
- Censys could not resolve the DNS name to an IP address.
The hosts we’re most interested in are the ones where the DNS names no longer resolve to an IP address, so we first run through every hostname from our sample and validate that the authoritative nameserver for that hostname is returning an NXDOMAIN. This process reduced the number of potential targets to 8,227 hosts, meaning 41,773 hosts in our sample data still had valid DNS records (but do not have services associated with them).
For each verified DNS record that now returns an NXDOMAIN that was found in our sample, we grab a historical copy of the associated host details and make a note of the following information:
- The IP address that the DNS record used to resolve to
- The old Sha256Sum of the HTTP response body
- The old HTML title from the HTTP response
- The old Sha256sum of the TLS server certificate
We then connect to the IP of the host that used to serve content for that expired DNS record and issue three functionally separate HTTP queries:
- A GET request to the old IP with the HTTP Host header and TLS SNI field set to the no longer existing DNS name.
- GET request to the old IP with a random value added to the beginning of the hostname. For example, if the hostname we target is “search.censys.io,” we would issue a GET request for “Host: $random_string.search.censys.io,” – the output of which is then used to test for potential domain name parking sites or default HTTP handlers.
- GET request without the Host header or the SNI field in the TLS handshake set. We want to ensure that our Host header request response differs from just hitting the bare IP.
Our goal with this specific study was to analyze only “real” sites that don’t just return the default HTTP handlers or wildcard domain parking webpages, so we used the two non-Host+SNI queries in steps 2 and 3 to compare against the query we made in step 1. For example, if the response from queries 2 and 3 looks similar to that from query 1, we don’t consider it a “real” website.
If the hostname and IP pass the validation check above (meaning we consider it a “real” website), we then check the response from the first query, looking for the following criteria:
- Does the response body have the same SHA256SUM as what was in our historical data?
- Does the response HTML title match what was in our historical data?
- Does the server certificate match what was in our historical data?
If all three of these criteria are true, it’s highly likely that the host is still serving content for a DNS record that no longer exists. If only two of the three criteria are true, then it is a medium to high chance, while if only one criterion is found, it’s a low to medium chance this host is serving content for expired DNS records.
Out of the 50,000 random hosts we sampled, there were a total of 694 servers that met at least one of these three criteria and served what we consider a “real” website; 112 of those hosts (16.1%) met all three criteria, while only 52 (7.5%) hosts matched only one.
Below is a table of the breakdown of all three criteria combinations broken down by the number of hosts.
HTTP Body Matches Old Data |
HTML Title Matches Old Data |
Server Certificate Matches Old Data |
Host Count |
% |
YES |
NO |
YES |
1 |
0.1 |
YES |
YES |
NO |
52 |
7.5 |
NO |
NO |
YES |
99 |
14.3 |
YES |
YES |
YES |
112 |
16.1 |
NO |
YES |
NO |
181 |
26.1 |
NO |
YES |
YES |
249 |
35.9 |
|
|
Total |
694 |
|
With our second test, we also wanted to run this same analysis against a non-sampled but targeted dataset. So we grabbed a list of all hostnames and IP addresses from February 02, 2023, that did not show up on February 14 that contained the word “admin” somewhere within the subdomain of the DNS name.
Our goal with this test was to find web administration interfaces that someone may have mistakenly put online and given a valid DNS name but have since been removed only from DNS but not from the web server’s virtual-host configuration. And while not directly accessible, these “virtual ghosts” could still pose a problem when an attacker has prior knowledge of that old hostname and IP address.
Between February 02 and the 14th, we saw 21,502 IP and hostname combinations that were potential “virtual ghosts” (Hostnames seen on the 2nd, not seen on the 14th). After running the NXDOMAIN validation, that number dropped to only 5,101 possibilities. Finally, by running those 5,101 hosts through the same process we ran for the sampled data, 223 hosts were found with one or more “virtual ghost” criteria met. Below is the complete mapping:
HTTP Body Matches Old Data |
HTML Title Matches Old Data |
Server Certificate Matches Old Data |
Host Count |
% |
NO |
NO |
YES |
16 |
7.2 |
NO |
YES |
NO |
18 |
8.1 |
YES |
YES |
NO |
24 |
10.8 |
NO |
YES |
YES |
73 |
32.7 |
YES |
YES |
YES |
92 |
41.3 |
|
|
Total |
223 |
|
Subsequently, we established a local DNS server and integrated all the non-existent DNS names into their corresponding zones, mapping the forgotten DNS names to their original IP addresses. The DNS server allowed us to browse these “virtual ghosts” using any HTTP-capable interface, such as Chromium, and examine whether the process had uncovered any noteworthy findings. Below are some interesting examples of websites that should not be accessible, or as we refer to them, “Virtual Ghosts.”
Expanding the Attack Surface
The data presented indicates that “Virtual Ghosts” are not uncommon. Discovering them is a simple task if one has access to information on every host, service, and website that has ever existed. This reinforces the notion that not all attack surfaces are visible, and historical context is crucial in providing an accurate overview of an organization’s digital footprint.
In light of these findings, we are proud to introduce our new Web Entities feature in The Censys Platform, which automatically tracks and alerts on assets that qualify as “Virtual Ghosts.” – These are hosts that have been disassociated from their DNS names but still host content on the web server. We strongly encourage organizations to incorporate this feature into their security protocols to safeguard against potential threats.
Take the first step in securing your digital assets by incorporating our Web Entities feature into your security strategy today. Contact us to schedule a demo.