- Spoofed domains and brand impersonators are still a prevalent problem, and one of the difficulties is timeliness in finding the impersonators.
- By using “fuzzy matching” with Censys data and BigQuery, organizations can proactively find and block domain impersonators, thus protecting their users.
The Internet is a vast place, and there can be a lot of pitfalls for users. Technology has made it easier for malicious actors to spin up fraudulent websites quickly and easily, and threat actors can use these spoofed domains or brand impersonations to trick users into forfeiting sensitive information. Threat actors often also target specific organizations by pretending to be the organization itself, thus tricking unsuspecting employees and gaining access to internal infrastructure.
Organizations often have tools that enable them to proactively protect their employees from this sort of attack, but that requires both knowing the domains and then blocking them as quickly as possible. In other words, time and knowledge are both critical to reducing harm from domain/brand impersonators.
However, with Censys, BigQuery, and a bit of help from the Levenshtein Distance, this problem becomes as simple as a query and allows you to blocklist suspicious domains faster.
At Censys we constantly scan the Internet, which means we are able to find a lot of information quickly, including potential impersonators. Examining all of this data through Search can be challenging, though, especially if you are trying to filter on multiple different data fields. As such, for this use case we’ll utilize BigQuery, Google’s serverless data warehouse, to find suspicious domains. A primer on how to search through Censys data via BigQuery is linked here.
Using Levenshtein’s to Examine Different Aspects of a URL in BigQuery
Since Levenshtein Distance is sensitive to small changes in the strings, we’ll tokenize and examine different parts of the URL, specifically the full URL and the domain. It is possible to break down these queries even further to look at subdomains specifically, but we only look at these two iterations of the URL for simplicity. Moreover, we remove the TLD information, because it is trivial for an attacker to purchase an alternate TLD, but keeping it in the comparison can drastically change the results.
Thus, the following query queries the IPv4/IPv6 address of a host, all of it’s dns.names, and partitions the dns.names into a URL without a TLD and a domain with a TLD, and then computes the Levenshtein’s Distance algorithm on a scale of 0 to 1 (0 is no match, 1 is exact match). This query only examines instances where the Levenshtein’s algorithm outputs 0.8 or higher (and is not 1), but this threshold can also be modified for your use.
As we can see, there are a number of interesting urls/domains that are worth further investigation or blocking. However, the analysis does not need to stop here.
We can append additional Censys data about these hosts to help filter even further. A slight modification to this query will append Autonomous system name, location data, and certificate issuer to the results, which could allow faster identification of suspicious infrastructure. For example, the output of this query shows a number of hosts located in the Proofpoint ASN, which may not be notable as Bank of America could be a Proofpoint customer. However, there are also a number of other results in different ASNs that have certificate issuers that are different from Bank of America’s homepage (Entrust, Inc.). These results could be worth blocking or digging into further.
This screenshot shows how additional metadata from Censys can be added to more quickly filter out legitimate use cases.
This write-up is meant to be a jumping point for your own investigations, and can be further modified to your organization’s own needs. To find out more about how to use BigQuery with Censys, check out our help docs, and also check out more about BigQuery. We hope that by showing how to combine BigQuery and Censys to fuzzy match the phishers, we can empower your organization to protect users more quickly!