We are excited to be launching our new Universal Internet DataSet based on new scanning technology that we have been developing over the past two years. We have made several fundamental changes to how we approach Internet scanning, resulting in the best visibility of the Internet. Our new scanning capability sees more than 33% more services than we did previously and 97% more services than competitors.
The best visibility means even better Attack Surface Management. Censys Attack Surface Management (ASM) platform has internally been using this data for about a year, and no actions are necessary for current ASM customers. However, because we have made significant changes to our scanning approach, we are releasing a newly structured dataset to Enterprise customers. In this post, we explain why we’ve made changes to how we scan, the impact on our data, the benefits to our customers, and how enterprise customers can access the new dataset.
We’ve made several fundamental changes to how we perform Internet scans based on both peer-reviewed research and our own experiences scanning.
- Automatic Protocol Discovery. Recent research at USENIX Security shows that most services do not live on their assigned ports. Shockingly, only 3.0% of HTTP and 6.4% of TLS services run on ports 80 and 443, respectively. Further, Izhikevich et al. have shown that the services on non-standard ports are typically less secure.
Izhikevich et al. recently showed that protocol deployment is significantly more diffuse than previously realized. Most protocols run across thousands or tens of thousands of ports rather than on their assigned port.
Despite this, most scanners only look for the IANA assigned protocol on each port.
We’ve added automatic protocol detection for every port that we scan, which allows us to nearly always detect what protocol is running based on the response we receive. Once we’ve identified the protocol, we complete a full protocol handshake with the service to collect full service details. Over all the services we see, Censys has the ability to see 97% of them independent of which port they are running on. This provides Censys customers with full details of the protocols and services running on non-standard ports. Today, more than 66% of our scan results come from unexpected services on non-standard ports.
- Multi-Perspective Scanning. Scanning from a single perspective limits a scanner’s visibility. We’ve recently begun scanning from three service providers in the U.S., Europe, and Asia. Recent research that we helped perform showed that 3 geographic perspectives provide over 99% visibility of the Internet:
Based on recent research by Gan et al., we’ve begun to scan from three perspectives, providing us with 99% coverage of Internet hosts.
- Continuous Refresh and Increased Scan Frequency. Our scan data provides the most current and up-to-date information about the Internet so that our customers do not waste time conducting investigations on stale data. This is particularly important in cloud environments where IP addresses change hands frequently and week-old data means outdated ownership. While our closest competitor is refreshing services on average every 10 days, our new dataset refreshes services on average less than 48 hours.
- Improved Service and Devices Context: We have improved our detection of Software and Operating Systems, and are working on IoT device detection to provide more context about the devices and services in our scan results. We have also adopted the standard Common Platform Enumeration (CPE) format for software and operating systems to make it easier to correlate this with other datasets using the CPE standard. We’ve also switched to using Recog for our service identification, which we will be contributing device fingerprints to moving forward.
What does this mean for Censys Data?
Let us look at our competitive benchmarking statistics which show how we compare against our closest competitors for breadth, depth and frequency of scanning:
How can enterprise customers access the new dataset?
We have already been using the new dataset internally for our SolarWinds investigation, and we are excited to share the dataset via download and Google BigQuery with our Enterprise customers. We are planning to make the dataset available to everyone in the Search UI and API in Q2 of this year. We will continue to add additional features and functionality throughout the year. In particular we are planning to provide new ways to be able to access our historical data via API and UI.
Standby for an exciting year where we will be advancing the state of Internet scanning technology and provide significant new functionality to our community and customers!
On the Origin of Scanning: The Impact of Location on Internet-Wide Scans
Gerry Wan, Liz Izhikevich, David Adrian, Katsunari Yoshioka, Ralph Holz, Christian Rossow, Zakir Durumeric; ACM Internet Measurement Conference (IMC), October 2020
LZR: Identifying Unexpected Internet Services
Liz Izhikevich, Renata Teixeira, Zakir Durumeric; USENIX Security Symposium, August 2021