Data Protection Teams Have an Internet Visibility Problem
Data protection programs have gotten good at watching data move. You classify it, set retention on it, answer DSARs about it, and run DLP across email, SaaS, endpoints, and cloud storage to catch it leaving through a sanctioned channel. Every one of those controls answers the same question: where did our sensitive data go?
None of them answer a different question that regulators care about just as much: where is our sensitive data already reachable, right now, with little to no movement required?
A misconfigured database, a forgotten FTP server, an unauthenticated message broker, or an exposed AI integration doesn’t trip a DLP alert. No one emailed a file. No sharing policy was violated. No endpoint sensor agent saw a copy. The data just sits on an Internet-facing service that your DLP stack was never pointed at.
Some examples found by Censys ARC:
- Exposed pub/sub message queues containing financial data
- Publicly accessible DICOM servers, a specialized system that stores and retrieves medical images
- Open FTP servers containing gigabytes of corporate data
Breach notification law rarely cares how exposure happened, only that unauthorized parties could reach regulated data. That exposure is in scope for you whether or not you own the app or infrastructure.
This is the gap Censys fills. DLP monitors movement. Censys discovers exposure. You need both, and below we show you exactly how to find the second one. Complete with queries you can run today.
Modern DLP is focused on control points and user-behavior, movement and sharing. It’s time to expand that.
Building a Partnership Between Data Protection and Exposure Management
“Hold on!” you may say. “This simply doesn’t belong to me; this is for exposure management, vulnerability management, AppSec, and [five other teams].”
In many ways, you’re right. They open the tickets, they close the services.
But they answer “what is exposed.” Only data protection can answer “what regulated data does this exposure put at risk, and does it trigger a notification obligation?” An exposed Redis instance isn’t your host to patch, but if it holds session tokens or customer identifiers tied to a regulated workflow, the consequences are yours to own. The partnership is the point: Exposure/Vuln Management finds the door; you decide whether what’s behind it is reportable.
The goal is collaboration. Together you can answer questions neither team can answer alone:
- Which Internet-facing assets are most likely to contain regulated data?
- Which exposures create the greatest privacy risk?
- Which findings should be prioritized for remediation?
- Which exposures could trigger breach notification obligations if compromised?
What follows is a starter kit: five exposure classes, each with a Censys query to make the abstract concrete. Emphasis on “starter”: shape these queries into your own! Start by scoping each to your organization by appending your ASN, netblock, certs, DNS names, etc. Example additions include: and host.autonomous_system.name: "YOUR_ASN" ; and host.ip: "125.8.0.0/13" ; and host.services.cert.names: "google.com" ; and host.dns.names: "google.com"
1. Exposed file transfer: HR records on a server everyone forgot
FTP is old, boring, and still online at thousands of organizations because a payroll or benefits hand-off five years ago was never decommissioned. The directory still holds onboarding packets, tax IDs, and benefits exports. DLP will never see it.
Find FTP services that don’t require TLS, the ones most likely to be both legacy and leaking in cleartext:
host.services.protocol = "FTP" and not host.services.ftp.implicit_tls = true

Pivot to the higher-fidelity finding: file shares that expose their contents to anyone, with the filenames visible:
host.services.labels.value = "OPEN_DIRECTORY" and host.services.endpoints.open_directory.files.name: "payroll"
Swap “payroll” for the terms that map to your regulated data, such as tax, benefits, export, backup, .sql. Censys surfaces the file names in an open directory, so a data protection analyst can judge sensitivity before anyone touches the host. That’s the whole workflow: classify the exposure, attribute it to your org or a processor, and escalate with context (Is encryption observed? Is it still needed? Does it create a notification obligation?).
FTP Exposure Brief: Examining the 55-Year-Old Protocol Used by Millions
Censys ARC has the definitive measurement of FTP on the Internet.
2. Internet-facing databases: the classic breach, still happening
The largest accidental exposures in history have been open databases. The risk isn’t theoretical and it isn’t dated.
The naive query finds all of a database type. The useful query finds the ones answering to strangers. MongoDB returns a master handshake to an unauthenticated probe when access control isn’t enforced. That boolean is your high-signal finding:
host.services.mongodb.is_master.is_master = true

For Redis, it’s the same logic, different protocol. An instance that answers PONG to an unauthenticated PING took the command without credentials:
host.services.redis.ping_response = "PONG"
Broaden to every database-role asset Censys recognizes, then layer on your scope and a “recently appeared” review cadence to catch instances stood up during cloud migrations:
host.services.labels.value = "DATABASE"
Treat each hit as a high-priority privacy finding regardless of whether DLP ever alerted. Pivot from the host to what else runs on it (host.service_count, co-located services) to understand blast radius.
3. Unauthenticated message queues: the live event stream nobody secured
Message brokers don’t look like data stores, which is exactly why they get exposed. But a pub/sub layer in a financial or healthcare app can carry transaction events, account identifiers, fraud alerts, and session telemetry. Live! An external subscriber can watch the operational heartbeat of the business.
NATS publishes whether it requires auth, so the exposure is unambiguous — auth_required = false is a definitive unauthenticated finding, not an inference:
host.services.nats_io.auth_required = false
For MQTT, a broker that returns an accepted connection status to an anonymous probe took the connection without credentials:
host.services: (protocol = "MQTT" and mqtt.connection_ack_return.return_value: "Accepted")
ZeroMQ exposes its socket handshake; a publisher socket (PUB) reachable on the open Internet means anyone can subscribe to whatever it’s streaming:
host.services: (protocol = "ZEROMQ" and zeromq.handshake.socket_type = "PUB")
The data protection question for each: which business process owns the stream, and could those messages contain personal, financial, or regulated data? Engineering owns the broker; you own the consequence.
Unauthenticated Message Queues are a Problem
Censys ARC investigates unauthenticated message queues, finds chaos.
4. Exposed MCP servers: AI integrations that advertise a path to your data
This is the newest and least-watched class, and it’s the strongest argument for putting data protection on the Internet-exposure map. Model Context Protocol servers connect AI assistants to tools and data. Databases, file stores, ticketing, CRM, finance, and more.

Crucially, the protocol doesn’t require authentication by default. As of late April 2026, Censys ARC counted 12,520 Internet-accessible MCP services across 8,758 IPs. The number has skyrocketed since, to over 2.5 million MCP web endpoints.
An exposed MCP server advertises its own capabilities. Censys parses the tool and resource metadata, so you can read what an external client could discover:
host.services.endpoints.mcp.tools.name: *
Now make it a data protection query. Find MCP servers advertising tools or resources that name sensitive data stores:
host.services.endpoints: (mcp.tools.name: "database" or mcp.tools.name: "customer" or mcp.resources.uri: "file://")
Run the same hunt against web properties to catch MCP exposed over HTTP(S) front ends:
web.endpoints.mcp.tools.name: "query" or web.endpoints.mcp.resources.content: *
The metadata alone (tool names, resource URIs, even embedded prompts) can reveal which regulated systems an AI integration can reach. A DLP policy can’t protect data when a newly shipped AI tool publishes a direct route to it. This is a data governance finding, and it’s the kind of exposure that didn’t exist for your program eighteen months ago.
Finally, if you really want to cast a wide net, you can search for any exposures bearing the Censys ARC label AI, and do the hunting yourself.
host.services.labels.value = "AI" or web.labels.value = "AI"
MCP Servers on the Internet
Censys ARC surveils over 2.5 million MCP web endpoints.
5. Healthcare and other regulated-data systems
Some exposures map straight to a regulated data category, which makes them automatic critical findings. Medical imaging and clinical systems are the headline example. Censys provides convenient labels for identifying these.
host.services.labels.value = "MEDICAL" or host.services.labels.value = "MEDICAL_DEVICE"

Pair that with a hunt for the lightweight web viewers that imaging teams stand up “temporarily” and forget. Take the same query and add the label for login pages and web UIs on healthcare-scoped infrastructure:
and host.services.labels.value = "LOGIN_PAGE"
Censys ARC has identified everything from unauthenticated access to medical images (DICOM), to exposed patient record logins (EMR/EHR).
The Global State of Internet of Healthcare Things (IoHT) Exposures on Public-Facing Networks
Censys ARC’s State of Internet of Healthcare Things (IoHT)
What These Five Examples Have in Common
File transfer, databases, message queues, AI integrations, medical systems.
Wildly different technologies, one truth: sensitive data doesn’t only leak through the channels DLP was built to watch. It leaks through forgotten infrastructure, vendor-managed systems, operational telemetry, and brand-new AI tooling. Your DLP stack is pointed inward at movement. Not outward, at the vast open Internet.
Anthropic released MCP in 2024, and already there are 2.5 million public endpoints. The boom in AI-assisted building and tools isn’t slowing down.
ASM answers what is exposed. Data protection answers what regulated data this exposure affects, and whether it’s reportable. Neither answer is complete alone. The queries above are how you start producing your half.
The shift is small and the consequence is large. Stop asking only “do we have a policy that says this shouldn’t be public?” Start asking “is there an Internet-accessible system exposing this right now?”
And then go run the query.

