Skip to main content

Transferring large binary data with web services

Posted by adhirmehta on June 11, 2010 at 4:14 AM PDT
 
 
Prerequisites
1)      Basic understanding about web service
2)      Knowledge of base 64 encoding
3)      Knowledge of MTOM
 
Refer resource section for information on these topics.
 
Introduction
Web service has been evolved from simple request-response mechanism to object oriented style support and now large data transfer. Large data may be of various types like binary, images, database record export in XML or other format etc. Normally, the large data transfer is avoided with the web service however in unavoidable circumstances, if it is required to do; you have few options which will be discussed later in the article.
 
There is couple of big challenges during the large data transfer through web service  
 
1)      How do you make sure that you meet the performance of the application and it should not deteriorate?
2)      Memory constraint. Huge files should not be hold in the memory completely.
 
We’ll discuss these challenges later in the article. Now, let’s discuss the various data transfer strategies/options to transfer the binary data.
 
[Note: This article only provides conceptual knowledge.]
 
Binary data transfer strategies
 
In the following sections I’ll discuss the various data transfer strategies with their advantages and disadvantages.
 
1) Embedding the binary data in the SOAP envelop
The straight forward option to transfer the binary data is to convert the binary data into base 64 format and embed the base 64 encoding into soap envelop itself. This approach is also called as “by value” approach as the binary data is embedded in the XML document itself. This approach has following advantages and disadvantages.
 
Advantages:
  • Simple to use
  • This approach gives applications the ability to process and describe data, based only on the XML component of the data
  • It can be positively considered for small binary data transfer.
 
Drawbacks of embedding binary large data into soap envelop
1)      As binary data is part of soap envelop, it requires large memory to hold it.
2)      Base64 encoding increases the size of the binary data by a factor of 1.33x of the original size
3)      It slows down the application performance.
 
Here is the API which can be referred/used for implementation
 
Implementation API:
 
Following sections in the article discuss other approaches also called as “by reference approaches” to overcome these issues. Sending binary data by reference is achieved by attaching pure binary data as external unparsed general entities outside the XML document and then embedding reference URIs to those entities as elements or attribute values. This prevents the unnecessary bloating of data and wasting of processing power.
 
 
2) FTP or Network files system
One of the good solutions to transfer the large data via web service is to move out the binary large data from the soap envelop and keep only the reference to binary data as a part of soap response.
To implement this approach, the data can be placed either on FTP server or somewhere on shared location on network and the path to data (file) can be given in the web service request or response.
 
Following section discuss the advantages and disadvantages of this approach.
 
Advantages
1)      Data is not part of the soap envelop so it does not require huge memory to hold it.
2)      Since there is no encoding and decoding requires as in case of base 64, it improves the application performance.
3)      You can have separate network line dedicated for FTP which can free up normal application network line resulting in performance improvement.
 
Disadvantages
1)      It requires FTP server.
2)      Additional maintenance of FTP server
3)      Sometime, may not be best suited for small applications.
 
Here is the API which can be referred/used for implementation
 
Implementation API:
·        Apache commons - http://commons.apache.org/net/api-release/org/apache/commons/net/ftp/FTPClient.html
 
3) Message Transmission Optimization Mechanism (MTOM)
 
MTOM (SOAP Message Transmission Optimization Mechanism) is another specification that focuses on solving the "Attachments" problem. MTOM is actually a "by reference" method. The wire format of a MTOM optimized message is the same as the SOAP with Attachments message. The most noticeable feature of MTOM is the use of the XOP:Include element, to reference the binary attachments (external unparsed general entities) of the message. With the use of this exclusive element, the attached binary content logically becomes inline (by value) with the SOAP document even though it is actually attached separately. This merges the two realms by making it possible to work only with one data model. On a lighter note, MTOM has standardized the referencing mechanism of SwA. The following is an extract from the XOP specification.
 
At the conceptual level, this binary data can be thought of as being base64-encoded in the XML Document. As this conceptual form might be needed during some processing of the XML document (e.g., for signing the XML document), it is necessary to have a one-to-one correspondence between XML Infosets and XOP Packages. Therefore, the conceptual representation of such binary data is as if it were base64-encoded, using the canonical lexical form of the XML Schema base64Binary datatype. In the reverse direction, XOP is capable of optimizing only base64-encoded Infoset data that is in the canonical lexical form.
 
The client application sends SOAP Message that contains complex data in Base64Binary encoded format. Base64Binary data type represents arbitrary data (e.g., Images, PDF files, Word Docs) in 65 textual. A sample SOAP Body with Base64Binary encoded element is as follows:
 

 <
mtom:ByteEcho>
 <
mtom:data>AVBERi0xLjYNJeLjz9MNCjE+DQpzdGFyNCjEx0YNCg== mtom:data>
mtom:ByteEcho>
An MTOM-aware web services engine detects the presence of Base64Binary data, < mtom:data> in the example, and makes a decision – to convert the Base64Binary data to MIME data with an XML-binary Optimization Package (xop) content type. The data conversion, results in replacing the Base64Binary data with an element that references the original raw bytes of the document being transmitted. The raw bytes are appended to the SOAP Message and are separated by a MIME boundary as shown below:
 

 
  <
mtom:ByteEcho>
  <
mtom:data> mtom:data>
  
mtom:ByteEcho>
 

--MIMEBoundary000000
content-id: <1.633335845875937500@java.net>
content-type: application/octet-stream
content-transfer-encoding: binary
The raw binary data along with the SOAP Message and the MIME Boundary is send over the wire to the Producer. The Producer then transforms the raw binary data back to Base64Binary encoding for other processing. This approach has couple of advantages:
 
Advantages
  1. Effective Transmission: Base64Binary encoded data is ~33% greater than raw byte transmission using MIME. Therefore MTOM reduces data size by converting Base64Binary encoding to raw bytes for transmission.
  2. Data is transferred using streaming approach.
Disadvantages/Constraints
  1. HTTP-specific (although it could easily be adapted to other > MIME-based or MIME-like transports)
  2. Detecting MTOM messages for dispatch is additional overhead.
  3. You may not be able use MTOM if all the parties involved in data transfer do not support MTOM specification.
Here is the API which can be referred/used for implementation
 
Implementation API:
 
Plain HTTP
One of the big genuine question is why just don’t use the http to post and stream the data. It doesn’t require any soap envelop etc. And the answer is yes this can be done provided you are not looking for any web service related benefits.
 
Conclusion:
I’ve seen different methods of transferring the binary large data through web service however each method has its own advantages and disadvantages. Now, the question is raised which one is better and answer is: depending upon your application environment, you will have to choose one of the options.
However in my opinion, here is the order of precedence of selection.
1)      Use base64 encoding if binary data is very small and it does not substantially impact the performance.
2)      Use FTP or network share if binary data is large.
3)      Prefer MTOM if all the involved parties in data transfer support the MTOM as it is one of the best way to stream the content.
4)      Use plain HTTP if it is just upload/download of the file.
 
 
Resource and reference
 
Apache commons codec - http://commons.apache.org/codec/

 

Comments

Sounds Great

This looks great, and I totally agree. Just a few input from my side.
  • 1) Instead of using base64 for large or small binary data why not to use XOP which keeps generated xml size small.
  • 2) We can also use SAAJ
  • 3) We can also go for creating a programmable chunk download servlet. which works on http and give benifit of streaming download instead of http progressive. which is again an optimization technique incase of very large binary data.

  • rest the article is very good and the author convey his message very beautifully. I like it...
    Keep up good work.. :)

Thanks for your input. Here is my thoughts on them

1) XOP - Yes, it can be used provided all the involved parties supports it. MTOM is also based on XOP.

2) SAAJ - SAAJ had few disadvantages and MTOM overcome them.

3) Chunk download - As I mentioned in the HTTP section in the article, you can certainly go for it provided you are not required to have web service features/advantages in your application.