Skip to main content

Apache Thrift’s Role in Distributed Applications

Posted by manning_pubs on August 1, 2013 at 1:56 PM PDT






Apache Thrift’s Role in Distributed Applications

By Randy Abernethy, authors of The Programmer's Guide to Apache Thrift


Apache Thrift helps programmers build high performance cross-language services to address the growing need for multilanguage integration. In this article, based on chapter 1 of The Programmer’s Guide to Apache Thrift, author Randy Abernethy shows you how Apache Thrift fits into the overall landscape of distributed applications. Save 42% on The Programmer's Guide to Apache Thrift with Promotional Code mlthriftjn, only at manning.com.

Distributed applications are applications which have been broken down into subsystems which can be deployed on separate computers, yet still collaborate to accomplish the purpose of the application. When subsystems are autonomous and export flexible APIs they are often called services. Compared to large monolithic systems, distributed applications benefit from smaller more focused processes which are easier to scale out, reuse, maintain and test, particularly when using a language well suited for the scope of the subsystem. Distributed applications generally use three key types of inter-process communications:

  • Streaming—Communications characterized by an ongoing flow of bytes from a server to one or more clients.
  • Example: An Internet radiobroadcast where the client receives bytes over time transmitted by the server in an ongoing sequence of small packets.

  • Messaging—Message passing involves one-way asynchronous, often queued, communications, producing loosely coupled systems.
  • Example: Sending an email message where you may get a response or you may not, and if you do get a response you don’t know exactly when you will get it.

  • Remote Procedure (RPC)—Call systems allow function calls to be made between processes on different computers.
  • Example: An iPhone app calling a service on the Internet that returns the weather forecast.

Scalability
Scalability describes a system’s ability to increase, or scale, its workload. There are two common means of scaling a system, vertical and horizontal. Vertical scaling is often referred to as scaling up and horizontal scaling is often referred to as scaling out.
Vertical scaling, in simple terms, involves buying a faster computer. Vertical scaling places little burden on the application and was the traditional way to increase capacity. Modern CPUs are no longer increasing in performance at the rates they once did, making it expensive, or impossible in many cases, to scale vertically.

Horizontal scaling involves adding more computers to a pool or cluster of systems which run an application collectively. Horizontally scaled applications take advantage of multiple CPUs and/or multiple systems to grow performance. Applications must be designed for distribution across multiple computers to take advantage of horizontal scaling. Extreme examples of horizontal scaling allow applications to harness thousands of CPUs to perform demanding tasks in very short periods of time. Apache Thrift is a tool particularly well suited to building horizontally scaled distributed applications.

These three communications paradigms can be used to tackle just about any inter-process communication task. Let’s look at how Apache Thrift fits into each model.

Streaming

Streaming systems deal in continuous flows of bytes. Streaming servers may transmit streams to one or more clients. Some streams have temporal requirements, for instance streaming movies that require frames to arrive at least as fast as they are viewed. Some streams are more batch oriented, for example, a background file transfer. Streaming systems are typically designed for types of communications where data transfers flow in one direction and are of large or undefined size.

Figure 1 Streaming systems are often purpose built to meet performance needs

Streaming systems are frequently low overhead in nature. They tend to be large-bandwidth consumers and, therefore, strive for efficiency over ease of use. In many cases, multicasting is used to allow the server to send a single message to multiple clients. Streaming systems may use data compression mechanisms to reduce network impact as well.

Apache Thrift does not typically play a role in streaming data services. However, control APIs used to subscribe to streams and perform other setup and configuration tasks may be a good fit for Apache Thrift RPC. Apache Thrift also supports one-way messages, which may suffice to stream information from a client to a server in some applications. Apache Thrift serialization may also be useful in streaming solutions that require cross-language support.

Messaging

Messaging is a purely asynchronous communications model allowing queued communications to take place independently of the speed of the producer or consumer. Full-service messaging systems support reliable communications over unreliable links with features such as store and forward, transactions, multi-cast and publish/subscribe. Systems such as WebsphereMQ, ActiveMQ, RabbitMQ, and JMS fit into this category.

Lightweight messaging systems are more appropriate for messaging at high data rates with minimum latency as a design imperative. Systems such as MIT’s LCM, and ZeroMQ and commercial systems such as TIBCO Rendezvous implement a lightweight framework supporting many standard messaging features with performance as an overriding design goal. Such high-speed messaging systems strike a balance between the performance of streaming systems and the features of heavier-weight messaging systems.


Figure 2 Messaging systems can make use of Apache Thrift serialization

Apache Thrift is not a message-queuing platform but it can fulfill the serialization responsibilities associated with cross-language messaging. For example, if you are interested in using RabbitMQ to send messages between a C++ and a Java application, you may need a common serialization format. User defined message types can be described in Apache Thrift IDL and then compiled to support serialization in any Apache Thrift language. For example, a C# program could serialize a C# object and then send it as a message through the messaging system, whereupon an Objective-C application could receive the message and deserialize it into a native Objective-C object.

RPC systems, under the covers, send messages between clients and servers to make function calls and return results. For this reason it is possible to implement RPC systems on top of messaging systems. For example, Apache Thrift offers an experimental transport which layers on top of ZeroMQ, allowing Apache Thrift RPC to operate over the ZeroMQ messaging platform.

Remote procedure calls

Making function calls to complete the work of a program is fairly natural in most languages. Remote Procedure Call systems allow function callers and function implementers to live in different processes, as demonstrated by our sample RPC application earlier in this chapter. Systems such as Apache Thrift, JavaRMI, COM, DCE, SOAP, and CORBA provide RPC style functionality.


Figure 3 Apache Thrift allows clients to call functions hosted in remote servers

Unlike messaging systems, the client and the server in an RPC exchange must be up and running at the same time. The client waits for the server’s response in many RPC environments, just as if the client were calling a local function. This couples the client to the server in a much closer way than that of a messaging system. However, SOA platforms, such as Apache Thrift, lend flexibility to the client and server relationship in several ways.

Some Apache Thrift languages support asynchronous client interfaces. This allows the client to call the server and then go about other business, checking back later to see if the response is available. This is similar to the way a client and server would interact over a messaging platform.

Apache Thrift also supports one-way messages. One-way messages are fire-and-forget style communications: the client calls the one-way function with the appropriate parameters and then goes about its business. The server receives the message but does not reply. This is similar to the way single-direction messages are sent in a messaging environment without the queuing.

Summary

Choosing the right communications platform often involves a combination of RPC, messaging, and streaming-style solutions. Thrift is well suited to such hybrid environments, easily adapting to an assortment of languages and communications platforms. Thrift provides a rich RPC framework and can fulfill the serialization needs associated with messaging and streaming applications as well.




Here are some other Manning titles you might be interested in:



Making Sense of NoSQL
Dan McCreary and Ann Kelly



Mondrian in Action
William D. Back, Nicholas Goodman, and Julian Hyde



Solr in Action
Trey Grainger and Timothy Potter