Thrift is a software library and a set of code
generation tool which was developed at the Facebook Office at Palo Alto,
California, to expedite development and implementation of scalable and
efficient backend services. The primary goal of thrift is enable efficient and
reliable communication across programming languages by abstracting the portions
of each language that tend to require the most customization into a common
library that is implemented in each language. This is done by allowing the
users to define the data types and service interfaces in a common Interface
Definition Logic File (IDL File) which is supposed to be language neutral file
and it generates all the necessary code to build Remote Procedure Calls to
clients and servers. This report explains the design choices and implementation
level details and also tries to demonstrate a sample Thrift Service.
The whole concept of Thrift stemmed out from the fact
that a new direction was required to tackle the resource demands problems for
many of Facebook's on-site applications, which couldn’t be addressed by staying
within the LAMP framework. LAMP is the acronym for Linux, MySQL, Apache and
PHP. When Facebook was being laboriously designed, it was done from ground up
using this LAMP framework. By 2006 Facebook was widely accepted all over the
world as the social networking site and consequently its network traffic also
grew giving rise to the need for scaling its network structure for many of its
onsite applications like, search, ad selection and delivery and event logging.
Scaling these operations to match the resource demands
was not possible within the LAMP framework. In their implementation of creating
many of these services like search, event logging various programming languages
had been selected to optimize for the right combination of performance, ease
and speed of development, availability of existing libraries etc. Also a large
portion of the Facebook's culture has always preferred to choose the best tools
and implementations over the standardizing on any one programming language and
begrudgingly accepting its inherent limitations. Most of the programming
languages either suffered from subpar performance or constrained data type
freedom. Given all these technical challenges and design choices, the engineers
at Facebook were presented with a herculean task of building a scalable,
transparent and high performance bridge across various programming languages.
Thrift Design Features
The primary idea behind Thrift is that it consists of
a language neutral stack which is implemented across various programming
languages and an associated code generation engine which transforms a simple
interface and data definition language into client and server remote procedure
call libraries. Thrift is designed to be as simple as possible for the
developers who can define all the necessary data structures and interfaces for
a complex service in a single short file. This file is called as Thrift
Interface Definition Logic File or Thrift IDL File. The developers identified
some important features while evaluating the technical challenges of cross
language interactions in a networked environment.
Types:
A common type system should exist across all the
programming languages without requiring the need for the developers to write
their own serialization code. Serialization is the process of transforming an
object of one type to another. For example if a programmer has written an application
implementing a strongly typed STL map for a Python dictionary. Neither
programmer should be forced to write any code below the application layer.
Dictionary is a data type in Python which allows sequencing a collection of
items or elements using keys. It is very similar to 'Associative Arrays'.
Transport:
Each language must have a common interface to
bidirectional raw data transport. Consider a scenario where there are 2 servers
in which, one is deployed in Java and the other one is deployed in Python. So a
typical service written in Java should be able to send the raw data from that
service to a common interface which will be understood by the other server
which is running on Python and vice-versa. The Transport Layer should be able
to transport the raw data file across the two ends. The specifics about how
this transport is implemented shouldn’t matter to the service developer. The
same application code should be able to run against TCP Stream Sockets, raw
data in memory or files on disk.
Protocol:
In order to transport the raw data, they have to be
encoded into a particular format like binary, XML etc. Therefore the Transport
Layer uses some particular protocol to encode or decode the data. Again the
application developer will not be bothered about this. He is only worried
whether the data can be read or written in some deterministic manner.
Versioning:
For the services to be robust they must evolve from
their present version. They should incorporate new features and in order to do
this the data types involved in the service should provide a mechanism to add
or delete fields of an object or alter the arguments list of a function without
any interruption in service. This is called Versioning.
Processors:
Processors are the ones which process the data streams
and accomplish Remote Procedure Calls. Thrift allows programmers to develop completely using
thrift's native data type rather than using any wrapper objects or special
dynamic types. It also does not require the developer to write any
serialization code for transport. The developer is given the freedom to
logically annotate their data structures in Thrift Interface Definition Logic
File (IDL File), with minimal amount of extra information necessary to tell the
code generator how to safely transport the objects across languages.
Structs:
A thrift struct defines a common object to be used
across languages. A struct is essentially similar to a class in object oriented
programming languages. A Thrift struct has a strongly typed field with unique
field identifiers. The basic syntax for Thrift struct is very similar to the
structs used in C. The fields in a Thrift struct may be annotated with unique
field identifiers unique to the scope of the struct and also with optional
default values. The concept of field identifiers can be omitted also and this concept
of field identifers was introduced strictly for versioning purposes.
This is how a Thrift Struct looks like,
struct Example
{
1: i32 number =10,
2: i64 bignumber,
3: double decimals,
4: string name= “NB”
};
{
1: i32 number =10,
2: i64 bignumber,
3: double decimals,
4: string name= “NB”
};
As you can see the fields inside the Thrift struct are
labeled with unique field identifiers.
Facebook Thrift Services
Thrift has been employed in a large number of
applications at Facebook, including search, logging, mobile, ads and the
developer platform. Two specific usages are discussed below.
Search
Thrift is used as the underlying protocol and
transport layer for the Facebook Search service. The multi-language code
generation is well suited for search because it allows for application
development in an efficient server side language (C++) and allows the Facebook
PHP-based web application to make calls to the search service using Thrift PHP
libraries. There is also a large variety of search stats, deployment and
testing functionality that is built on top of generated Python code.
Additionally, the Thrift log file format is used as a redo log for providing
real-time search index updates. Thrift has allowed the search team to leverage
each language for its strengths and to develop code at a rapid pace.
Logging
The Thrift TFileTransport functionality is used for
structured logging. Each service function definition along with its parameters
can be considered to be a structured log entry identified by the function name.
This log can then be used for a variety of purposes, including online and
offline processing, stats aggregation and as a redo log.
Thrift has enabled Facebook to build scalable backend
services efficiently by enabling engineers to divide and conquer. Application
developers can focus on application code without worrying about the sockets layer.
We avoid duplicated work by writing buffering and I/O logic in one place,
rather than interspersing it in each application. Thrift has been employed in a
wide variety of applications at Facebook, including search, logging, mobile,
ads, and the developer platform. We have found that the marginal performance
cost incurred by an extra layer of software abstraction is far eclipsed by the
gains in developer efficiency and systems reliability. Finally Thrift has been
added to Apache Software Foundation as the Apache Thrift Project , making it
open source framework for cross-language services implementation.
Comments
Post a Comment