A primary challenge of parallel programming is to find
better abstractions for expressing parallel computation and for writing
parallel programs. Parallel programming encompasses all of the difficulties of
sequential programming, but also introduces the hard problem of coordinating
interactions among concurrently executing tasks. Today, most parallel programs
employ low-level programming constructs that are just a thin veneer over the
underlying hardware. These constructs consist of threads, which are an abstract
processor, and explicit synchronization (for example, locks, semaphores, and
monitors) to coordinate thread execution. Parallel programs written with these
constructs are difficult to design, program, debug and maintain. Transactional
Memory was created to simplify parallel programming and relieve software
developers from the difficulties associated with lockbased parallel
programming. With TM, programmers simply mark code segments as transactions
that should execute atomically and in isolation with respect to other code, and
the TM system manages the concurrency control for them. All TM systems use
either hardware-based or software-based approaches to implement the two basic
TM mechanisms: data versioning and conflict detection.
What is Transaction Memory?
A transaction is a form of program execution borrowed
from the database community. Concurrent queries conflict when they read and
write an item in a database, and a conflict can produce an erroneous result
that could not arise from a sequential execution of the queries. Transactions
ensure that all queries produce the same result as if 5 they executed serially
(a property known as “serializability”). Decomposing the semantics of a
transaction yields four requirements, usually called the “ACID” properties—atomicity,
consistency, isolation, and durability. TM provides lightweight transactions
for threads running in a shared address space. TM ensures the atomicity and
isolation of concurrently executing tasks.
In general, TM does not provide consistency or
durability. Atomicity ensures program state changes effected by code executing
in a transaction are indivisible from the perspective of other, concurrently
executing transactions. In other words, although the code in a transaction may
modify individual variables through a series of assignments, another
computation can only observe the program state immediately before or
immediately after the transaction executes. Isolation ensures that concurrently
executing tasks cannot affect the result of a transaction, so a transaction
produces the same answer as when no other task was executing. Transactions
provide a basis to construct parallel abstractions, which are building blocks
that can be combined without knowledge of their internal details, much as
procedures and objects provide composable abstractions for sequential code.
DYNAMIC STM:
Herlihy et al.’s Dynamic STM (DSTM) was the first STM
system that did not require a program to declare the memory locations accessed
by a transaction. DSTM is an object-granularity, deferred-update STM system,
which means that a transaction modifies a private copy of an object and only
makes its changes visible to other transactions when it commits. The
transaction exclusively accesses the copy without synchronization. However,
another transaction can access the original, underlying object while the first
transaction is still running, which causes a logical conflict that the STM
system detects and resolves by aborting one of the two transactions.
An STM system can detect a conflict when a transaction
first accesses an object (early detection) or when the transaction attempts to
commit (late detection). Both approaches yield the same results, but may
perform differently and, unfortunately, neither is consistently superior. Early
detection prevents a transaction from performing unnecessary computation that a
subsequent abort will discard. Late detection can avoid unnecessary aborts, as
when the conflicting transaction itself aborts because of a conflict with a
third transaction. Another complication is a conflict between a transaction
that only reads an object and another that modifies the object. Since reads are
more common than writes, STM systems only clone objects that are modified. To
reduce overhead, a transaction tracks the objects it reads and, before it
commits, ensures that no other transaction modified them.
DSTM is a library. An object manipulated in a
transaction is first registered with the DSTM system, which returns a TMObject
wrapper for the object (as illustrated in the accompanying figure).
Subsequently, the code executing the transaction can open the TMObject for
read-only or read write access, which returns a pointer to the original or
cloned object, respectively. Either way, the transaction manipulates the object
directly, without further synchronization. A transaction ends when the program
attempts to commit the transaction’s changes. If the transaction successfully
commits, the DSTM system atomically replaces, for all modified objects, the old
object in a Locator structure with its modified version.
A transaction T can commit successfully if it meets
two conditions. The first is that no concurrently executing transaction
modified an object read by T. DSTM tracks the objects a transaction opened for
reading and validates the entries in this read set when the transaction
attempts to commit. An object in the read set is valid if its version is
unchanged since transaction T first opened it. DSTM also validates the read set
every time it opens an object, to avoid allowing a transaction to continue
executing in an erroneous program state in which some objects changed after the
transaction started execution.
The second condition is that transaction T is not
modifying an object that another transaction is also modifying. DSTM prevents
this type of conflict by only allowing one 16 transaction to open an object for
modification. When a write-write conflict occurs, DSTM aborts one of the two
conflicting transactions and allows the other to proceed. DSTM rolls the
aborted transaction back to its initial state and then allow it to reexecute.
The policy used to select which transaction to abort can affect system
performance, including liveness, but it should have no effect on the semantics
of the STM system.
The performance of DSTM, like other STM systems,
depends on the details of the workload. In general, the large overheads of STM
systems are more expensive than locking on a small number of processors.
However, as the number of processors increases, so does the contention for a
lock and the cost of locking. When this occurs and conflicts are rare, STMs have
been shown to outperform locks on small benchmarks.
Transactional Memory Today:
Before everybody gets too excited about the prospects
of TM, we should remember that it is still very much a topic of research. First
implementations are becoming available, but we still have much to learn. The
VELOX project (http://www.velox-project.eu/), for example, has as its goal a
comprehensive analysis of all the places in an operating system where TM
technology can be used. This extends from lock-free data structures in the
operating-system kernel to high-level uses in the application server.
The analysis includes TM with and without hardware
support. The VELOX project will also research the most useful semantics of the
TM primitives that should be added to higher-level programming languages. In
the previous example it was a simple tm_atomic keyword. This does not
necessarily have to be the correct form; nor do the semantics described need to
be optimal. A number of self-contained STM implementations are available today.
One possible choice for people to get experience with is TinySTM
(http://tinystm.org). It provides all the primitives needed for TM while being
portable, small, and depending on only a few services, which are available on
the host system.
Based on TinySTM and similar implementations, we will
soon see language extensions such as tm_atomic appear in compilers. Several
proprietary compilers have support, and the first patches for the GNU compilers
are also available (http://www.hipeac.net/node/2419). With these changes it
will be possible to collect experience with the use of TM in real-world
situations to find solutions to the remaining issues.
Conclusion
Transactional memory by itself is unlikely to make
Multicore computers readily programmable. Many other improvements to
programming languages, tools, runtime systems, and computer architecture are
also necessary. TM, however, does provide a time tested model for isolating
concurrent computations from each other. This model raises the level of
abstraction for reasoning about concurrent tasks and helps avoid many insidious
parallel programming errors. However, many aspects of the semantics and
implementation of TM are still the subject of active research. If these
difficulties can be resolved in a timely fashion, TM will likely become a
central pillar of parallel programming.
Comments
Post a Comment