Dryad and DryadLINQ

Dryad and DryadLINQ Screenshot
  • Rating:
  • Version: 1.0.1411.2 Academic
  • Publisher: www.microsoft.com
  • File Size: 28.62 MB
  • Date: Mar 04, 2010
  • License: Freeware
  • Category:
Dryad and DryadLINQ Download
Free Download Dryad and DryadLINQ 1.0.1411.2 Academic

Dryad is a high-performance, general-purpose distributed computing engine that simplifies the task of implementing distributed applications on clusters of computers running a Windows operating system. DryadLINQ allows developers to implement Dryad applications in managed code by using an extended version of the LINQ programming model and API. This paper is a general introduction to Dryad and DryadLINQ. Although Dryad can run on top of various cluster technologies, this paper is limited to Dryad on a Windows HPC Server 2008 cluster.

Distributed computing is an increasingly important part of software development. At the single-system level, manufacturers are improving performance by adding increasing numbers of CPU cores to the motherboard, and several technologies take advantage of the parallel processing abilities of modern GPUs. At the other end of the spectrum, large-scale Internet services use parallel processing technology to distribute computations involving petabytes of data across clusters of thousands of computers.
A distributed software must be implemented such that different parts can execute concurrently on different processors. Depending on the application, the different processors could be on the same motherboard or on multiple computers organized in a cluster. In general, dividing an software into concurrently executing parts represents a difficult problem. There are two basic approaches:
* Task-parallel computing is a variant of multi-threaded computing that assigns different tasks to different processors.
* Data -parallel computing distributes the data for a task across the available processors and operates on the data concurrently.

The Dryad distributed computing engine from Microsoft is designed to support data-parallel computing for use in many important types of applications, including a variety of data-mining applications, image and stream processing, and some scientific computations. For these applications, distributing the data and the computations across a cluster is an impactful way to process large volumes of data.
However, data-parallel computing on a cluster of computers poses a number of challenges. For example:
* Cluster-based computations must manage thousands of processes and allocate resources across thousands of individual computers.
* Members of the cluster are commodity computers, some of which can be expected to fail during the course of the computation.
* Programming models that most developers are familiar with-and that have the best tools and documentation-are designed for applications that run sequentially on a single computer, not as distributed applications on a cluster.

Dryad is a high-performance general-purpose distributed computing engine for running distributed applications on various cluster technologies, including Windows HPC Server 2008. Dryad simplifies the task of implementing distributed applications by addressing the issues in the preceding list, as follows:
* The Dryad engine handles some of the most difficult aspects of large-scale distributed applications, including delivering data to the appropriate location, resource scheduling, optimization, and failure detection and recovery.
* Dryad supports an expressive programming model that is designed to support cluster-based computation.
* Dryad can handle large-scale data-parallel computations.
Some groups at Microsoft routinely use Dryad applications to process petabytes of data on clusters of several thousand computers. However, Dryad can be used impactfully even on clusters of only a few computers.

Implementing a native Dryad software is still a demanding task for most programmers. DryadLINQ-created at Microsoft Research-simplifies the development process. With DryadLINQ, developers can implement Dryad applications in managed code by using an extended version of the LINQ programming model and API. Developers use Microsoft Visual Studio to implement their applications. Much of the code in a typical DryadLINQ software is similar to that used by LINQ-to-objects applications. The DryadLINQ provider converts the LINQ queries into a Dryad job and then executes the job on a cluster.
Note: The native Dryad API has been used by researchers at Microsoft Research to implement higher-level abstractions such as DryadLINQ and SCOPE. Although this paper refers to native Dryad applications, the native Dryad API is not public. It can be accessed only indirectly, through DryadLINQ.
For more information on Dryad applications, see the "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks" whitepaper.
This paper is a general introduction to Dryad and DryadLINQ for the current public release of Dryad, which runs on Windows HPC Server 2008. For more information, see these topics in the DryadLINQ documentation package:
* "DryadLINQ Installation and Configuration Guide"
* "DryadLINQ Programming Guide," which provides a detailed guide with code samples

For links to these papers-and other related information-see "Resources" at the end of this paper.
Note: This paper assumes that you are familiar with the basics of LINQ programming, and so focuses on how to use the DryadLINQ extensions to LINQ. If you are new to LINQ programming, see "LINQ: .NET Language Integrated Query" or "Language-Integrated Query (LINQ)" topics in the MSDN library, as listed in "Resources" later in this paper. There are also numerous books on LINQ available from a variety of publishers.

Please Note: Non-Commercial Use OnlyDryad/DryadLINQ (Academic Use)

The license of this software is Freeware, you can free download and free use this server utility software.

Server Utility Software Related Downloads: