با گسترش روز افزون داده ها در دنیای الکترونیکی و در کنار آن ارزان تر و پیشرفته تر شدن ابزارهای سخت افزاری، حل مسائل پیچیده علوم مختلف با کمک روشهای کامپیوتری و نرم افزار های برنامه نویسی روز به روز راحت تر و سریعتر می شود.
موضوعاتی از قبیل محاسبات موازی(Parallel Computing)، محاسبات سریع(High Performance Computing)، محاسبات تورین(Grid Computing) و محاسبات ابری(Cloud Computing) به منظور کمک به حل اینگونه مسائل بوجود آمدند. برنامه نویسی موازی(Parallel Programming) و اجرای همزمان یک برنامه بر رو چندین کامپیوتر، دریچه ایست به اینگونه تکنولوژی ها.
در ادامه این مطلب لیستی از زبانهای برنامه نویسی را که با کمک آنها می توان برای اینگونه محیطها برنامه نویسی کرد می توانید مشاهده کنید. این زبانها با استفاده از یک سری دستورات تعبیه شده این قابلیت را به برنامه نویسان می دهند که یک الگوریتم را پس از طراحی مجدد با دید اجرای موازی، به برنامه کاربردی تبیدل نمایند. MPI یکی از معروفترین و پرکاربرد ترین چارچوبها برای برنامه نویسی و اجرای موازی برنامه است.
The following is a rough draft of a list of the most important parallel programming environments.
ACE (Adaptive Communication Environment): This C++ threads environment is portable between UNIX and Win32 platforms. It integrates into the threads a range of IPC mechanisms such as RPC, sockets, and System V IPC. The overall driving force behind ACE is its use of many core design patterns for concurrent communication software. ACS provides a rich set of reusable C++ wrappers and framework components that perform common communication software tasks across a range of operating system platforms.
BSP (Bulk Synchronous Parallel model): They have plans to support NT, but I don't know when such a system will be available. Currently, it runs on a number of systems including PCs under Linux.
Calypso: An implementation of the BSP model (though the authors don't describe it this way). A Calypso program is an SPMD program that consists of master and worker processes that are distributed about a network of UNIX or NT workstations (homogenous OS only). The master process takes care of sequential operations and serves as a task and memory server for the worker processes. Processes dynamically participate in computing parallel sections. The system provides dynamic load balancing and a degree of fault tolerance.
CC++ (Compositional C++, Caltech): Parallel programming language based on C++. Doesn't support NT, but it does support Solaris (and hence it is not unreasonable to get it to work on a PC).
Charm (Charm/Charm++, University of Illinois at Urbana-Champaign): The Converse tools: the CHARM message driven Programming environment.
Chameleon: A message-passing library from Bill Gropp (این آدرس ایمیل توسط spambots حفاظت می شود. برای دیدن شما نیاز به جاوا اسکریپت دارید) at Argonne National Laboratory. Chameleon is a low-level stable interface to p4, PICL, PVM, and vendor-specific message-passing environments. Chameleon can be used in a development mode where it provides a wide variety of debugging information or in production mode where it emphasizes parallel efficiency.
Cilk: Efficient execution of multithreaded computations. Cilk is an algorithmic multithreaded language. The philosophy behind Cilk is that programmers should concentrate on structuring their programs to expose parallelism and exploit locality, leaving the runtime system with the responsibility of scheduling the computation to run efficiently on a given platform. Thus, the Cilk runtime system takes care of details like load balancing, paging, and communication protocols. Unlike other multithreaded languages, however, Cilk is algorithmic in that the runtime system guarantees efficient and predictable performance. Cilk runs on many different SMP systems including PCs running Linux.
CM-Fortran: The Fortran environment on the Connection Machine. Implements a strict data parallel programming model.
Code: Visual parallel programming system.
Concurrent C: Concurrent C [Gehani85] extends the CSP model used in Occam to provide a more general (and complex) concurrent programming model. Concurrent C uses a synchronous message-passing scheme to facilitate parallelism, with processes proceeding independently except when communication takes place. Concurrent C introduces a fairly complex set of new operators and program structures to support asynchronous parallel programming. Through the use of transaction pointers (a transaction is a structured two-way communication mechanism) and process variables, Concurrent C processes are able to communicate directly with other processes, regardless of the physical position of the processes in the system. Concurrent C maps well onto distributed memory machines. (Actually, Concurrent C can be implemented on a shared memory system, but constructs for managing shared memory were deliberately left out of the language to ensure portability.)
Crystal: Marina Chen's functional programming language [Szymanski]. This language is based on familiar mathematical notation and lambda calculus.
Data Parallel C: Quinn and Hatcher's MIMD version of TMC's C* [Hatcher91].
DOLIB: Distributed Object Library [D'Azevedo94a] provides a distributed array much like GA. DOLIB is more general in that its fundamental data type is a one-dimensional array. Paging is used to try to automatically provide higher performance. It is also integrated with an I/O library called DONIO [D'Azevedo94b]. The system was used to create a scalable molecular dynamics program [D'Azevedo94c]. GA and DOLIB are compared and contrasted in [Mattson95a].
Fortran D: The experimental compiler system from Ken Kennedy's group at Rice University. The syntax and programming model of Fortran D are essentially the same as for HPF.
Fortran M: Fortran M from Argonne National Laboratory. Works w/on: Heterogeneous computers. Languages: Fortran. Contact: این آدرس ایمیل توسط spambots حفاظت می شود. برای دیدن شما نیاز به جاوا اسکریپت دارید. A small set of extensions to Fortran 77 that supports development of modular, deterministic message-passing programs. [Foster93b]
GA: GA or Global Arrays [Nieplocha94] from Pacific Northwest National Laboratory. Shared-memory programming interface for distributed-memory computers. GA provides a library interface to a distributed two-dimensional array data type. GA is compared to DOLIB and NX in [Mattson95a].
GLU: Granular Lucid. Programming system for constructing parallel and distributed applications. Runs on a number of systems including PCs running Linux.
Haskell: A functional programming language [Hudak92]. Haskell is a higher-order functional language with a rich polymorphic type system and non-strict semantics (i.e., "lazy evaluation"). Its author describes Haskell as a para-functional language to convey that it is an extension of a pure functional language and that it includes constructs to represent explicit parallelism. It uses a meta-language to keep the functional semantics (what is computed) distinct from the operational semantics (how is the computation carried out). "para-Haskell" supports two kinds of parallel annotations: scheduling constraints and mapping expressions. HPC++ (High Performance C++): A standard model for parallel programming using C++. A programming environment from Dennis Gannon's group at the University of Indiana. It combines his past system pC++ and the Caltech system CC++. It supports a basic data-parallel approach thereby providing compatible interaction with HPF distribution directives.
HPF: High Performance Fortran, from the HPF Forum. HPF is a data parallel dialect of Fortran 90. Extensions have recently emerged to support task-level parallelism, but the core of the language and its historical roots are with data-parallel programming. Most of HPF is directives and language constructs to partition and distribute arrays among the nodes of a parallel computer. HPF has not been very successful to date since algorithms that are not strictly data-parallel are hard to implement with HPF.
JADA: JADA is a Linda-like system that mixes Linda with Java. Multiple tuple spaces are supported. These can be local (for coordinating between threads) or remote (for coordinating between distinct applets potentially distributed over the WWW). JADA was created as part of the PageSpace (an ESPRIT funded project).
Java Threads.
Legion: Legion is a metasystems project at the University of Virginia. It provides the illusion of a single virtual machine to users; a virtual machine that provides secure shared object and shared name spaces, application-adjustable fault tolerance, improved response time, and greater throughput. The physical systems can be supercomputers, workstations, PCs, or even nontraditional computing devices.
Linda: The best-known coordination language is Linda [Carriero91]. In Linda, coordination takes place through a small set of operations that manipulate objects within a distinct shared memory. The shared memory supports algorithms that use high-level constructs such as distributed data structures and anonymous communication (i.e., the sender and receiver don't know the identity of one another). The commercial providers of Linda (Scientific Computing Associates provide Fortran and C versions of the system. Also see JADA, WWWinda, ISETL-Linda, ParLin, Eilean, P4-Linda, Glenda, POSYBL, and Objective-Linda.
Lucid: A parallel functional language based on intensional logic. Lucid is an implicitly parallel language. Lucid permits data structures such as arrays, lists, and trees to be implemented in a manner that is easily distributable. Lucid is simple and elegant. Lucid is not committed to any particular model of computation so the writer of a Lucid compiler has considerable freedom to implement language features in a manner that cannot be interfered with from the user program. [Szymanski].
Mentat (University of Virginia): First, the good folks at UVA would probably want to add that Mentat follows a large-grain dataflow approach and not a data-parallel one. This is an outstanding and very flexible model. Mentat is an object-oriented parallel processing system designed to directly address the difficulty of developing architecture-independent parallel programs. The system consists of a runtime system, a programming language, and a monitor. Works on a cluster of Linux PCs. Contact: این آدرس ایمیل توسط spambots حفاظت می شود. برای دیدن شما نیاز به جاوا اسکریپت دارید. Also take a look at Legion.
MPI (Message-Passing Interface, Argonne National Laboratory (CRPC)): Works w/on: Chameleon. Languages: C and Fortran. Implementations include WinMPI (MPI for MS Windows 3.1) and MPICH (A Portable Implementation of MPI). Contact: این آدرس ایمیل توسط spambots حفاظت می شود. برای دیدن شما نیاز به جاوا اسکریپت دارید (William Gropp). Also available here.
Multiblock Parti: A programming environment from Joel Saltz's group at the University of Maryland.
NESL (CMU): NESL is a nested data parallel language. NESL programs are build around a data type called a "sequence". Each element of a sequence can be any of the atomic types in the language or another sequence. Parallelism enters the picture through an "apply-to-each" form over element of the sequence and through parallel operations on sequences. The NESL program is compiled into a stack-based intermediate vector code (VCODE). The VCODE program is run on the target hardware through the VCODE interpreter. [Hardwick96].
NOW (Network Of Workstations): Using a network of workstations to act as a distributed supercomputer.
Occam (Oxford Univ.): Occam is one of the first languages created explicitly for parallel computing. An Occam program is a collection of processes that are composed either sequentially or in parallel. The processes interact through explicit communication channels. See also KROC (Occam for all) (Univ. of Kent) and [Pountain86] and my notes on channels in parallel programming environments.
OpenMP.
p4 (Portable Programs for Parallel Processors, Argonne National Laboratory). Works w/on: Heterogeneous computers. Languages: C and Fortran. Contact: این آدرس ایمیل توسط spambots حفاظت می شود. برای دیدن شما نیاز به جاوا اسکریپت دارید. Like PVM. Unlike PVM, however, monitors can be used in shared-memory systems. Also available here.
PAMS: A commercially supported programming environment from Myrias. PAMS is a compiler-driven system. The program identifies parallel loops and uses directives to make them execute in parallel.
Papers: A coordination library designed to emphasize low-latency communication. It includes synchronized aggregate communication, reduction operations, a scan operation, some support for parallel I/O. It supports a variation of the BSP model in that communication occurs aggregately at a barrier. It breaks with BSP in that only subsets of processors must participate in the barrier. See "A parallel processing support library based on synchronized aggregate communication" in the book Languages and compilers for parallel computing, edited by Huang, et al.
Parmacs: The parmacs macros package [Lusk87] from Argonne National Laboratory is a coordination library specialized to shared memory systems. This is the environment used within the SPLASH project.
pC: A shared memory abstraction of message-passing from Ridg Scott's group at the University of Houston. See the comments about pFortran.
DOLIB:
PCN from Argonne National Laboratory/California Institute of Tech. Works w/on: Homogeneous computers. Languages: C and Fortran can be incorporated. Contact: این آدرس ایمیل توسط spambots حفاظت می شود. برای دیدن شما نیاز به جاوا اسکریپت دارید. Also available here.
PETSc (Portable, Extensible, Toolkit for Scientific Computation): A programming environment to support writing large-scale scientific applications. It supports C primarily but it can also be mixed with C++ and F77. PETSc comes from William Gropp's group at Argonne National laboratory. It incorporates a variety of parallel data structures including index sets, vectors, matrices (not merely arrays, but parallel data structures), and distributed arrays. It also includes libraries of solvers, preconditioners, ODE solvers, and simple X-window graphics systems. At the simplest level, a programmer can use PETSc by creating distributed data structures and getting parallelism from the parallel libraries. I need to look into it further and see how they make the data distribution visible to the user. Typically, this is made opaque to the user (which is smart), but if one needs to write one's own parallel routines that must work with the PETSc libraries, this distribution must be visible. This is an impressive package that deserves careful study. For more information see [Curfman96]. It supports an impressive range of systems including Windows NT/95.
pFortran: pFortran is a member of the "P" family of languages (pC, pC++, and pFortran) [Bagheri91]. All of these languages provide a high-level, shared-memory abstraction for message-passing systems. pFortran programs use an SPMD model. Any node can access data on another node using an "@" notation. For example, a node can access data on node "J" as D@J.
PICL (Oak Ridge National Laboratory): Works on a variety of multiprocessors (and workstations?). Languages: C (and Fortran-to-C interface routines). Contact: این آدرس ایمیل توسط spambots حفاظت می شود. برای دیدن شما نیاز به جاوا اسکریپت دارید (Patrick H. Worley). A subroutine library that implements a generic message-passing interface for a variety of multiprocessors. It also provides time-stamped trace data, if requested.
POET: An object oriented framework from Sandia National Laboratories in California. [Armstrong96]. The POET framework views the data in terms of Cells: the smallest unit of data that POET concerns itself with. The size of the Cell must be large enough to amortize the overhead added by the framework itself. However, it needs to be small enough to support good load balancing. These Cells are distributed among the nodes of the parallel system and that distribution is documented in a partitionMAP object (which is replicated on each node and kept up to date). The partitionMAP contains information about where each cell is mapped and how cells (or parts of cells) are communicated between processors. In terms of execution, the POET framework provides an Exec component. This is a pure virtual class that has one important method: "exec(void)". This method means "do something, it's your turn". All components in the application will inherit from Exec and will overload the exec(void) method. Using this approach, a programmer create s a parallel application by defining the cells and who the communicate, and then creating a nested collection of Exec objects. The framework then executes by running the topmost exec and then the other execs in the nest. Parallel algorithms encapsulated as components in a Smalltalk-like C++ framework. POET looks at a scientific parallel program as a collection of such components linked and orchestrated by the framework. Supports PCs under Linux. This approach is very interesting and deserves further study.
POOMA: The Parallel Object Oriented Methods and Applications Framework from Los Alamos National Laboratory (John Reynders's group). See [Atlas96]. This is a narrowly defined framework for developing simulations of physical systems. It includes physically motivated parallel objects such as Particles and Fields as well as canonical mathematical methods which can be applied to these parallel objects (e.g. gather/scatter of Particles onto a Grid, Fourier transforms). POOMA is a layered system of objects. Each object in the Framework is composed of or utilizes objects from lower layers. Upper layers contain global data objects that are abstractions of scientific problem domains. Objects lower in the framework capture the abstractions relevant to parallelism and efficient node-level computation (e.g., communication, domain decomposition, load balancing, etc.). An important abstraction in POOMA is the virtual node (or vnode). When a distributed object is created, it distributes itself among a collection of vnodes. A map of the vnodes and which processors they are mapped to is maintained by the VnodeManager.
POSYBL (Programming System for Distributed Applications, University of Crete): Simple implementation of Linda-like system. The system consists of a daemon that runs on every workstation in a cluster, and a C-library of Linda like operations. Works w/on: Heterogeneous computers. Languages: C. Contact: این آدرس ایمیل توسط spambots حفاظت می شود. برای دیدن شما نیاز به جاوا اسکریپت دارید (Sxoinas Ioannis).
pSather: See Sather.
Pthreads: This is the standard way to do threads in the UNIX world. This group of threads libraries includes DCE threads and Solaris threads (which are also known as UI or Unix International threads).
PVM (Parallel Virtual Machine, Oak Ridge National Laboratory): A software system that enables a collection of heterogeneous computers to be used in parallel. It includes libraries of user-callable functions and a daemon program which coordinates inter-machine activity. Works w/on: Heterogeneous computers. Languages: C and Fortran. Also available here. Contact: این آدرس ایمیل توسط spambots حفاظت می شود. برای دیدن شما نیاز به جاوا اسکریپت دارید. Bob Daniel at این آدرس ایمیل توسط spambots حفاظت می شود. برای دیدن شما نیاز به جاوا اسکریپت داریدmentioned a very fast PVM that runs on top of NT (aside: Bob works for a company called Dash that sells optimization software for PC's including SMP boxes).
Sather: An object-oriented language with parameterized classes, object-oriented dispatch, statically-checked strong typing, separate implementation and type inheritance, multiple inheritance, garbage collection, iteration abstraction, higher-order routines and iters, exception handling, assertions, preconditions, postconditions, and class invariants. pSather is a parallel extension of Sather. It extends Sather by adding threads, synchronization, and data distribution. Unlike actor languages, multiple threads can execute in one object. It offers several synchronization mechanisms like futures, gates, mutex, reader/writer locks, barrier synchronization, rendezvous, and a disjunctive lock statement.
SDDA: Scalable Distributed dynamic Array from the University of Texas in Austin. See [Edwards96] for more information. SDDA is a software infrastructure for developing complex dynamic data structures on distributed memory multiprocessor systems. SDDA provides all functions required to manage distributed dynamic data structures. The central idea is an index space. The index space defines a uniform global address space for an applications distributed objects. Each object is associated with a unique index into the SDDA. The object's index uniquely defines the location of the object within the distributed memory environment. An application creates, accesses, updates, and deletes objects in the SDDA via the associated SDDA index. Location of the object is transparent, and the access API is the same for both remote and local objects. SDDA uses a hashing technique over the indices to preserve locality and optimize object access. It sits on top of MPI.
Sisal: Sisal [Feo90] is a functional programming language. It has been heavily used in shared-memory environments. There has been work to move it to distributed-memory environments, but this work hasn't led to a robust distributed-memory implementation. [Cann92]. Sisal extracts parallelism from a program using a data-dependence analysis. The language has no explicit parallel constructs. Sisal guarantees repeatable results in a multiprocessor environment. Split-C: Parallel extension to C with global address space for distributed-memory multiprocessors. Split-C is an extension of the C programming language offering a global address space. It assumes a single program multiple data (SPMD) model in which each of the CPUs has a single thread of control and the memory model is a two-dimensional array, where one dimension is the set of CPUs and the other dimension is each processor's local address space. Accesses to memory locations on a remote node are compiled to code fetching from or putting data to that remote processor. Split-C allows one to overlap communication of communication and computation by using split phase operations (called gets, puts, and stores). Split-C is available on a variety of supercomputers, including the TMC CM-5, the Meiko CS-2, Intel Paragon, and IBM SP-2 machines, as well as for networks of workstations. Most implementations, including our SCI implementation, are based on Active Messages.
SR ( Synchronizing Resources): Concurrent programming language. The SR language is a public-domain language that runs on Unix multiprocessor machines and on workstations connected over a LAN. It appears to be a whole new language as opposed to a language extension. See [Olsson92] and [Andrews93].
Sthreads: A threads-based programming environment from Caltech. The system consists of a pragma and a collection of synchronization primitives. If used according to some narrowly defined rules, an Sthreads program is guaranteed to execute and to produce the same result in sequential and multi-threaded modes. The underlying library used to implement the pragmas is also provided with Sthreads and made visible to the user.
Strand: Strand is a parallel language based on concurrent logic programming [Foster90]. It is very similar to flat concurrent Parlog. A discussion of its use in scientific programming can be found in [Mattson90]. The language was developed for commercial distribution, but it is currently freely available.
TCGMSG (Theoretical Chemistry Group Message Passing System, Argonne National Laboratory): Works w/on: Heterogeneous computers. Languages: C and Fortran. Contact: این آدرس ایمیل توسط spambots حفاظت می شود. برای دیدن شما نیاز به جاوا اسکریپت دارید (Robert J. Harrison). Like PVM and p4. PARMACS inspired the independent implementation of TCGMSG, a much simpler but much more robust package ... at that time the authors of PARMACS were not interested in doing more work in that area. p4 was subsequently and independently written largely by two of the original authors of PARMACS. The current version (4.05) of TCGMSG message-passing library is a part of the Global Arrays toolkit (providing a shared-memory programming model for most major parallel architectures) distribution. It is located on an anonymous ftp server: ftp.pnl.gov (192.35.193.200), file: /pub/global/global1.2.tar.Z TCGMSG will be upgraded soon to version 5 that will include asynchronous communication for networks of workstations and ports to the SGI Power Challenge and Cray T3D.
Threads.h++: A commercially supported product from Rogue Wave Software. It provides a moderately high-level interface for writing portable multithreaded programs. It includes basic synchronization primitives (monitors, mutexes, etc.), futures (which they call IOUs), thread creation, and a slick way to easily take procedures and turn them into threads. It's a large and rather complete package. It's an impressive package. While higher level than NT or Java threads, it still may be too high level for our needs.
TreadMarks: A user-level software-based Distributed Shared Memory system [Amza96]. Provides a global name space ion top of physically distributed memory. Synchronization is managed with barriers and mutex locks. Shared data resides in Fortran Common blocks. TreadMarks uses a relaxed consistency model for the shared memory. TreadMarks is commercially supported [Amza96]. It supports networks of PCs (currently only Unix environments, but soon on NT as well). Interfaces exist for C, C++, Fortran, and Java.
Vienna Fortran (VFCS): A data-parallel Fortran dialect that had a major impact on the formation of HPF. It is from the Zima group in Vienna.
Win32 threads.
WinPar: A message-passing (MPI and PVM) based environment to support parallel computing on Intel Architecture workstations. NT is the platform of choice for WINPAR. WINPAR is an integrated software development environment for parallel computing targeting personal computers interconnected by local area networks running Windows NT. The technical objectives of WINPAR are to provide a message-passing layer including MPI and PVM, to provide a framework of basic functionality needed for parallel computing and to provide a set of tools for code development, simulation, performance prediction, graphical high-level debugging, monitoring, and visualization of parallel applications. The commercial objectives of WINPAR are to offer an affordable parallel development environment for training and education at universities, research organizations, and industry, to be compliant to existing standards in the HPCN market, and thus to extend this currently only UNIX- and MPP-based market to networked Windows NT computers. The WINPAR environment will be developed using existing state-of-the-art tools with easy-to-use graphical user interfaces which are already available for UNIX. These tools, including AUGUR, MOD ARCH, ParadiseC++, TRAPPER, WPVM, and WMPI, will be enhanced, integrated, and ported to Windows NT. Some of these tools contain large modules dealing with user interaction and visualization. As the experience with previous ports from UNIX to Windows NT showed, it is often quicker to completely re-engineer the graphical user interfaces. In this process overlap areas between different tools will be eliminated by introducing common data structures and modules. Modern integration techniques like OLE automation will be used to achieve a tight integration of the tools and at the same time to provide open interfaces for future extensions of the environment. The usage of a commercially available multi-platform C++ object library for graphical user interfaces will ensure that the WINPAR environment is available for both UNIX and Windows NT and shares a common look and feel.
ZPL: ZPL is a data-parallel language based on the phase abstractions programming model [Lin]. ZPL is a subset of ORCA-C specialized to solving data-parallel computations. It is based on the phase abstraction model and the CTA (candidate type architecture). ZPL provides a global view of the computation so the programmer sees a single address space, and all parallelism is implicitly specified. ZPL uses the concept of a region at the core of its parallelism. A region is a set of indices. Once defined, one can specify array operations by referencing the involved arrays and the index region. Offsets into regions can be specified to allow relative referencing of array elements. This is combined with special operations to handle array boundaries plus array reduction and scan operations. The ZPL compiler is targeted to the CTA machine model.
(Links are not currently available for all environments.)