Earth Simulator(ES1) System Overview


High Performance Fortrans (HPF)

[ Features ]
Principal users of ES are considered to be natural scientists who are not necessarily familiar with the parallel programming or rather dislike it. Accordingly, a higher-level parallel language is in great demand.
HPF/ES provides easy and efficient parallel programming on ES to supply the demand. It supports the specifications of HPF2.0, its approved extensions, HPF/JA, and some unique extensions for ES (Figure 10).

Figure 10: Features of HPF/ES
Included in the ES unique extensions are as follows:

* HALO
This is a feature for irregular problems such as the finite element method, which have been said to be difficult to be parallelized with HPF [2]. In irregular problems data is accessed and communicated in an irregular manner, that is, data located separately in memory is processed in sequence. Such accesses and communications cannot be handled efficiently by the conventional features of HPF.
HALO enables you to specify explicitly array elements to be accessed or communicated irregularly (Figure 11). The HPF/ES compiler and runtime system can process such user-specified irregular accesses and communications efficiently in a special manner [3].

Figure 11: Example of HALO in FEM: A node on the distribution boundary is allocated as a real object on one processor and as HALO objects on the others, according to the user's declaration. HALO objects can be referenced in the same way as real objects after they are updated by REFLECT directives. It is also possible to execute reduction operations more efficiently using HALO.

* parallel I/O
HPF/ES provides a simple interface to parallel I/O, where each processor inputs/outputs independently the region of the array it owns from/to its local disk (Figure 12). This parallel I/O generates parallel files each of which contains the mapping information managed by the HPF/ES runtime system as well as the array data itself.


Figure 12: Parallel I/O of HPF/ES

The parallel files can be read only by HPF programs that is executed by as many processors as the files are written by. To resolve this problem, a tool is available for re-partitioning the parallel files among any number of processors. Note that it can also convert the parallel files into a conventional Fortran file.

* interface to MPI
subroutines A subroutine embedded with MPI library can be invoked from HPF programs, which means that you can replace the part of performance bottleneck in your program with faster MPI subroutines to improve its performance.

* vectorization/microtasking directives
HPF/ES accepts some vectorization/microtasking directives. They are specified for more efficient vectorization and microtasking.

* etc.

[ Evaluation ]
We parallelized a plasma simulation code IMPACT-3D with HPF/ES and obtained the performance of 14.9 Tflops, that is 45% of the peak, in the 512-nodes execution on ES [4]. It is a great honor that this achievement was chosen for the Gordon Bell Award for language in SC2002. Figure 13 displays an example of the results of IMPACT-3D.

Figure 13: Visualized Result of IMPACT-3D: In the stagnation phase of imploding targets in the laser fusion, a perturbation at the pusher-fuel contact surface is Rayleigh-Taylor unstable, and it is one of most important subjects to investigate this instability in this research field. This figure shows the isovaluesurfaces of the density, corresponding to the pusher-fuel contact surface at the maximum compression, indicating a geometrical appearance of the nonlinear bubble-spike structures. The initial perturbation is induced as a summation of two spherical harmonics modes (n, m)=(6,3) and (12,6), and the time evolution is simulated with a three-dimensional fluid code.

We also parallelized PFES, a code of Princeton ocean model, which achieved 10.5 Tflops in the 376-nodes execution. These results show us that HPF/ES has a high scalability and can be used readily in developing an actual simulation program.




Back