Earth Simulator(ES2) System Overview

Job Scheduling

ES is basically a batch-job system. Network Queuing System II (NQSII) is introduced to manage the batch job.
Fig.1 shows the queue configuration of ES.

Fig.1:Queue configuration of the Earth Simulator. ES has two-type queues. S batch queue is designed for single-node batch jobs and L batch queue is for multi-node batch queue.

There are two-type queues. One is L batch queue and the other is S batch queue. S batch queue is aimed at being used for a pre-run or a post-run for large-scale batch jobs (making initial data, processing results of a simulation and other processes), and L batch queue is for a production run. Users choice an appropriate queue for users' jobs.

The nodes allocated to a batch job are used exclusively for that batch job.
The batch job is scheduled based on elapsed time instead of CPU time.

Strategy (1) enables to estimate the job termination time and to make iteasy to allocate nodes for the next batch jobs in advance. Strategy (2) contributes to an efficiency job execution. The job can use the nodes exclusively and the processes in each node can be executed simultaneously. As a result, the large-scale parallel program is able to be executed efficiently.

PNs of L-system are prohibited from access to the user disk to ensure enough disk I/O performance. herefore the files used by the batch job are copied from the user disk to the work disk before the job execution. This process is called "stage-in." It is important to hide this staging time for the job scheduling.
Main steps of the job scheduling are summarized as follows;

Node Allocation
Stage-in (copies files from the user disk to the work disk automatically)
Job Escalation (rescheduling for the earlier estimated start time if possible)
Job Execution
Stage-out (copies files from the work disk to the user disk automatically)

When a new batch job is submitted, the scheduler searches available nodes (Step.1). After the nodes and the estimated start time are allocated to the batch job, stage-in process starts (Step.2). The job waits until the estimated start time after stage-in process is finished. If the scheduler find the earlier start time than the estimated start time, it allocates the new start time to the batch job. This process is called "Job Escalation" (Step.3). When the estimated start time has arrived, the scheduler executes the batch job (Step.4). The scheduler terminates the batch job and starts stage-out process after the job execution is finished or the declared elapsed time is over (Step.5).

How to execute the batch job

A user must write the batch script to execute the batch jobs. Directives (they describe the number of nodes, the disk space, the declared elapsed time and other system resources to execute the batch job) are written in the job script.

Fig.3: Example of a batch script. The line that begins #PBS is a directive for NQSII.

To execute the batch job, the user logs into the login-server and submits the batch script to ES. And the user waits until the job execution is done. During that time, the user can see the state of the batch job using the conventional web browser or user commands. The node scheduling, the file staging and other processing are automatically processed by the system according to the batch script.

Back