
Job Scheduling
ES is basically a batch-job system. Network Queuing System
II (NQSII) is introduced to manage the batch job.
Fig.1 shows the queue configuration of ES.
Fig.1:Queue configuration of the Earth Simulator.
ES has two-type queues. S batch queue is designed for single-node
batch jobs and L batch queue is for multi-node batch queue.
There are two-type queues. One is L batch queue and the other
is S batch queue. S batch queue is aimed at being used for a
pre-run or a post-run for large-scale batch jobs (making initial
data, processing results of a simulation and other processes),
and L batch queue is for a production run. Users choice an appropriate
queue for users' jobs.
- The nodes allocated to a batch job are used exclusively
for that batch job.
- The batch job is scheduled based on elapsed time instead
of CPU time.
Strategy (1) enables to estimate the job termination time and
to make iteasy to allocate nodes for the next batch jobs in
advance. Strategy (2) contributes to an efficiency job execution.
The job can use the nodes exclusively and the processes in each
node can be executed simultaneously. As a result, the large-scale
parallel program is able to be executed efficiently.
PNs of L-system are prohibited from access to the user disk
to ensure enough disk I/O performance. herefore the files used
by the batch job are copied from the user disk to the work disk
before the job execution. This process is called "stage-in."
It is important to hide this staging time for the job scheduling.
Main steps of the job scheduling are summarized as follows;
- Node Allocation
- Stage-in (copies files from the user disk to the work disk
automatically)
- Job Escalation (rescheduling for the earlier estimated start
time if possible)
- Job Execution
- Stage-out (copies files from the work disk to the user disk
automatically)
When a new batch job is submitted, the scheduler searches available
nodes (Step.1). After the nodes and the estimated start time
are allocated to the batch job, stage-in process starts (Step.2).
The job waits until the estimated start time after stage-in
process is finished. If the scheduler find the earlier start
time than the estimated start time, it allocates the new start
time to the batch job. This process is called "Job Escalation"
(Step.3). When the estimated start time has arrived, the scheduler
executes the batch job (Step.4). The scheduler terminates the
batch job and starts stage-out process after the job execution
is finished or the declared elapsed time is over (Step.5).
How to execute the batch job
A user must write the batch script to execute the batch jobs.
Directives (they describe the number of nodes, the disk space,
the declared elapsed time and other system resources to execute
the batch job) are written in the job script.
Fig.3: Example of a batch script. The line
that begins #PBS is a directive for NQSII.
To execute the batch job, the user logs into the login-server
and submits the batch script to ES. And the user waits until
the job execution is done. During that time, the user can see
the state of the batch job using the conventional web browser
or user commands. The node scheduling, the file staging and
other processing are automatically processed by the system according
to the batch script.