Number of Failures
Description
- If a job is aborted because of PN failure, the job will be restarted using another node automatically.
- The storage system employs redundant configuration. Most of the storage failures have no influence on user program execution.
Number of components
ES(Compute Nodes+InfiniBand)
- CPU Nodes:720 nodes
- VE-equipped Nodes:684 nodes
- GPU-equipped Nodes:8 nodes
- InfiniBand:Internode network switch
Storage
- HOME disks(about 120TB)
- DATA disks(about 60PB)
- WORK disks(about 1.3PB)
Login server, etc.
- Login servers
- Various other servers, network devices