# Hardware Technology of the Earth Simulator

## By Jun INASAKA,\* Rikikazu IKEDA,\* Kazuhiko UMEZAWA,\* Ko YOSHIKAWA,\* Shitaka YAMADA† and Shigemune KITAWAKI‡

**ABSTRACT** This paper describes the hardware technologies of the supercomputer system "The Earth Simulator," which has the highest performance in the world and was developed by ESRDC (the Earth Simulator Research and Development Center)/NEC. The Earth Simulator has adopted NEC's leading edge technologies such as most advanced device technology and process technology to develop a high-speed and high-integrated LSI resulting in a one-chip vector processor. By combining this LSI technology against high heat dissipation and high-density cabling technology for the internal node shared memory system, ESRDC/NEC has been successful in implementing the supercomputer system with its Linpack benchmark performance of 35.86TFLOPS, which is the world's highest performance (http://www.top500.org/ : the 20th TOP500 list of the world's fastest supercomputers).

## KEYWORDS The Earth Simulator, Supercomputer, CMOS, LSI, Memory, Packaging, Build-up Printed Wiring Board (PWB), Connector, Cooling, Cable, Power supply

## **1. INTRODUCTION**

The Earth Simulator has adopted NEC's most advanced CMOS technologies to integrate vector and parallel processing functions for realizing a supercomputer. It tightly combines 5,120 one-chip processors and 10TB (Tera Bytes) main memory using the internal-node shared memory system and inter-nodal high-speed network, operating the total system with a clock cycle of 2ns.

We also developed a high-performance and highly integrated full-custom CMOS LSI with  $0.15\mu$ m process and 60 million transistors to realize a scalar and vector processor on one-chip. Moreover, to realize the highest performance of the system, we developed many sophisticated technologies as listed below:

- · Dedicated gate array
- FPLRAM that reduced the random cycle time to about a third of general purpose DRAM
- High-speed circuit technology to increase performance, and
- Most advanced fine buildup wiring board to support the maximum capability of ultra-high-speed and high pin count CMOS LSI.

The AP (Arithmetic Processor) package and MMU

<sup>†</sup>System ULSI Development Division

(Main Memory Unit) package are all interconnected with the fine coaxial cables to minimize the distances between them and maximize the system packaging density corresponding to the high-performance system. To cope with increased heat density due to highly integrated LSI, we developed a high-efficiency air-cooling system. The Earth Simulator is a system that optimally merges the most advanced technologies so that 8 processors can form one node that shares 16GB memory. The system is configured as 640 processor nodes which are interconnected into a system using a high-speed network. The system is deployed on the area as wide as four tennis courts (i.e.  $65 \times 50$ m), in which 640 processor nodes in 320 cabinets are aligned concentrically around 65 cabinets of the connection network, which provides interconnection between the nodes.

## 2. LSI TECHNOLOGY

Until SX-5, NEC's Supercomputer SX series used multiple LSI's to realize processor functions. In the Earth Simulator we realized a one-chip processor (CPU) for the first time. As shown in Fig. 1, in terms of clock cycle, an operation speed of 2ns (500MHz) was obtained, which is the fastest including bipolar technologies. To realize this by developing a fullcustom LSI, NEC combines most advanced CMOS technologies such as  $0.15\mu$ m and eight layer copper wiring process and high-performance circuit designed with advanced design tools. We describe elements of LSI technology, including common specification, highspeed LSI design technology, CAD technology, and

<sup>\*</sup>Computers Division

<sup>‡</sup>Earth Simulator Center, Japan Marine Science and Technology Center

high-speed and large-capacity memory technology.

### 2.1 LSI Common Specifications

Table I shows a comparison of LSI specifications and Photo 1 shows the microphotograph of CPU chip used in the Earth Simulator.

This LSI has adopted fine process of  $0.15 \mu m$  design rule, improves delay in wiring due to copper wiring process and enhances accommodation density using the high-density memory cell. Two types of power supply voltage are used: 1.8V for internal logic and 1.8/2.5V for external interface section. With use of most advanced CMOS technology and well-planned design of circuits, high-speed operation is implemented along with the reduction of power consumption. In designing the chip layout, the floor plan combines upstream stage of design in parallel with block design stage. In addition, automatic and manual designs are efficiently combined so that up to 60 million transistors are integrated to realize a high-speed operation frequency of 500MHz, providing four times the integration ratio and two times the operation frequency of the SX-5, a previous generation product.

#### sx\_4 10 SX-2 смоз (su) SX-5 cycle Bipolar SX-3 Clock The Earth Simulato 1995 2000 2005 1980 1985 1990

Fig. 1 Clock cycle of supercomputer.

#### 2.2 LSI Design

Two stage clock distribution is used so that the main clock driver drive locally allocated clock drivers while the local clock driver distributes clock signals with low skew to about 200,000 F/F (Flip Flops). The main clock distribution uses a dedicated clock and low-resistance thick film wiring layer to avoid the rounded waveform due to a resistance component and reduced cross talk noise by shielding of power and ground lines. In addition, clock driver power lines are separated from other power lines to reduce clock jitter.

LSI has several types of built-in RAM circuits, including a high-capacity cache memory and multiport register file. These RAM circuits are designed as dedicated circuits to fully enhance the device performance. To improve the accommodation density, a memory cell uses a high-density cell with reduced area applying the dedicated rule such as common contact and local wiring. In addition, copper wiring is used to decrease the wiring capacitance of bit line and other lines, improving the wiring delay for high-speed processing. To eliminate the environmental



Photo 1 Microphotograph of CPU chip.

| Item                      | SX-5         | The Earth Simulator |                         |
|---------------------------|--------------|---------------------|-------------------------|
| LSI design                | Full-custom  | Full-custom (CPU)   | Full-custom (Other LSI) |
| Design rule $(\mu m)$     | 0.25         | 0.15                | 0.15                    |
| Die size (mm $\Box$ )     | 17.5         | 20.79               | 17.62 for MMC           |
| Number of transistors     | 15 million   | 60 million          | ~50 million             |
| Operating frequency (MHz) | 250          | 500                 | 500                     |
| Metal layer               | Aluminum : 5 | Copper: 8           | Copper: 8               |
| Number of I/O (Sig.)      | 1,468 (564)  | 5,185 (1986)        | 4,189 (1763) for MMC    |
| I/O pitch ( $\mu$ m)      | 250          | 200                 | 200                     |
| Power supply voltage (V)  | 3.3, 1.2     | 1.8                 | 2.5, 1.8                |
| Package                   | FC-BGA       | FC-bare             | FC-bare, FC-BGA         |

Table I LSI specifications.

conditions such as temperature and voltage and process variation, a timing generation circuit is used for high-speed and stable operations.

LSI internal test has a scan path function in F/F to perform full scanning over the LSI and a partial scan test for upgraded failure detect ratio. To shorten the test duration for high-capacity memory mounted on LSI, all the built-in RAM use BIST (Built In Self Test). In addition to basic RAM operations, multiple pattern sensitive tests are available so that a highly reliable test is realized to reject any production defects.

## 2.3 CAD

In the LSI design, most advanced CAD technologies are applied to both logic design and layout design.

#### (1) Logic Design

In logic design, a hardware description language called FDL (Functional Description Language) was used. We have developed and used various tools for FDL including logic simulators, logic synthesis tools, and formal verification tools. Logic emulation was also applied to the SPU (Scalar Processor Unit) on the CPU chip. This technology realizes functions written in FDL as a prototype using FPGA (Field Programmable Gate Array) so that the verification can be executed 10,000× faster than normal software simulation. The operation from hardware initialization to OS boot was executed and verified by the logic emulation.

#### (2) Layout Design

A hierarchical layout method is essential for today's most advanced VLSI designs. We developed and used a top-down hierarchical layout method (Fig. 2). At first, the top-level layout is performed, where wires and repeater cells for delay optimization can be overlapped to lower hierarchical areas. After the toplevel layout, top-level wires and repeater cells overlapping to the lower hierarchical areas are pushed down into the lower hierarchical level. Then, the lower hierarchical level layout is performed including the wires and repeater cells from the top-level layout. After the lower hierarchical level layout, a full chip layout is generated by merging the lower hierarchical level layout and the top-level layout. At the full chip level, layout verification and logical equivalence verification are performed to verify the correctness of the hierarchical design method.

To design very high performance LSI, we developed and used a clock distribution method to minimize clock skew and a timing driven layout method for delay optimization. The clock distribution method consists of mesh-based global clock distributions and local clock distributions generated with two-level clock tree synthesis from the global clock distributions. For local clock distributions, delay adjustment by gate sizing and load adjustment by dummy load insertion are performed. With such adjustments, the clock skew is successfully minimized to below the target value. As a timing driven layout method, we developed and used improved timing driven placement, gate sizing, repeater insertion, and timing driven routing.

#### 2.4 Memory LSI

Memory specifications used in the main memory unit are shown in Table II. Since upgraded memory throughput performance is the key to improving the performance for scientific computing such as the



Fig. 2 Top-down hierarchical layout method.

Table II Memory specifications.

| Item                 | SX-5             | The Earth Simulator  |
|----------------------|------------------|----------------------|
| Туре                 | Synchronous DRAM | Full-PipeLine Memory |
| Capacity (bits)      | 64M              | 128M                 |
| Number of banks      | 4                | 8                    |
| Clock frequency (MH  | (z) 125          | 133                  |
| Random cycle time (r | ns) 72           | 21.6                 |
| Access time (ns)     | 45.5             | 30                   |
| Supply voltage (V)   | 3.3              | 2.55/(I/O 1.8)       |
| Package              | 54pin TSOP       | 100pin $\mu$ BGA     |
| Power (W)            | 1                | 1                    |
|                      |                  |                      |

Earth Simulator, reduction of random cycle time in the memory LSI is especially important. The memory LSI performs pipeline processing in all stages of memory internal operations (called a full-pipeline memory) so that the random cycle time is reduced to a third of that of the general purpose DRAM (e.g. Synchronous DRAM, Rambus DRAM). In addition, there are 8 internal banks in the LSI memory, which is twice the number in general purpose Synchronous DRAM (four banks), reducing the same bank access probability to a half. As a result, the effective memory performance of the device is upgraded up to six times or, in the worst case of accessing the same bank continuously, to about three times. The memory cell array is configured in subareas so that the load capacity of word lines and bit lines in the memory cell array decreases to keep high-speed data access, and the narrowed operation area over the memory LSI will reduce the power consumption. Thus, in addition to reduction of power voltage  $(3.3V \rightarrow 2.55V)$ , increase of power consumption due to upgraded performance and increased memory capacity was avoided, keeping the same level of power consumption as that in the previous generation Synchronous DRAM. The interface used is 1.8V CMOS interface with small amplitude to support high-speed operation. Using a chip scale package ( $\mu$ BGA), we realized high-density and highspeed interface due to shortened signal wiring length, decreased capacitance and inductance as well as lower heat resistance.

## 3. HIGH-SPEED CIRCUIT TECHNOLOGY

For the enhancement of system performance, signal transmission both of inner LSI and between LSIs must be carried out at a higher speed. To realize highspeed signal transmission of up to 500Mbps/pin between LSIs in the simulator, we employed a highspeed clock distribution and suppressed waveform distortion and noise due to such high-speed circuit solution.

#### 3.1 High-Speed Signal Transmission

To realize reliable high-speed data transfer, we developed a dedicated high-speed driver/receiver circuit for the external interface circuit. In the implementation structure, we developed a high-density wiring board to shorten inner-card wiring and, for inter-card signal transmission we also developed a fine coaxial cable with less propagation delay by reducing transmission distance, while a higher speed data transfer was realized between LSIs.

The driver circuit uses a built-in output impedance

adjust circuit for impedance matching with transmission line and also optimizes the signal waveform due to the variation in the LSI production and environmental change such as power voltage and temperature to realize a high-speed data transfer. Characteristic impedance of transmission line is derived by wiring board, cables and connectors and set to  $50\Omega$ throughout the transmission system considering the impact on component productivity (e.g. wiring board, cable, connector), LSI driver capability, cooling and power supply.

#### 3.2 Noise Reduction

For the high-density wiring board, we applied a strip structure to keep the characteristic impedance uniform. By using blind via to reduce capacitance, reflection noise is also reduced to effectively avoid waveform distortion. In the card connector, pin length is shortened as far as possible and ground pin is appropriately allocated to suppress unmatched impedance. LSI output impedance is adjusted to the optimal value considering the production variation and environmental changes such as supply voltage and temperature.

In the high-speed system, reduction of power supply noise is another key factor. LSI adopts bare chip implementation. Decoupling capacitors are arranged closely to LSI. By utilizing cards, optimal layer configuration is applied to decrease impedance in the power supply system to suppress power supply noise. As another solution to decrease noise, the power response time is reduced in the power supply module. To reduce cross talk noise in the wiring board, we developed a proprietary check tool focusing on generation timing of such noise and overlapped multiple wirings, which was useful in optimal wiring and noise reduction.

#### **3.3 Clock Distribution**

Clock distribution requires a high-speed and precise waveform transmission so that the differential signaling is adopted. Characteristic impedance throughout the transmission line is kept uniform. In addition, to reduce the effect on signal waveform from reflection noises caused by via in the wiring board and input pin capacitance in the reception side LSI, several measures have been taken including optimally selected signal wiring length. As a result, LSIs on multiple logic packages are fed high speed clock signals with less jitter.

#### 3.4 Inter-Node Signal Transmission

The Earth Simulator has been realized with

multiple processing nodes that are interconnected with Interconnection Network. The data transfer capability of the processing nodes and Interconnection Network is the key to the Earth Simulator performance, requiring a high-speed and high capacity connection circuit. The electric serial transmission system performs parallel/serial conversion on the data from a device into 1.25Gbps serial data before transmission, while the reception side in turn performs serial/parallel conversion on the received serial data to restore the original parallel data.

The installation environment of the Earth Simulator requires a connection distance of up to 40m between processing nodes and Interconnection Network. We evaluated various cable characteristics and transmission waveforms to realize the stable signal transmission.

## 4. HIGH-DENSITY PACKAGING TECHNOL-OGY

The AP/MMU package that needs very high-speed operation uses bare chip implementation and a highdensity buildup wiring board to obtain the maximum LSI performance. AP and MMU packages are shown in Photos 2 and 3.

The AP package uses a one-chip processor mounted on one side of the wiring board with bare chip mounting technology, and high density connectors on the other side. The MMU package has both memory controller LSI and high-density connectors, and also has 48  $\mu$ -BGA type RAMs mounted on both sides. Specifications of AP and MMU packages are shown in Table III.

## 4.1 LSI Mounting

For LSIs, bare chip mounting technology is used to



Photo 2 AP package.

reduce noise and signal delays. The wiring board includes more than 5,000 solder bumps with 0.2mm pitch square grid, which are connected with high temperature solder bumps formed on the LSI. Then, underfill resin is filled and sealed between the LSI and the wiring board so that reliable connections are maintained for the bare chip. This bare chip mounting technology is realized through development of various technologies: fine solder supply technology, precision parts mounting technology, cleansing technology and resin seal technology. A cross section of bare chip mounting is shown in Photo 4.



Photo 3 MMU package.

Table III Specifications of AP & MMU packages.

| Item                             | AP                                               | MMU                            |  |
|----------------------------------|--------------------------------------------------|--------------------------------|--|
| Substrate                        | Build-up printed circuit board                   |                                |  |
| Size (mm)                        | $100 \times 115$                                 | $120 \times 105$               |  |
| Thickness (mm)                   | 1.57                                             |                                |  |
| Number of layer                  | 4 build-up layers on both sides<br>6 core layers |                                |  |
| Line/Space (µm)                  | 25 / 25                                          |                                |  |
| Via/Land (µm)                    | 50 / 75                                          |                                |  |
| Wiring length (m)                | 175                                              | 120                            |  |
| Device                           | CPU LSI × 1<br>(Bare chip)                       | MMC LSI × 1<br>(Bare chip)     |  |
|                                  | — 12                                             | 8Mb–FPLRAM×48<br>( $\mu$ -BGA) |  |
| Number of<br>I/O terminal (Sig.) | 3,960 (1980)                                     | 1,200 (600)                    |  |
| I/O terminal pitch (mm)          | 0.5                                              |                                |  |
| Power dissipation (w)            | 140                                              | 60                             |  |
| Cooling                          | Forced air cooling                               |                                |  |

## 4.2 Wiring Board

A high-density buildup wiring board realizing fine wiring with four buildup layers on both sides, a via hole pitch of  $75\mu$ m and a wiring pattern pitch of  $50\mu$ m was newly developed to support the above-mentioned bare chip implementation. This wiring board accepts more than 5,000 bare chip input/output pins along with higher wiring density.

Fine pattern on buildup layer is formed by the additive method and via hole by the laser method. Photo 5 shows the cross section diagram of buildup layer. Photo 6 shows the external appearance of the wiring. On the AP module, about 2,000 LSI signals connected to the buildup board are drawn through the buildup layer on the board surface, and then connected to the connector on the other surface. To reduce skew, each signal between the LSI and the connector is connected by same pattern length. Figure 3 shows a cross section of the AP package.

#### 4.3 Wiring Board Design

The wiring board is designed to support a high



Photo 4 Cross section of solder joint between substrate and bare chip.



Photo 5 Cross section of build-up layer.

density packaging, which dramatically outperforms traditional printed circuit board, through functional enhancements over the existing design system and improved efficiency and quality. Four main features of functional enhancements are described below.

#### 1) Via generation on the buildup layer

With this function, specific via (e.g. spiral via, staggered via) required in the buildup board are generated with any shape so that efficient wiring design is possible.

2) Pattern correction for preventing crack

This function is used to detect specific shapes of pattern that easily generate cracks and automatically correct it to the pattern shape considering the crack resistance.

3) Mesh generation

To realize the strip line structure, efficient power supply, intensified stickiness between buildup layers, and to improve yield by uniformized inner-layer conductor density, the design system supports meshshape copper foil plane generation over the buildup layer.

4) Mask data check and verification

For the productivity check, functions related to fine buildup board are improved, including checking of inter-layer via overlap and checking of gaps using various clearance value for each layer, area and



Photo 6 External view of signal layer.



Fig. 3 Cross section of AP package.

pattern. In addition, electrical connection verification time was remarkably reduced by speeding up the extraction of reversed net-list (connection information about the original drawing data) and revising the algorithm for comparison between net-list (logical connection information on design) and reversed netlist.

## **4.4 Interconnection**

Intra node interconnection between AP and MMU packages uses a newly developed 60 pole finediameter coaxial cable with reduced propagation delay (3.8ns/m) (See Photo 7). For the actual interconnection, about 300 of these cables are housed in the cable box (W: 50cm, H: 65cm, D: 20cm, approximately) where one side of the box accommodates AP package ×8, RCA (Remote access Control Adopter) package ×1 and another side accommodates MMU package ×32 (See Photo 8). Cable connectors are directly connected with high-SMT (Surface Mount Technology) connectors mounted on each module. Such structure helps reduce the impedance variation and cross talk in the connector section, providing



Photo 7 Coaxial cable (intra node).

excellent signal transmission characteristics.

Internode connection also uses a shielded electric cable with differential four-core quad used for highspeed serial transmission. This cable has superior high-frequency characteristics, successfully suppressing the inner-pair skew. As a result, 1.25Gbps signal transmission distance can reach up to 29m with a conductor of AWG#26 diameter or up to 40m with AWG#24. Internode connection also uses shielded electric cables for high speed serial transmission. This cable has two differential pairs of twisted wires, therefore, one cable consists of a total of four core wires that are shielded and covered with a jacket. Superior high-frequency characteristics successfully suppress the inner-pair skew. As a result, 1.25Gbps signal transmission distance can reach up to 29m with core conductor wire of 0.51mm diameter and up to 40m with 0.61mm. Internode cable connector uses a slide lock mechanism for easier insertion/ pull out, needing less time for maintenance.

#### 4.5 Cooling

LSIs for the Earth Simulator generate a considerable amount of heat. To make the entire system as compact as possible, high-density packaging technologies are fully employed, so that the density of heat generated throughout the systems increases. In other words, the development of advanced cooling technology is a key factor for the high-speed system.

To cool the AP package, the device with the largest heat generation, we developed a new highperformance heat sink utilizing the principle of heat siphon. Photo 9 shows the AP package equipped with heat sink. To efficiently transfer the heat from LSI chip to the heat sink, a thermal compound was also newly developed. Such developments result in the superior heat resistance of AP (0.29°C/W) from



Photo 8 Cable box.



Photo 9 AP package with heat sink.

junction to the air. Figure 4 shows the heat resistance breakdown into each AP package section.

MMU package's heat generation is not so high as that of the AP package. For the high density packaging, a high-performance heat sink was developed. Photo 10 shows the external appearance of the MMU package equipped with heat sink.

To cool overall the system that has a higher packaging density, a fan with higher static pressure and large airflow amount was developed and used in the AP package and MMU package sections. At the critical points inside the cabinets, thermal simulations were performed as well as experiments and evaluations by using a mockup model similar to the actual cabinet.

## 4.6 Power Supply

Due to the highly integrated and high-speed LSI along with lower voltages than in the previous sys-



Fig. 4 Thermal resistance of AP package.



Photo 10 MMU package with heat sink.

tem, we had to reduce voltage drop in power supply and inductance and improve response speed against load variation. To resolve these issues, we developed a DC-DC converter with DC48V input, employing the distributed power supply system in which AP/MMU packages are powered on a one-to-one connection basis via a plug-in connector.

The DC-DC converter was designed to use the synchronized rectification type/active clamp scheme for the circuit so that switching loss, rectification loss and transformer core loss were remarkably reduced with a high efficiency of 80% or higher. The secondary rectification section employed two-phase output, shifting the oscillation frequency on the output filter to the higher band. By optimizing the circuit constants related to output feedback accordingly, response speed of  $30\mu$ s (two times the previous power module) was obtained against the load variation. For connecting the power module to AP/MMU package, a compact plug-in connector with low resistance and high capacity (100A) was developed. For the circuit components, a transformer of surface implementation type and high output (50A) planer type and choke coil of high output (100A) planer type were developed to realize a compact DC-DC converter. For packaging, the main output circuit used a metal base print circuit board so that the heat generated from devices could be efficiently radiated, eliminating the heat sinks from the DC-DC converter for MMU package. Photo 11 shows the DC-DC converter.

#### 4.7 System Packaging

The Earth Simulator is much larger than others. In designing the cabinets, compactness of overall system and allowable size for individual cabinet were considered. For PN (Processor Node) cabinets, two



Photo 11 DC-DC converter (left: for MMU, right: for AP).



Photo 12 PN cabinet



Photo 13 IN cabinet

nodes are housed in a cabinet, while for IN (Interconnection Network) cabinets, two data switch units were housed in a cabinet. Photo 12 shows the implementation structure of PN cabinets. Photo 13 shows the implementation structure of IN cabinets.

Interconnection among 640 nodes uses about 83,000 cables (about 2,400km in total). To realize the cable installation, we developed dedicated cable installation simulator software, which was remodeled from the automatic router for the wiring board. Using this simulator, cable length and the height of installed cables were simulated to select an optimal cable route and order of cabling. Internode cables installed are shown in Photo 14.

As shown in Fig. 5, the cooling air is provided from air conditioners place on the 1st floor of the Earth Simulator Building. Then the air flows in the free access floor through the ducts between the 1st floor and free access, then inhaled in PN/IN cabinets to cool them. After cooling PN/IN, the heated air goes back to the air conditioner through the air return duct.



Photo 14 Internode cable in free access floor.



Fig. 5 Cross-sectional view of the Earth Simulator building.

#### 5. CONCLUSION

We have described hardware technologies of the Earth Simulator developed by ESRDC/NEC. The Earth Simulator, the world's top performance supercomputer, was realized by integrating NEC's most advanced technology and highest design capability, including highly integrated LSI technology, high-speed circuit technology and high-density implementation technology (wiring board technology, highefficiency cooling technology, cabling technology, cabinet implementation technology and high-speed power technology). We hope that this most advanced large size system will be effectively used for scientific development in various fields such as protection against the earth greenhouse effect.

#### REFERENCES

- [1] T. Watari and A. Dohya, "High performance Packaging Technology for Supercomputers," IEICE Transactions, **E74**, 8, Aug. 1991.
- J. Inasaka, T. Iwata, et al., "Hardware Technologies Tran-[2]sition from Bipolar to CMOS for NEC SX Series Supercomputers," NEC Res. & Develop., 39, 4, pp.369-378, Oct. 1998.
- [3] Tummala, Rymaszewski and Klopfenstein, "Microelectronics Packaging Handbook part 2," Second Edition.
- [4] S. Suzuki, K. Takahashi, T. Sugimoto and M. Kuwata, "Integrated Design System for Supercomputer SX-1/SX-2," Proc. of the 22nd Design Automation Conference, pp.536-542, 1985.
- T. Okamoto and J. Cong, "Buffered Steiner Tree Construc-[5]tion with Wire Sizing for Interconnect Layout Optimization," Proc. of IEEE Int. Conf. Computer-Aided Design, pp.44-49, 1996.

#### Received October 21. 2002



Jun INASAKA received his B.E. and M.E. degrees in mechanical engineering from Kyushu University in 1983 and 1985, respectively. He joined NEC Corporation in 1985, and engaged in the development of the packaging technology for large-scale computers at the packaging

engineering department. He is now Manager of the Circuit Engineering Department, Computers Division, where he is engaged in the developing high-speed circuit technology for supercomputers.



Rikikazu IKEDA received his B.E. and M.E. degrees in electronic engineering from the College of Science and Technology, Nihon University in 1982 and 1984, respectively. In 1984, he joined NEC Corporation. From 1984 to 2002, he was engaged in the development of CMOS

Full-custom LSI at the System ULSI Development Division. He is currently Expert Engineer of Circuit Engineering Department, Computers Division, where he is engaged in the development of supercomputers and mainframes.



Kazuhiko UMEZAWA received his B.E. degree from the University of Tokyo in 1985, and joined NEC Corporation. He is now Manager of Packaging Engineering Department, Computers Division. He has been engaged in developing packaging technology for HPC and high end servers.



Ko YOSHIKAWA received his B.E. degree from Waseda University in 1985. He joined NEC Corporation in 1985 and is currently Engineering Manager of the CAD Engineering Department, Computers Division. He is engaged in the development of CAD Systems for supercomputers and high end servers.

Mr. Yoshikawa is a member of the IEEE and the Information Processing Society of Japan.



Shitaka YAMADA received his B.E. degree in information engineering from Shinshu University in 1980. He joined NEC Corporation in 1980 and is now Senior Design Engineer of 1st Custom LSI Devision, NEC Electronics. He is engaged in the development of System ULSI for supercomputers and mainframes.



Shigemune KITAWAKI received his B.S. degree and M.S. degree in mathematical engineering from Kyoto University in 1966 and 1968, respectively. He joined NEC Corporation in 1968. Since then he had been involved in developing compilers for NEC's computers. He

joined Earth Simulator project in 1998, and now is Group Leader of "User Support Group" of Earth Simulator Center, Japan Marine Science and Technology Center.

Mr. Kitawaki belongs to the Information Processing Society of Japan and the Association for Computing Machinery.