NERSCPowering Scientific Discovery Since 1974

Interconnect

Edison employs the "Dragonfly" topology for the interconnection network. This topology is a group of interconnected local routers connected to other similar router groups by high speed global links. The groups are arranged such that data transfer from one group to another requires only one route through a global link.

This topology is composed of circuit boards, copper and optical cables. Routers (represented by the Aries ASIC) are connected to other routers in the chassis via a backplane. Chassis are connected together to form a two-cabinet group (a total of six chassis) using copper cables. Network connections outside the two-cabinet group require a global link. The System uses optical cables for all global links. All two-cabinet groups are directly connected to each other with these cables. See Figure below.

In the Figure above, each router (Rx) is connected to four processors nodes (P). Sixteen blades, each with one router, are connected together at the chassis level by circuit board links (Rank-1 Subtree). Six chassis are connected together to form the two-cabinet group by using copper cabling at the cabinet level (Rank-2 Subtree). Finally, the two-cabinet groups are connected to each other by using optical cables for the global links (Rank-3 Subtree).  Rank 1 routing is characterized by one electrical link between routers. Rank 2 is characterized by three electrical links and Rank 3 is characterized by two optical links between routers.

Rank 1 Detail

Within a chassis, the internal wiring of the backplane connects every Aries ASIC in the chassis to each other. As many as 16 Aries reside in each chassis (one per base board); there is one link between each ASIC. The interconnections of the chassis level ASICs require no cables. This set of interconnections is called the intra-chassis network. See Figure below for the “Intra-chassis Connections (Rank-1)”. 

Rank-2 Details

Copper cables connect each chassis in the two-cabinet group. Each cable contains three links that comprise a total of 18 differential pair wires (36 total). Each cable connects a blade router to a blade router in another chassis, in the same slot location. For example, the router in Slot 1, Chassis 0, Cabinet 0 would be connected to the five blades in the Slot 1 position in the five other chassis (two in the same cabinet and three in the neighboring cabinet). Fully connecting a two-cabinet group requires 240 cables.

Rank-3 Details

The Rank-3 network is used to interconnect two-cabinet groups.  This level of the topology utilizes optical cables that are housed in integrated cable trays above the system cabinets.

The optical connection uses a 24-channel optical cable: 12 channels for transmit and 12 channels for receive. Each cable handles four links (six channels per link), two from each of two Aries ASICs.  There are up to five optical cables associated with every pair of Aries ASICs and a total of 40 optical connections possible for each chassis.  Thus a complete two-cabinet group has up to 240 optical connections.

The Rank-3 connections must form an all-to-all network between the two-cabinet groups.  The width of these connections is variable and can be as few as 1 optical cable between two-cabinet groups and as many as INT(240/(N-1)) where N is the number of two-cabinet groups.  Therefore Edison with 30 cabinets (or 15 two-cabinet groups) can utilize up to 17 optical cables (INT(240/(15-1)) between each pair of two-cabinet groups.

Edison's Dragonfly Topology

Edison has 30 cabinets, and they are arranged into 8 columns and 4 rows. However, cabinets for columns 0 and 1 and row 3 don't exist (8 * 4 - 2 = 30). Please note that there are three chassis (or "cages") in a cabinet, sixteen blades in a chassis, and four compute nodes in a blade (if it is a compute blade).

These cabinet "coordinates" are succinctly represented by 'c#-#c#s#n#', where the first # is for the column number (0,1,...,7), the 2nd for the row number (0,1,2,3), the third # for the cage number (0,1,2), the fourth # for the slot (or blade) number (0,1,...,15), and the last # for the node number (0,1,2,3 in case of the compute node type).

The cabinet coordinates and the corresponding dragonfly topology coordinates can be found by running the 'xtdb2proc' command on a MOM node. One easy way to get this information is to put the following command in your batch script:

xtdb2proc -f edisontopo.out

where edisontopo.out is the output file name. As the following shows, it contains all the information about the coordinates:

$ cat edisontopo.out
#
# This file lists tables currently in the system database,
#
# Each line contains a record of comma-delineated pairs of the form field1=val1, field2=val2, etc.
#
# Note: this file is automatically generated from the system database.
#
cpu=1,slot=0,cage=0,cabinet=null,cab_position=0,cab_row=0,x_coord=0,y_coord=0,z_coord=0,process_slots=4,process_slots_free=4,processor_status='up',processor_type='service',alloc_mode='batch',processor_id=1,od_allocator_id=0,next_red_black_switch=null,processor_spec=null
cpu=2,slot=0,cage=0,cabinet=null,cab_position=0,cab_row=0,x_coord=0,y_coord=0,z_coord=0,process_slots=4,process_slots_free=4,processor_status='up',processor_type='service',alloc_mode='batch',processor_id=2,od_allocator_id=0,next_red_black_switch=null,processor_spec=null
cpu=0,slot=1,cage=0,cabinet=null,cab_position=0,cab_row=0,x_coord=0,y_coord=0,z_coord=1,process_slots=4,process_slots_free=4,processor_status='up',processor_type='service',alloc_mode='other',processor_id=4,od_allocator_id=0,next_red_black_switch=null,processor_spec=null
cpu=1,slot=1,cage=0,cabinet=null,cab_position=0,cab_row=0,x_coord=0,y_coord=0,z_coord=1,process_slots=4,process_slots_free=4,processor_status='up',processor_type='service',alloc_mode='other',processor_id=5,od_allocator_id=0,next_red_black_switch=null,processor_spec=null
cpu=2,slot=1,cage=0,cabinet=null,cab_position=0,cab_row=0,x_coord=0,y_coord=0,z_coord=1,process_slots=4,process_slots_free=4,processor_status='up',processor_type='service',alloc_mode='other',processor_id=6,od_allocator_id=0,next_red_black_switch=null,processor_spec=null
cpu=3,slot=1,cage=0,cabinet=null,cab_position=0,cab_row=0,x_coord=0,y_coord=0,z_coord=1,process_slots=4,process_slots_free=4,processor_status='up',processor_type='service',alloc_mode='other',processor_id=7,od_allocator_id=0,next_red_black_switch=null,processor_spec=null
...
cpu=0,slot=15,cage=2,cabinet=null,cab_position=7,cab_row=3,x_coord=15,y_coord=5,z_coord=15,process_slots=4,process_slots_free=4,processor_status='up',processor_type='compute',alloc_mode='batch',processor_id=6140,od_allocator_id=0,next_red_black_switch=null,processor_spec=null
cpu=1,slot=15,cage=2,cabinet=null,cab_position=7,cab_row=3,x_coord=15,y_coord=5,z_coord=15,process_slots=4,process_slots_free=4,processor_status='up',processor_type='compute',alloc_mode='batch',processor_id=6141,od_allocator_id=0,next_red_black_switch=null,processor_spec=null
cpu=2,slot=15,cage=2,cabinet=null,cab_position=7,cab_row=3,x_coord=15,y_coord=5,z_coord=15,process_slots=4,process_slots_free=4,processor_status='up',processor_type='compute',alloc_mode='batch',processor_id=6142,od_allocator_id=0,next_red_black_switch=null,processor_spec=null
cpu=3,slot=15,cage=2,cabinet=null,cab_position=7,cab_row=3,x_coord=15,y_coord=5,z_coord=15,process_slots=4,process_slots_free=4,processor_status='up',processor_type='compute',alloc_mode='batch',processor_id=6143,od_allocator_id=0,next_red_black_switch=null,processor_spec=null

A brief description about some fields is given below:

  • cpu: node number in a blade (0,1,2,3 for the 'compute' processor type)
  • slot: slot (blade) number (0,1,...,15)
  • cage: cage (chassis) number (0,1,2)
  • cab_position: cabinet column number (0,1,...,7)
  • cab_row: cabinet row number (0,1,2,3)
  • x_coord, y_coord, z_coord: the dragonfly topology coordinates
  • processor_type: 'compute' or 'service'
  • processor_id: node ID (or NID)

The dragonfly topology coordinates are given by a (X,Y,Z) triplet. The compute nodes in a blade are conected to the same Aries on a blade, and therefore share the same coordinate values.

The X coordinate is for the 2-cabinet group number for the rank-3 network, and the value and its cabinets are given as follows:

  • X=0 (group 0): c0-0 and c1-0
  • X=1 (group 1): c2-0 and c3-0
  • X=2 (group 2): c4-0 and c5-0
  • X=3 (group 3): c6-0 and c7-0
  • X=4 (group 4): c0-1 and c1-1
  • X=5 (group 5): c2-1 and c3-1
  • X=6 (group 6): c4-1 and c5-1
  • X=7 (group 7): c6-1 and c7-1
  • X=8 (group 8): c0-2 and c1-2
  • X=9 (group 9): c2-2 and c3-2
  • X=10 (group 10): c4-2 and c5-2
  • X=11 (group 11): c6-2 and c7-2
  • X=12 (group 12) doesn't exist as this is for non-existent c0-3 and c1-3
  • X=13 (gorup 13): c2-3 and c3-3
  • X=14 (group 14): c4-3 and c5-3
  • X=15 (group 15): c6-3 and c7-3

The Y coordinate is given as follows:

  • Y=0: cage 0 of the first cabinet in the group
  • Y=1: cage 1 of the first cabinet in the group
  • Y=2: cage 2 of the first cabinet in the group
  • Y=3: cage 0 of the second cabinet in the group
  • Y=4: cage 1 of the second cabinet in the group
  • Y=5: cage 2 of the second cabinet in the group

The first cabinet refers to the first cabinet mentioned in the above group listing (for example, c6-0 in case of the group 3).

The Z value is for the slot number in a cage.