Cray T3E (cont)
Memory Latency
- Local access is between 13 to 38 clock cycles
- Remote access between 150 to 300 clocks
Processor synchronization built on barrier hardware which is a logical AND tree that enable multiple barriers to be pipelined
- When a processor reaches a barrier, it sets a bit in the AND tree
- Processor cannot proceed until all bits in the barrier have been set