blog Deadlock during first touch of upc_alloc'd remote memory when target is in upc_barrier <p><strong>Status: This has been reported to Cray (811537) and a workaround is available.</strong></p> <p>When a UPC thread first attempts to access a shared memory location that has been allocated with upc_alloc() and the resulting pointer stored to a shared variable, it will first need to send an active message to the target thread requesting the DMAPP descriptor for the shared memory location. However, if the target is waiting in upc_barrier, there is a risk that this active message will not be serviced, leading the origin thread to wait indefinitely (as it will never reach the same barrier, this will lead to deadlock). The likelihood of deadlock is higher at greater levels of concurrency.</p> <p>This is due to the behavior of the optimized barrier implementation in Cray's DMAPP library. A potential workaround that appears to be effective on Hopper is to disable the optimized barrier by the PGAS_USE_DMAPP_COLLECTIVES environmental variable to "0" in your job submission script (or in your environment for an interactive session). This has been verified to work with cce/8.3.2 (under the current default version, 8.2.1, this does not appear to be the case). Also, this setting may have a performance impact, particularly in the latencies of barriers and certain reductions.</p> <p>We will provide further information on this issue as we learn more from Cray.</p> Thu, 16 Oct 2014 16:17:57 -0700