2024-03-29T06:26:19Zhttps://www.tdx.cat/oai/requestoai:www.tdx.cat:10803/1349582017-09-23T20:56:47Zcom_10803_183col_10803_196
TDX (Tesis Doctorals en Xarxa)
author
Alvanos, Michail
authoremail
MALVANOS@BSC.ES
authoremailshow
false
director
Martorell Bofill, Xavier
codirector
Amaral, José Nelson
codirector
Farreras Esclusa, Montse
authorsendemail
true
2014-05-28T08:50:38Z
2014-05-28T08:50:38Z
2013-12-10
http://hdl.handle.net/10803/134958
B 15971-2014
Partitioned Global Address Space (PGAS) languages promise to deliver improved programmer productivity and good performance in large-scale parallel machines. However, adequate performance for applications that rely on fine-grained communication without compromising their programmability is difficult to achieve. Manual or compiler assistance code optimization is required to avoid fine-grained accesses. The downside of manually applying code transformations is the increased program complexity and hindering of the programmer productivity. On the other hand, compiler optimizations of fine-grained accesses require knowledge of physical data mapping and the use of parallel loop constructs.
This thesis presents optimizations for solving the three main challenges of the fine-grain communication: (i) low network communication efficiency; (ii) large number of runtime calls; and (iii) network hotspot creation for the non-uniform distribution of network communication, To solve this problems, the dissertation presents three approaches. First, it presents an improved inspector-executor transformation to improve the network efficiency through runtime aggregation. Second, it presents incremental optimizations to the inspector-executor loop transformation to automatically remove the runtime calls. Finally, the thesis presents a loop scheduling loop transformation for avoiding network hotspots and the oversubscription of nodes. In contrast to previous work that use static coalescing, prefetching, limited privatization, and caching, the solutions presented in this thesis focus cover all the aspect of fine-grained communication, including reducing the number of calls generated by the compiler and minimizing the overhead of the inspector-executor optimization.
A performance evaluation with various microbenchmarks and benchmarks, aiming at predicting scaling and absolute performance numbers of a Power 775 machine, indicates that applications with regular accesses can achieve up to 180% of the performance of hand-optimized versions, while in applications with irregular accesses the transformations are expected to yield from 1.12X up to 6.3X speedup. The loop scheduling shows performance gains from 3-25% for NAS FT and bucket-sort benchmarks, and up to 3.4X speedup for the microbenchmarks.
eng
Optimization techniques for fine-grained communication in PGAS environments
info:eu-repo/semantics/doctoralThesis info:eu-repo/semantics/publishedVersion
URL
https://www.tdx.cat/bitstream/10803/134958/1/TMA1de1.pdf
File
MD5
261d64e04457e11daa846994c240553c
1755599
application/pdf
TMA1de1.pdf
URL
https://www.tdx.cat/bitstream/10803/134958/5/TMA1de1.pdf.txt
File
MD5
d281fc3c5c3d7e75bade88cce17965f4
280438
text/plain
TMA1de1.pdf.txt