Optimization techniques for fine-grained communication in PGAS environments

dc.contributor
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.author
Alvanos, Michail
dc.date.accessioned
2014-05-28T08:50:38Z
dc.date.available
2014-05-28T08:50:38Z
dc.date.issued
2013-12-10
dc.identifier.uri
http://hdl.handle.net/10803/134958
dc.description.abstract
Partitioned Global Address Space (PGAS) languages promise to deliver improved programmer productivity and good performance in large-scale parallel machines. However, adequate performance for applications that rely on fine-grained communication without compromising their programmability is difficult to achieve. Manual or compiler assistance code optimization is required to avoid fine-grained accesses. The downside of manually applying code transformations is the increased program complexity and hindering of the programmer productivity. On the other hand, compiler optimizations of fine-grained accesses require knowledge of physical data mapping and the use of parallel loop constructs. This thesis presents optimizations for solving the three main challenges of the fine-grain communication: (i) low network communication efficiency; (ii) large number of runtime calls; and (iii) network hotspot creation for the non-uniform distribution of network communication, To solve this problems, the dissertation presents three approaches. First, it presents an improved inspector-executor transformation to improve the network efficiency through runtime aggregation. Second, it presents incremental optimizations to the inspector-executor loop transformation to automatically remove the runtime calls. Finally, the thesis presents a loop scheduling loop transformation for avoiding network hotspots and the oversubscription of nodes. In contrast to previous work that use static coalescing, prefetching, limited privatization, and caching, the solutions presented in this thesis focus cover all the aspect of fine-grained communication, including reducing the number of calls generated by the compiler and minimizing the overhead of the inspector-executor optimization. A performance evaluation with various microbenchmarks and benchmarks, aiming at predicting scaling and absolute performance numbers of a Power 775 machine, indicates that applications with regular accesses can achieve up to 180% of the performance of hand-optimized versions, while in applications with irregular accesses the transformations are expected to yield from 1.12X up to 6.3X speedup. The loop scheduling shows performance gains from 3-25% for NAS FT and bucket-sort benchmarks, and up to 3.4X speedup for the microbenchmarks.
eng
dc.format.extent
147 p.
dc.format.mimetype
application/pdf
dc.language.iso
eng
dc.publisher
Universitat Politècnica de Catalunya
dc.rights.license
L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by/3.0/es/
dc.rights.uri
http://creativecommons.org/licenses/by/3.0/es/
*
dc.source
TDX (Tesis Doctorals en Xarxa)
dc.title
Optimization techniques for fine-grained communication in PGAS environments
dc.type
info:eu-repo/semantics/doctoralThesis
dc.type
info:eu-repo/semantics/publishedVersion
dc.subject.udc
004
cat
dc.contributor.director
Martorell Bofill, Xavier
dc.contributor.codirector
Amaral, José Nelson
dc.contributor.codirector
Farreras Esclusa, Montse
dc.embargo.terms
cap
dc.rights.accessLevel
info:eu-repo/semantics/openAccess
dc.identifier.dl
B 15971-2014


Documents

TMA1de1.pdf

1.674Mb PDF

This item appears in the following Collection(s)