Optimization techniques for fine-grained communication in PGAS environments

Alvanos, Michail

Optimization techniques for fine-grained communication in PGAS environments

dc.contributor

Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors

dc.contributor.author

Alvanos, Michail

dc.date.accessioned

2014-05-28T08:50:38Z

dc.date.available

2014-05-28T08:50:38Z

dc.date.issued

2013-12-10

dc.identifier.uri

http://hdl.handle.net/10803/134958

dc.description.abstract

Partitioned Global Address Space (PGAS) languages promise to deliver improved programmer productivity and good performance in large-scale parallel machines. However, adequate performance for applications that rely on fine-grained communication without compromising their programmability is difficult to achieve. Manual or compiler assistance code optimization is required to avoid fine-grained accesses. The downside of manually applying code transformations is the increased program complexity and hindering of the programmer productivity. On the other hand, compiler optimizations of fine-grained accesses require knowledge of physical data mapping and the use of parallel loop constructs. This thesis presents optimizations for solving the three main challenges of the fine-grain communication: (i) low network communication efficiency; (ii) large number of runtime calls; and (iii) network hotspot creation for the non-uniform distribution of network communication, To solve this problems, the dissertation presents three approaches. First, it presents an improved inspector-executor transformation to improve the network efficiency through runtime aggregation. Second, it presents incremental optimizations to the inspector-executor loop transformation to automatically remove the runtime calls. Finally, the thesis presents a loop scheduling loop transformation for avoiding network hotspots and the oversubscription of nodes. In contrast to previous work that use static coalescing, prefetching, limited privatization, and caching, the solutions presented in this thesis focus cover all the aspect of fine-grained communication, including reducing the number of calls generated by the compiler and minimizing the overhead of the inspector-executor optimization. A performance evaluation with various microbenchmarks and benchmarks, aiming at predicting scaling and absolute performance numbers of a Power 775 machine, indicates that applications with regular accesses can achieve up to 180% of the performance of hand-optimized versions, while in applications with irregular accesses the transformations are expected to yield from 1.12X up to 6.3X speedup. The loop scheduling shows performance gains from 3-25% for NAS FT and bucket-sort benchmarks, and up to 3.4X speedup for the microbenchmarks.

eng

dc.format.extent

147 p.

dc.format.mimetype

application/pdf

dc.language.iso

eng

dc.publisher

Universitat Politècnica de Catalunya

dc.rights.license

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by/3.0/es/

dc.rights.uri

http://creativecommons.org/licenses/by/3.0/es/

dc.source

TDX (Tesis Doctorals en Xarxa)

dc.title

Optimization techniques for fine-grained communication in PGAS environments

dc.type

info:eu-repo/semantics/doctoralThesis

dc.type

info:eu-repo/semantics/publishedVersion

dc.subject.udc

004

cat

dc.contributor.director

Martorell Bofill, Xavier

dc.contributor.codirector

Amaral, José Nelson

dc.contributor.codirector

Farreras Esclusa, Montse

dc.embargo.terms

cap

dc.rights.accessLevel

info:eu-repo/semantics/openAccess

dc.identifier.doi

https://dx.doi.org/10.5821/dissertation-2117-95212

dc.identifier.dl

B 15971-2014

dc.description.degree

DOCTORAT EN ARQUITECTURA DE COMPUTADORS (Pla 2007)

Documents

TMA1de1.pdf

1.674Mb PDF

Aquest element apareix en la col·lecció o col·leccions següent(s)

Programa de Doctorat en Arquitectura de Computadors [272]

Àrea de contingut