2024-03-28T12:54:19Zhttps://www.tdx.cat/oai/requestoai:www.tdx.cat:10803/6742242022-05-11T13:12:02Zcom_10803_183col_10803_196
nam a 5i 4500
Adaptable register file organization for vector processors
[Barcelona] :
Universitat Politècnica de Catalunya,
2022
Accés lliure
http://hdl.handle.net/10803/674224
cr |||||||||||
AAMMDDs2022 sp ||||fsm||||0|| 0 eng|c
Ramírez Lazo, Cristóbal,
autor
Arquitectura de Computadors,
degree
1 recurs en línia (158 pàgines)
Tesi
Doctorat
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
2022
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Tesis i dissertacions electròniques
Cristal Kestelman, Adrián,
supervisor acadèmic
Ramírez Salinas, Marco Antonio,
supervisor acadèmic
TDX
Today there are two main vector processors design trends. On the one hand, we have vector processors designed for long vectors lengths such as the SX-Aurora TSUBASA which implements vector lengths of 256 elements (16384-bits). On the other hand, we have vector processors designed for short vectors such as the Fujitsu A64FX that implements vector lengths of 8 elements (512-bit) ARM SVE. However, short vector designs are the most widely adopted in modern chips. This is because, to achieve high-performance with a very high-efficiency, applications executed on long vector designs must feature abundant DLP, then limiting the range of applications. On the contrary, short vector designs are compatible with a larger range of applications. In fact, in the beginnings, long vector length implementations were focused on the HPC market, while short vector length implementations were conceived to improve performance in multimedia tasks. However, those short vector length extensions have evolved to better fit the needs of modern applications. In that sense, we believe that this compatibility with a large range of applications featuring high, medium and low DLP is one of the main reasons behind the trend of building parallel machines with short vectors. Short vector designs are area efficient and are "compatible" with applications having long vectors; however, these short vector architectures are not as efficient as longer vector designs when executing high DLP code.
In this thesis, we propose a novel vector architecture that combines the area and resource efficiency characterizing short vector processors with the ability to handle large DLP applications, as allowed in long vector architectures. In this context, we present AVA, an Adaptable Vector Architecture designed for short vectors (MVL = 16 elements), capable of reconfiguring the MVL when executing applications with abundant DLP, achieving performance comparable to designs for long vectors. The design is based on three complementary concepts. First, a two-stage renaming unit based on a new type of registers termed as Virtual Vector Registers (VVRs), which are an intermediate mapping between the conventional logical and the physical and memory registers. In the first stage, logical registers are renamed to VVRs, while in the second stage, VVRs are renamed to physical registers. Second, a two-level VRF, that supports 64 VVRs whose MVL can be configured from 16 to 128 elements. The first level corresponds to the VVRs mapped in the physical registers held in the 8KB Physical Vector Register File (P-VRF), while the second level represents the VVRs mapped in memory registers held in the Memory Vector Register File (M-VRF). While the baseline configuration (MVL=16 elements) holds all the VVRs in the P-VRF, larger MVL configurations hold a subset of the total VVRs in the P-VRF, and map the remaining part in the M-VRF. Third, we propose a novel two-stage vector issue unit. In the first stage, the second level of mapping between the VVRs and physical registers is performed, while issuing to execute is managed in the second stage.
This thesis also presents a set of tools for designing and evaluating vector architectures. First, a parameterizable vector architecture model implemented on the gem5 simulator to evaluate novel ideas on vector architectures. Second, a Vector Architecture model implemented on the McPAT framework to evaluate power and area metrics. Finally, the RiVEC benchmark suite, a collection of ten vectorized applications from different domains focusing on benchmarking vector microarchitectures.
p
ES-BaCBU
cat
rda
ES-BaCBU
text
txt
rdacontent
informàtic
c
rdamedia
recurs en línia
cr
rdacarrier