Optimizing VLIW architectures for multimedia applications

Author

Salamí San Juan, Esther

Director

Valero Cortés, Mateo

Date of defense

2007-06-01

ISBN

9788469139622

Legal Deposit

B.31930-2008



Department/Institute

Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors

Abstract

The growing interest that multimedia processing has experimented during the last decade is motivating processor designers to reconsider which execution paradigms are the most appropriate for general-purpose processors. On the other hand, as the size of transistors decreases, power dissipation has become a relevant limitation to increases in the frequency of operation. Thus, the efficient exploitation of the different sources of parallelism is a key point to investigate in order to sustain the performance improvement rate of processors and face the growing requirements of future multimedia applications. We belief that a promising option arises from the combination of the Very Long Instruction Word (VLIW) and the vector processing paradigms together with other ways of exploiting coarser grain parallelism, such as Chip MultiProcessing (CMP). <br/><br/>As part of this thesis, we analyze the problem of memory disambiguation in multimedia applications, as it represents a serious restriction for exploiting Instruction Level Parallelism (ILP) in VLIW architectures. We state that the real handicap for memory disambiguation in multimedia is the extensive use of pointers and indirect references usually found in those codes, together with the limited static information available to the compiler on certain occasions. Based on the observation that the input and output multimedia streams are commonly disjointed memory regions, we propose and implement a memory disambiguation technique that dynamically analyzes the region domain of every load and store before entering a loop, evaluates whether or not the full loop is disambiguated and executes the corresponding loop version. This mechanism does not require any additional hardware or instructions and has negligible effects over compilation time and code size. The performance achieved is comparable to that of advanced interprocedural pointer analysis techniques, with considerably less software complexity. We also demonstrate that both techniques can be combined to improve performance.<br/><br/>In order to deal with the inherent Data Level Parallelism (DLP) of multimedia kernels without disrupting the existing core designs, major processor manufacturers have chosen to include MMX-like µSIMD extensions. By analyzing the scalability of the DLP and non-DLP regions of code separately in VLIW processors with µSIMD extensions, we observe that the performance of the overall application is dominated by the performance of the non-DLP regions, which in fact exhibit only modest amounts of ILP. As a result, the performance achieved by very wide issue configurations does not compensate for the related cost. To exploit the DLP of the vector regions in a more efficient way, we propose enhancing the µSIMD -VLIW core with conventional vector processing capabilities. The combination of conventional and sub-word level vector processing results in a 2-dimensional extension that combines the best of each one, including a reduction in the number of operations, lower fetch bandwidth requirements, simplicity of the control unit, power efficiency, scalability, and support for multimedia specific features such as saturation or reduction. This enhancement has a minimal impact on the VLIW core and reaches more parallelism than wider issue µSIMD implementations at a lower cost. Similar proposals have been successfully evaluated for superscalar cores. In this thesis, we demonstrate that 2-dimensional Vector-µSIMD extensions are also effective with static scheduling, allowing for high-performance cost-effective implementations.

Keywords

memory disambiguation; VLIW; SIMD extensions; ILP; vector processing; DLP; multimedia; computer architecture

Subjects

004 - Computer science and technology. Computing. Data processing

Documents

01_salamiSanJuan.pdf

1.114Mb

 

Rights

ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

This item appears in the following Collection(s)