Δ nttp://www.ktuonline.com (6) # APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY SECOND SEMESTER M.TECH DEGREE EXAMINATION, MAY 2016 Branch: Computer Science & Information Technology (All Specializations) #### 01CS6102: Parallel Computer Architecture Time: 3Hrs Max.Marks:60 #### Part A (Each Question Carries 9 Marks) - 1. a) Discuss Flynn's Taxonomy based architectures specifying the application of each one. (5) - b) We have a program of 1000 instructions in the format of "lw, add, lw, add, ...." The add instruction depends (and only depends) on the lw instruction right before it. The lw instruction depends (and only depends) on the add instruction right before it. If the program is executed on the pipelined datapath with 5 stages (IF-ID&DR-EXE-MEM-WB). - (1) What would be the actual CPI if operand forwarding is permitted? - (2) Without forwarding, what would be the actual CPI? Format : LOAD Rdest, #constant(Rx) ADD Rdest,Rsrc1,Rsrc2 (4) - 2. a) Suppose we are designing an instruction set architecture with 28-bit instructions and 44 different opcodes. Immediate operands can be in the range of E512. How many registers can this data path have? Assume we would like to support an R-type and an I-type instruction format with the same operand number and types used in the MIPS format. Fill up the following - (1) For I-type: Opcode: ? Dest. Register: ? Source1 Reg: ? Immediate: ? - (2) For R-type:Opcode: ? Dest. Register: ? Source1 Reg: ? Source2 Reg: ? (3) - b) Explain some basic compiler techniques for exposing Instruction Level Parallelism - 3. a) Suppose you have the following instruction sequence to be executed lw \$1, 0(\$7) addi \$1, \$1, 1 sw \$10, 10(\$7) lw \$2, 0(\$8) addi \$2, \$2, 1 sw \$20, 10(\$8) Rearrange the instruction sequence so that it achieves the same functionality but best performance (shortest execution time). You are only allowed to change the order of the six instructions. Do not modify or add new instructions. Calculate the execution time of the instruction sequence you rearranged. (3) b) What do you understand by branch prediction and explain correlating branch predictor? ## Δ ### Part B (Each Question Carries 9 Marks) - 4. a) Complete the following table using Tomasulo's algorithm with reservation stations and Reorder Buffer. - Assume the following information about functional units. | Functional unit type | Cycles in Ex | |----------------------|--------------| | Integer Mul | 2 | | Integer Div | 10 | | Integer Add | 1 | - Assume processor can issue into the reservation stations and reorder buffer only one instruction per cycle. - iii) Assume you have unlimited reservation stations, functional units, reorder buffer entries and CDB. - iv) The Functional units are not pipelined. http://www.ktuonline.com - v) Fill in the cycle numbers in each pipeline stage for each instruction. For each instruction indicate where its source operand's are read from (use RF for register file, CDB for common data bus and ROB for Reorder Buffer). - vi) Also for simplicity when an operand is waiting for an execution unit's result just indicate as waiting on CDB, instead of the number of the execution unit. http://www.ktuonline.com - vii) An instruction waiting for data on CDB can move to its EX stage in the cycle after the CDB broadcast. - viii) Assume that integer instructions also follow Tomasulo's algorithm so the result from the integer functional unit is also broadcast on CDB and forwarded to dependent instructions through CDB. Some of the entries for the instructions and the issue stage are already filled in. | | Instructions issued | Issue | Operand1 source | Operand2 source | EX | WB | Comm | |---|---------------------|-------|-----------------|-----------------|----|----|------| | | | | | | | | | | 1 | | 1 | RF | RF | 2 | 4 | 5 | | | MUL R2.R6.R12 | | _ | | | | | | 2 | i | 2 | RF | CDB | | | | | | DIV R1.R1.R2 | | | | | | | | 3 | | 3 | | | | | | | | ADD R5.R1.R3 | | | | | | | | 4 | | 4 | | | | | | | | ADDI R7.R5.4 | | | | | | | | 5 | | 5 | | | | | | | | ADD R5.R6.R8 | | | | | | | | 6 | | 6 | | | | | | | | ADDI R8.R8.2 | | | | | | | | 7 | - 1 | 7 | | | | | | | | ADD R9.R6.R9 | | | | | | | | 8 | | 8 | | | | | | | | ADD R5.R5.R10 | | | | | | | | 9 | | 9 | | | | | | | | ADD R6,R8,R5 | 9 | | | | | ļ | (6) http://www.ktuonline.com | | | 1 | |----|-------------------------------------------------------------------------------------------------|-----| | | b) Illustrate the use of Vector Mask Registers Vector architecture. | (3) | | 5 | 5. a) Explain the concept of Branch Target Buffer. | (3) | | | b) Write notes on the following | | | | i) ISA of intel core-i7 | | | | ii) Pipelining in intel core-i7 | (6) | | | 6. a) Discuss about ILP Wall | (5) | | | b) Compare Vector architecture and GPU. | (4) | | | Part C (Each Question Carries 12 Marks) | | | 7 | a) Draw the switch settings of an 8x8 Omega network built with 2x2 switches for the permutation | | | | (0,6,4,7,3)(1,5)(2). Explain the routing of a message from input 110 to 100 | (6) | | | b) Discuss directory based cache coherence protocol | (6) | | 8 | . a) Compare C-Access and S-Access memory schemes. | (6) | | | b) Explain Snooping based cache coherence protocol | (6) | | ). | a) What do you understand by blocking and non-blocking switching networks | (6) | | - | b) Discuss distributed and shared memory system in Multiprocessor architecture | (6) | | | | | http://www.ktuonline.com Whatsapp @ 9300930012 Your old paper & get 10/-पुराने पेपर्स भेजे और 10 रुपये पार्ये, Paytm or Google Pay से