4. No. of Pages: 3 A # APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY SECOND SEMESTER M.TECH DEGREE EXAMINATION, APRIL/MAY 2018 #### Branch: COMPUTER SCIENCE & ENGINEERING Stream(s): Computer Science & Engineering #### 01CS6102:PARALLEL COMPUTER ARCHITECTURE Answer any two full questions from each part. Limit answers to the required points. Max. Marks: 60 Duration: 3 hours http://www.ktuonline.com [3] [2] #### PART A | 1. | a. | Determine the number of clock cycles required to process a program with 300 instructions in a five stage pipeline. | | | | | | | |----|----|------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|-----|--|--|--|--| | | b. | | | | | | | | | | c. | | | | | | | | | 2. | a. | Explain about three data hazards with proper examples. | | | | | | | | | b. | Illustrate the concept of loop-unrolling. [3 | | | | | | | | | c. | How many bits are there in the (2,2) branch predictor with 8K entries? How many entries are there in a (1,2) predictor with the same number of bits. | | | | | | | | 3. | a. | State Amdahl's law and derive an expression for overall speedup. [4 | | | | | | | | | b. | Find the CPU time needed for executing 100K program with a processor of speed 200 MIPS. | | | | | | | | | c. | Write the depend<br>following code.<br>ADD.D<br>MUL.D<br>S.D<br>SUB.D<br>MUL.D | F1,F2,F4 F6,F1,F8 F6,0(R1) F8,F10,F14 F14,F1,F10 | [3] | | | | | | | | | PART B | | | | | | b. Write the difference in how stores are handled in a speculative processor a. Explain how the three hazards are avoided in Tomasulo's approach. versus in Tomosulo's approach. ## http://www.ktuonline.com | | c. | What are the which support | red in instruction execution in a syste<br>culation. | em [4] | | | | | |----|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------|------------------------------------------|----------------------------------------|--------|--|--| | 5. | a. | a. Describe the Intel core i7 pipeline structure. | | | | | | | | | b. | copy of eac<br>sequence tal<br>LV<br>MULVS.D<br>LV | ch vector fi<br>ke?<br>V1,Rx<br>V2,V1,F0<br>V3,Ry | inctional ur<br>load vect;<br>vector-sc; | alar multiply | | | | | | | ADDVV.D<br>SV | V4,V2,V3<br>V4,Ry | store the | | | | | | | c. | | | • | chaining in vector processing? | [2] | | | | | | • | | • | - | | | | | 6. | a. | What are the two different approaches used to issue multiple instructions per clock in a dynamically scheduled processor? | | | | | | | | | b. | The largest configuration of a Cray T90 has 32 processors, each capable of generating 4 loads and 2 stores per clock cycle. The processor clock cycle is 2.167 ns, while the cycle time of the SRAMs used in the memory system is 15 ns. Calculate the minimum number of memory banks required to allow all processors to run at full memory bandwidth. | | | | | | | | | c. | Explain the | | _ | | [3] | | | | | | • | | | RT C | | | | | 7. | a. | Explain the | concept of | multiport m | nemory organization for a multiprocess | or [5] | | | | • | ٠. | system. | , | | | | | | | | b. | | | | | | | | | | c. | Draw an 8x8 Omega network built with 2x2 switches. Check whether the permutation (0,6,4,7,3)(1,5)(2) is blocking or non-blocking. Explain the routing of a message from 111 to 011. | | | | | | | | 8. | a. | | rence? Point out the reasons which cau | se [6] | | | | | | | b. | <ul> <li>the cache inconsistencies.</li> <li>Assume that words x1 and x2 are in the same cache block, which is shared state in the caches of both P1 and P2. Assuming the following seq of events, identify each miss as a true sharing miss, a false sharing mish hit. Any miss that would occur if the block size were one word is design true sharing miss.</li> </ul> | | | | | | | | | | | | | | | | | | | | Time | P1 | P2 | | | | | | | | | rite x1 | | | | | | | | | 2 | wite | Read x2 | | | | | | | | 3 W | rite x1 | Write x2 | | | | | | | | | ad x2 | | | | | | http://www.ktuonline.com [6] ### http://www.ktuonline.com 9. a. Explain the Snooping coherence protocols. http://www.ktuonline.com b. Explain the different types of directory structures used in case of a directory [6] based cache coherence scheme.