# CS 61C Spring 2024

PRINT Your Name: \_\_\_\_

PRINT Your Student ID: \_\_\_\_\_

You have 170 minutes. There are 10 questions of varying credit and difficulty (100 points total).

| Question: | 1  | 2  | 3  | 4  | 5  | 6  | 7 | 8 | 9  | 10 | Total |
|-----------|----|----|----|----|----|----|---|---|----|----|-------|
| Points:   | 12 | 12 | 13 | 10 | 13 | 10 | 9 | 8 | 13 | 0  | 100   |

For questions with **circular bubbles**, you may select only one choice.

- **O** Unselected option (completely unfilled)
- Only one selected option (completely filled)
- On't do this (it will be graded as incorrect)

Anything you write outside the answer boxes or you <del>cross out</del> will not be graded. If you write multiple answers, your answer is ambiguous, or the bubble/checkbox is not entirely filled in, we will grade the worst interpretation. For coding questions with blanks, you may write at most one statement per blank and you may not use more blanks than provided.

If an answer requires hex input, you must only use capitalized letters (OxDEADBEEF instead of Oxdeadbeef). For hex and binary, please include prefixes in your answers unless otherwise specified, and do not truncate any leading 0's. For all other bases, do not add any prefixes or suffixes.

## Write the statement below in the same handwriting you will use on the rest of the exam.

I have neither given nor received help on this exam (or quiz), and have rejected any attempt to cheat; if these answers are not my own work, I may be deducted up to 0x0123 4567 89AB CDEF points.

SIGN your name: \_\_\_\_\_

☐ You can select

- multiple squares
- (completely filled)

For questions with square checkboxes,

you may select one or more choices.



Yan, Yokota Final

# Q1 61Collection (Potpourri)

Q1.1 (1 point) Convert the **8-bit** unsigned binary **0b0001 1011** to decimal, treating it as an unsigned integer.



Q1.2 (1 point) Convert the decimal -510 to a **12-bit** two's complement hexadecimal. If it cannot be represented, write "None".



Q1.3 (1.5 points) What value does 0x61C0 0000 represent, if interpreted as an IEEE-754 single precision floating point number? Express your answer as  $x \times 2^y$ , where x is a base-10 number such that  $1 \le |x| < 2$  and y is a base-10 integer. For NaN, infinities, or zeros, please leave the boxes blank and fill in the corresponding bubble. Otherwise, leave the bubbles blank.



Q1.4 (0.5 points) A user program can request an operating system service, such as printing a string or reading a file, by issuing a system call.

O True O False

For Q1.5 – Q1.7 indicate the stage of CALL that...

Q1.5 (0.5 points) ...produces pseudoinstructions.

| O Compiler | O Assembler | O Linker | O Loader |
|------------|-------------|----------|----------|
|------------|-------------|----------|----------|

Q1.6 (0.5 points) ...produces machine code for the RISC-V instruction bne x0 x0 8.

| Ο | Compiler | 🔿 Assembler 🛛 🕻 | 🔿 Linker | O Loader |
|---|----------|-----------------|----------|----------|
|---|----------|-----------------|----------|----------|

- Q1.7 (0.5 points) ...produces machine code for the RISC-V instruction la t0 magic\_num, assuming that magic\_num is a label that points to the data segment.
  - O Compiler O Assembler O Linker O Loader
- Q1.8 (1.5 points) Suppose we have two 16-bit integer arrays, each with 73 elements, and we want to compute their element-wise product. If our RISC-V CPU has 128-bit registers and supports **vmul**, an instruction that performs 16-bit elementwise vector multiplication, what is the minimum number of **mul** and **vmul** instructions required to multiply these two vectors?

Instructions

Q1.9 (1.5 points) Suppose we have a program that takes 20 minutes to complete on a system with one core and takes 10 minutes to complete on a system with four cores. What fraction of this program is parallelizable? Express your answer as a simplified fraction.



Q1.10 (1.5 points) Given a cache with a hit rate of 80%, a hit time of 5 cycles, and a miss penalty of 20 cycles, what is the Average Memory Access Time (AMAT) for the system? Express your answer as an integer, rounding up if necessary.

For Q1.11 – Q1.14, choose which section of memory the value would live in. Assume this program compiles successfully.



Q1.11 (0.5 points) facade

| O Stack                           | O Heap | O Static | O Code |
|-----------------------------------|--------|----------|--------|
| Q1.12 (0.5 points) <b>*facade</b> |        |          |        |
| O Stack                           | O Heap | O Static | O Code |
| Q1.13 (0.5 points) <b>*brun</b>   |        |          |        |
| O Stack                           | O Heap | O Static | O Code |
| Q1.14 (0.5 points) adam           |        |          |        |
| O Stack                           | O Heap | O Static | O Code |

# Q2 61C16 (RISCV)

#### (12 points)

A palindrome is a sequence of characters that reads the same backward and forward. For example, "civic" and "redder" are palindromes, but "wave" and "canal" are not palindromes.

Implement the RISC-V function find\_palindrome that takes as input a nonempty null-terminated string in a0 and its length (excluding the null-terminator) in a1. The function should return 1 in a0 if the string is a palindrome and 0 in a0 otherwise. Assume the input string contains only lowercase letters.



You may only use registers a0, a1, t5 and t6.

## Q3 61Control Logic

(13 points)

Consider the standard 5-stage pipeline included in the CS 61C Reference Card. For each of the control logic signals below, indicate which stage the control signal would be used.

| Q3.1 (1 point) BrU | n    |      |     |      |
|--------------------|------|------|-----|------|
| O IF               | O ID | O EX | O M | O WB |
| Q3.2 (1 point) Reg | WEn  |      |     |      |
| O IF               | O ID | O EX | О М | O WB |
| Q3.3 (1 point) Mem | RW   |      |     |      |
| O IF               | O ID | O EX | O M | O WB |
| Q3.4 (1 point) WBS | el   |      |     |      |
| O IF               | O ID | O EX | О М | O WB |

Q3.5 (3 points) Consider the following RISC-V code:

| 1 | addi t3 x0 8    |
|---|-----------------|
| 2 | addi t4 x0 6    |
| 3 | lw t1 0(t3)     |
| 4 | bne t3 t4 label |
| 5 | addi t3 t3 4    |
| 6 | label:          |
| 7 | addi x0 x0 0    |

Identify the **first four hazards** that occur when the above program is run on the standard 5stage pipeline included in the CS 61C Reference Card. Assume that the register file implements write-then-read (i.e. it allows reading out the data being written in the same cycle).

For each hazard, indicate the type of hazard and the line number(s) of the instruction(s) involved. If there are fewer than four hazards, select "None" for the unused rows.



Final (Question 3 continues...)

We can mitigate some hazards by adding a forwarding path from the ALU result directly back into the execute stage.

For Q3.6 – Q3.12 on the following page, assume we are working with the standard 5-stage pipeline included in the CS 61C Reference Card with the modifications shown below to implement this forwarding path. The diagram shows a subset of the datapath (specifically the EX stage), and modifications are shown in dotted lines.



Q3.6 (1 point) What type of hazard does this forwarding path attempt to mitigate?

O Structural O Data O Control

The described implementation introduces two new control signals: AFwd and BFwd. These control signals determine which, if any, forwarding path should be used. If a stall is unavoidable, AFwd and BFwd may be either 0 or 1. Assume instruction bits are zero-indexed.

For the two boxed statements below, fill in the blanks such that they correctly describe AFwd and BFwd.

| AFwd should be                                                                                                                     | AFwd should be 1 only when                                           |                                      |                                                             |       |  |  |
|------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|--------------------------------------|-------------------------------------------------------------|-------|--|--|
| - bits $\underline{Q3.7}$ (inclusive) of the instruction in EX are nonzero and equal to bits $\underline{Q3.8}$ (inclusive) of the |                                                                      |                                      |                                                             |       |  |  |
| instruction                                                                                                                        | in M and                                                             |                                      |                                                             |       |  |  |
| - all of the se                                                                                                                    |                                                                      | $\frac{Q_{5.9}}{M}$ are true.        |                                                             |       |  |  |
|                                                                                                                                    |                                                                      |                                      |                                                             |       |  |  |
| Q3.7 (0.5 points)                                                                                                                  | bits                                                                 | to                                   |                                                             |       |  |  |
|                                                                                                                                    |                                                                      |                                      |                                                             |       |  |  |
|                                                                                                                                    | 1                                                                    |                                      |                                                             |       |  |  |
| Q3.8 (0.5 points)                                                                                                                  | bits                                                                 | to                                   |                                                             |       |  |  |
| Q3.9 (1.5 points)                                                                                                                  | Select as few c                                                      | conditions as possible               | e to maintain the correct behavior.                         |       |  |  |
| $\Box$ The in                                                                                                                      | ☐ The instruction in M writes to rd ☐ The instruction in EX uses rs1 |                                      |                                                             |       |  |  |
| ☐ The in                                                                                                                           | struction in M                                                       | l is not a load                      | ☐ The instruction in EX uses <b>rs2</b>                     |       |  |  |
| ☐ The in                                                                                                                           | struction in E                                                       | X is not a store                     | O None of the above                                         |       |  |  |
|                                                                                                                                    |                                                                      |                                      | -                                                           |       |  |  |
| BFwd should be                                                                                                                     | 1 only when                                                          |                                      |                                                             |       |  |  |
| - bits <u>Q3.10</u> (i                                                                                                             | inclusive) of th                                                     | ne instruction in EX a               | are nonzero and equal to bits $\underline{Q3.11}$ (inclusiv | e) of |  |  |
| the instruct                                                                                                                       | ion in M <b>and</b>                                                  |                                      |                                                             |       |  |  |
| - all of the se                                                                                                                    | lected conditio                                                      | ons in $\underline{Q3.12}$ are true. |                                                             |       |  |  |
|                                                                                                                                    |                                                                      |                                      |                                                             |       |  |  |
| Q3.10 (0.5 points)                                                                                                                 | bits                                                                 | to                                   |                                                             |       |  |  |
|                                                                                                                                    |                                                                      |                                      |                                                             |       |  |  |
|                                                                                                                                    | 1                                                                    |                                      |                                                             |       |  |  |
| Q3.11 (0.5 points)                                                                                                                 | bits                                                                 | to                                   |                                                             |       |  |  |
| 23.12 (1.5 points) Select as few conditions as possible to maintain the correct behavior.                                          |                                                                      |                                      |                                                             |       |  |  |
| The instruction in M writes to <b>rd</b>                                                                                           |                                                                      |                                      | ☐ The instruction in EX uses <b>rs1</b>                     |       |  |  |
| ☐ The instruction in M is not a load                                                                                               |                                                                      |                                      | ☐ The instruction in EX uses <b>rs2</b>                     |       |  |  |
| $\square$ The instruction in FX is not a store                                                                                     |                                                                      |                                      | igcap None of the above                                     |       |  |  |
| The instruction in EX is not a store O None of the above                                                                           |                                                                      |                                      |                                                             |       |  |  |
|                                                                                                                                    |                                                                      |                                      | 0                                                           |       |  |  |

# Q4 61CO (FSM)

Q4.1 (4 points) We are designing an FSM for a carbon monoxide (CO) detector that receives input every half-second. A 0 indicates normal CO levels, and a 1 indicates dangerous levels. When at least two of the last three inputs are 1, the detector activates and outputs 1's indefinitely. When not yet activated, it outputs 0's. The FSM starts in state 00. **Complete the transition table below to match this behavior, filling in the next state and output for each row.** Some boxes are pre-filled for you.

For example, if the input to this FSM was 0b01 0010 1011 1000 1001,

the output should be 0b00 0000 1111 1111 1111.

- CS and NS represent the Current State and Next State of the FSM respectively
- In represents the input to the FSM (whether or not the CO level is dangerous)
- **Out** represents the output of the FSM (whether or not the detector is activated)

| CS | In | NS | Out |
|----|----|----|-----|
| 00 | 0  | 00 | 0   |
| 00 | 1  |    |     |
| 01 | 0  | 10 |     |
| 01 | 1  |    |     |
| 10 | 0  |    |     |
| 10 | 1  |    |     |
| 11 | 0  | 11 |     |
| 11 | 1  |    |     |

Q4.2 (2 points) We want to make our detector more sensitive. Now, the detector should output 1's indefinitely if **two of the last 12 samples** are 1's. What is the minimum number of states required to implement this updated detector as an FSM?

Q4.3 (4 points) A state transition table for a **different** FSM is shown below. Complete the circuit below to implement this FSM using only AND, OR, and NOT gates. Your implementation should minimize the critical path delay. Assume wires have negligible delay and each logic gate has the same delay.

| CS | In | NS | Out |
|----|----|----|-----|
| 00 | 0  | 00 | 0   |
| 00 | 1  | 01 | 0   |
| 01 | 0  | 00 | 0   |
| 01 | 1  | 11 | 0   |
| 10 | 0  | 00 | 1   |
| 10 | 1  | 01 | 1   |
| 11 | 0  | 00 | 1   |
| 11 | 1  | 11 | 0   |



#### Q5 61Caches

(13 points)

Suppose our computer has 64KiB of memory and a 32B direct-mapped cache with 8B blocks.

Q5.1 (1.5 points) How many tag, index, and offset bits are in each address?



For Q5.2 – Q5.10, consider running the following program on our computer, assuming the cache starts off cold.

1 #define NUM\_INTS 24
2 int32\_t arr[NUM\_INTS + 2]; // arr is located at address 0x2000
3 for (register int32\_t i = 0; i < NUM\_INTS; i += 1) {
4 arr[i + 2] = arr[i + 1] + arr[i]; // arr[i + 1] is accessed before arr[i]
5 }</pre>

Q5.2 (1 point) For each iteration of the for loop, how many memory accesses are there?

Q5.3 (1 point) For the **first** iteration of the **for** loop (i = 0), how many **hits** are there?



Q5.4 (1.5 points) Which of the following iterations would have the same number of hits as the **first** iteration of the **for** loop (i = 0)?

| □ i = 3 | □ i = 6  | □ i = 22                     |
|---------|----------|------------------------------|
| □ i = 4 | □ i = 7  | □ i = 23                     |
| 🗌 i = 5 | 🗌 i = 21 | $\bigcirc$ None of the above |

Q5.5 (1 point) For the **second** iteration of the **for** loop (**i** = 1), how many **hits** are there?

Q5.6 (1.5 points) Which of the following iterations would have the same number of hits as the **second** iteration of the for loop (i = 1)?

| □ i = 3 | □ i = 6  | □ i = 22                   |
|---------|----------|----------------------------|
| □ i = 4 | 🗌 i = 7  | □ i = 23                   |
| □ i = 5 | □ i = 21 | <b>O</b> None of the above |

Q5.7 (2.5 points) Considering all memory accesses for this program, how many compulsory misses, non-compulsory misses, and hits are there?

Compulsory: Non-compulsory: Hits:

(Question 5 continued...)

The program on the previous page has been copied below for your convenience:

| 1 | #define NUM_INTS 24                                                                  |
|---|--------------------------------------------------------------------------------------|
| 2 | <pre>int32_t arr[NUM_INTS + 2]; // arr is located at address 0x2000</pre>            |
| 3 | <pre>for (register int32_t i = 0; i &lt; NUM_INTS; i += 1) {</pre>                   |
| 4 | <pre>arr[i + 2] = arr[i + 1] + arr[i]; // arr[i + 1] is accessed before arr[i]</pre> |
| 5 | }                                                                                    |

For Q5.8 – Q5.10, assume each subpart is independent. As a reminder, our original cache is a **32B direct-mapped cache with 8B blocks**.

- Q5.8 (1 point) If we change our original cache to a **24B** direct-mapped cache with 8B blocks, the hit rate for this program...
  - O increases. O remains the same. O decreases.
- Q5.9 (1 point) If we change our original cache to a 32B direct-mapped cache with **16B** blocks, the hit rate for this program...

| O increases. | O remains the same. | O decreases. |
|--------------|---------------------|--------------|
|--------------|---------------------|--------------|

Q5.10 (1 point) If we change our original cache to a 32B **fully associative** cache with 8B blocks **and a LRU replacement policy**, the hit rate for this program...

| O increases. | O remains the same. | O decreases. |
|--------------|---------------------|--------------|
|--------------|---------------------|--------------|

#### **Q6** 61Confusing (Virtual Memory)

(10 points)

For Q6.1 to Q6.3, suppose we have a 4GiB virtual memory space, a 64KiB physical memory space, 128B pages, and a 4-entry fully associative TLB with a most-recently used (MRU) replacement policy.

Q6.1 (2.5 points) How many bits are in the offset, virtual page number (VPN), and physical page number (PPN)?



Q6.2 (1.5 points) How many entries are in the page table? Write your answer as a power of 2 (e.g.  $2^5$ ).



Q6.3 (1.5 points) Consider the following program executed by a particular process.

```
1 #define NUM_INTS 256
2 int32_t arr[NUM_INTS];
3 arr[0] = 0;
4 for (register int32_t i = 16; i < NUM_INTS; i += 16) {
5 arr[i] = i;
6 }</pre>
```

Suppose the address of **arr** is **0x1000** 0000 and the execution of line 3 results in a TLB hit. When executing the **for** loop, what is the **minimum** number of TLB hits that may occur?

For Q6.4 – Q6.6, suppose we have a system as follows:

- 32-bit virtual addresses and 16-bit physical addresses
- 8-bit offsets
- One free physical page with PPN 0x49
- 4-entry fully associative TLB

The current state of the TLB and first 5 entries of the page table are shown below.

| TLB   |      |      |      |  |  |  |  |  |
|-------|------|------|------|--|--|--|--|--|
| Valid | VI   | PPN  |      |  |  |  |  |  |
| 1     | 0x00 | 0005 | 0x12 |  |  |  |  |  |
| 0     | 0x00 | 0001 | 0x18 |  |  |  |  |  |
| 1     | 0x00 | 0002 | 0x45 |  |  |  |  |  |
| 1     | 0x00 | 0003 | 0x78 |  |  |  |  |  |

| Page table        |      |  |  |  |  |  |
|-------------------|------|--|--|--|--|--|
| (first 5 entries) |      |  |  |  |  |  |
| PTE               |      |  |  |  |  |  |
| 0xB256            | 0670 |  |  |  |  |  |
| 0xC490            | 8924 |  |  |  |  |  |
| 0x9845            | 6745 |  |  |  |  |  |
| 0xA158            | 9078 |  |  |  |  |  |
| 0x7256            | 0645 |  |  |  |  |  |
| • • •             |      |  |  |  |  |  |

Each page table entry (PTE) is 4 bytes:

| 31    | 30 7              | 0   |
|-------|-------------------|-----|
| Valid | Other status bits | PPN |

For each of the following virtual addresses, translate it into its corresponding physical address and determine what will happen during address translation. Assume each access occurs independently (not sequentially).

Q6.4 (1.5 points) 0x0000 01C9

0x

Q6.5 (1.5 points) 0x0000 0340

0x

Q6.6 (1.5 points) 0x0000 0424

0x

- O TLB hit
- **O** TLB miss and page table hit
- O Page fault
- O TLB hit
- **O** TLB miss and page table hit
- O Page fault
- O TLB hit
- O TLB miss and page table hit
- O Page fault

#### **Q7** 61Calculating $\pi$ (OpenMP)

#### (9 points)

To estimate the value of  $\pi$  in C, we can generate random points within a unit square  $[0, 1] \times [0, 1]$ , count how many are within the unit circle, and use the following equation:

$$\pi \approx 4 \times \frac{\text{Points in Circle}}{\text{Total Points}}$$

A working implementation of estimate\_pi that uses exactly tot\_points points to estimate  $\pi$  is shown below. You may assume that random\_01 is a function that returns a number uniformly at random from the range [0, 1) and can be called by multiple threads without changing its behavior. Note that a commented out OpenMP directive does nothing.

```
1 double estimate_pi(int tot_points) {
     int points_inside = 0;
 2
3
     // #pragma omp parallel
4
     {
5
       // #pragma omp parallel for
6
       for (int i = 0; i < tot_points; i++) {</pre>
7
         double x = random_01();
8
         double y = random_01();
         if (x * x + y * y \le 1) \{ // Check if point is inside \}
9
10
           // #pragma omp critical
           points_inside++;
11
12
         }
       }
13
14
     }
15
     return 4 * ((double) points_inside / (double) tot_points);
16 }
```

For Q7.1 – Q7.4, select the below behavior that **best** describes the new behavior of the code if we include OpenMP directives by uncommenting the specified line(s). Each subpart is independent.

**Behavior A:** Incorrectly estimates  $\pi$ , or does not generate tot\_points points.

**Behavior B:** Correctly estimates  $\pi$  using tot\_points points faster than the original implementation. Behavior C: Correctly estimates  $\pi$  using tot\_points points slower than the original implementation.

Q7.1 (1 point) Uncomment line 3.

| <b>O</b> Behavior A           | O Behavior B     | <b>O</b> Behavior C |
|-------------------------------|------------------|---------------------|
| Q7.2 (1 point) Uncomment line | e 5.             |                     |
| O Behavior A                  | O Behavior B     | <b>O</b> Behavior C |
| Q7.3 (1 point) Uncomment line | e 3 and line 10. |                     |
| O Behavior A                  | O Behavior B     | <b>O</b> Behavior C |
| Q7.4 (1 point) Uncomment line | e 5 and line 10. |                     |
| O Behavior A                  | O Behavior B     | O Behavior C        |

Implement the function  $estimate_pi_fixed$  such that it correctly estimates  $\pi$  using  $exactly tot_points$  points, and is faster than the original  $estimate_pi$  above (i.e. without any lines commented out). If you do not need a blank, then leave it empty.



# Q8 61Cool C Question (SIMD)

#### (8 points)

Implement **abs\_sum**, a function that takes in a large array of integers **arr** of length **arr\_size** and calculates the sum of the absolute values of each element.

For example, the abs\_sum of the array [1, -2, -3, 4, 5, -6, 7, -8, 9] is 45.

For full credit, your implementation must run as fast as possible. You may assume **abs** is a function that correctly returns the absolute value of the argument.

- vector vec\_load(uint32\_t \*A): Loads 4 integers from memory address A into a vector
- vector vec\_store(vector \*mem\_addr, vector A): Stores vector A to mem\_addr
- vector vec\_setnum(uint32\_t num): Returns a vector where every element is equal to num
- vector vec\_and(vector A, vector B): Computes the bitwise AND between each pair of corresponding vector elements in A and B, and returns a new vector with the result
- vector vec\_or(vector A, vector B): Computes the bitwise OR between each pair of corresponding vector elements in A and B, and returns a new vector with the result
- vector vec\_xor(vector A, vector B): Computes the bitwise XOR between each pair of corresponding vector elements in A and B, and returns a new vector with the result
- vector vec\_sra(vector A, vector count): Shifts each pair of corresponding vector elements in A to the right by count with sign extension, and returns a new vector with the result
- vector vec\_add(vector A, vector B): Adds A and B together elementwise, and returns a new vector with the result
- vector vec\_sub(vector A, vector B): Subtracts B from A elementwise, and returns a new vector with the result

```
1 int abs_sum(const int *arr, int arr_size) {
2
    vector sum = vec_setnum(0);
    vector shift = vec_setnum(_____);
3
    for (int i = 0; i < _____; ____) {
4
      vector vec = vec_load((vector *)(arr + i));
5
6
      vector mask = vec_sra(vec, shift);
7
      vec = vec_xor(vec, mask);
                            _____(vec, mask);
8
      vec = _____
                      08.4
                      _____(sum, vec);
      sum = _____
9
10
    }
11
    int result[4];
                      ____((vector *)result, sum);
12
               08 6
13
    for (int i = /* Blank Q8.2 */; i < arr_size; i += 1) {</pre>
      result[0] += abs(arr[i]);
14
15
    }
    return result[0] + result[1] + result[2] + result[3];
16
17 }
```

### Q9 61Core of the Multi Variety Matrix Multiplication (PLP)

(13 points)

Note: This question was inspired by the game TIS-100, written by Zach Barth.

Multiprocess programming is great, but communication is so expensive... Taking inspiration from neural networks, we create new *micro-CPUs*, which work as follows:

- They have a reduced instruction set of RISC-V, where there are no load or store instructions. Assume DMEM does not exist in our micro-CPUs because data memory accesses are "too slow".
- They also have built-in communication systems to "connect" (i.e. wire) each micro-CPU to up to four other micro-CPUs:
  - For each of the four directions (up, down, left, and right), at most one micro-CPU is connected.
  - All micro-CPUs share a clock but have separate IMEMs, PCs, and RegFiles.
  - To communicate, two additional 32-bit RISC-V instructions are added:
    - send rs1 direction starts a "send operation" to the micro-CPU in the specified direction. This operation will stall until the destination micro-CPU runs a corresponding recv. Once both sender and receiver are ready, sends the value in rs1 to the receiver.
    - recv rd direction starts a "receive operation" from the micro-CPU in the specified direction. This operation will stall until the source micro-CPU runs a corresponding send. Once both sender and receiver are ready, saves the received value in rd.

Warmup: Suppose we connect two micro-CPUs that run programs **A** and **B** as follows:

#### **Micro-CPU Layout**



- In the layout above, each cell represents one micro-CPU, and adjacent cells represent a connection.
   For example, the micro-CPU labeled A has right connected to the micro-CPU labeled B and left, up, and down connected to nothing.
- The label on each micro-CPU represents the program it will execute. For example, the micro-CPU labeled A will execute the program named **A**.

We would like to pass the value stored in the left micro-CPU's a0 register to the right micro-CPU's a0 register. If the value in a0 of the left CPU is 10, then the following pair of diagrams shows each micro-CPU's a0 registers before and after running their respective programs. A value of ? indicates that it can be any value.

| a0 Before Program Execution |   | Execution | a0 After P | rogi | am | Execution |
|-----------------------------|---|-----------|------------|------|----|-----------|
| 10                          | ? |           |            | ?    | 10 |           |

**Implement programs A and B** such that the micro-CPUs match the expected behavior described above. After each program is done, it should jump (or branch) to the label exit.

| A:      | B:      |
|---------|---------|
| send a0 | recv a0 |
| Q9.1    | Q9.2    |
| j exit  | j exit  |

We now decide to implement parallel matrix multiplication by running seven programs, **A** through **G**, on micro-CPUs as follows:

- Input in a0 of the micro-CPUs
  - Micro-CPUs labeled B contain elements of the matrix W.
  - Micro-CPUs labeled D contain elements of the matrix X.
  - Micro-CPUs labeled A and C contain the value 0.
  - All other micro-CPUs may contain any value.
- Output in **a0** of the micro-CPUs
  - Micro-CPUs labeled **E** should contain elements of the output matrix.
  - All other micro-CPUs may contain any value.

For example, let's consider the following matrix multiplication operation along with the corresponding micro-CPU layout. The values in the **a0** registers in each of the micro-CPUs before execution and the expected values after execution are also shown below.

#### **Example: Matrix Multiplication Operation**

|   | Μ    | atrix | W    |          | ۸/     | [at: | iv ' | v  |  |
|---|------|-------|------|----------|--------|------|------|----|--|
|   | Γ1   | 10    | 1007 | г        | 1V.    | 1ati |      | ^  |  |
|   | 10   | 100   | 1    |          | 1      | 5    | 6    | 7  |  |
|   | 100  | 1     | 10   | $\times$ | კ<br>ი | 9    | 8    | 3  |  |
|   | 1    | 100   | 10   | Ľ        | 2      | Э    | 4    | 8] |  |
|   | 0    | utput | Matr | ix       |        |      |      |    |  |
|   | [231 | 595   | 486  | 837      | 1      |      |      |    |  |
|   | 312  | 955   | 864  | 378      |        |      |      |    |  |
| = | 123  | 559   | 648  | 783      |        |      |      |    |  |
|   | 321  | 955   | 846  | 387_     |        |      |      |    |  |

#### **Example:** a0 Before Program Execution

|   | - |     |     |     |   |   |   |   |   |
|---|---|-----|-----|-----|---|---|---|---|---|
|   |   |     |     |     | 0 | 0 | 0 | 0 |   |
|   |   |     |     |     | 1 | 5 | 6 | 7 |   |
|   |   |     |     |     | 3 | 9 | 8 | 3 | n |
|   |   |     |     |     | 2 | 5 | 4 | 8 | • |
|   | 0 | 1   | 10  | 100 | ? | ? | ? | ? | ? |
| Ì | 0 | 10  | 100 | 1   | ? | ? | ? | ? | ? |
|   | 0 | 100 | 1   | 10  | ? | ? | ? | ? | ? |
|   | 0 | 1   | 100 | 10  | ? | ? | ? | ? | ? |
|   |   |     |     |     | ? | ? | ? | ? |   |
|   |   |     |     |     |   |   |   |   |   |

# Micro-CPU Layout

|   |             |                         | C                           | •••                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | C                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                         |                                                      |
|---|-------------|-------------------------|-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------|
|   |             |                         | D                           | •••                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | D                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                         |                                                      |
|   |             |                         | ÷                           | ·.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | :                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                         |                                                      |
|   |             |                         | D                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | D                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                         |                                                      |
| В |             | В                       | Е                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | Е                                                                                                                                                                                                                                                                                                                                           | F                                                                                                                                                                                                                       |                                                      |
| : | ·           | :                       | ÷                           | ·.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | ÷                                                                                                                                                                                                                                                                                                                                           | ÷                                                                                                                                                                                                                       |                                                      |
| В |             | В                       | Е                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | Е                                                                                                                                                                                                                                                                                                                                           | F                                                                                                                                                                                                                       |                                                      |
|   |             |                         | G                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | G                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                         |                                                      |
|   | B<br>:<br>B | B ····<br>∃ ∵.<br>B ··· | B ··· B<br>: ∴ :<br>B ··· B | C         D         :         D         :         D         :         D         :         D         :         D         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         :         : <t< td=""><td>C          D          E          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          C      </td><td>C     ····     C       D     ····     D       E     ···     E       D     ···     D       E     ···     E       E     ···     E</td><td><math display="block">\begin{array}{cccccccccccccccccccccccccccccccccccc</math></td></t<> | C          D          E          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          B          C | C     ····     C       D     ····     D       E     ···     E       D     ···     D       E     ···     E       E     ···     E | $\begin{array}{cccccccccccccccccccccccccccccccccccc$ |

#### **Example: Micro-CPU Layout**

|   | 1 |   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|---|---|
|   | С | С | С | С |   |   |   |   |
|   | D | D | D | D |   |   |   |   |
|   | D | D | D | D |   |   |   |   |
|   | D | D | D | D |   |   |   |   |
| F | Е | E | E | E | В | В | В | A |
| F | Е | Е | Е | Е | В | В | В | A |
| F | Е | Е | Е | Е | В | В | В | A |
| F | E | Е | Е | Е | В | В | В | A |
|   | G | G | G | G |   |   |   |   |

#### **Example:** a0 After Program Execution

|   |   |   |   | ?   | ?   | ?   | ?   |   |
|---|---|---|---|-----|-----|-----|-----|---|
|   |   |   |   | ?   | ?   | ?   | ?   |   |
|   |   |   |   | ?   | ?   | ?   | ?   |   |
|   |   |   |   | ?   | ?   | ?   | ?   |   |
| ? | ? | ? | ? | 231 | 595 | 486 | 837 | ? |
| ? | ? | ? | ? | 312 | 955 | 864 | 378 | ? |
| ? | ? | ? | ? | 123 | 559 | 648 | 783 | ? |
| ? | ? | ? | ? | 321 | 955 | 846 | 387 | ? |
|   |   |   |   | ?   | ?   | ?   | ?   |   |

Final (Question 9 continues...)

**Implement programs A through G** such that the micro-CPUs match the expected behavior described above and follows calling convention. After each program is done, it should jump (or branch) to the label exit.



Page 19 of 20

# Q10 61Colorless Epilogue

(0 points)

These questions will not be assigned credit; feel free to leave them blank.

Q10.1 The Coppersmith-Winograd algorithm to perform matrix multiplication has a time complexity of  $O(n^{2.3737})$ . What is the runtime of the matrix multiplication performed in Q9?



Q10.2 Decorate the 61Car.



Q10.3 If there's anything else you want us to know, or you feel like there was an ambiguity in the exam, please put it in the box below.

For ambiguities, you must qualify your answer and provide an answer for both interpretations. For example, "if the question is asking about A, then my answer is X, but if the question is asking about B, then my answer is Y". You will only receive credit if it is a genuine ambiguity and both of your answers are correct. We will only look at ambiguities if you request a regrade.