-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hi community,
We are very new to this field. We are trying to use an NVBit to inject faults into HMMA instructions. However, it is always difficult to understand exactly what is going on. When the NVBit scans the code, the following instructions are seen: HMMA.1688.F32.BF16 R0, R184, R200.reuse, R0. In this case R0 is the register used for accumulation. We want to capture the accumulation register, emulate the calculation of the HMMA instruction and write it back to the destination register so basically R0 = AB + R0_before, where AB is emulated. Eventually, we want to inject faults into the emulation of HMMA. However, we are currently not able to capture R0 before the HMMA in an isolated manner. Obviously, we understand something wrong. Here are some code snippets that might be helpful, to understand what we do:
__device__ uint32_t c_buffer[MAX_THREADS][MAX_REGS];
__inline__ __device__ long long get_flat_tid() {
long long tid_b = threadIdx.x + (blockDim.x * (threadIdx.y + (threadIdx.z * blockDim.y)));
long long bid = blockIdx.x + (gridDim.x * (blockIdx.y + (blockIdx.z * gridDim.y)));
long long tid = tid_b + (bid * blockDim.x * blockDim.y * blockDim.z);
return tid;
}
extern "C" __device__ __noinline__ void capture_inputs(int pred, int reg) {
if (!pred) return;
long tid = get_flat_tid();
if (tid < MAX_THREADS) {
c_buffer[tid][reg+0] = nvbit_read_reg(reg+0);
c_buffer[tid][reg+1] = nvbit_read_reg(reg+1);
c_buffer[tid][reg+2] = nvbit_read_reg(reg+2);
c_buffer[tid][reg+3] = nvbit_read_reg(reg+3);
}
}
nvbit_insert_call(instr, "capture_inputs", IPOINT_BEFORE);
nvbit_add_call_arg_guard_pred_val(instr);
nvbit_add_call_arg_const_val32(instr, instr->getOperand(3)->u.reg.num); // C reg
nvbit_insert_call(instr, "insert_fault", IPOINT_AFTER);
...
We appreciate your help!