|
| 1 | + |
| 2 | +BUFFERED General Ufunc explanation: |
| 3 | + |
| 4 | +We need to optimize the section of ufunc code that handles mixed-type |
| 5 | +and misbehaved arrays. In particular, we need to fix it so that items |
| 6 | +are not copied into the buffer if they don't have to be. |
| 7 | + |
| 8 | +Right now, all data is copied into the buffers (even scalars are copied |
| 9 | +multiple times into the buffers even if they are not going to be cast). |
| 10 | + |
| 11 | +Some benchmarks show that this results in a significant slow-down |
| 12 | +(factor of 4) over similar numarray code. |
| 13 | + |
| 14 | +The approach is therefore, to loop over the largest-dimension (just like |
| 15 | +the NO_BUFFER) portion of the code. All arrays will either have N or |
| 16 | +1 in this last dimension (or their would be a mis-match error). The |
| 17 | +buffer size is B. |
| 18 | + |
| 19 | +If N <= B (and only if needed), we copy the entire last-dimension into |
| 20 | +the buffer as fast as possible using the single-stride information. |
| 21 | + |
| 22 | +Also we only copy into output arrays if needed as well (other-wise the |
| 23 | +output arrays are used directly in the ufunc code). |
| 24 | + |
| 25 | +Call the function using the appropriate strides information from all the input |
| 26 | +arrays. Only set the strides to the element-size for arrays that will be copied. |
| 27 | + |
| 28 | +If N > B, then we have to do the above operation in a loop (with an extra loop |
| 29 | +at the end with a different buffer size). |
| 30 | + |
| 31 | +Both of these cases are handled with the following code: |
| 32 | + |
| 33 | +Compute N = quotient * B + remainder. |
| 34 | + quotient = N / B # integer math |
| 35 | + (store quotient + 1) as the number of innerloops |
| 36 | + remainder = N % B # integer remainder |
| 37 | + |
| 38 | +On the inner-dimension we will have (quotient + 1) loops where |
| 39 | +the size of the inner function is B for all but the last when the niter size is |
| 40 | +remainder. |
| 41 | + |
| 42 | +So, the code looks very similar to NOBUFFER_LOOP except the inner loop is |
| 43 | +replaced with... |
| 44 | + |
| 45 | +for(k=0; i<quotient+1; k++) { |
| 46 | + if (k==quotient+1) make itersize remainder size |
| 47 | + copy only needed items to buffer. |
| 48 | + swap input buffers if needed |
| 49 | + cast input buffers if needed |
| 50 | + call function() |
| 51 | + cast outputs in buffers if needed |
| 52 | + swap outputs in buffers if needed |
| 53 | + copy only needed items back to output arrays. |
| 54 | + update all data-pointers by strides*niter |
| 55 | +} |
| 56 | + |
| 57 | + |
| 58 | +Reference counting for OBJECT arrays: |
| 59 | + |
| 60 | +If there are object arrays involved then loop->obj gets set to 1. Then there are two cases: |
| 61 | + |
| 62 | +1) The loop function is an object loop: |
| 63 | + |
| 64 | + Inputs: |
| 65 | + - castbuf starts as NULL and then gets filled with new references. |
| 66 | + - function gets called and doesn't alter the reference count in castbuf |
| 67 | + - on the next iteration (next value of k), the casting function will |
| 68 | + DECREF what is present in castbuf already and place a new object. |
| 69 | + |
| 70 | + - At the end of the inner loop (for loop over k), the final new-references |
| 71 | + in castbuf must be DECREF'd. If its a scalar then a single DECREF suffices |
| 72 | + Otherwise, "bufsize" DECREF's are needed (unless there was only one |
| 73 | + loop, then "remainder" DECREF's are needed). |
| 74 | + |
| 75 | + Outputs: |
| 76 | + - castbuf contains a new reference as the result of the function call. This |
| 77 | + gets converted to the type of interest and. This new reference in castbuf |
| 78 | + will be DECREF'd by later calls to the function. Thus, only after the |
| 79 | + inner most loop do we need to DECREF the remaining references in castbuf. |
| 80 | + |
| 81 | +2) The loop function is of a different type: |
| 82 | + |
| 83 | + Inputs: |
| 84 | + |
| 85 | + - The PyObject input is copied over to buffer which receives a "borrowed" |
| 86 | + reference. This reference is then used but not altered by the cast |
| 87 | + call. Nothing needs to be done. |
| 88 | + |
| 89 | + Outputs: |
| 90 | + |
| 91 | + - The buffer[i] memory receives the PyObject input after the cast. This is |
| 92 | + a new reference which will be "stolen" as it is copied over into memory. |
| 93 | + The only problem is that what is presently in memory must be DECREF'd first. |
| 94 | + |
| 95 | + |
| 96 | + |
| 97 | + |
| 98 | + |
0 commit comments