Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Performance issues with Raspberry Pi #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ilikecake opened this issue Feb 19, 2022 · 5 comments · Fixed by #22
Closed

Performance issues with Raspberry Pi #18

ilikecake opened this issue Feb 19, 2022 · 5 comments · Fixed by #22

Comments

@ilikecake
Copy link
Contributor

Hi All,

I was having issues with performance using this display on a Raspberry Pi. The details of my issue can be found here. It basically boiled down to two issues with the code:

  • The nested for loops in the framebuffer image() function taking a very long time to convert a PIL frame into a framebuffer.
  • Delays when sending SPI data that I speculate is because of the underlying Linux OS on the Pi limiting the rate of SPI transactions.

I was able to fix both of these issues and significantly speed up the device. (frame generation time went from ~6s per frame to .011s) I would like to submit this code back to the repo so that others can use it, but I am not sure of the implications of how this would affect the code running on other devices. I am looking for help/guidance on how this code can be implemented in a way that does not break things for other devices.

  • The solution uses numpy. I think this should be available for any device running circuit python, but I am not sure.
  • Part of the speed up comes from precomputing the static headers that need to be appended to the framebuffer as it is sent to the device. These precomputed headers are 2 bytes per line for a total of ~480bytes of extra memory usage. This is not a problem on the Raspberry Pi, but it may be a problem on more memory constrained processors.
  • In addition to that, part of the speedup comes from sending all of the data to the display as a single SPI transaction. This requires combining the static headers with the frame data. This can be achieved in several ways, but it may also have implications for memory usage on small static-memory-mapped devices.

Given freedom to change any of the code, the best way to implement this would probably be

  • Change display buffer in the SharpMemoryDisplay class from a buffer that only holds the image data to one that holds both the image data and the headers. This would involve changing how that buffer is accessed in the FrameBuffer class, either by changing the functions in MHMSBFormat or making a new format for this.
  • Change the show function in the SharpMemoryDisplay class to send this buffer all at once.

Does this make sense? I worry that this is too big of a change and it would affect other devices too much to be allowable.

@ladyada
Copy link
Member

ladyada commented Feb 19, 2022

numpy isnt available for circuitpython microcontroller boards, ideally the library would not require it - however you can write the library to support as long as it fails nicely. 480 bytes extra is fine.
you can see how we do it for RGB TFT displays here
https://github.com/adafruit/Adafruit_CircuitPython_RGB_Display/blob/main/adafruit_rgb_display/rgb.py
:)

@dhalbert
Copy link
Contributor

You could look at using ulab, which is like a mini-numpy, and has a very similar API.
https://circuitpython.readthedocs.io/en/latest/shared-bindings/ulab/index.html
You may be able to use the same code just by altering the imports.

@ilikecake
Copy link
Contributor Author

Thanks for the responses. It looks like ulab does not have the functions I use that do the magic conversion:

I can look at making a modification to the code to check for numpy and fail back to the old functions if numpy is not present. Can you give any pointers on the right way to do this? Looking at the code, it looks like I would need to:

  1. Modify adafruit_framebuf.py to add in a new format similar to MHMSB, but using numpy array functions. This would involve adding a check to the framebuffer.py file for numpy.

  2. Modify adafruit_sharpmemorydisplay.py to check for numpy and switch to the different framebuffer format and functions if it is present.

This is doable, but seems like a rather large change to your code, and I don't know if that would be okay.

For reference, below is a modified sharpdisplay class. I started out with your code and modified it to work with numpy the way I outlined in the previous thread. This class is not dependent on the FrameBuffer class, and only displays PIL images. This code should illustrate what I am trying to do, but it is not really compatible with your libraries as-is.

Code is in here
import numpy as np
from micropython import const
#TODO: I probably need more includes here for standalone operation (SPI bus stuff, etc...)
#TODO: Combine the spi send stuff into a differnt function?

_SHARPMEM_BIT_WRITECMD = const(0x80)  # in lsb
_SHARPMEM_BIT_VCOM = const(0x40)  # in lsb
_SHARPMEM_BIT_CLEAR = const(0x20)  # in lsb

def reverse_bit(num):
    """Turn an LSB byte to an MSB byte, and vice versa. Used for SPI as
    it is LSB for the SHARP, but 99% of SPI implementations are MSB only!"""
    result = 0
    for _ in range(8):
        result <<= 1
        result += num & 1
        num >>= 1
    return result

class SharpDisplay:
    """A driver for sharp memory displays, you can use any size but the
    full display must be buffered in memory!"""

    def __init__(self, spi, scs_pin, width, height, *, baudrate=2000000):
        self._scs_pin = scs_pin
        scs_pin.switch_to_output(value=True)
        self._baudrate = baudrate
        self._displaywidth = width
        self._displayheight = height
        # The SCS pin is active HIGH so we can't use bus_device. exciting!
        self._spi = spi

        #Numpy array for the frame data. TODO: Not sure if there is a reason to preallocate this...
        self.buffer = np.zeros((height, (width // 8)), dtype=np.uint8)
        
        #Precompute headers and tail arrays
        self._HeaderVals = np.zeros((self._displayheight, 2), dtype=np.uint8)
        for i in range (0,self._displayheight):
            self._HeaderVals[i,1] = reverse_bit(i+1)

        self._TailVals = np.array([0, 0], dtype=np.uint8)

        # Set the vcom bit to a defined state
        self._vcom = True

    def ShowFrame(self, img):
        """write out the frame buffer via SPI, we use MSB SPI only so some
        bit-swapping is rquired. The display also uses inverted CS for some
        reason so we con't use bus_device"""
    
        #Import image and convert
        self.buffer = np.packbits(np.asarray(img), axis=1)
        DisplayBuffer = np.append(np.hstack((self._HeaderVals, self.buffer)),self._TailVals).tolist()

        # CS pin is inverted so we have to do this all by hand
        while not self._spi.try_lock():
            pass
        self._spi.configure(baudrate=self._baudrate)
        self._scs_pin.value = True

        # toggle the VCOM bit
        CommandByte = _SHARPMEM_BIT_WRITECMD
        if self._vcom:
            CommandByte |= _SHARPMEM_BIT_VCOM
        self._vcom = not self._vcom
        DisplayBuffer[0] = CommandByte
        self._spi.write(DisplayBuffer)

        self._scs_pin.value = False
        self._spi.unlock()
        
    def UpdateDisplay(self):
        """This function should be called approx once per second whenever the display is not updated
        This function switches the polarity of the display. This is supposed to happen about once a second"""   
        CommandByte = bytearray(1)
        
        # CS pin is inverted so we have to do this all by hand
        while not self._spi.try_lock():
            pass
        self._spi.configure(baudrate=self._baudrate)
        self._scs_pin.value = True

        # toggle the VCOM bit
        CommandByte[0] = 0x00
        if self._vcom:
            CommandByte[0] |= _SHARPMEM_BIT_VCOM
        self._vcom = not self._vcom
        self._spi.write(CommandByte)

        self._scs_pin.value = False
        self._spi.unlock()
        
    def BlankDisplay(self):
        """Call to clear the memory and write all white to the display"""
        CommandByte = bytearray(1)        
        # CS pin is inverted so we have to do this all by hand
        while not self._spi.try_lock():
            pass
        self._spi.configure(baudrate=self._baudrate)
        self._scs_pin.value = True

        # toggle the VCOM bit
        CommandByte[0] = _SHARPMEM_BIT_CLEAR
        if self._vcom:
            CommandByte[0] |= _SHARPMEM_BIT_VCOM
        self._vcom = not self._vcom
        self._spi.write(CommandByte)

        self._scs_pin.value = False
        self._spi.unlock()
        

@ladyada
Copy link
Member

ladyada commented Feb 20, 2022

can you shove the numpy part just into the PIL/image generation part? if PIL is there, you can assume numpy is too (unless im not understanding something)

@ilikecake
Copy link
Contributor Author

Sorry for the slow response, your reply sent me down a bit of a rabbit hole. There are two interrelated issues here:

  1. The conversion between the PIL image and the framebuffer is very slow.
  2. The multiple SPI transactions to write the framebuffer to the screen are slow.

Both of these issues involve manipulation of (relatively) large arrays, so I initially used numpy for both. However, issue 1 involves iterating over a matrix of all the pixels in the display (240*400=96000), while issue 2 only involves iterating over the height of the display (240). I figured I would try separating the two solutions and figure out how fast the screen can update with no changes to the framebuffer format.

TLDR: I can get a frame time of .1s (~10fps) (frame generation using numpy, array formatting using python native array manipulation). However, I am unsure of the memory usage implications of this.

For all three methods below, I generate a frame in PIL and convert it to a bytearray using numpy. In the code snippets below, DisplayBuffer starts off as the standard framebuffer format, and ends as the final packet to be sent to the display.

  • Method 1: calculate and append headers/tails on the fly.
    • Frame time: .1066s (9.4fps)
Code
        #Convert a PIL image to the adafruit framebuffer format
        DisplayBuffer = np.packbits(np.asarray(img), axis=1).flatten().tolist()
        
        HeaderBytes = bytearray([0x00,0x00])
        addressToSend = self._displayheight+1
        for x in range(((self._displaywidth // 8) * self._displayheight), 0, -1*(self._displaywidth // 8)):
            HeaderBytes[1] = reverse_bit(addressToSend)
            DisplayBuffer[x:x] = HeaderBytes
            addressToSend = addressToSend - 1
        
        HeaderBytes[1] = reverse_bit(addressToSend)
        DisplayBuffer[0:0] = HeaderBytes
  • Method 2: Precompute the headers/tails as a list of bytearrays.
    • Frame time: .09467s (10.5fps)
Code
        #Convert a PIL image to the adafruit framebuffer format
        DisplayBuffer = np.packbits(np.asarray(img), axis=1).flatten().tolist()
        
        pc_index = len(self._HeaderVals_ada)-1  #start at the top of the list, count down.
        for x in range(((self._displaywidth // 8) * self._displayheight), 0, -1*(self._displaywidth // 8)):
            DisplayBuffer[x:x] = self._HeaderVals_ada[pc_index]
            pc_index = pc_index - 1
        
        DisplayBuffer[0:0] = self._HeaderVals_ada[pc_index]
  • Method 3: Use numpy for both frame conversion and buffer formatting. This method is not compatible with your framebuffer format.
    • Frame time: .09516s (10.5fps)
Code
        #Import image and convert
        self.buffer = np.packbits(np.asarray(img), axis=1)
        DisplayBuffer = np.append(np.hstack((self._HeaderVals, self.buffer)),self._TailVals).tolist()   #Note: append also flattens the matrix

To my surprise, numpy does not really seem to give any benefits for the array formatting. Given that, I could modify only the sharpmemorydisplay.py file to address both of these issues.

However, I am wary of the implications on memory usage. Methods 1 and 2 above both insert the headers into the framebuffer before sending it. Since the framebuffer persists between frames, this probably means that the length of the framebuffer will grow by ~480bytes every time a frame is sent. Is there a way to reset the framebuffer length, or at least bound it somehow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants