0% found this document useful (0 votes)

7 views6 pages

Pico External RAM

The document discusses the challenges and solutions for using 8MB of external QSPI RAM with the RP2040 microcontroller in a memory-mapped manner, despite its limitations in write access. It outlines a method to boot from flash, load applications into RAM, and handle write operations through a custom HardFault handler that emulates write instructions. Additionally, it addresses performance considerations, memory protection, and potential multi-CPU support for the ROMRAM setup.

Uploaded by

Ricardo Casson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views6 pages

Pico External RAM

Uploaded by

Ricardo Casson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

You are on page 1/ 6

retirado de: https://dmitry.gr/index.php?r=06.%20Thoughts&proj=10.

%20RomRam

Using QSPI RAM with RP2040's SSI in read-write mode

Table of Contents

The problem

Towards a solution

Read-only RAM is not much use

Let the nasty hacks begin

The horror

Emulators all the way down

Polishing it to perfection

There are always hardware bugs

Memory protection

Multi-CPU

Performance

Download

Comments...

The problem

Can you use 8MB of external RAM with RP2040, memory mapped, like real memory? I call this ROMRAM

RP2040 is a rather versatile chip. One of its most convenient features is support for flash XIP via SSI. SSI is
quite configurable and can support all sorts of flash chips. It is, of course, not entirely bug free (try to
configure it for SPI commands and QPI addresses, for example, see how that goes), but a large memory
with a fast cache is super nice. There is only one issue: RP2040's XIP mode only supports read and
execute accesses not writes. This makes sense given its purpose and what it was designed for, but who
cares about that? COULD we attach a RAM to it? Well, actually this is not too hard. QSPI SRAM chips
exist, made by ISSI, APMEMORY, and (my favourite) VilsionTech. They talk more or less the same
protocol, and getting SSI to talk to them is trivial. This is useless... You can indeed manually issue read
and write accesses to it, but it is not memory-mapped and thus useless. Could it be? Sure? Enabling XIP
and configuring it properly will work - the RAM will support read and execute, but not write. This is still
not all that useful either.

Towards a solution

First of all, how would you boot without persistent memory? I solved this by having both a flash and a
RAM onboard. RP2040 only has a single nCS pin for SSI and only a single memory mapped address range,
so we'll not be able to use them both. The idea is to boot from flash, copy flash to the start of RAM, and
continue running from RAM. How do we make all of this work? It does not take much: two OR gates and
a NOT gate will do. In my design I used a tiny SMD dual-OR gate IC and a tiny SMD NAND gate as an
inverter. We'll also need two resistors. The output of RP2040's SSI's nCS is pulled up, and is an input to
one of the inputs of each OR gate. A GPIO pin called RAM/nROM and pulled down by default is the other
part of the equation. It goes to input of (gate A) and to the inverter. The output of the inverter is an input
to the other OR gate (gate B). Gate A's output wll go to the flash chip's nCS input, gate B's output goes to
RAM's nCS.

What does this accompish? When we boot, the GPIO is floating, the pulldown will provide a logic low,
this means that RP2040's SSI accesses flash (via gate A), and we can boot. The first stage loader can load
a larger second stage loader to internal RAM. That loader can copy the entire appliction (in my case
2MB) from flash to RAM using almost all the internal memory as temporary space (in my case 256KB). It
can toggle the RAM/nROM pin and reconfigure SSI as needed to access flash and RAM. Then, XIP can be
enabled, and with proper SSI config, the RAM/nROM can be left in the high state, causing all accesses to
go to RAM now.

This will almost work. If you actually try this, you'll find a fun bug. If you attempt to reset the RP2040
using its RUN pin, you'll note that the manual is wrong, and the GPIO module does NOT get reset, the pin
does NOT go back to floating, and you are still accessing RAM and not flash. Oopsie... Not sure how this
was not noticed. In my case this was not a problem since when I ran out of pins, I moved RAM/nROM to
an i2c io expander, and its nRST input does work. If you plan to use this without an io expander, keep this
annoyance in mind.

Read-only RAM is not much use

OK, so our RP2040 now has a memory-mapped RAM. This is quite useless since we cannot write to it
directly. Oh, sure, we can issue SSI commands, but this is (1) annoying, (2) boring, and (3) will not allow
unmodified software that needs a few megabytes of RAM to run. How do we make this better? With
nasty hacks, of course! The RP2040 has a few features (and misfeatures) that we can glue together to
improve the situation. The XIP cache allows us to flush lines in it, which will be important since the cache
has no idea that the backing store is writeable and can change. There is also an MPU which we can
[ab]use.

Let the nasty hacks begin

By default, a write anywhere to 0x10xxxxxx (normal cached access to XIP) will be treated as a command
to flush a cache line. That means that any write attempt in normal code will be silently ignored. No fun!
Let's use the MPU to write-protect the region. Now a write attempt will trigger a HardFault. Ok, that's
better! Our HardFault handler can now ... quickly interpret the faulting instruction, emulate the write,
flush the cache line, and resume. This sounds easy ... NOT.

The horror

Let's consider the concept. Clearly, this HardFault handler cannot itself live in XIP memory, since we do
not want the XIP cache attempting a read while we're trying to issue a write. There will also be some
other limits. We can only emulate accesses we can understand. What other kinds are there? There are
two more sources of writes in the system besides code. One is DMA. The answer here is simple: we're
targeting running unmodified code from elsewhere. Such code would not be relying on RP2040's DMA,
so no issue here. And if you use DMA, be careful to not attempt to DMA to our ROMRAM (reads are OK).
The second source of writes we cannot understand and emulate is the Cortex-M0 CPU itself. The CPU will
push 8 words to the current stack on any interrupt or fault. If the current stack lives in our ROMRAM,
these writes will fail (caught by the MPU) and we'll have lost the info we need to resume the current
code. The answer is, more or less, the same as before. Most likely "existing code" does not directly
manipulate the stack pointer, so this should be avoidable. If you are writing new code and relying on
ROMRAM, keep your stack in an internal memory of some sort. Easy.

Emulators all the way down

How easy is it to write a super fast partial ARMv6M emulator that can properly emulate any write
instruction, including complex ones like STMIA? It is actually not too hard, especially if you throw some
RAM at the problem. The simplest way to dispatch on the instruction type is to use the top 7 bits of it.
That implies a table of 127 entries. That is 256 bytes of just jump instructions. This is not too hard to
justify, really. So, as we take the HardFault, and after we assume that PC and CPSR.T are set right
(checking for this would take more cycles), we can read the faulting instruction. Shift it right, add this to
PC, and then come the 128 jump instructions to dispatch based on all the possible 128 cases. Most of
them will go to a "some other fault happened" label since they do not decode to a valid instruction that
could have caused a write. There are 2 variants of each: STRB, STRH, and STR that we need to handle.
Since ARMv6M requires all writes to be properly aligned, we need not worry about any QSPI RAM page-
crossing limitations here. We get the value from the proper register (decoding this using a few more
jumptables), byteswap it (SPI is BE, CPU is LE), and issue the write directly to the SSI hardware.

And then there is STMIA... This is a complex beast that can write up to 8 words to RAM at any word-
aligned address. There are three ways I could have handled this. The first is to issue each write as a word
write to QSPI. This will work for all QSPI RAMs. The second is to issue it as one long write. This is the
fastest option, but it will only work on Vilsion RAMs since both ISSI and APMEMORY chips wrap all
accesses to a 1KB-address-window. The third option is to detect crossing a 1KB boundary, and switch
between the above options. This is the most complex option, and the checking itself may be more cost
than it is worth. My code uses option two, since I use chips from VilsionTech. With that, emulating STMIA
is just a matter of sending the proper words to write in a row fast enough.

Polishing it to perfection

There are always hardware bugs

Fast enough? What!? Yes... RP2040's SSI seems to ignore the programmed "NDF" value for write-only
transactions. Once it has started a write, it will raise nCS anytime the TX FIFO is empty. This means that
you need to fill it just fast enough to keep it busy. This, in turn, means that you should carefully watch
your SSI clock divisor... There was also an issue I found with writing to the SSI FIFO too fast (even when it
is empty) and a NOP was needed. Do not ask... There were more bugs in the SSI. For example,
sometimes, requesting a cache flush would trigger a XIP read. As you can imagine, this completely breaks
things if we're in the middle of issuing a write command. The solution there was to delay all cache
flushing till after the writing is done. This was only an issue for STMIA, of course, since all other writes
are simple already. It might be reasonable to ask whether interrupts could cause any issues to this
requirement of precise timing. The answer is no, since this code runs in HardFault context - interrupts
will wait for it to be finished. This prioritization is important, since it also allows the interrupt handlers to
easily write to ROMRAM.

Memory protection

This brings us to another interesting topic. I mentioned that the MPU is used to catch the writes. But I
did not mention disabling the MPU. One might ask how it is that I flush the proper cache lines without
re-triggering it (since cache flushes are done via writes). The answer is HFNMIENA. This bit in the MPU
config needs to be set. It tells the CPU core to ignore the MPU while running in HardFault and NMI
contexts. Not having to wrangle the MPU for each write saves valuable cycles in the handler, allowing it
to be faster. But what if you do not want the entire ROMRAM region to be writeable? This is supported.
Two global variables exist. One (mRomRamStart) records the address of the first writeable ROMRAM
address, the other (mRomRamLen) records the writeable area length. They may be modified anytime to
adjust the writeable region. In rePalm project, I use them to split ROMRAM into three regions, for
example. Region A is always below mRomRamStart and is always read-only (the copy of the code we're
running that the second stage loader copied to RAM). Region B is next and is writeable or not based on
an API call to protect it or not (PalmOS is weird). Region C is always writeable. This is pretty easy to do
with the provided knobs.

Multi-CPU

What would multi-CPU support look like for ROMRAM? You'd need to simply add use of one of the
hardware mutexes to make sure two cores do not try to write at the same time. I leave this as an
exercise to the reader. The rest of it will work. Just point the HardFault vectors from both CPUs to the
same ROMRAM HardFault handler and you're done. Cool, right?

Performance

Ok, the million dollar question: how fast is it? Well, reads and execute are native speed, since they work
via the usual pathways and cache. Write speeds depend on how writes are done. Each write instruction
is emulated, and thus write instructions that write more produce hugher throughput. This is good news
for things like memcpy, since that kind of code usually uses STMIA. To put some hard numbers on it, I see
memcpy to ROMRAM hitting 36Mbit/s at stock clock rates, which is not too terrible. This works well for
cases when memory is mor read thatn written (which is common). We can approximate the actual cost
of a write by looking at the instructions of the handler. The actual math differs based on the registers
used. Let's check on a simple STR(immediate). The exception entry and exit take 12 and 10 cycles
respectively. Exception entry code to handle various entry modes and getting the proper exception frame
pointer takes 6 or 7 cycles. Dispatching based on instruction type takes 9 cycles. Getting the address
calculated takes around 17 cycles. The math to verify that we're within writeable bounds takes 11 cycles
or so. Getting the value to write takes around 10 cycles. Issuing the write command takes 28 cycles. Then
we wait for the SSI to finish. At DIV of 4, it will need 256 cycles to finish issuing the write command. But
we overlap with he first 6 of those in code, so effectively it takes 250 cycles of waiting for us to continue.
Cleanup takes 10 more cycles. So all-in-all, a single word write took us
12+10+6+9+17+11+10+28+250+10 = 363 cycles. Some of this could be cut a little with some creative
work (eg: by overlapping more of the SSI write and the data-getting. This optimization is also left as a
exercise to the reader).

Download

The code download for the second stage loader and the HardFault handler is [HERE]. License is BSD 2-
clause. I am too lazy (and disgusted) to turn this into some sort of an arduino or a micropython plugin,
but I am sure someone else will. My provided code will build standalone with no dependency on
anything. License is BSD-2 clause. Enjoy

FlashcatUSB Manual
No ratings yet
FlashcatUSB Manual
26 pages
Slides 2
No ratings yet
Slides 2
35 pages
BlackcatUSB Manual
No ratings yet
BlackcatUSB Manual
26 pages
Embedded System Memory
No ratings yet
Embedded System Memory
22 pages
Registers in W25Q256JV: 2. Configuration Register (CR)
No ratings yet
Registers in W25Q256JV: 2. Configuration Register (CR)
16 pages
External ROM and RAM 8051
No ratings yet
External ROM and RAM 8051
59 pages
Embedded System Interview Question Set - 5
No ratings yet
Embedded System Interview Question Set - 5
12 pages
Internal Memory
No ratings yet
Internal Memory
17 pages
Flash Enable Bios Reverse Engineering (FOSDEM2010 - Slides)
100% (1)
Flash Enable Bios Reverse Engineering (FOSDEM2010 - Slides)
42 pages
ARM Part 1
No ratings yet
ARM Part 1
7 pages
MCU1101b (XC8 PIC16F)
100% (1)
MCU1101b (XC8 PIC16F)
76 pages
M29F200BB
No ratings yet
M29F200BB
23 pages
M 29 F 200 BB
No ratings yet
M 29 F 200 BB
22 pages
Low-Voltage Rad-Hard 32-Bit Sparc Embedded Processor TSC695FL Preliminary
No ratings yet
Low-Voltage Rad-Hard 32-Bit Sparc Embedded Processor TSC695FL Preliminary
42 pages
STM68 PDF
No ratings yet
STM68 PDF
11 pages
Getting To Know Embedded Systems Hardware
No ratings yet
Getting To Know Embedded Systems Hardware
28 pages
EXP5
No ratings yet
EXP5
3 pages
Atmega 2560 Ingles (031-060)
No ratings yet
Atmega 2560 Ingles (031-060)
30 pages
8051 Microcontroller Overview
No ratings yet
8051 Microcontroller Overview
55 pages
M29F400BB
No ratings yet
M29F400BB
21 pages
Sistem Mikroprosessor: Week 4 Session 2
No ratings yet
Sistem Mikroprosessor: Week 4 Session 2
44 pages
S 5 Processeurs MSP 432 OverView 10102020 2
No ratings yet
S 5 Processeurs MSP 432 OverView 10102020 2
13 pages
STM8S - Flash - and - Control - System
No ratings yet
STM8S - Flash - and - Control - System
75 pages
MX10E8050I
No ratings yet
MX10E8050I
41 pages
STM32L476 Memory & GPIO Guide
No ratings yet
STM32L476 Memory & GPIO Guide
155 pages
Test 1
No ratings yet
Test 1
56 pages
Intel 8096 Microcontroller Overview
No ratings yet
Intel 8096 Microcontroller Overview
11 pages
K.Lalmuankima: Microcontroller Based System Design Unit - 3 & 4
No ratings yet
K.Lalmuankima: Microcontroller Based System Design Unit - 3 & 4
18 pages
Memory Bulets PDF
No ratings yet
Memory Bulets PDF
4 pages
Multimap EDC15
No ratings yet
Multimap EDC15
6 pages
BeagleBone Black for Enthusiasts
No ratings yet
BeagleBone Black for Enthusiasts
7 pages
Performing Open Heart Surgery On A Furby Recon 2014
No ratings yet
Performing Open Heart Surgery On A Furby Recon 2014
61 pages
Rominit.s File For MPC8377
No ratings yet
Rominit.s File For MPC8377
8 pages
MC Unit I - 2
No ratings yet
MC Unit I - 2
32 pages
MPC5674F Software Initialization
No ratings yet
MPC5674F Software Initialization
13 pages
Microchip Mid-Range PIC MCU Peripherals
No ratings yet
Microchip Mid-Range PIC MCU Peripherals
74 pages
Lec 9
No ratings yet
Lec 9
24 pages
FlashcatUSB Manual
No ratings yet
FlashcatUSB Manual
39 pages
General Description: EM78P153S
No ratings yet
General Description: EM78P153S
38 pages
Stm32l4 Manual
No ratings yet
Stm32l4 Manual
2,301 pages
Temperature Sensor
No ratings yet
Temperature Sensor
10 pages
2023-24 Lab Manual EE2314 Embedded Systems PTK
No ratings yet
2023-24 Lab Manual EE2314 Embedded Systems PTK
26 pages
10 Memory Devices
No ratings yet
10 Memory Devices
39 pages
RM0351r6 - STM32L4xy PDF
No ratings yet
RM0351r6 - STM32L4xy PDF
1,881 pages
8051 Microcontroller Guide
No ratings yet
8051 Microcontroller Guide
140 pages
2017 Reg - Embedded Lab Manual
No ratings yet
2017 Reg - Embedded Lab Manual
68 pages
Chapter 22
No ratings yet
Chapter 22
23 pages
FlashcatUSB Manual
No ratings yet
FlashcatUSB Manual
39 pages
ST95040 ST95020, ST95010: 4K/2K/1K Serial SPI EEPROM With Positive Clock Strobe
No ratings yet
ST95040 ST95020, ST95010: 4K/2K/1K Serial SPI EEPROM With Positive Clock Strobe
19 pages
ST95040 ST95020, ST95010: 4K/2K/1K Serial SPI EEPROM With Positive Clock Strobe
No ratings yet
ST95040 ST95020, ST95010: 4K/2K/1K Serial SPI EEPROM With Positive Clock Strobe
18 pages
The Veiled Gate To Siemens S7 Silicon
No ratings yet
The Veiled Gate To Siemens S7 Silicon
53 pages
8051 Microcontroller Overview
100% (1)
8051 Microcontroller Overview
147 pages
Embedded System of Shibu K V
0% (1)
Embedded System of Shibu K V
10 pages
1 Mbit (128K X 8) Parallel EEPROM With Software Data Protection
No ratings yet
1 Mbit (128K X 8) Parallel EEPROM With Software Data Protection
22 pages
Atmega32A DataSheet Complete DS40002072A 17
No ratings yet
Atmega32A DataSheet Complete DS40002072A 17
15 pages
MPS Lab Ex5-Memory
No ratings yet
MPS Lab Ex5-Memory
10 pages
STM32F407 Parallel I - O Ports. Reference - STM32F407 User Manual
100% (1)
STM32F407 Parallel I - O Ports. Reference - STM32F407 User Manual
23 pages
Selection of Processor
No ratings yet
Selection of Processor
22 pages
Evoltion&Future - Memory Technology
No ratings yet
Evoltion&Future - Memory Technology
37 pages
The Islamia College of Science & Commerce, Srinagar - J &K Department of Computer Applica Tions
No ratings yet
The Islamia College of Science & Commerce, Srinagar - J &K Department of Computer Applica Tions
15 pages
Snooping Cache and Directory Based Multiprocessors
No ratings yet
Snooping Cache and Directory Based Multiprocessors
59 pages
Module - 1-1
No ratings yet
Module - 1-1
58 pages
BLV, System Dump, Device Related Commands
No ratings yet
BLV, System Dump, Device Related Commands
2 pages
Intel - Coffeelake-S Plamform: System Chipset
No ratings yet
Intel - Coffeelake-S Plamform: System Chipset
49 pages
Microprocessor Notes PDF
No ratings yet
Microprocessor Notes PDF
101 pages
Log Book Micro P Unfinished
No ratings yet
Log Book Micro P Unfinished
9 pages
Basic Computer Quiz Questions With Answers
87% (15)
Basic Computer Quiz Questions With Answers
3 pages
Quanta - X21 SCH For ERD Chocolate - AMD R1a HP Pavilion 15 DAX21MB6D0
No ratings yet
Quanta - X21 SCH For ERD Chocolate - AMD R1a HP Pavilion 15 DAX21MB6D0
43 pages
Lista Precios STB 15082019
No ratings yet
Lista Precios STB 15082019
1 page
Cortexm3-Assembly - Language PDF
No ratings yet
Cortexm3-Assembly - Language PDF
16 pages
Windows Internals
No ratings yet
Windows Internals
108 pages
300 Plus Computer Mcqs PDF Notes For All Exams
83% (12)
300 Plus Computer Mcqs PDF Notes For All Exams
42 pages
Custom PC - January 2022
No ratings yet
Custom PC - January 2022
118 pages
Classification of Computer Software
No ratings yet
Classification of Computer Software
12 pages
STM32MP1 Microprocessor: Continuing The STM32 Success Story
No ratings yet
STM32MP1 Microprocessor: Continuing The STM32 Success Story
29 pages
Linux vs. Windows: Key Differences
No ratings yet
Linux vs. Windows: Key Differences
16 pages
001-84868 AN84868 Configuring An FPGA Over USB Using Cypress EZ-USB FX3 PDF
No ratings yet
001-84868 AN84868 Configuring An FPGA Over USB Using Cypress EZ-USB FX3 PDF
21 pages
Dumpstate Board
No ratings yet
Dumpstate Board
463 pages
Fujitsu A 530
No ratings yet
Fujitsu A 530
34 pages
Siemens Orbic Software Recovery
100% (5)
Siemens Orbic Software Recovery
65 pages
Why Does Dram and Cpu Light Flash On Motherboard - Google Search
No ratings yet
Why Does Dram and Cpu Light Flash On Motherboard - Google Search
1 page
Pinnacle
No ratings yet
Pinnacle
11 pages
Date: 09/01/2023: Don Bosco Institute of Technology Department of Information Technology Microprocessor Lab Experiment 1
No ratings yet
Date: 09/01/2023: Don Bosco Institute of Technology Department of Information Technology Microprocessor Lab Experiment 1
26 pages
Inventario 2022
No ratings yet
Inventario 2022
16 pages
1 Software Package Is
No ratings yet
1 Software Package Is
2 pages
BCS402 IA2 (Version A) Scheme 24-25
No ratings yet
BCS402 IA2 (Version A) Scheme 24-25
10 pages
Kontron 2 18008-0000-16-0
No ratings yet
Kontron 2 18008-0000-16-0
2 pages

Pico External RAM

Uploaded by

Pico External RAM

Uploaded by

retirado de: https://dmitry.gr/index.php?r=06.%20Thoughts&proj=10.

Using QSPI RAM with RP2040's SSI in read-write mode

Read-only RAM is not much use

Let the nasty hacks begin

Emulators all the way down

There are always hardware bugs

Read-only RAM is not much use

Let the nasty hacks begin

Emulators all the way down

There are always hardware bugs

You might also like