In an earlier post Correctly Matching Xilinx Native FIFOs to Streaming AXI FIFOs, I mentioned the advantages of interconnecting Xilinx's Native FIFOs to AXI-Streaming FIFOs and some special considerations on how to do it. Now, I would like to show a simple architecture to use the DDR3 memory as a Native FIFO.
The intention of this post is to show a possible implementation for creating a DDR3 Memory Controller which effectively remove from the end user the address handling for read and write operations and simply delivers a FIFO behaviour to the end user. There are many advantages on using a large DDR3 Memory (in the range of 1 to 4 GB) as regular FIFO. RAM Memory blocks on FPGA devices are very limited, therefore it is often required to use external devices for such purpose.
FIFOs are tipically used in FPGA designs as temporal buffers to accomodate temporal waiting states from the different logic blocks. It provides a great oportunity to cross clock domains as well. But in general they are very useful when we would like to buffer detector data which later will be used as part of an algorithm calculation when all calculations have concluded and then it is required to substract the original value from the calculated data several clock cycles after the original value arrived.
In this scenario, the limiting factor comes from the size of the FIFO that could be implemented entirely inside an FPGA. Internal storage is very small (~tens of MB) and therefore the need of designing a simple comtroller for encapsulating DDR3 transactions into input and output FIFO ports.
Using the Xilinx IP integrator tool, the previours design was implemented along with a custom FSM (represented by the DataMover Controller Block in the image) to control the data flow written in Verilog and integrated with the rest of the block design. The Verilog file contains two simple state machines for writing and reading operations of the DDR considering the current status of the input and output FIFOs.
The transfer of data from the input FIFO to the S2MM AXI-S port of the DataMover is managed by the state machine mentioned above. When the input FIFO is empty the FSM remains in the IDLE state. Once some data has arrived it moves to the FILLING FIFO state where it waits until one of two conditions are met. First the data size contained in the input FIFO has to be greater or equal to 4 kB or second the Timer signal has reached its maximum value. If one of those two conditions are met, then the FSM proceeds to the SEND CMD and SEND DATA states. The timer signal allows the system to not remain locked in case that less than 4 kB of data have arrived. The filling level of the FIFO is measured using the "More Accurate Data Counts" feature available in the "First Word Fall Through" FIFO. In this way there is an accurate representation of the data contained inside and therefore the size of the data to transfer can be calculated appropriately.
In the SEND DATA state the FSM issues read requests to the input FIFO and sends the data to the AXI-S interface of the DataMover following the special considerations mentioned in Correctly Matching Xilinx Native FIFOs to Streaming AXI FIFOs. The FSM takes care that the exact size of data declared in the SEND CMD state is read from the FIFO and transfered to the DataMover. Once the data has been transfered in its totality the state machine goes back to the initial IDLE state. It is also worth mentioning that the FIFO can receive more data at the same time it is reading and transferring to the DataMover. The state machine allows an accurate data size management so that no data is repeated or lost during the transmission, accurate data counters also provide a higher level of reliability to the overall transfer.
The state machine will remain in the IDLE state until the writing address pointer is different from the reading address pointer. When the FSM is in the SEND CMD state it sends a read request with the appropriate size, but if the data to be read is larger than 8 MB, it is necessary to issue multiple commands with maximum "bytes to transfer" (bbt) being 8 MB. This limitation comes from the DataMover itself, the command have only 23 bits for the bbt parameter.