Nios II is a RISC instruction set architecture for 32-bit embedded processors (specifically for Altera FPGAs), with 32 registers. The DE1-SoC loads Nios II as a soft core processor.

It’s little-endian,1 and has a memory space that supports words, half-words, and bytes. Each separate byte has its own address in memory.

Instructions

Instructions have two basic modes. One is immediate mode, where we use a 16-bit number in our instructions. This is always suffixed by i. The other is register mode, where the values are held in the registers already.

Basic register operations:

  • Moving data
    • mov rX, rY moves the data of rY to rX.
    • movi rX, Imm16 to “move immediate” a 16-bit number into rX. There will be a sign extension.
    • movhi rX, Imm16 to move to the upper 16-bits of the register.
    • movia rX, Imm32, for a 32-bit number. This is a macro, i.e., it is essentially two instructions at once: a movhi of the upper 16 bits then an addi of the lower 16 bits.
  • Loading data, note that this requires the register to already be holding a memory address (via a move instruction).
    • ldw rX, n(rN), load into rX the value at rN shifted by n bytes forward.
      • This is basically pointer arithmetic and dereferencing.
      • The preceding instruction should be: movia rN addr32.
        • We’re basically loading a 32-bit address into register N, then when we call ldw, we load from the address stored in N. This is due to the restrictions of the architecture, a load-store style.
        • We can alternatively specify the address for a given word with a directive:
          • list: .word 10 — wherever the assembler chooses to place the memory word with value 10, the “variable” list will refer to that address.
          • Variable positions in memory will change! So this makes the most sense to do
      • The w after the ld refers to loading every 4 bytes.
    • ldwio rX, n(rN), where rN stores the address of an I/O device.
    • ldh, ldhu, ldwu, ldb, ldbu, ldui to specify the contents of the move.
      • u means unsigned. By default, registers hold signed values. Sign extensions will be with 0.
      • h means half-word. A sign extension will be done for the next 16-bits. We can only enter an address ending in 0.
      • b means byte. The other bytes are the sign bit.
      • i for immediate mode.
  • Storing data, same address restriction as above
    • stw rX, n(rN), store into the shifted address at rN, the value at rX.
    • stwio rX, n(rN)

Arithmetic, logic, and bitwise operations:

  • Arithmetic operations
    • add, addi
    • sub, subi
    • mul, mulu, no immediate variant
    • div, divu, no immediate variant
  • Logic operations
    • and, andi
    • or, ori
    • xor, xori
    • Note that these operate bit-by-bit. The upper 16 bits are left-alone.
  • Bit shifting, where bits are lost and set to 0.
    • Logical shift, this is an unsigned shift
      • srli rX, rY, Imm16; srl
      • sll, slli
    • Arithmetic shift, this is a signed shift where the sign bit is duplicated
      • srai, sra, slai, sla
  • Bit rotating, note that only the last 5 bits really matter, since registers are only 32-bits
    • Rotate right , ror, rori
    • Rotate left , rol, roli

Flow control and functions:

  • Branches
    • br BRANCH to unconditionally enter BRANCH.
    • For conditional branches, we have the general form bXX rX, rY, BRANCH:
      • beq, for x == y
      • bne, for x != y
      • bge, for a signed comparison x >= y
      • bgeu, for an unsigned comparison x >= y
      • blt, for x < y
      • bgt, for x > y
      • and so on.
  • Subroutines:
    • call NAME calls a subroutine NAME.
    • ret returns from a subroutine. It moves the value of ra into pc, such that the next instruction is ra.
  • Interrupts:
    • eret returns from an interrupt. It moves the value of ea into pc.
    • rdctl reads control registers.
    • wrctl writes to control registers.

Directives

A full list here.

  • .data to specify a section, for variables in memory.
  • .align n, to specify the “memory alignment”. This is used when specifying words of data; n will be the exponent in an offset by .
    • “align with next available address divisible by
.data
.align 2
a: .word 0
b: .word 0x11223344
c: .word 0x55667788
  • .text, to specify instructions.
  • .global to specify global variables.
  • .section to specify code sections.
  • .word/.byte/.hword specifies the format of the stored data.
    • .word and .hword will be aligned automatically.
    • .byte must be aligned manually on 0. Alternatively, we can use the .skip directive.
  • .skip will skip by bytes before assigning a memory address.
  • .equ SYMBOL ADDRESS to essentially set an alias to a certain address.
  • .exceptions "ax" specifies code that executes when interrupts occur.

Registers

By convention, we reserve registers for subroutines:

  • r2 is the function return value. This is for a single word. If more than one value comes back from the callee, then this information must be put on the stack and popped by the caller.
  • r4 to r7 is for function arguments from the caller to the callee. Any more parameters must be put on the stack.
  • r8 to r15 are the caller saved registers, i.e., the responsibility of the caller. They must be saved on the stack by the caller if it wants to preserve them, i.e., we save, then call a subroutine, then the caller restores them when returned.
  • r16 to r23 are the callee saved registers, i.e., the responsibility of the callee. If it wants to use these registers, their contents must be saved before being changed, and restored before returning.

Some other specialised registers:

  • r0 will only store 'b0
  • r24 is the exception type register.
  • r27/sp is the stack pointer, for the word at the top of the stack
    • We must initialise sp at the beginning of any program. By convention, this is at 0x20000.
  • r29/ea is the exception return address
  • r31/ra is the return register. It stores the address to go back to when a subroutine is called.
    • If the subroutine calls other subroutines, we need to save ra on the stack.
  • et is for “exception temporary”, which may be used by the assembler/linker
  • pc is the program counter, not one of the 32 registers. It stores the next instruction to be executed, and is incremented by 4.

Programming

We can download Nios II programs onto FPGA boards using the Quartus Monitor Program. This uses a JTAG interface.

If Quartus says that it Could not query JTAG Instance IDs, this is often because the board’s been turned off and on again. Follow these steps:

  • Actions > Download System to load the .sof file onto the board.
  • Actions > Configure HPS
  • Action > Connect to System
  • Then things are fine

During compilation, it may also throw multiple definition errors, even when you’ve been careful to use include guards. One cause of this is that Quartus doesn’t like global structs defined in header files, even if it’s a valid C language construct. Why this happens is anybody’s guess.

Addendums

The Nios memory structure is as follows: Prof Moshovos says it’s “ridiculously close to MIPS and RISC-V”. Nios II was since replaced by Nios V, a RISC-V based architecture.

Footnotes

  1. ”Why? I don’t care.” - Prof Moshovos