Go to the first, previous, next, last section, table of contents.

Kernel

Eventually, this chapter will describe the ARC kernel in detail. However, at this time it is still under construction, and is sketchy in many places. Over time it should improve.

Kernel Overview

The ARC kernel is the part of the ARC development system which runs on the embedded processor. The kernel boots the system and provides a user-interaction console, a remote downloading and debugging interface, a multitasker, and libraries which can be shared by user programs. The full system currently runs on Motorola 68332 processors. A partial port (not including a multitasker or boot code) also runs on Intel 386, 486, and Pentium processors.

In creating the ARC kernel, I sought to provide powerful debugging support and multitasking control with a minimum of system overhead in order to facilitate real time appliations. I also sought to allow multiple processes to use shared resources, like the serial port, in a clear and efficient manner.

The kernel performs system services and facilitates the loading and running of programs, which are key features of any operating system. However, unlike most other operating system it does not put up any firewalls that isolate user programs from access to the hardware or the kernel internals. When a user program is compiled, it is linked to an image of the kernel. The user program therefore has direct access to the kernel globals and functions.

This system has the advantages that it is more efficient at runtime and allows more complete code sharing and lower memory usage than the standard way operating systems are implemented. However, it also has some disadvantages. It is potentially easier for a user program to mess up the operating system's state by allowing it complete access to the operating system, and it makes the problems of version skew worse.

The standard way for an operating system to provide services to client programs is for there to be some library code linked into the user program which sets up and performs a trap (a type of interrupt). The trap allows the operating system to take control of the processor in a well defined manner and provides a way for the user code to interact with the operating system without having access to its internals.

However, the overhead for performing a trap is much greater than the overhead for a normal function call. I believe that the added security of using traps is not worth the extra time expense when the goal is to make good real-time systems, as was my goal in designing the ARC kernel.

Also, because the ARC kernel is multitasking and the user interaction and debuggging processes are independent of any user processes, even if user processes crash, the supervisory processes are still there to help you debug your crashed process. Without having hardware memory protection, you could not really do better than that (unless you store the interrupt table in ROM) even if the only interaction between the user processes and the operating system were through traps. Without hardware memory protection, there is no way to stop a user from overwriting the interrupt vector table, which could bring the operating system in either method to its knees.

On reset, kernel sets up necessary system resources. It initializes the chip selects, globals, and serial port; checks the checksums on the memory map and persistant table; sets up multitasking, the TPU, the memory manager, the console process, and the gdb process; then it checks the memory map for programs which wish to be run on reset. After this, the console, gdb, and user programs are free to be used.

Memory Management

Because ARC is intended to primarily run on embedded controllers, "memory management" refers only to the way the kernel manages the contents of the RAM. Blocks cannot be swapped out to disk because there is no disk. Because of this, the memory management facilities are fine-grained. They are also geared towards allowing programs to persist across reset and power cycling so long as the RAM is not unduly corrupted.

The memory management consists of two modules -- the memory map, which is relatively coarse-grained persists across resets, and malloc, which is fine-grained and does not persist across resets.

Memory Map Module

The memory map keeps track of the name and type of data contained in the memory. It can manage multiple devices, and keep track of whether each device is RAM or ROM. It is not currently possible for users to add their own devices to the memory map, but I'm working on it. However, the current kernel does automatically detect the size of the RAM and ROM in the system and bounds the memory map appropriately.

When the board is reset, the kernel checks whether whether the memory map is still valid. It determines this by comparing the stored checksum against a freshly computed one. If the stored memory map is valid, it uses that. If it is not valid, the kernel creates a new memory map with default sizing for the system tables and heap. It also searches through ROM and RAM to find programs which are still there and valid, and adds those to the map.

Because the kernel and the user programs will be potentially sharing a small amount of memory, blocks may be specified to start at any address and may be of any size. Use the memmap console command to view the memory map.

A typical memory map looks like this:

Start   End     Size    Type         Name          Device
000000  012393  012394  Code         vestaboot.919 ROM
012394  0125c3  000230  Data         Kernel inits  ROM
100000  1003ff  000400  Vector Table Vector Table  RAM
100400  100dff  000a00  Memory Map   Memory Map    RAM
100e00  102603  001804  Data         Persistents   RAM
102f00  105a9f  002ba0  Data         Kernel data   RAM
105aa0  105e9f  000400  Data         GDB Block     RAM
106000  1062ef  0002f0  User Code    mobot-vision  RAM
1310a8  13f7ff  00e758  Heap         Heap          RAM
13f800  13ffff  000800  Stacks       Root stack    RAM

Blocks which are not being used for programs, data, or system tables are used by malloc. Therefore, if you want to use a block of data, you should first reserve it by using the memblock console command (see section Viewing memory usage).

Malloc Module

The malloc module manages blocks from the memory map, and partitions them up when malloc(), realloc(), or free() are called.

The "Heap" block in the memory map is always reserved for use by the malloc module and you are prevented from downloading a program in the block reserved for it. However, to achieve maximum usage of a potentially small amount of memory, the malloc module incorporates unused blocks in the memory map for temporary use. Temporary use in this case refers to memory which does not persist across downloads.

The output of the `malloc' console command (done during the same session as the memory map in the previous section) shows the temporary heaps:

Heap 0x1062f0, size 175544 (0x2adb8)
  0x1062f8:     Free  real size=175536
Total free = 175536, Total used = 0
Heap 0x102604, size 2300 (0x8fc)
  0x10260c:     Free  real size= 2292
Total free = 2292, Total used = 0
Heap 0x1310a8, size 59224 (0xe758)
  0x1310b0:  Alloced  real size=   38   PID=0
  0x1310d6:  Alloced  real size=  270   PID=0
  0x1311e4:  Alloced  real size=   38   PID=0
  0x13120a:  Alloced  real size=  270   PID=0
  0x131318:  Alloced  real size=   42   PID=0
  0x131342:  Alloced  real size=   28   PID=0
  0x13135e:  Alloced  real size= 2062   PID=0
  0x131b6c:  Alloced  real size=   38   PID=0
  0x131b92:  Alloced  real size=  270   PID=0
  0x131ca0:  Alloced  real size=   42   PID=0
  0x131cca:  Alloced  real size=   30   PID=0
  0x131ce8:  Alloced  real size= 1038   PID=0
  0x1320f6:  Alloced  real size=   38   PID=33
  0x13211c:  Alloced  real size=   54   PID=33
  0x132152:  Alloced  real size=   98   PID=33
  0x1321b4:     Free  real size=   98
  0x132216:     Free  real size=54762
Total free = 54860, Total used = 4356
-----------------------------------------------------
Total free = 232688, Total used = 4356

The first two heaps are temporary and were allocated in holes in the memory map. The last heap, and the only one which has been used in this example, is the "Heap" block. In this example, the board has a total of 256K of memory, of which less than 30K is being used by system overhead and a small user program. Malloc is managing 230K of free memory, most of which has been recovered from blanks in the memory map.

These optimizations have the following repercussions:

Polling vs. Interrupts

Responding to asynchronous events is one of the most basic and necessary functions of a computer system. There are two fundamental approaches to dealing with asynchronous events: polling and interrupts.

In a polling scheme, you typically have a flag which is set by an asynchronous event and polled by the processor. The processor periodically checks the status of the flag. When the processor sees that the flag is set, it takes the appropriate actions (reads the data from a latch, writes data to a latch, turns on a motor, etc.) and clears the status flag. Polling works, but can take a lot of processor time if you want to respond to an event quickly.

Interrupts take advantage of special hardware in a processor which allows normal processing to be preempted by asynchronous events. Using interrupts to process asynchronous events has the advantage that the processor only spends time servicing the event when it actually happens. Therefore, for the same desired response time, the CPU overhead for interrupt servicing can be much lower.

The disadvantages of using interrupts are that there is more setup involved and it is typically harder to debug interrupt routines. Therefore, it is often a good idea to start out with a polling scheme and, when you feel confident that the servicing routines work, install them on interrupts.

Typically there will be a range of possible interrupt sources, each of which is assigned a number. An interrupt vector table, located in memory, stores the address of a handler for each possible interrupt. When an interrupt occurs, the processor saves some of the current processing state (which typically includes the status flags, program counter, and interrupt source, and may include other registers or data), looks up the address of the correct interrupt handler, and resumes execution in the handler.

The handler must save any registers it will use before it starts. The handler may also need to perform some action to clear the interrupt request (this is usually the case with hardware interrupts). When the handler is done, it restores the registers to their original condition and performs a return-from-interrupt instruction to return to the processing that was preempted.

User console

The console is the default interaction mode of the root process. The model for using the console is that of a unix shell. At the console you can view and modify the contents of memory, run programs, view and kill processes, view and modify the value of persistents, view the memory map, list and unload programs, change the baud and clock rates, and much more. Commands are typed at the console prompt and those commands are executed by the root process. If any console command hangs, pressing the abort button (an active low button attached to the IRQ7 pin) will return you to the interaction prompt.

If not otherwise specified, assume that all numerical arguments are in hexidecimal.

Command summary

`help'
Print help information
`run'
Run program: run <name> [<stack>] [<ticks>]
`go'
Run code at addr: go <addr> [<stack>] [<ticks>]
`resetrun'
Run reset programs: resetrun
`memmap'
Display memory map
`devmap'
Display device map
`memfree'
Display free memory
`memblock'
Create memory block: memblock <name> <addr> <len>
`memrm'
Remove memory block: memrm <name>
`clearmem'
Reinitialize memory map
`checksum'
Checksum: <name> or <start> <size>
`list'
List programs currently loaded
`ls'
List programs currently loaded
`unload'
Unload program: unload <name>
`remove'
Remove program from memmap: remove <name>
`restore'
Restore program: restore <name>
`malloc'
Print memory allocation history
`ps'
Display process table
`kill'
Kill process: kill <ps>
`fix'
Fix process: fix <ps>
`files'
Display active files table
`streams'
Display active streams table
`md'
Memory Display: md [<addr>] [<length>]
`mm'
Memory Modify: mm [<addr>]
`rd'
Display registers: rd [<ps>]
`set'
Set values: set <var> [<val>], default val = 1
`show'
Show values: show <var>
`persist'
Show/set values of persistents
`time'
Display value of time counter
`quit'
Quit
`baud'
Set baud rate: baud <baudrate>
`clock'
Set clock rate: clock <hertz>
`tickperiod'
Set tick period: tickperiod <ns>
`tpu_interrupts'
Enable/disable tpu_interrupts

Running programs

`run <name>'
Run program by name. The program is checksummed and its dependencies are checked before running. If the checksum fails, you are not allowed to run the program. If the dependencies fail, you are prompted whether to run it or not. The program is run as a new process. The amount of stack and number of ticks are taken from the startup_stack and startup_ticks variables in the user program. (See (see section Changing program startup parameters) for details).

Viewing memory usage

`memmap'
Display memory map. This is your only real way to know how the memory of your 332 is being used. Unfortunately, you have to manually know where the free blocks in your memory are and set up the -start option to the link script accordingly. Therefore, knowing the memory map on your board is critical, and this is the command that gives it to you.
`memfree'
Display free memory blocks. Free memory blocks are used as heap between downloads, but this will show you where you can download programs to or reserve data blocks with the memblock command.
`memblock <name> <addr> <len>'
Create a reserved data block in the memmap with name, starting address, and length as specified. The address and length are interpreted as being in hex. This command causes all processes to be killed and the process heaps to be reallocated to clear the reserved memory block from being changed by malloc.
`memrm <name>'
Remove named memory block. If there is more than one block with the same name, the first one will be removed. This can be used for executables, but is really intended for data blocks since executables can be deleted with unload or remove.
`checksum <name>'
`checksum <start> <size>'
If given a name arguments, checksums the named block. Otherwise checksums starting at the start address for the size given. The checksum is just a simple add-every-byte-together checksum. The result is just printed.
`list'
List programs currently loaded. The checksum/version number is also shown for convenience in checking dependencies.
`unload <name>'
Unload program and delete it from the memory map. If there is more than one program in memory with the same name (a situation which should really be avoided), it will unload the first one in memory.
`remove <name>'
Remove a program from the memory map, but do not destroy it. So long as the memory block is not overwritten, subsequent calls to clearmem will restore it.
`restore <name>'
Attempts to restore the executable of the name specified. If there is backup in the memory map named <name.bak>, that program will be copied to the location where it is supposed to be. This is mainly useful for when you want to recover a version of a program that is in ROM. Programs can be stored in ROM, but the can currently only be run out of RAM. In fact, they can only be run if they are in the place in the ram where they were linked to be. The reason that code is not relocatable once it is downloaded to the board is that only the actual executable and globals are downloaded to the board. The symbol table, etc. which is used to relocate code only on the host and most likely would not fit on the board anyway.
`malloc'
Print memory allocation history. The pointer, size, and creating process are printed for each currently mallocced entry. This could be useful for debugging.

Process management

`ps'
Display process table. Use this a lot. It's extremely useful. The pid, status, stack usage, current pc, and name are all printed.
`kill <ps>|<name>'
Kill process. The process can be specified either by its pid or its name.

Viewing and modifying memory

`md [<addr>] [<length>]'
Memory display. Address and length are specified in hex and are optional. If no address is specified, the addresses following the last ones displayed by the md command are used. If no length is specified, 16 bytes are displayed. It is not possible to specify a length and not an address, as the first argument is assumed to be the length. Immediately after an md command is issued, subsequent blank returns are interpreted as md.
`mm [<addr>]'
Memory Modify. If no address is specified, the address after the last one modified is used. Memory is modified on a word by word basis, and the location is read back to confirm that the data stored properly. Entering a blank line is assumed to mean that that the specified word should not be altered. Entering a '.' on a line by itself exits out of mm mode.
`rd [<ps>]'
Register display. Displays the current registers for the specified process id. If no process is specified, the root process' registers are displayed.
`persist [<peristname>] [<value>]'
Show/set values of persistents. If no arguments are given, all persistents are printed. If just a name is given, that persistent only is printed. If a value is given, the value of the persistent is set to that value. If the persistent does not exist, you are given the option to create it.
`set <varname> [<val>]'
Set debugging values or persistents.
`show [<varname>]'
Show values. This is for kernel debugging flags. If no arguments are specified, all possible values are printed. Otherwise, just the requested one is printed.
`time'
Display time. The time is persistent and increments from the moment your persistent table was last initialized. This is how you look at it. Your programs can access the time by getting its pointer by a call to lookup_persistent("time") and dereferencing the pointer.

Setting system parameters

`baud <baudrate>'
Set serial baud rate. The baudrate argument is in decimal. Using multiterm's baud command causes multiterm to call this command on the board and change its own baud rate simultaneously. Otherwise, you will have to call this command, then manually change host baud rate before being able to talk to the board. The baud rate is persistent, so it will reset to the same value it was before on reset or power up. If you set the board to a bad or unknown baud rate, there is no really good way to recover aside from removing power from your rams, which will cause all your programs and persistents to be wiped. If anybody has a good suggestion for an override on this, please tell me.
`clock <hertz>'
Set clock rate. The frequency argument is in units of hertz and read as a decimal number. The 332 will by default be run at 16777216 hertz. However, the clock rate is controlled by a programmable VCO which can produce a wide range of clock frequencies down into the kHz range. Running the clock faster than 16777216 hertz can be done, but this is the specified maximum and bad things can happen at faster speeds. The baud rate is automatically adjusted to compensate for the clock rate, but be aware that as the clock rate decreases, the maximum supportable baud rate decreases. If you change the clock and lose serial, reset and try again at a lower baud rate. Also be aware that the scheduler tick period should be increased when the clock frequency is decreased.
`tickperiod <ns>'
Set scheduler tick period. The period argument is in nanoseconds and read as a decimal number. As a benchmark, as of March, 1994 the scheduling interrupt during a swap takes 120us at a clock rate of 16MHz, and 6us when a process has not yet run out of ticks. Therefore, with a 1ms tick period, giving only 1 tick per process, and using no defers 12% of the CPU is taken up by context switching. This percentage decreases as the tick period increases, at the expense of scheduling granularity. This percentage decreases as the average number of ticks per process increases because fewer of the scheduling interrupts result in a swap. This percentage increases as the clock rate decreases, because the same amount of computation takes longer. Therefore, I would suggest a good range for the tick period as 1000000ns (1ms) which is the default and up. The system "time" is in units of shecduler ticks, so changing this parameter also affects timing granularity.

Miscellaneous

`help'
Print help information.
`quit'
Quits the console. This is only really useful if you are running a ram kernel and you want to escape back to rom kernel.

Go to the first, previous, next, last section, table of contents.