Ken Thompson's main.c Explained

The Boot Sequence
Data Structures
- Clock Addresses
- The Init Program
main() - Line by Line
sureg() — Set Up Registers
estabur() — Establish User Regions
nseg() — Number of Segments
What's Remarkable
Pre-ANSI C Syntax
Source

This is the kernel initialization code from Unix Fourth Edition (1973). 189 lines that boot an operating system.

The Boot Sequence

When the PDP-11 loads the Unix kernel, execution begins at main(). This function does exactly five things:

Discover and free all available memory
Find the system clock
Create process 0 (the scheduler)
Initialize the filesystem
Fork to create process 1 (init), then become the scheduler

That's it. After main() returns, Unix is running.

Data Structures

Clock Addresses

int     lksp[]
{
        0177546,   /* KW11-L line clock */
        0172540,   /* KW11-P programmable clock */
        0          /* sentinel */
};

The PDP-11 could have different clock hardware. This array lists possible clock register addresses (in octal). The kernel probes each until one responds.

The Init Program

int     icode[]
{
        0104413,   /* sys exec; initp; initarg */
        0000014,
        0000010,
        0000777,
        0000014,
        0000000,
        0062457,   /* /etc/init\0 */
        0061564,
        0064457,
        0064556,
        0000164,
};

This is PDP-11 machine code that calls exec("/etc/init"). It's copied into the address space of process 1. When process 1 starts executing, it runs this code, which replaces itself with /etc/init.

The string "/etc/init" is encoded in the octal words at the end:

0062457 = "/e"
0061564 = "tc"
0064457 = "/i"
0064556 = "ni"
0000164 = "t\0"

main() - Line by Line

main()
{
        extern schar;
        register i1, *p;

No return type (defaults to int). Variables i1 and p go in CPU registers for speed. Pre-ANSI C allowed register without a type.

Phase 1: Memory Discovery

        updlock = 0;
        UISA->r[0] = KISA->r[6] + USIZE;
        UISD->r[0] = 077406;
        for(; fubyte(0) >= 0; UISA->r[0]++) {
                clearseg(UISA->r[0]);
                maxmem++;
                mfree(coremap, 1, UISA->r[0]);
        }
        printf("mem = %l\n", maxmem*10/32);

This probes memory by trying to read from increasingly higher addresses. fubyte(0) returns -1 when it hits non-existent memory.

UISA/UISD = User Instruction Space Address/Descriptor registers (MMU)
KISA = Kernel Instruction Space Address registers
clearseg() = zero out a 64-byte segment
mfree() = add segment to free memory pool

The printf shows something like "mem = 64530" — memory in 32-byte units.

Phase 2: Clock Detection

        UISA->r[7] = KISA->r[7];
        UISD->r[7] = 077406;
        for(p=lksp;; p++) {
                if(*p == 0)
                        panic("no clock");
                if(fuword(*p) != -1) {
                        lks = *p;
                        break;
                }
        }

Probe the clock addresses in lksp[]. fuword() reads a word from user space; returns -1 if the address doesn't exist. First valid address becomes lks.

If no clock is found: panic("no clock") — the system halts.

Phase 3: Process 0 Setup

        proc[0].p_addr = KISA->r[6];
        proc[0].p_size = USIZE;
        proc[0].p_stat = SRUN;
        proc[0].p_flag =| SLOAD|SSYS;
        u.u_procp = &proc[0];

Create the first process entry. Note =|= — this is pre-ANSI compound assignment (modern C uses |=).

proc[0] = first slot in process table
SRUN = runnable state
SLOAD = in memory (not swapped)
SSYS = system process (can't be swapped)
u = the "user structure" — per-process kernel data

Phase 4: Filesystem Init

        sureg();
        *lks = 0115;
        cinit();
        binit();
        iinit();
        rootdir = iget(rootdev, ROOTINO);
        rootdir->i_flag =& ~ILOCK;
        u.u_cdir = iget(rootdev, ROOTINO);
        u.u_cdir->i_flag =& ~ILOCK;

sureg() = set up user-space memory mapping
*lks = 0115 = start the clock (magic constant enables interrupts)
cinit() = character device init
binit() = block device buffer init
iinit() = inode/superblock init
iget(rootdev, ROOTINO) = get the root directory inode

The =&= is another pre-ANSI operator (modern: &=).

Phase 5: Fork and Schedule

        if(newproc()) {
                expand(USIZE+1);
                u.u_uisa[0] = USIZE;
                u.u_uisd[0] = 6;
                sureg();
                copyout(icode, 0, 30);
                return;
        }
        sched();
}

newproc() is fork. It returns:

0 in the parent (process 0)
1 in the child (process 1)

Child path (process 1):

expand() = allocate more memory
copyout(icode, 0, 30) = copy the init program to address 0
return = start executing at address 0 → runs /etc/init

Parent path (process 0):

sched() = become the scheduler, never returns

sureg() — Set Up Registers

sureg()
{
        register *up, *rp, a;

        a = u.u_procp->p_addr;
        up = &u.u_uisa[0];
        rp = &UISA->r[0];
        while(rp < &UISA->r[8])
                *rp++ = *up++ + a;
        ...
}

Copies the process's memory map from the user structure to the MMU hardware. The PDP-11/45 has 8 segment registers; this sets all 8.

estabur() — Establish User Regions

estabur(nt, nd, ns)
{
        register a, *ap, *dp;

        if(nseg(nt)+nseg(nd)+nseg(ns) > 8 || nt+nd+ns+USIZE > maxmem) {
                u.u_error = ENOMEM;
                return(-1);
        }
        ...
}

Sets up the three segments of a Unix process:

nt = text (code) size
nd = data size
ns = stack size

Returns -1 with ENOMEM if the process won't fit.

The implementation fills MMU segment descriptors:

Text segments get RO (read-only)
Data segments get RW (read-write)
Stack grows down, gets ED (expand down) flag

nseg() — Number of Segments

nseg(n)
{
        return((n+127)>>7);
}

Converts size in 64-byte blocks to number of 8KB segments. (n+127)>>7 is integer division by 128, rounding up.

What's Remarkable

189 lines total. Modern Linux's start_kernel() is thousands of lines.
No error handling complexity. If something fails, panic().
Hardware-specific but clear. The PDP-11 MMU details are exposed, but the logic is readable.
The entire boot is synchronous. No async init, no module loading, no device tree parsing.
Process 1 is just fork + exec. The same pattern every Unix uses today.

Pre-ANSI C Syntax

1973 Syntax	Modern Equivalent
`a =+ b`	`a +` b=
`a =-b`	`a -` b=
`a =\vert b`	`a \vert` b=
`a =& b`	`a &` b=
`main()`	`int main(void)`
`register i`	`register int i`

The old operators were ambiguous: a-b= could mean a = -b or a - b=. ANSI C (1989) fixed this.

Source

From the Unix V4 tape recovered at University of Utah, December 2025. File: /usr/sys/ken/main.c