Ken Thompson's main.c Explained

Table of Contents

This is the kernel initialization code from Unix Fourth Edition (1973). 189 lines that boot an operating system.

The Boot Sequence

When the PDP-11 loads the Unix kernel, execution begins at main(). This function does exactly five things:

  1. Discover and free all available memory
  2. Find the system clock
  3. Create process 0 (the scheduler)
  4. Initialize the filesystem
  5. Fork to create process 1 (init), then become the scheduler

That's it. After main() returns, Unix is running.

Data Structures

Clock Addresses

int     lksp[]
{
        0177546,   /* KW11-L line clock */
        0172540,   /* KW11-P programmable clock */
        0          /* sentinel */
};

The PDP-11 could have different clock hardware. This array lists possible clock register addresses (in octal). The kernel probes each until one responds.

The Init Program

int     icode[]
{
        0104413,   /* sys exec; initp; initarg */
        0000014,
        0000010,
        0000777,
        0000014,
        0000000,
        0062457,   /* /etc/init\0 */
        0061564,
        0064457,
        0064556,
        0000164,
};

This is PDP-11 machine code that calls exec("/etc/init"). It's copied into the address space of process 1. When process 1 starts executing, it runs this code, which replaces itself with /etc/init.

The string "/etc/init" is encoded in the octal words at the end:

  • 0062457 = "/e"
  • 0061564 = "tc"
  • 0064457 = "/i"
  • 0064556 = "ni"
  • 0000164 = "t\0"

main() - Line by Line

main()
{
        extern schar;
        register i1, *p;

No return type (defaults to int). Variables i1 and p go in CPU registers for speed. Pre-ANSI C allowed register without a type.

Phase 1: Memory Discovery

        updlock = 0;
        UISA->r[0] = KISA->r[6] + USIZE;
        UISD->r[0] = 077406;
        for(; fubyte(0) >= 0; UISA->r[0]++) {
                clearseg(UISA->r[0]);
                maxmem++;
                mfree(coremap, 1, UISA->r[0]);
        }
        printf("mem = %l\n", maxmem*10/32);

This probes memory by trying to read from increasingly higher addresses. fubyte(0) returns -1 when it hits non-existent memory.

  • UISA/UISD = User Instruction Space Address/Descriptor registers (MMU)
  • KISA = Kernel Instruction Space Address registers
  • clearseg() = zero out a 64-byte segment
  • mfree() = add segment to free memory pool

The printf shows something like "mem = 64530" — memory in 32-byte units.

Phase 2: Clock Detection

        UISA->r[7] = KISA->r[7];
        UISD->r[7] = 077406;
        for(p=lksp;; p++) {
                if(*p == 0)
                        panic("no clock");
                if(fuword(*p) != -1) {
                        lks = *p;
                        break;
                }
        }

Probe the clock addresses in lksp[]. fuword() reads a word from user space; returns -1 if the address doesn't exist. First valid address becomes lks.

If no clock is found: panic("no clock") — the system halts.

Phase 3: Process 0 Setup

        proc[0].p_addr = KISA->r[6];
        proc[0].p_size = USIZE;
        proc[0].p_stat = SRUN;
        proc[0].p_flag =| SLOAD|SSYS;
        u.u_procp = &proc[0];

Create the first process entry. Note =|= — this is pre-ANSI compound assignment (modern C uses |=).

  • proc[0] = first slot in process table
  • SRUN = runnable state
  • SLOAD = in memory (not swapped)
  • SSYS = system process (can't be swapped)
  • u = the "user structure" — per-process kernel data

Phase 4: Filesystem Init

        sureg();
        *lks = 0115;
        cinit();
        binit();
        iinit();
        rootdir = iget(rootdev, ROOTINO);
        rootdir->i_flag =& ~ILOCK;
        u.u_cdir = iget(rootdev, ROOTINO);
        u.u_cdir->i_flag =& ~ILOCK;
  • sureg() = set up user-space memory mapping
  • *lks = 0115 = start the clock (magic constant enables interrupts)
  • cinit() = character device init
  • binit() = block device buffer init
  • iinit() = inode/superblock init
  • iget(rootdev, ROOTINO) = get the root directory inode

The =&= is another pre-ANSI operator (modern: &=).

Phase 5: Fork and Schedule

        if(newproc()) {
                expand(USIZE+1);
                u.u_uisa[0] = USIZE;
                u.u_uisd[0] = 6;
                sureg();
                copyout(icode, 0, 30);
                return;
        }
        sched();
}

newproc() is fork. It returns:

  • 0 in the parent (process 0)
  • 1 in the child (process 1)

Child path (process 1):

  • expand() = allocate more memory
  • copyout(icode, 0, 30) = copy the init program to address 0
  • return = start executing at address 0 → runs /etc/init

Parent path (process 0):

  • sched() = become the scheduler, never returns

sureg() — Set Up Registers

sureg()
{
        register *up, *rp, a;

        a = u.u_procp->p_addr;
        up = &u.u_uisa[0];
        rp = &UISA->r[0];
        while(rp < &UISA->r[8])
                *rp++ = *up++ + a;
        ...
}

Copies the process's memory map from the user structure to the MMU hardware. The PDP-11/45 has 8 segment registers; this sets all 8.

estabur() — Establish User Regions

estabur(nt, nd, ns)
{
        register a, *ap, *dp;

        if(nseg(nt)+nseg(nd)+nseg(ns) > 8 || nt+nd+ns+USIZE > maxmem) {
                u.u_error = ENOMEM;
                return(-1);
        }
        ...
}

Sets up the three segments of a Unix process:

  • nt = text (code) size
  • nd = data size
  • ns = stack size

Returns -1 with ENOMEM if the process won't fit.

The implementation fills MMU segment descriptors:

  • Text segments get RO (read-only)
  • Data segments get RW (read-write)
  • Stack grows down, gets ED (expand down) flag

nseg() — Number of Segments

nseg(n)
{
        return((n+127)>>7);
}

Converts size in 64-byte blocks to number of 8KB segments. (n+127)>>7 is integer division by 128, rounding up.

What's Remarkable

  1. 189 lines total. Modern Linux's start_kernel() is thousands of lines.
  2. No error handling complexity. If something fails, panic().
  3. Hardware-specific but clear. The PDP-11 MMU details are exposed, but the logic is readable.
  4. The entire boot is synchronous. No async init, no module loading, no device tree parsing.
  5. Process 1 is just fork + exec. The same pattern every Unix uses today.

Pre-ANSI C Syntax

1973 Syntax Modern Equivalent
a =+ b a + b=
a =-b a - b=
a =\vert b a \vert b=
a =& b a & b=
main() int main(void)
register i register int i

The old operators were ambiguous: a-b= could mean a = -b or a - b=. ANSI C (1989) fixed this.

Source

From the Unix V4 tape recovered at University of Utah, December 2025. File: /usr/sys/ken/main.c

Author: Jason Walsh

jwalsh@nexus

Last Updated: 2025-12-25 00:44:48

build: 2025-12-25 00:48 | sha: b1eb356