Ken Thompson's main.c Explained
Table of Contents
This is the kernel initialization code from Unix Fourth Edition (1973). 189 lines that boot an operating system.
The Boot Sequence
When the PDP-11 loads the Unix kernel, execution begins at main().
This function does exactly five things:
- Discover and free all available memory
- Find the system clock
- Create process 0 (the scheduler)
- Initialize the filesystem
- Fork to create process 1 (init), then become the scheduler
That's it. After main() returns, Unix is running.
Data Structures
Clock Addresses
int lksp[] { 0177546, /* KW11-L line clock */ 0172540, /* KW11-P programmable clock */ 0 /* sentinel */ };
The PDP-11 could have different clock hardware. This array lists possible clock register addresses (in octal). The kernel probes each until one responds.
The Init Program
int icode[] { 0104413, /* sys exec; initp; initarg */ 0000014, 0000010, 0000777, 0000014, 0000000, 0062457, /* /etc/init\0 */ 0061564, 0064457, 0064556, 0000164, };
This is PDP-11 machine code that calls exec("/etc/init"). It's copied into
the address space of process 1. When process 1 starts executing, it runs
this code, which replaces itself with /etc/init.
The string "/etc/init" is encoded in the octal words at the end:
0062457= "/e"0061564= "tc"0064457= "/i"0064556= "ni"0000164= "t\0"
main() - Line by Line
main() { extern schar; register i1, *p;
No return type (defaults to int). Variables i1 and p go in CPU registers
for speed. Pre-ANSI C allowed register without a type.
Phase 1: Memory Discovery
updlock = 0;
UISA->r[0] = KISA->r[6] + USIZE;
UISD->r[0] = 077406;
for(; fubyte(0) >= 0; UISA->r[0]++) {
clearseg(UISA->r[0]);
maxmem++;
mfree(coremap, 1, UISA->r[0]);
}
printf("mem = %l\n", maxmem*10/32);
This probes memory by trying to read from increasingly higher addresses.
fubyte(0) returns -1 when it hits non-existent memory.
UISA/UISD= User Instruction Space Address/Descriptor registers (MMU)KISA= Kernel Instruction Space Address registersclearseg()= zero out a 64-byte segmentmfree()= add segment to free memory pool
The printf shows something like "mem = 64530" — memory in 32-byte units.
Phase 2: Clock Detection
UISA->r[7] = KISA->r[7];
UISD->r[7] = 077406;
for(p=lksp;; p++) {
if(*p == 0)
panic("no clock");
if(fuword(*p) != -1) {
lks = *p;
break;
}
}
Probe the clock addresses in lksp[]. fuword() reads a word from user space;
returns -1 if the address doesn't exist. First valid address becomes lks.
If no clock is found: panic("no clock") — the system halts.
Phase 3: Process 0 Setup
proc[0].p_addr = KISA->r[6]; proc[0].p_size = USIZE; proc[0].p_stat = SRUN; proc[0].p_flag =| SLOAD|SSYS; u.u_procp = &proc[0];
Create the first process entry. Note =|= — this is pre-ANSI compound
assignment (modern C uses |=).
proc[0]= first slot in process tableSRUN= runnable stateSLOAD= in memory (not swapped)SSYS= system process (can't be swapped)u= the "user structure" — per-process kernel data
Phase 4: Filesystem Init
sureg(); *lks = 0115; cinit(); binit(); iinit(); rootdir = iget(rootdev, ROOTINO); rootdir->i_flag =& ~ILOCK; u.u_cdir = iget(rootdev, ROOTINO); u.u_cdir->i_flag =& ~ILOCK;
sureg()= set up user-space memory mapping*lks = 0115= start the clock (magic constant enables interrupts)cinit()= character device initbinit()= block device buffer initiinit()= inode/superblock initiget(rootdev, ROOTINO)= get the root directory inode
The =&= is another pre-ANSI operator (modern: &=).
Phase 5: Fork and Schedule
if(newproc()) { expand(USIZE+1); u.u_uisa[0] = USIZE; u.u_uisd[0] = 6; sureg(); copyout(icode, 0, 30); return; } sched(); }
newproc() is fork. It returns:
- 0 in the parent (process 0)
- 1 in the child (process 1)
Child path (process 1):
expand()= allocate more memorycopyout(icode, 0, 30)= copy the init program to address 0return= start executing at address 0 → runs/etc/init
Parent path (process 0):
sched()= become the scheduler, never returns
sureg() — Set Up Registers
sureg() { register *up, *rp, a; a = u.u_procp->p_addr; up = &u.u_uisa[0]; rp = &UISA->r[0]; while(rp < &UISA->r[8]) *rp++ = *up++ + a; ... }
Copies the process's memory map from the user structure to the MMU hardware. The PDP-11/45 has 8 segment registers; this sets all 8.
estabur() — Establish User Regions
estabur(nt, nd, ns) { register a, *ap, *dp; if(nseg(nt)+nseg(nd)+nseg(ns) > 8 || nt+nd+ns+USIZE > maxmem) { u.u_error = ENOMEM; return(-1); } ... }
Sets up the three segments of a Unix process:
nt= text (code) sizend= data sizens= stack size
Returns -1 with ENOMEM if the process won't fit.
The implementation fills MMU segment descriptors:
- Text segments get
RO(read-only) - Data segments get
RW(read-write) - Stack grows down, gets
ED(expand down) flag
nseg() — Number of Segments
nseg(n) { return((n+127)>>7); }
Converts size in 64-byte blocks to number of 8KB segments.
(n+127)>>7 is integer division by 128, rounding up.
What's Remarkable
- 189 lines total. Modern Linux's
start_kernel()is thousands of lines. - No error handling complexity. If something fails,
panic(). - Hardware-specific but clear. The PDP-11 MMU details are exposed, but the logic is readable.
- The entire boot is synchronous. No async init, no module loading, no device tree parsing.
- Process 1 is just fork + exec. The same pattern every Unix uses today.
Pre-ANSI C Syntax
| 1973 Syntax | Modern Equivalent |
|---|---|
a =+ b |
a + b= |
a =-b |
a - b= |
a =\vert b |
a \vert b= |
a =& b |
a & b= |
main() |
int main(void) |
register i |
register int i |
The old operators were ambiguous: a-b= could mean a = -b or a - b=.
ANSI C (1989) fixed this.
Source
From the Unix V4 tape recovered at University of Utah, December 2025.
File: /usr/sys/ken/main.c