Video Conferencing Systems
Table of Contents
Overview
Video conferencing systems enable real-time audio and video communication between multiple participants over IP networks. Modern browser-based implementations rely primarily on WebRTC (Web Real-Time Communication), an open standard providing peer-to-peer media streaming without plugins. The architecture involves complex interplay between signaling servers (for session establishment), STUN/TURN servers (for NAT traversal), and Selective Forwarding Units (SFUs) or Multipoint Control Units (MCUs) for scaling beyond peer-to-peer limits. Key challenges include network adaptation, echo cancellation, bandwidth estimation, and maintaining quality of experience across heterogeneous network conditions.
Background
- Traditional video conferencing: H.323, SIP protocols (1990s-2000s)
- Flash-based solutions: Adobe Connect, early browser video
- WebRTC standardization began 2011, W3C Recommendation 2021
- Major implementations: Google Meet, Zoom (partial WebRTC), Jitsi, Daily.co
- COVID-19 pandemic (2020) dramatically accelerated adoption and development
- Current focus: E2E encryption, AI features, spatial audio, virtual backgrounds
Key Concepts
WebRTC Core APIs
| API | Purpose |
|---|---|
getUserMedia() |
Capture camera/microphone streams |
RTCPeerConnection |
Manage peer-to-peer media connections |
RTCDataChannel |
Arbitrary data transfer between peers |
MediaRecorder |
Record media streams |
getDisplayMedia() |
Screen sharing capture |
Signaling and Session Establishment
WebRTC requires external signaling (not specified by standard):
- Offer/Answer: SDP (Session Description Protocol) exchange
- ICE Candidates: Network endpoint discovery
- Trickle ICE: Incremental candidate exchange for faster connection
Signaling Server
|
+-----------+-----------+
| |
Peer A Peer B
| |
+--- STUN/TURN Server --+
|
Media Streams
NAT Traversal
- STUN (Session Traversal Utilities for NAT): Discover public IP/port
- TURN (Traversal Using Relays around NAT): Relay when direct fails
- ICE (Interactive Connectivity Establishment): Framework combining both
- ~85% of connections succeed with STUN only
- TURN required for symmetric NATs, enterprise firewalls
Scaling Architectures
| Architecture | Description | Use Case |
|---|---|---|
| Mesh | All peers connect to all peers | 2-4 participants |
| SFU | Server forwards streams selectively | 5-50 participants |
| MCU | Server mixes into single stream | Legacy endpoints |
Media Processing
- Codec negotiation: VP8, VP9, H.264, AV1 for video; Opus for audio
- Simulcast: Send multiple quality layers, SFU selects per recipient
- SVC (Scalable Video Coding): Single stream with extractable layers
- Bandwidth estimation: REMB, Transport-CC for congestion control
- Jitter buffer: Smooth out network timing variations
Implementation
Basic WebRTC Connection
// Get user media const stream = await navigator.mediaDevices.getUserMedia({ video: { width: 1280, height: 720 }, audio: { echoCancellation: true, noiseSuppression: true } }); // Create peer connection with STUN server const pc = new RTCPeerConnection({ iceServers: [ { urls: 'stun:stun.l.google.com:19302' }, { urls: 'turn:turn.example.com', username: 'user', credential: 'pass' } ] }); // Add local tracks stream.getTracks().forEach(track => pc.addTrack(track, stream)); // Handle ICE candidates pc.onicecandidate = ({candidate}) => { if (candidate) sendToSignalingServer({type: 'candidate', candidate}); }; // Handle remote stream pc.ontrack = ({streams}) => { remoteVideo.srcObject = streams[0]; }; // Create and send offer const offer = await pc.createOffer(); await pc.setLocalDescription(offer); sendToSignalingServer({type: 'offer', sdp: offer});
Signaling Server (Node.js/Socket.io)
io.on('connection', socket => { socket.on('join-room', roomId => { socket.join(roomId); socket.to(roomId).emit('user-joined', socket.id); }); socket.on('offer', ({to, sdp}) => { io.to(to).emit('offer', {from: socket.id, sdp}); }); socket.on('answer', ({to, sdp}) => { io.to(to).emit('answer', {from: socket.id, sdp}); }); socket.on('candidate', ({to, candidate}) => { io.to(to).emit('candidate', {from: socket.id, candidate}); }); });
Screen Sharing
const screenStream = await navigator.mediaDevices.getDisplayMedia({ video: { cursor: 'always' }, audio: true // System audio (browser support varies) }); // Replace video track in existing connection const videoSender = pc.getSenders().find(s => s.track?.kind === 'video'); await videoSender.replaceTrack(screenStream.getVideoTracks()[0]);
References
Notes
- WebRTC requires HTTPS (except localhost) for
getUserMedia - Mobile browser support varies; native SDKs often preferred
- End-to-end encryption: Insertable Streams API (experimental)
- Virtual backgrounds: TensorFlow.js BodyPix, MediaPipe
- Recording: Server-side via SFU or client-side MediaRecorder
- Common TURN providers: Twilio, Xirsys, Daily, self-hosted coturn
- Bandwidth typically: 250-1000 kbps per video stream
- Latency target: <150ms for interactive conversation
- Quality metrics: MOS (Mean Opinion Score), SRTT, jitter, packet loss