Code Generation

Table of Contents

1. Code Generation

1.1. Overview

Code generation is the automated creation of source code from higher-level specifications, models, or templates. It reduces manual coding effort, ensures consistency, and minimizes human error. Modern code generation spans from simple template expansion to sophisticated AI-driven synthesis.

1.2. Fundamental Approaches

1.2.1. Template-Based Generation

Template engines transform parameterized templates into concrete code. This approach dominates configuration-heavy domains.

# Jinja2 template example
template = """
class {{ class_name }}:
    def __init__(self{% for field in fields %}, {{ field.name }}: {{ field.type }}{% endfor %}):
        {% for field in fields %}
        self.{{ field.name }} = {{ field.name }}
        {% endfor %}
"""

# Generates Python class from specification

Advantages: Simple, predictable, debuggable Limitations: Poor for complex logic, limited adaptability

1.2.2. Abstract Syntax Tree (AST) Manipulation

Direct manipulation of language parse trees enables precise, semantically-aware code generation.

import ast

# Generate function AST programmatically
def create_getter(field_name):
    return ast.FunctionDef(
        name=f'get_{field_name}',
        args=ast.arguments(args=[ast.arg(arg='self')], defaults=[]),
        body=[ast.Return(value=ast.Attribute(
            value=ast.Name(id='self'),
            attr=field_name
        ))],
        decorator_list=[]
    )

Advantages: Type-safe, language-aware, enables complex transformations Limitations: Steep learning curve, language-specific

1.2.3. Large Language Model (LLM) Generation

Modern AI models generate code from natural language descriptions or existing code context.

# Example using Claude API
import anthropic

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-5-20251101",
    messages=[{
        "role": "user",
        "content": "Generate a Python function to parse ISO8601 dates with timezone support"
    }]
)

Advantages: Natural specification, handles ambiguity, context-aware Limitations: Non-deterministic, requires validation, can introduce vulnerabilities

1.3. Traditional Code Generators

1.3.1. Protocol Buffers

Google's protobuf generates serialization code from .proto schemas across 20+ languages.

// person.proto
syntax = "proto3";

message Person {
  string name = 1;
  int32 age = 2;
  repeated string emails = 3;
}
protoc --python_out=. person.proto  # Generates person_pb2.py

1.3.2. OpenAPI/Swagger Codegen

Generates REST API clients, servers, and documentation from OpenAPI specifications.

# openapi.yaml
paths:
  /users/{id}:
    get:
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: integer

1.3.3. GraphQL Codegen

Type-safe client code from GraphQL schemas and queries.

graphql-codegen --config codegen.yml  # Generates TypeScript types

1.4. AI-Assisted Code Generation

1.4.1. GitHub Copilot

Context-aware autocomplete powered by OpenAI Codex. Trained on public repositories.

  • Inline suggestions during coding
  • Multi-line completions
  • Test generation
  • Best for: Boilerplate, common patterns, test scaffolding

1.4.2. Claude Code (Anthropic)

Full-context code generation, refactoring, and debugging assistant.

  • Entire file/project awareness
  • Multi-step reasoning
  • Code review and explanation
  • Best for: Complex refactoring, architecture design, debugging

1.4.3. Cursor/Continue

IDE-integrated AI coding assistants with codebase indexing.

1.5. Best Practices

1.5.1. When to Use Code Generation

USE for:

  • Repetitive boilerplate (getters/setters, constructors)
  • API bindings from specifications
  • Database models from schemas
  • Serialization/deserialization
  • Type definitions across language boundaries

AVOID for:

  • Business logic (requires human judgment)
  • Security-critical code (needs manual review)
  • Creative algorithms (AI can introduce subtle bugs)
  • One-off implementations (overhead not justified)

1.5.2. Quality Assurance

  1. Always review generated code - Never blindly commit
  2. Test thoroughly - Generated code needs validation
  3. Version control generators - Track .proto files, not just output
  4. Document generation process - Make builds reproducible

1.5.3. Tradeoffs

Approach Speed Quality Maintainability Learning Curve
Templates Fast High Good Low
AST Medium Very High Excellent High
LLM Very Fast Variable Poor Low
Traditional Fast High Excellent Medium

1.6. Integration Strategies

1.6.1. Build System Integration

# Makefile
proto-gen:
        protoc --python_out=gen/ --go_out=gen/ api/*.proto

openapi-gen:
        openapi-generator-cli generate -i api.yaml -g python -o client/

.PHONY: codegen
codegen: proto-gen openapi-gen

1.6.2. Pre-commit Hooks

Validate that generated code is up-to-date:

#!/bin/bash
# .git/hooks/pre-commit
make codegen
git diff --exit-code gen/ || {
    echo "Generated code out of sync. Run 'make codegen'"
    exit 1
}

1.7. Future Directions

  • Formal verification - Proving generated code correctness
  • Multi-modal generation - From diagrams, specifications, examples
  • Incremental generation - Updating code as specs evolve
  • Domain-specific languages - Higher-level abstractions

1.8. References