Parsing Org-mode Files with Python

Table of Contents

1. Overview

Org-mode files are plain text, but their semantics are not. A heading carries TODO state, priority, tags, scheduled/deadline timestamps, properties, and a position in the document tree. Parsing this structure manually is error-prone. The orgparse library exposes org documents as a traversable Python object model.

This note demonstrates orgparse's API through working examples: installation, reading files, extracting metadata, navigating the tree, and querying TODO items. The examples use test data from the orgparse suite, available in orgparse-examples/.

2. Installation

orgparse is a pure-Python library with no dependencies.

pip install orgparse

Verify the install:

import orgparse
print(orgparse.__version__)

The library works on Python 3.7+. No Emacs installation required.

3. Loading a document

orgparse.load() parses an org file and returns a root node.

import orgparse

# Load from file path
root = orgparse.load('site/research/orgparse-examples/00_simple.org')

# The root node represents the entire document
print(f"Root level: {root.level}")  # 0
print(f"Children: {len(root.children)}")

Each node in the tree has:

  • .level — heading depth (0 for root, 1 for *, 2 for **, etc.)
  • .heading — heading text without markup
  • .children — list of child nodes
  • .parent — parent node reference

The root node's level is always 0. Top-level headings are its children.

4. Accessing heading metadata

Headings carry TODO state, priority, and tags. These attributes are accessible as node properties.

root = orgparse.load('site/research/orgparse-examples/00_simple.org')

# Traverse all nodes
for node in root[1:]:  # Skip root
    print(f"{'*' * node.level} {node.heading}")
    if node.todo:
        print(f"  TODO: {node.todo}")
    if node.priority:
        print(f"  Priority: {node.priority}")
    if node.tags:
        print(f"  Tags: {node.tags}")

Expected output (excerpt):

#+begin_example

5. Heading 0

TODO: TODO1 Tags: {'TAG1'}

5.1. Heading 1

TODO: TODO2 Tags: {'TAG1', 'TAG2'} #+end_example

Tags are a set. Tag inheritance is not automatic — nodes report only their directly-attached tags. To get inherited tags, walk up .get_parent() and union the sets.

6. Working with timestamps

Org timestamps appear in scheduling metadata (SCHEDULED, DEADLINE, CLOSED) and inline in body text. orgparse exposes scheduling timestamps as datetime objects.

import orgparse
from datetime import datetime

root = orgparse.load('site/research/orgparse-examples/01_attributes.org')

for node in root[1:]:
    if node.scheduled:
        print(f"{node.heading}:")
        print(f"  Scheduled: {node.scheduled.start}")
    if node.deadline:
        print(f"  Deadline: {node.deadline.start}")
    if node.closed:
        print(f"  Closed: {node.closed.start}")

The .start attribute returns a Python datetime.datetime or datetime.date object depending on whether the timestamp includes time-of-day.

For inline timestamps in the body, parse node.body manually or use node.get_timestamps(), which returns all timestamps (scheduled, deadline, body text) as OrgTime objects.

7. Extracting properties

Org properties live in a :PROPERTIES: drawer under a heading. orgparse exposes them as a dict.

root = orgparse.load('site/research/orgparse-examples/01_attributes.org')

for node in root[1:]:
    if node.properties:
        print(f"{node.heading}:")
        for key, value in node.properties.items():
            print(f"  {key}: {value}")

Expected output (excerpt):

Heading with attributes:
  Effort: 1:20

Property values are strings. Parse them as needed (e.g., duration strings, numbers).

8. Navigating the tree

The document is a tree. Each node has .parent, .children, and .get_parent() / .get_children() accessors.

root = orgparse.load('site/research/orgparse-examples/02_tree_struct.org')

# Find a specific node by heading
def find_heading(root, heading_text):
    for node in root[1:]:
        if node.heading == heading_text:
            return node
    return None

node = find_heading(root, "Heading 2")
if node:
    print(f"Found: {node.heading} (level {node.level})")
    print(f"Parent: {node.parent.heading if node.parent else 'None'}")
    print(f"Children: {[c.heading for c in node.children]}")

root[1:] iterates depth-first over all nodes except the root. This is the primary traversal mechanism.

For breadth-first or level-filtered traversal, implement custom traversal:

def nodes_at_level(root, target_level):
    return [n for n in root[1:] if n.level == target_level]

root = orgparse.load('site/research/orgparse-examples/02_tree_struct.org')
level2 = nodes_at_level(root, 2)
print(f"Level 2 headings: {[n.heading for n in level2]}")

9. Querying TODO items

A common use case: extract all TODO items with deadlines approaching.

import orgparse
from datetime import datetime, timedelta

root = orgparse.load('site/research/orgparse-examples/03_repeated_tasks.org')

now = datetime.now()
week_ahead = now + timedelta(days=7)

for node in root[1:]:
    # Only TODO items
    if not node.todo or node.todo == "DONE":
        continue

    # With deadlines in the next week
    if node.deadline and node.deadline.start:
        deadline = node.deadline.start
        # Convert date to datetime for comparison if needed
        if hasattr(deadline, 'date'):
            deadline = deadline.date()

        print(f"TODO: {node.heading}")
        print(f"  State: {node.todo}")
        print(f"  Deadline: {node.deadline.start}")

This pattern generalizes: filter nodes by TODO state, tags, properties, or timestamp predicates.

10. Converting to structured data

Export org documents as JSON for downstream processing.

import orgparse
import json

def node_to_dict(node):
    """Convert an org node to a dict."""
    return {
        'heading': node.heading,
        'level': node.level,
        'todo': node.todo,
        'tags': list(node.tags) if node.tags else [],
        'properties': node.properties,
        'body': node.body,
        'children': [node_to_dict(c) for c in node.children]
    }

root = orgparse.load('site/research/orgparse-examples/00_simple.org')
doc = {
    'title': root.heading or 'Untitled',
    'children': [node_to_dict(c) for c in root.children]
}

print(json.dumps(doc, indent=2)[:500])  # First 500 chars

This exports the full tree. For flat output (e.g., CSV of TODO items), iterate root[1:] and collect fields directly.

11. Practical use cases

11.1. Automating weekly reviews

Extract completed tasks from the past week:

from datetime import datetime, timedelta

root = orgparse.load('weekly-tasks.org')
week_ago = datetime.now() - timedelta(days=7)

completed = []
for node in root[1:]:
    if node.todo == "DONE" and node.closed:
        closed = node.closed.start
        if closed >= week_ago:
            completed.append(node.heading)

print(f"Completed this week: {len(completed)}")
for task in completed:
    print(f"  - {task}")

11.2. Generating issue trackers from org files

Convert org TODO items to GitHub issues:

import orgparse

root = orgparse.load('project-tasks.org')

for node in root[1:]:
    if node.todo == "TODO":
        title = node.heading
        body = node.body or "(no description)"
        tags = ",".join(node.tags) if node.tags else ""

        # gh CLI invocation (pseudo-code)
        # subprocess.run(['gh', 'issue', 'create',
        #                 '-t', title, '-b', body, '-l', tags])
        print(f"Would create issue: {title}")

11.3. Publishing org to static sites

Extract frontmatter and convert body to Markdown:

import orgparse

root = orgparse.load('post.org')

# Extract document-level properties
title = root.get_property('TITLE', 'Untitled')
date = root.get_property('DATE', '')

print(f"---")
print(f"title: {title}")
print(f"date: {date}")
print(f"---")
print()
print("(body conversion would go here)")

For full Markdown conversion, use pandoc or implement org-to-markdown manually. orgparse handles the parsing; conversion is a separate concern.

12. Limitations

orgparse is read-only. It does not write org files or preserve formatting. For round-trip editing, use Emacs or a dedicated org writer library.

The library parses structure, not evaluation. Babel source blocks are accessible as strings, but orgparse does not execute them or expand macros.

Tag inheritance must be implemented manually. Org-mode does this in Emacs; orgparse exposes per-node tags only.

13. See also