Parsing Org-mode Files with Python
Table of Contents
1. Overview
Org-mode files are plain text, but their semantics are not. A heading carries TODO state, priority, tags, scheduled/deadline timestamps, properties, and a position in the document tree. Parsing this structure manually is error-prone. The orgparse library exposes org documents as a traversable Python object model.
This note demonstrates orgparse's API through working examples: installation, reading files, extracting metadata, navigating the tree, and querying TODO items. The examples use test data from the orgparse suite, available in orgparse-examples/.
2. Installation
orgparse is a pure-Python library with no dependencies.
pip install orgparse
Verify the install:
import orgparse print(orgparse.__version__)
The library works on Python 3.7+. No Emacs installation required.
3. Loading a document
orgparse.load() parses an org file and returns a root node.
import orgparse # Load from file path root = orgparse.load('site/research/orgparse-examples/00_simple.org') # The root node represents the entire document print(f"Root level: {root.level}") # 0 print(f"Children: {len(root.children)}")
Each node in the tree has:
.level— heading depth (0 for root, 1 for*, 2 for**, etc.).heading— heading text without markup.children— list of child nodes.parent— parent node reference
The root node's level is always 0. Top-level headings are its children.
4. Accessing heading metadata
Headings carry TODO state, priority, and tags. These attributes are accessible as node properties.
root = orgparse.load('site/research/orgparse-examples/00_simple.org') # Traverse all nodes for node in root[1:]: # Skip root print(f"{'*' * node.level} {node.heading}") if node.todo: print(f" TODO: {node.todo}") if node.priority: print(f" Priority: {node.priority}") if node.tags: print(f" Tags: {node.tags}")
Expected output (excerpt):
#+begin_example
5. Heading 0
TODO: TODO1 Tags: {'TAG1'}
5.1. Heading 1
TODO: TODO2 Tags: {'TAG1', 'TAG2'} #+end_example
Tags are a set. Tag inheritance is not automatic — nodes report only
their directly-attached tags. To get inherited tags, walk up
.get_parent() and union the sets.
6. Working with timestamps
Org timestamps appear in scheduling metadata (SCHEDULED, DEADLINE,
CLOSED) and inline in body text. orgparse exposes scheduling timestamps
as datetime objects.
import orgparse from datetime import datetime root = orgparse.load('site/research/orgparse-examples/01_attributes.org') for node in root[1:]: if node.scheduled: print(f"{node.heading}:") print(f" Scheduled: {node.scheduled.start}") if node.deadline: print(f" Deadline: {node.deadline.start}") if node.closed: print(f" Closed: {node.closed.start}")
The .start attribute returns a Python datetime.datetime or datetime.date
object depending on whether the timestamp includes time-of-day.
For inline timestamps in the body, parse node.body manually or use
node.get_timestamps(), which returns all timestamps (scheduled, deadline,
body text) as OrgTime objects.
7. Extracting properties
Org properties live in a :PROPERTIES: drawer under a heading. orgparse
exposes them as a dict.
root = orgparse.load('site/research/orgparse-examples/01_attributes.org') for node in root[1:]: if node.properties: print(f"{node.heading}:") for key, value in node.properties.items(): print(f" {key}: {value}")
Expected output (excerpt):
Heading with attributes: Effort: 1:20
Property values are strings. Parse them as needed (e.g., duration strings, numbers).
8. Navigating the tree
The document is a tree. Each node has .parent, .children, and
.get_parent() / .get_children() accessors.
root = orgparse.load('site/research/orgparse-examples/02_tree_struct.org') # Find a specific node by heading def find_heading(root, heading_text): for node in root[1:]: if node.heading == heading_text: return node return None node = find_heading(root, "Heading 2") if node: print(f"Found: {node.heading} (level {node.level})") print(f"Parent: {node.parent.heading if node.parent else 'None'}") print(f"Children: {[c.heading for c in node.children]}")
root[1:] iterates depth-first over all nodes except the root. This is
the primary traversal mechanism.
For breadth-first or level-filtered traversal, implement custom traversal:
def nodes_at_level(root, target_level): return [n for n in root[1:] if n.level == target_level] root = orgparse.load('site/research/orgparse-examples/02_tree_struct.org') level2 = nodes_at_level(root, 2) print(f"Level 2 headings: {[n.heading for n in level2]}")
9. Querying TODO items
A common use case: extract all TODO items with deadlines approaching.
import orgparse from datetime import datetime, timedelta root = orgparse.load('site/research/orgparse-examples/03_repeated_tasks.org') now = datetime.now() week_ahead = now + timedelta(days=7) for node in root[1:]: # Only TODO items if not node.todo or node.todo == "DONE": continue # With deadlines in the next week if node.deadline and node.deadline.start: deadline = node.deadline.start # Convert date to datetime for comparison if needed if hasattr(deadline, 'date'): deadline = deadline.date() print(f"TODO: {node.heading}") print(f" State: {node.todo}") print(f" Deadline: {node.deadline.start}")
This pattern generalizes: filter nodes by TODO state, tags, properties, or timestamp predicates.
10. Converting to structured data
Export org documents as JSON for downstream processing.
import orgparse import json def node_to_dict(node): """Convert an org node to a dict.""" return { 'heading': node.heading, 'level': node.level, 'todo': node.todo, 'tags': list(node.tags) if node.tags else [], 'properties': node.properties, 'body': node.body, 'children': [node_to_dict(c) for c in node.children] } root = orgparse.load('site/research/orgparse-examples/00_simple.org') doc = { 'title': root.heading or 'Untitled', 'children': [node_to_dict(c) for c in root.children] } print(json.dumps(doc, indent=2)[:500]) # First 500 chars
This exports the full tree. For flat output (e.g., CSV of TODO items),
iterate root[1:] and collect fields directly.
11. Practical use cases
11.1. Automating weekly reviews
Extract completed tasks from the past week:
from datetime import datetime, timedelta root = orgparse.load('weekly-tasks.org') week_ago = datetime.now() - timedelta(days=7) completed = [] for node in root[1:]: if node.todo == "DONE" and node.closed: closed = node.closed.start if closed >= week_ago: completed.append(node.heading) print(f"Completed this week: {len(completed)}") for task in completed: print(f" - {task}")
11.2. Generating issue trackers from org files
Convert org TODO items to GitHub issues:
import orgparse root = orgparse.load('project-tasks.org') for node in root[1:]: if node.todo == "TODO": title = node.heading body = node.body or "(no description)" tags = ",".join(node.tags) if node.tags else "" # gh CLI invocation (pseudo-code) # subprocess.run(['gh', 'issue', 'create', # '-t', title, '-b', body, '-l', tags]) print(f"Would create issue: {title}")
11.3. Publishing org to static sites
Extract frontmatter and convert body to Markdown:
import orgparse root = orgparse.load('post.org') # Extract document-level properties title = root.get_property('TITLE', 'Untitled') date = root.get_property('DATE', '') print(f"---") print(f"title: {title}") print(f"date: {date}") print(f"---") print() print("(body conversion would go here)")
For full Markdown conversion, use pandoc or implement org-to-markdown
manually. orgparse handles the parsing; conversion is a separate concern.
12. Limitations
orgparse is read-only. It does not write org files or preserve formatting. For round-trip editing, use Emacs or a dedicated org writer library.
The library parses structure, not evaluation. Babel source blocks are accessible as strings, but orgparse does not execute them or expand macros.
Tag inheritance must be implemented manually. Org-mode does this in Emacs; orgparse exposes per-node tags only.
13. See also
- Annotated examples — test data from the orgparse suite
- orgparse repository — source and documentation
- Org syntax specification — the grammar orgparse implements