ExpertRefresh

SQL / Parent id  

1) What pattern uses parent_id?




2) What's the main problem when using parent_id?




3) When is legitimate to use parent_id?







Context

A modern website, where readers can contribute comments and even reply to each other.

Antipattern

You choose a simple solution to track these reply: each comment references the comment to which it replies. It soon becomes clear, however, that it's hard to retrieve a long chain of replies in a single SQL query. The threads can have an unlimited depth. CREATE TABLE Comments ( comment_id SERIAL PRIMARY KEY, parent_id BIGINT UNSIGNED, bug_id BIGINT UNSIGNED NOT NULL, author BIGINT UNSIGNED NOT NULL, comment_date DATETIME NOT NULL, comment TEXT NOT NULL, FOREIGN KEY (parent_id) REFERENCES Comments(comment_id), FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id), FOREIGN KEY (author) REFERENCES Accounts(account_id) ); This design is called Adjacent List. It’s probably the most common design software developers use to store hierarchical data.

Problems

Adjacency List can be an antipattern when it’s the default choice of so many developers yet it fails to be a solution for one of the most common tasks you need to do with a tree: query all descendants. You can retrieve a comment and its immediate children using a relatively simple query: SELECT c1.*, c2.* FROM Comments c1 LEFT OUTER JOIN Comments c2 ON c2.parent_id = c1.comment_id; The following query retrieves a tree of depth up to four but cannot retrieve the tree beyond that depth: SELECT c1.*, c2.*, c3.*, c4.* FROM Comments c1 -- 1st level LEFT OUTER JOIN Comments c2 ON c2.parent_id = c1.comment_id -- 2nd level LEFT OUTER JOIN Comments c3 ON c3.parent_id = c2.comment_id -- 3rd level LEFT OUTER JOIN Comments c4 ON c4.parent_id = c3.comment_id; -- 4th level This makes it hard to compute an aggregate such as COUNT( ).

How to Recognize the Antipattern

How many levels do we need to support in trees?” “I dread ever having to touch the code that manages the tree data structures.” “I need to run a script periodically to clean up the orphaned rows in the trees.”

Legitimate Uses of the Antipattern

An inventory application never needed to create a grouping of equipment with a tree deeper than a single parent-child relationship. Don’t Over-Engineer.

Solution: Use Alternative Tree Models

Path Enumeration

A UNIX path like /usr/local/lib/ is a Path Enumeration of the filesystem. CREATE TABLE Comments ( comment_id SERIAL PRIMARY KEY, path VARCHAR(1000), bug_id BIGINT UNSIGNED NOT NULL, author BIGINT UNSIGNED NOT NULL, comment_date DATETIME NOT NULL, comment TEXT NOT NULL, FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id), FOREIGN KEY (author) REFERENCES Accounts(account_id) ); For example, to find ancestors of comment #7, whose path is 1/4/6/7/, do this: SELECT * FROM Comments AS c WHERE '1/4/6/7/' LIKE c.path || '%';

Closure Table

Create another table TreePaths, with two columns, each of which is a foreign key to the Comments table. CREATE TABLE Comments ( comment_id SERIAL PRIMARY KEY, bug_id BIGINT UNSIGNED NOT NULL, author BIGINT UNSIGNED NOT NULL, comment_date DATETIME NOT NULL, comment TEXT NOT NULL, FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id), FOREIGN KEY (author) REFERENCES Accounts(account_id) ); CREATE TABLE TreePaths ( ancestor BIGINT UNSIGNED NOT NULL, descendant BIGINT UNSIGNED NOT NULL, PRIMARY KEY(ancestor, descendant), FOREIGN KEY (ancestor) REFERENCES Comments(comment_id), FOREIGN KEY (descendant) REFERENCES Comments(comment_id) ); Store one row in this table for each pair of nodes in the tree that shares an ancestor/descendant relationship, even if they are separated by multiple levels in the tree. Also add a row for each node to reference itself.

Queries

To retrieve descendants of comment #4, match rows in TreePaths where the ancestor is 4: SELECT c.* FROM Comments AS c JOIN TreePaths AS t ON c.comment_id = t.descendant WHERE t.ancestor = 4; To retrieve ancestors of comment #6, match rows in TreePaths where the descendant is 6: SELECT c.* FROM Comments AS c JOIN TreePaths AS t ON c.comment_id = t.ancestor WHERE t.descendant = 6;

Insert

To insert a new leaf node, for instance a new child of comment #5, first insert the self-referencing row. INSERT INTO TreePaths (ancestor, descendant) SELECT t.ancestor, 8 FROM TreePaths AS t WHERE t.descendant = 5 UNION ALL SELECT 8, 8;

Delete

To delete a complete subtree, for instance comment #4 and its descendants: DELETE FROM TreePaths WHERE descendant IN (SELECT descendant FROM TreePaths WHERE ancestor = 4);


References