Memory Refresh
SQLAntipatterns



Parent ID tree





Context: Readers can add comments and even reply to each other

You choose a simple solution to track these reply: parent_id. This design is called Adjacent List. It soon becomes clear, however, that it's hard to retrieve a long chain of replies in a single SQL query. CREATE TABLE Comments ( comment_id SERIAL PRIMARY KEY, parent_id BIGINT UNSIGNED, bug_id BIGINT UNSIGNED NOT NULL, author BIGINT UNSIGNED NOT NULL, comment_date DATETIME NOT NULL, comment TEXT NOT NULL, FOREIGN KEY (parent_id) REFERENCES Comments(comment_id), FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id), FOREIGN KEY (author) REFERENCES Accounts(account_id) );

Antipattern: Can't get all descendants

Adjacency List can be an antipattern when it’s the default choice of so many developers yet it fails to be a solution for one of the most common tasks you need to do with a tree: query all descendants. You can retrieve a comment and its immediate children using a relatively simple query: SELECT c1.*, c2.* FROM Comments c1 LEFT OUTER JOIN Comments c2 ON c2.parent_id = c1.comment_id; The following query retrieves a tree of depth up to four but cannot retrieve the tree beyond that depth: SELECT c1.*, c2.*, c3.*, c4.* FROM Comments c1 -- 1st level LEFT OUTER JOIN Comments c2 ON c2.parent_id = c1.comment_id -- 2nd level LEFT OUTER JOIN Comments c3 ON c3.parent_id = c2.comment_id -- 3rd level LEFT OUTER JOIN Comments c4 ON c4.parent_id = c3.comment_id; -- 4th level This makes it hard to compute an aggregate such as COUNT( ). An inventory application never needed to create a grouping of equipment with a tree deeper than a single parent-child relationship. Don’t Over-Engineer.

Solution: Path Enumeration

A UNIX path like /usr/local/lib/ is a Path Enumeration of the filesystem. CREATE TABLE Comments ( comment_id SERIAL PRIMARY KEY, path VARCHAR(1000), bug_id BIGINT UNSIGNED NOT NULL, author BIGINT UNSIGNED NOT NULL, comment_date DATETIME NOT NULL, comment TEXT NOT NULL, FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id), FOREIGN KEY (author) REFERENCES Accounts(account_id) ); For example, to find ancestors of comment #7, whose path is 1/4/6/7/, do this: SELECT * FROM Comments AS c WHERE '1/4/6/7/' LIKE c.path || '%';

Solution 2: Closure Table

Create another table TreePaths, with two columns, each of which is a foreign key to the Comments table. CREATE TABLE Comments ( comment_id SERIAL PRIMARY KEY, bug_id BIGINT UNSIGNED NOT NULL, author BIGINT UNSIGNED NOT NULL, comment_date DATETIME NOT NULL, comment TEXT NOT NULL, FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id), FOREIGN KEY (author) REFERENCES Accounts(account_id) ); CREATE TABLE TreePaths ( ancestor BIGINT UNSIGNED NOT NULL, descendant BIGINT UNSIGNED NOT NULL, PRIMARY KEY(ancestor, descendant), FOREIGN KEY (ancestor) REFERENCES Comments(comment_id), FOREIGN KEY (descendant) REFERENCES Comments(comment_id) ); Store one row in this table for each pair of nodes in the tree that shares an ancestor/descendant relationship, even if they are separated by multiple levels in the tree. Also add a row for each node to reference itself. To retrieve descendants of comment #4, match rows in TreePaths where the ancestor is 4: SELECT c.* FROM Comments AS c JOIN TreePaths AS t ON c.comment_id = t.descendant WHERE t.ancestor = 4; To retrieve ancestors of comment #6, match rows in TreePaths where the descendant is 6: SELECT c.* FROM Comments AS c JOIN TreePaths AS t ON c.comment_id = t.ancestor WHERE t.descendant = 6; To insert a new leaf node, for instance a new child of comment #5, first insert the self-referencing row. INSERT INTO TreePaths (ancestor, descendant) SELECT t.ancestor, 8 FROM TreePaths AS t WHERE t.descendant = 5 UNION ALL SELECT 8, 8; To delete a complete subtree, for instance comment #4 and its descendants: DELETE FROM TreePaths WHERE descendant IN (SELECT descendant FROM TreePaths WHERE ancestor = 4);