Comments
Context
Readers can add comments and even reply to each other. You choose a simple solution to track these reply: parent_id. This design is called Adjacent List. But it's hard to retrieve a long chain of replies in a single SQL query.
CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
parent_id BIGINT UNSIGNED,
bug_id BIGINT UNSIGNED NOT NULL,
author BIGINT UNSIGNED NOT NULL,
comment_date DATETIME NOT NULL,
comment TEXT NOT NULL,
FOREIGN KEY (parent_id) REFERENCES Comments(comment_id),
FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id),
FOREIGN KEY (author) REFERENCES Accounts(account_id)
);
Antipattern
In Adjacent List design you cannot get all descendants. It can't be an antipattern when it’s the default choice of so many developers. Yet it fails to be a solution for one of the most common tasks. You cannot query all descendants. You can retrieve a comment and its immediate children using.
SELECT c1.*, c2.*
FROM Comments c1 LEFT OUTER JOIN Comments c2
ON c2.parent_id = c1.comment_id;
The following query retrieves a tree of depth up to four.
This makes it hard to compute an aggregate such as COUNT( ).
SELECT c1.*, c2.*, c3.*, c4.*
FROM Comments c1 -- 1st level
LEFT OUTER JOIN Comments c2
ON c2.parent_id = c1.comment_id -- 2nd level
LEFT OUTER JOIN Comments c3
ON c3.parent_id = c2.comment_id -- 3rd level
LEFT OUTER JOIN Comments c4
ON c4.parent_id = c3.comment_id; -- 4th level
Don’t over-engineer
For example, an inventory application. It never needed to create a grouping of equipment with a tree deeper than one.Adjacent List
Solution 1
A UNIX path like /usr/local/lib/ is a Path Enumeration of the filesystem.
CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
path VARCHAR(1000),
bug_id BIGINT UNSIGNED NOT NULL,
author BIGINT UNSIGNED NOT NULL,
comment_date DATETIME NOT NULL,
comment TEXT NOT NULL,
FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id),
FOREIGN KEY (author) REFERENCES Accounts(account_id)
);
For example, to find ancestors of comment #7, whose path is 1/4/6/7/, do this:
SELECT *
FROM Comments AS c
WHERE '1/4/6/7/' LIKE c.path || '%';
Closure Table
Solution 2
Table TreePaths, with two columns, each of which is a foreign key to the Comments.
CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
bug_id BIGINT UNSIGNED NOT NULL,
author BIGINT UNSIGNED NOT NULL,
comment_date DATETIME NOT NULL,
comment TEXT NOT NULL,
FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id),
FOREIGN KEY (author) REFERENCES Accounts(account_id)
);
CREATE TABLE TreePaths (
ancestor BIGINT UNSIGNED NOT NULL,
descendant BIGINT UNSIGNED NOT NULL,
PRIMARY KEY(ancestor, descendant),
FOREIGN KEY (ancestor) REFERENCES Comments(comment_id),
FOREIGN KEY (descendant) REFERENCES Comments(comment_id)
);
Store one row in this table for each pair of nodes.
Also add a row for each node to reference itself.
To retrieve descendants of comment #4, match rows in TreePaths.
SELECT c.*
FROM Comments AS c
JOIN TreePaths AS t ON c.comment_id = t.descendant
WHERE t.ancestor = 4;
To insert a new leaf node, first insert the self-referencing row.
INSERT INTO TreePaths (ancestor, descendant)
SELECT t.ancestor, 8
FROM TreePaths AS t
WHERE t.descendant = 5
UNION ALL
SELECT 8, 8;
To delete a complete subtree.
DELETE FROM TreePaths
WHERE descendant IN (SELECT descendant
FROM TreePaths
WHERE ancestor = 4);
Questions and answers:
Clink on Option to Answer
1. What antipattern uses parent_id?
- a) Adjacent List
- b) Path Enumeration
2. What's the main problem when using parent_id?
- a) You can't retrieve all descendants
- b) It's hard to retrieve its immediate children
3. Closure table solution
- a) uses another table with ancestor / descendant
- b) uses ancestor within the current table
4. In Closure table design
- a) you need to add a row for each node to reference itself
- b) you need to add a row for each ancestor and descendant