ContextIn his content management database, he stored articles for publishing on a website. He used an intersection table for a many-to-many association between a table of articles and a table of tags.
CREATE TABLE ArticleTags ( id SERIAL PRIMARY KEY, article_id BIGINT UNSIGNED NOT NULL, tag_id BIGINT UNSIGNED NOT NULL, FOREIGN KEY (article_id) REFERENCES Articles (id), FOREIGN KEY (tag_id) REFERENCES Tags (id) );He was getting incorrect results from queries when counting the number of articles with a given tag. He knew that there were only five articles with the 'economy' tag, but the query was telling him there were seven.
AntipatternThis table had a primary key, but that primary key was't preventing duplicates in the columns that mattered. One remedy might be to create a UNIQUE constraint over the other two columns, but given that, why is the id column needed at all? The objective is to make sure every table has a primary key, but confusion about the nature of a primary key has resulted in an antipattern. Everyone who has been introduced to database design knows that a primary key is an important, even mandatory, part of a table. A primary key is guaranteed to be unique over all rows in the table. The tricky part is choosing a column to serve as the primary key.
ConventionBooks, articles, and programming frameworks have established a cultural convention that every database table must have a primary key column with the following characteristics: - The primary key's column name is ID - Its data type is a 32-bit or 64-bit integer - Unique values are generated automatically The presence of a column named id in every table is so common that this has become synonymous with a primary key. You might see an id column defined as the primary key simply for the sake of tradition, even when another column in the same table could be used as the natural primary key.
CREATE TABLE Bugs ( id SERIAL PRIMARY KEY, bug_id VARCHAR(10) UNIQUE, description VARCHAR(1000) );The bug_id column in the previous example has similar usage to the id, in that it serves to identify each row uniquely.
MeaningThe name id is so generic that it holds no meaning. This is especially important when you join two tables and they have the same primary key column name.
SELECT b.id, a.id FROM Bugs b JOIN Accounts a ON (b.assigned_to = a.id) WHERE b.status = 'OPEN';This is a problem especially in dynamic languages like PHP, when a query result is an associative array: one column overwrites the other unless you specify column aliases in your query.
Legitimate Uses of the AntipatternSome object-relational frameworks simplify development by assuming convention over configuration. They expect every table to define its primary key in the same way: as an integer pseudokey column named id. If you use such a framework, you may want to conform to its conventions, because this gives you access to other desirable features of the framework.
SolutionTo prevent duplicates, you could declare a UNIQUE constraint over the two columns besides id:
CREATE TABLE BugsProducts ( id SERIAL PRIMARY KEY, bug_id BIGINT UNSIGNED NOT NULL, product_id BIGINT UNSIGNED NOT NULL, UNIQUE KEY (bug_id, product_id), FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id), FOREIGN KEY (product_id) REFERENCES Products(product_id) );But if you need a unique constraint over those two columns anyway, the id column is superfluous. A primary key is a constraint, not a data type. You can declare a primary key on any column or set of columns, as long as the data types support indexing. You should also be able to define a column as an auto-incrementing integer without making it the primary key of the table. Choose sensible names for your primary key. For example, the primary key of the Bugs table should be bug_id.