Memory Refresh
SQLAntipatterns



Mandatory ID





Context: Article tags with incorrect agregate results

CREATE TABLE ArticleTags ( id SERIAL PRIMARY KEY, article_id BIGINT UNSIGNED NOT NULL, tag_id BIGINT UNSIGNED NOT NULL, FOREIGN KEY (article_id) REFERENCES Articles (id), FOREIGN KEY (tag_id) REFERENCES Tags (id) ); This table had a primary key, but that primary key was't preventing duplicates. The confusion about the nature of a primary key has resulted in an antipattern. The tricky part is choosing a column to serve as the primary key.

Antipattern: Db tables with id primary key, but not on key that matters

Books, articles, and programming frameworks have established a cultural convention: - The primary key's column name is ID - Its data type is a 32-bit or 64-bit integer - Unique values are generated automatically The presence of a column named id in every table is so common that this has become synonymous with a primary key. You might even see an id column defined as the primary key simply for the sake of tradition. The bug_id column has similar usage to the id, in that it serves to identify each row uniquely. CREATE TABLE Bugs ( id SERIAL PRIMARY KEY, bug_id VARCHAR(10) UNIQUE, description VARCHAR(1000) ); The name id is so generic that it holds no meaning. This is especially important when you join two tables and they have the same primary key column name. SELECT b.id, a.id FROM Bugs b JOIN Accounts a ON (b.assigned_to = a.id) WHERE b.status = 'OPEN'; This is a problem especially in dynamic languages like PHP, when a query result is an associative array: one column overwrites the other unless you specify column aliases in your query. -- Legitimate Uses of the Antipattern Some object-relational frameworks simplify development by assuming convention over configuration. They expect every table to define its primary key in the same way: as an integer pseudokey column named id. If you use such a framework, you may want to conform to its conventions, because this gives you access to other desirable features of the framework.

Solution: Unique contraints on the key that matters

CREATE TABLE BugsProducts ( id SERIAL PRIMARY KEY, bug_id BIGINT UNSIGNED NOT NULL, product_id BIGINT UNSIGNED NOT NULL, UNIQUE KEY (bug_id, product_id), FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id), FOREIGN KEY (product_id) REFERENCES Products(product_id) ); To prevent duplicates, you could declare a UNIQUE constraint over the two columns besides id: But if you need a unique constraint over those two columns anyway, the id column is superfluous. A primary key is a constraint, not a data type. You can declare a primary key on any column or set of columns, as long as the data types support indexing. You should also be able to define a column as an auto-incrementing integer without making it the primary key of the table. Choose sensible names for your primary key. For example, the primary key of the Bugs table should be bug_id.