ExpertRefresh

Multicolumn attributes

1) In multicolumn attributes antipattern




2) Legitimate use of this antipattern






I can't count the number of times I have created a table to store people's contact information. Phone numbers are a little trickier. People use multiple numbers: a home number, a work number, a fax number, etc. There could be other unforeseen categories. How many columns is enough? Another example is if we want the bugs database to allow tags so we can categorize bugs. Some bugs may be categorized by the software subsystem (printing, reports, email). Other bugs may be categorized by the nature of the defect (crash, performance). A bad color choice could be tagged with cosmetic.

Antipattern: Create Multiple Columns

CREATE TABLE Bugs ( bug_id SERIAL PRIMARY KEY, description VARCHAR(1000), tag1 VARCHAR(20), tag2 VARCHAR(20), tag3 VARCHAR(20) ); As you assign tags to a given bug, you’d put values in one of these three columns. Unused columns remain null.

Problems

You probably don't want the same value to appear in multiple columns, but when you use the Multicolumn Attributes antipattern, the {{database can't prevent this}}. INSERT INTO Bugs (description, tag1, tag2, tag3) VALUES ('printing is slow', 'printing', 'performance', 'performance'); Another weakness of this design is that three columns might not be enough. One tactic is to guess at a moderate number of columns and expand later. However, this change is costly. ALTER TABLE Bugs ADD COLUMN tag4 VARCHAR(20); Database table that already contains data may require locking the entire table. Also, you must revisit every SQL statement in every application that uses this table}. If you {{miss any queries that need edits, it can lead to bugs that are difficult to detect.

Legitimate Uses of the Antipattern

In some cases, an attribute may have a fixed number of choices (reported_by, assign_to, tested_by).

Solution: Create Dependent Table

Best solution is to create a dependent table with one column for the multivalue attribute. Store the multiple values in multiple rows instead of multiple columns. CREATE TABLE Tags ( bug_id BIGINT UNSIGNED NOT NULL tag VARCHAR(20), PRIMARY KEY (bug_id, tag), FOREIGN KEY (bug_id) REFERENCES Bugs(bug_id) ); INSERT INTO Tags (bug_id, tag) VALUES (1234, 'crash'), (3456, 'printing'), (3456, 'performance'); Searching for bugs with a given tag is more straightforward. SELECT * FROM Bugs JOIN Tags USING (bug_id) WHERE tag = 'performance'; Even more complex searches, such as a bug that relates to two specific tags, is easy to read. SELECT * FROM Bugs JOIN Tags AS t1 USING (bug_id) JOIN Tags AS t2 USING (bug_id) WHERE t1.tag = 'printing' AND t2.tag = 'performance'; Simply insert or delete a row from the dependent table. The PRIMARY KEY constraint ensures that no duplication is allowed. You are not limited to three tags per bug.