Expert Refresh

Comma separated

1) Programmers commonly use comma-separated lists as values in database:




2) What's the problem when using comma-separated lists?






Context

Bug-tracking application. Your original design allowed only one user to be the contact for each product. CREATE TABLE Products ( product_id SERIAL PRIMARY KEY, product_name VARCHAR(1000), account_id BIGINT UNSIGNED, -- many-to-one relationship FOREIGN KEY (account_id) REFERENCES Accounts(account_id) ); Next, you were requested to support assigning multiple users. Simple, change the database to store a list of user account identifiers separated by commas. Soon your boss approaches: "They tell me they can add five people only. If they try to add more, they get an error."

Antipattern

Programmers commonly use comma-separated lists to avoid creating an intersection table for a many-to-many relationship. CREATE TABLE Products ( product_id SERIAL PRIMARY KEY, product_name VARCHAR(1000), account_id VARCHAR(100), -- comma-separated list );

Problems

Querying Products for a Specific Account SELECT * FROM Products WHERE account_id REGEXP '[[:<:]]12[[:>:]]'; Pattern-matching expressions may return false matches and can’t benefit from indexes.

How to Recognize the Antipattern

“What is the greatest number of entries this list must support?” “Do you know how to match a word boundary in SQL?” “What character will never appear in any list entry?”

Legitimate Uses of the Antipattern

You might improve performance for some kinds of queries by applying denormalization to your database organization. Be conservative if you decide to employ denormalization. Start by using a normalized database organization.

Solution

Create an Intersection Table. Store acount_id it in a separate table (Contacts). CREATE TABLE Contacts ( product_id BIGINT UNSIGNED NOT NULL, account_id BIGINT UNSIGNED NOT NULL, PRIMARY KEY (product_id, account_id), FOREIGN KEY (product_id) REFERENCES Products(product_id), FOREIGN KEY (account_id) REFERENCES Accounts(account_id) ); When the table has foreign keys referencing two tables, it's called an intersection table. Querying Products by Account SELECT p.* FROM Products AS p JOIN Contacts AS c ON (p.account_id = c.account_id) WHERE c.account_id = 34; Some people resist queries that contain a join, thinking that they perform poorly. However, this query uses indexes much better. Validating Product IDs You can use a foreign key to validate the entries against a set of legitimate values in another table. You can also use SQL data types to restrict entries (to be sure all entries are legal values of that type, not nonsense entries like banana)


References