This is an example of how a database design that was intended to simplify work instead created more work.
CREATE TABLE Customers ( customer_id NUMBER(9) PRIMARY KEY, contact_info VARCHAR(255), business_type VARCHAR(20), revenue NUMBER(9,2) );The Sales division needed to break down the revenue by year so they could track recently active customers. They decided to add a series of new columns, each column's name indicating the year it covered.
ALTER TABLE Customers ADD (revenue2002 NUMBER(9,2)); ALTER TABLE Customers ADD (revenue2003 NUMBER(9,2)); ALTER TABLE Customers ADD (revenue2004 NUMBER(9,2));Each year, they needed to add one more column. A database administrator was responsible for managing Oracle’s tablespaces. So each year, they had to have a series of meetings, schedule a data migration to restructure the tablespace, and add the new column. Ultimately they wasted a lot of time and money. Performance degrades for any database query as the volume of data goes up. Using indexes intelligently helps, but nevertheless the tables grow.
Antipattern: Clone Tables or ColumnsIn the television series Star Trek, tribbles are small furry animals kept as pets. Tribbles are very appealing at first, but soon they reveal their tendency to reproduce out of control, and managing the overpopulation of tribbles becomes a serious problem. We know from experience that querying a table with few rows is quicker than querying a table with many rows, all other things being equal. This leads to a common error that we must make every table contain fewer rows, no matter what.
ProblemsYou discover that some 2010 bugs were entered in the Bugs_2009 by mistake. There’s no way to limit the data relative to the name of its table automatically, but you can declare a CHECK constraint in each of your tables.
CREATE TABLE Bugs_2009 ( -- other columns date_reported DATE CHECK (EXTRACT(YEAR FROM date_reported) = 2009) );If a dependent table like Comments references Bugs, the dependent table cannot declare a foreign key. A foreign key must specify a single table, but in this case the parent table is split into many.
Legitimate Uses of the AntipatternOne good use of manually splitting tables is for archiving. Archiving keeps the data in a compatible table structure for occasional analysis but allows queries against current data to run with greater performance. On database level, splitting the database in multiple databases, sensibly makes database administration tasks easier after the database size passes a certain threshold.
Solution: Partition and NormalizeThere are better ways to improve performance if a table gets too large, instead of splitting the table manually. These include horizontal partitioning, vertical partitioning, and using dependent tables.
Horizontal PartitioningPhysically, the table is split, but you can still execute SQL statements against the table as though it were whole.
CREATE TABLE Bugs ( bug_id SERIAL PRIMARY KEY, -- other columns date_reported DATE ) PARTITION BY HASH ( YEAR(date_reported) ) PARTITIONS 4;Advantages over splitting the table manually are that rows are never placed in the wrong split table, even if the value of date_reported column is updated, and you can run queries against the Bugs table without the need to reference individual split tables. When you have rows spanning more than four years, one of the partitions will be used to store more than one year's worth of data. This will continue as the years go on. Partitioning is not defined in the SQL standard. Nevertheless, some form of partitioning is now supported by every major brand of database.