ExpertRefresh

Sql / Clone tables  

1) Quering a table with few rows




2) When do you have clones antipattern?




3) When is legitim to use this antipattern?






This is an example of how a database design that was intended to simplify work instead created more work. CREATE TABLE Customers ( customer_id NUMBER(9) PRIMARY KEY, contact_info VARCHAR(255), business_type VARCHAR(20), revenue NUMBER(9,2) ); The Sales division needed to break down the revenue by year so they could track recently active customers. They decided to add a series of new columns, each column's name indicating the year it covered. ALTER TABLE Customers ADD (revenue2002 NUMBER(9,2)); ALTER TABLE Customers ADD (revenue2003 NUMBER(9,2)); ALTER TABLE Customers ADD (revenue2004 NUMBER(9,2)); Each year, they needed to add one more column. A database administrator was responsible for managing Oracle’s tablespaces. So each year, they had to have a series of meetings, schedule a data migration to restructure the tablespace, and add the new column. Ultimately they wasted a lot of time and money. Performance degrades for any database query as the volume of data goes up. Using indexes intelligently helps, but nevertheless the tables grow.

Antipattern: Clone Tables or Columns

In the television series Star Trek, tribbles are small furry animals kept as pets. Tribbles are very appealing at first, but soon they reveal their tendency to reproduce out of control, and managing the overpopulation of tribbles becomes a serious problem. We know from experience that querying a table with few rows is quicker than querying a table with many rows, all other things being equal. This leads to a common error that we must make every table contain fewer rows, no matter what.

Problems

You discover that some 2010 bugs were entered in the Bugs_2009 by mistake. There’s no way to limit the data relative to the name of its table automatically, but you can declare a CHECK constraint in each of your tables. CREATE TABLE Bugs_2009 ( -- other columns date_reported DATE CHECK (EXTRACT(YEAR FROM date_reported) = 2009) ); If a dependent table like Comments references Bugs, the dependent table cannot declare a foreign key. A foreign key must specify a single table, but in this case the parent table is split into many.

Legitimate Uses of the Antipattern

One good use of manually splitting tables is for archiving. Archiving keeps the data in a compatible table structure for occasional analysis but allows queries against current data to run with greater performance. On database level, splitting the database in multiple databases, sensibly makes database administration tasks easier after the database size passes a certain threshold.

Solution: Partition and Normalize

There are better ways to improve performance if a table gets too large, instead of splitting the table manually. These include horizontal partitioning, vertical partitioning, and using dependent tables.

Horizontal Partitioning

Physically, the table is split, but you can still execute SQL statements against the table as though it were whole. CREATE TABLE Bugs ( bug_id SERIAL PRIMARY KEY, -- other columns date_reported DATE ) PARTITION BY HASH ( YEAR(date_reported) ) PARTITIONS 4; Advantages over splitting the table manually are that rows are never placed in the wrong split table, even if the value of date_reported column is updated, and you can run queries against the Bugs table without the need to reference individual split tables. When you have rows spanning more than four years, one of the partitions will be used to store more than one year's worth of data. This will continue as the years go on. Partitioning is not defined in the SQL standard. Nevertheless, some form of partitioning is now supported by every major brand of database.

Vertical Partitioning

Whereas horizontal partitioning splits a table by rows, vertical partitioning splits a table by columns. Splitting a table by columns can have advantages when some columns are bulky or seldom needed. BLOB and TEXT columns have variable size, and they may be very large. Many database brands automatically store columns with these data types separately from the other columns of a given row. If you run a query without referencing any BLOB or TEXT columns of a table, you can access the other columns more efficiently. But if you use the column wildcard * in your query, the database retrieves all columns from that table, including any BLOB or TEXT columns. We might store a copy of the installation file for the respective product. This file is typically a self-extracting archive with an extension such as .exe. The files are usually very large, but a BLOB column can store binary data of enormous size. In most queries against that table, you wouldn’t need the installer. This could lead to performance problems if you're in the habit of retrieving all columns using the * wildcard.

Mysql varchar

For example, in MySQL's MyISAM storage engine, querying a table is most efficient when the rows are of fixed size. VARCHAR is a variable-length data type, so the presence of a single column with that data type in a table prevents the table from gaining that advantage. Store all variablelength columns in a separate table, then queries against the primary table can benefit.


References