Database sharding and table partitioning rules

##Sub database and sub table

Taken from: http://rdc.taobao.com/team/jm/archives/590

There are usually several goals:

Data discreteness, data distributed across multiple databases, mods in multiple tables, round robin, dayofweek, dayofMonth
Avoid hot data on a database table
Large data migration during expansion

The following solution is based on whole table migration but does not involve row level data migration within the table

Horizontal scaling out scheme mode one

Stage 1: A library DB0 is divided into 4 tables, with id% 4 dbRule: “DB0” tbRule: “t”+(id% 4)

Phase 2: Add db1 library, relocate the entire table of t2 and t3 to db1 dbRule: “DB”+(id% 4)/2) tbRule: “t”+(id% 4)

Phase 3: Increase the DB2 and DB3 libraries, migrate the entire table from t1 to DB2, and migrate the entire table from t3 to DB3. For rule expression, we swap the names and positions of DB1 and DB2 through internal name mapping or other methods to obtain the following dbRule: “DB”+(id% 4) tbRule: “t”+(id% 4)

If further database design and deployment can achieve one hard drive for each table, then the expansion process only needs to remove a certain hard drive from the original machine and insert it into the new machine to complete the entire table migration! It can greatly shorten downtime. Specifically, on MySQL, the database can be used as a table. Start 4 database instances on a physical machine, multiply the machine each time, and directly migrate the database to the new machine. This way, the rules do not need to be changed from start to finish, always: dbRule: “DB”+(id% 4) tbRule: “t”+(id% 4)

The drawback is that logically, it always maintains 4 libraries and 4 tables, with one library for each table. During the process from one database to four databases, the data volume of a single table has been increasing. When the data volume of a single table exceeds a certain range, it may cause performance issues. For example, issues with indexing and cleaning historical data. In addition, when the number of reserved tables is exhausted and reaches the stage of 1 table per 4 physical libraries, it is inevitable to start expanding from the tables.

Horizontal scaling out scheme mode 2
Phase 1: One database, two table partitioning rules dbRule: “DB0” partitioning rule tbRule: “t”+(ID% 2)

Phase 2: When the data volume of a single database approaches 10 million and the data volume of a single table approaches 5 million, expansion is carried out (the data volume is only an example, and the specific expansion capacity depends on the database and actual pressure conditions): Add a database DB1 and migrate the entire DB0.t1 table to the new database DB1. Add one table to each database, and write the future 10M-20M data mod2 into these two tables: t0_1, t1_1: database rule dbRule: “DB”+(id% 2) table rule tbRule:

if(id < 1千万){
        return "t"+ (id % 2);   //1千万之前的数据，仍然放在t0和t1表。t1表从DB0搬迁到DB1库
    }else if(id < 2千万){
        return "t"+ (id % 2) +"_1"; //1千万之后的数据，各放到两个库的两个表中: t0_1,t1_1
    }else{
        throw new IllegalArgumentException("id outof range[20000000]:" + id);
    }

In this way, the new data after 10M will be evenly distributed in DB0 and DB1; Inserting updates and querying hotspots can still be evenly distributed across each library. Each database contains both old data and constantly growing new data. The data of each table is still controlled below 5 million.

Stage 3: When the capacity of two libraries approaches the upper limit and continues to expand horizontally, perform the following operation: add two new libraries: DB2 and DB3. Divide the libraries by ID% 4. The remainders 0, 1, 2, and 3 correspond to the subscripts of DB, respectively T0 and t1 remain unchanged, migrate the entire DB0. t0_1 table to DB2; Transfer the entire DB1.t1_1 table to DB3 20M-40M data mod 4 and divide it into four tables: t0_2, t1_2, t2_2, t3_2, and place them in four separate libraries: database partitioning rule dbRule:

if(id < 2千万){
     //2千万之前的数据，4个表分别放到4个库
     if(id < 1千万){
         return "db"+  (id % 2);     //原t0表仍在db0, t1表仍在db1
     }else{
         return "db"+ ((id % 2) +2); //原t0_1表从db0搬迁到db2; t1_1表从db1搬迁到db3
     }
 }else if(id < 4千万){
     return "db"+ (id % 4);          //超过2千万的数据，平均分到4个库
 }else{
     throw new IllegalArgumentException("id out of range. id:"+id);
 }

分表规则tbRule:

   if(id < 2千万){        //2千万之前的数据，表规则和原先完全一样，参见阶段二
    if(id < 1千万){
        return "t"+ (id % 2);       //1千万之前的数据，仍然放在t0和t1表
    }else{
        return "t"+ (id % 2) +"_1"; //1千万之后的数据，仍然放在t0_1和t1_1表
    }
}else if(id < 4千万){
    return "t"+ (id % 4)+"_2";      //超过2千万的数据分为4个表t0_2，t1_2，t2_2，t3_2
}else{
    throw new IllegalArgumentException("id out of range. id:"+id);
}
     dbRule: “DB” + (id % 4) tbRule: “t”  + (id % 4)