Database sharding and table partitioning rules
##Sub database and sub table
Taken from: http://rdc.taobao.com/team/jm/archives/590
There are usually several goals:
- Data discreteness, data distributed across multiple databases, mods in multiple tables, round robin, dayofweek, dayofMonth
- Avoid hot data on a database table
- Large data migration during expansion
The following solution is based on whole table migration but does not involve row level data migration within the table
Horizontal scaling out scheme mode one
Stage 1: A library DB0 is divided into 4 tables, with id% 4 dbRule: “DB0” tbRule: “t”+(id% 4)
Phase 2: Add db1 library, relocate the entire table of t2 and t3 to db1 dbRule: “DB”+(id% 4)/2) tbRule: “t”+(id% 4)
Phase 3: Increase the DB2 and DB3 libraries, migrate the entire table from t1 to DB2, and migrate the entire table from t3 to DB3. For rule expression, we swap the names and positions of DB1 and DB2 through internal name mapping or other methods to obtain the following dbRule: “DB”+(id% 4) tbRule: “t”+(id% 4)
If further database design and deployment can achieve one hard drive for each table, then the expansion process only needs to remove a certain hard drive from the original machine and insert it into the new machine to complete the entire table migration! It can greatly shorten downtime. Specifically, on MySQL, the database can be used as a table. Start 4 database instances on a physical machine, multiply the machine each time, and directly migrate the database to the new machine. This way, the rules do not need to be changed from start to finish, always: dbRule: “DB”+(id% 4) tbRule: “t”+(id% 4)
The drawback is that logically, it always maintains 4 libraries and 4 tables, with one library for each table. During the process from one database to four databases, the data volume of a single table has been increasing. When the data volume of a single table exceeds a certain range, it may cause performance issues. For example, issues with indexing and cleaning historical data. In addition, when the number of reserved tables is exhausted and reaches the stage of 1 table per 4 physical libraries, it is inevitable to start expanding from the tables.
Horizontal scaling out scheme mode 2
Phase 1: One database, two table partitioning rules dbRule: “DB0” partitioning rule tbRule: “t”+(ID% 2)
Phase 2: When the data volume of a single database approaches 10 million and the data volume of a single table approaches 5 million, expansion is carried out (the data volume is only an example, and the specific expansion capacity depends on the database and actual pressure conditions): Add a database DB1 and migrate the entire DB0.t1 table to the new database DB1. Add one table to each database, and write the future 10M-20M data mod2 into these two tables: t0_1, t1_1: database rule dbRule: “DB”+(id% 2) table rule tbRule:
1 | if(id < 1千万){ |
In this way, the new data after 10M will be evenly distributed in DB0 and DB1; Inserting updates and querying hotspots can still be evenly distributed across each library. Each database contains both old data and constantly growing new data. The data of each table is still controlled below 5 million.
Stage 3: When the capacity of two libraries approaches the upper limit and continues to expand horizontally, perform the following operation: add two new libraries: DB2 and DB3. Divide the libraries by ID% 4. The remainders 0, 1, 2, and 3 correspond to the subscripts of DB, respectively T0 and t1 remain unchanged, migrate the entire DB0. t0_1 table to DB2; Transfer the entire DB1.t1_1 table to DB3 20M-40M data mod 4 and divide it into four tables: t0_2, t1_2, t2_2, t3_2, and place them in four separate libraries: database partitioning rule dbRule:
1 | if(id < 2千万){ |
分表规则tbRule:
1 | if(id < 2千万){ //2千万之前的数据,表规则和原先完全一样,参见阶段二 |