前言在發(fā)生故障切換后,,經(jīng)常遇到的問題就是同步報(bào)錯(cuò),,下面是最近收集的報(bào)錯(cuò)信息。
記錄刪除失敗在master上刪除一條記錄,,而slave上找不到 Last_SQL_Error: Could not execute Delete_rows event on table hcy.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000006, end_log_pos 254 解決方法:master要刪除一條記錄,,而slave上找不到報(bào)錯(cuò),這種情況主都已經(jīng)刪除了,,那么從機(jī)可以直接跳過,。 stop slave; set global sql_slave_skip_counter=1; start slave;
如果這種情況很多,需要針對這種錯(cuò)誤專門寫相關(guān)腳本。
主鍵重復(fù)在slave已經(jīng)有該記錄,,又在master上插入了同一條記錄,。 Last_SQL_Error: Could not execute Write_rows event on table hcy.t1; Duplicate entry '2' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000006, end_log_pos 924
解決方法: 在slave上用desc hcy.t1; 先看下表結(jié)構(gòu): mysql> desc hcy.t1; +-------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+---------+------+-----+---------+-------+ | id | int(11) | NO | PRI | 0 | | | name | char(4) | YES | | NULL | | +-------+---------+------+-----+---------+-------+
刪除重復(fù)的主鍵 mysql> delete from t1 where id=2; Query OK, 1 row affected (0.00 sec)
mysql> start slave; Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G; …… Slave_IO_Running: Yes Slave_SQL_Running: Yes …… mysql> select * from t1 where id=2;
在master上和slave上再分別確認(rèn)一下。
更新丟失在master上更新一條記錄,,而slave上找不到,,丟失了數(shù)據(jù)。 Last_SQL_Error: Could not execute Update_rows event on table hcy.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000010, end_log_pos 794
解決方法: 在master上,,用mysqlbinlog 分析下出錯(cuò)的binlog日志在干什么,。 /usr/local/mysql/bin/mysqlbinlog --no-defaults -v -v --base64-output=DECODE-ROWS mysql-bin.000010 | grep -A '10' 794
#120302 12:08:36 server id 22 end_log_pos 794 Update_rows: table id 33 flags: STMT_END_F ### UPDATE hcy.t1 ### WHERE ### @1=2 /* INT meta=0 nullable=0 is_null=0 */ ### @2='bbc' /* STRING(4) meta=65028 nullable=1 is_null=0 */ ### SET ### @1=2 /* INT meta=0 nullable=0 is_null=0 */ ### @2='BTV' /* STRING(4) meta=65028 nullable=1 is_null=0 */ # at 794 #120302 12:08:36 server id 22 end_log_pos 821 Xid = 60 COMMIT/*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
在slave上,查找下更新后的那條記錄,,應(yīng)該是不存在的,。 mysql> select * from t1 where id=2; Empty set (0.00 sec)
然后再到master查看 mysql> select * from t1 where id=2; +----+------+ | id | name | +----+------+ | 2 | BTV | +----+------+ 1 row in set (0.00 sec)
把丟失的數(shù)據(jù)在slave上填補(bǔ),然后跳過報(bào)錯(cuò)即可,。 mysql> insert into t1 values (2,'BTV'); Query OK, 1 row affected (0.00 sec)
mysql> select * from t1 where id=2; +----+------+ | id | name | +----+------+ | 2 | BTV | +----+------+ 1 row in set (0.00 sec)
mysql> stop slave ;set global sql_slave_skip_counter=1;start slave; Query OK, 0 rows affected (0.01 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G; …… Slave_IO_Running: Yes Slave_SQL_Running: Yes ……
1236錯(cuò)誤, 二進(jìn)制文件缺失誤刪二進(jìn)制文件等各種原因,,導(dǎo)致主庫mysql-bin.000012文件丟失,從庫同步失敗,。 Master_Log_File: mysql-bin.000012 Slave_IO_Running: No Slave_SQL_Running: Yes Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'
首先停止從庫同步 slave stop;
查看主庫日志文件和位置 mysql> show master logs; +------------------+-----------+ | Log_name | File_size | +------------------+-----------+ | mysql-bin.000013 | 154 | +------------------+-----------+
回從庫,,使日志文件和位置對應(yīng)主庫 CHANGE MASTER TO MASTER_LOG_FILE='log-bin.000013',MASTER_LOG_POS=154;
最后,啟動從庫:
```bash slave start;
show slave status\G;
Master_Log_File: mysql-bin.000013 Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_IO_Error: ```
中繼日志損壞slave的中繼日志relay-bin損壞,。 Last_SQL_Error: Error initializing relay log position: I/O error reading the header from the binary log Last_SQL_Error: Error initializing relay log position: Binlog has bad magic number; It's not a binary log file that can be used by this version of MySQL
1,、手工修復(fù) 解決方法:找到同步的binlog和POS點(diǎn),然后重新做同步,,這樣就可以有新的中繼日值了,。 例子: mysql> show slave status\G; *************************** 1. row *************************** Master_Log_File: mysql-bin.000010 Read_Master_Log_Pos: 1191 Relay_Log_File: vm02-relay-bin.000005 Relay_Log_Pos: 253 Relay_Master_Log_File: mysql-bin.000010 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1593 Last_Error: Error initializing relay log position: I/O error reading the header from the binary log Skip_Counter: 1 Exec_Master_Log_Pos: 821
Slave_IO_Running :接收master的binlog信息 Master_Log_File Read_Master_Log_Pos
Slave_SQL_Running:執(zhí)行寫操作 Relay_Master_Log_File Exec_Master_Log_Pos
以執(zhí)行寫的binlog和POS點(diǎn)為準(zhǔn)。 Relay_Master_Log_File: mysql-bin.000010 Exec_Master_Log_Pos: 821
mysql> stop slave; Query OK, 0 rows affected (0.01 sec)
mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000010',MASTER_LOG_POS=821; Query OK, 0 rows affected (0.01 sec)
mysql> start slave; Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.8.22 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql-bin.000010 Read_Master_Log_Pos: 1191 Relay_Log_File: vm02-relay-bin.000002 Relay_Log_Pos: 623 Relay_Master_Log_File: mysql-bin.000010 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1191 Relay_Log_Space: 778 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error:
2,、Ibbackup 各種大招都用上了,,無奈slave數(shù)據(jù)丟失過多,ibbackup(需要銀子)該你登場了,。 Ibbackup熱備份工具,,是付費(fèi)的。xtrabackup是免費(fèi)的,,功能上一樣,。 Ibbackup備份期間不鎖表,備份時(shí)開啟一個(gè)事務(wù)(相當(dāng)于做一個(gè)快照),,然后會記錄一個(gè)點(diǎn),,之后數(shù)據(jù)的更改保存在ibbackup_logfile文件里,恢復(fù)時(shí)把ibbackup_logfile 變化的數(shù)據(jù)再寫入到ibdata里,。 Ibbackup 只備份數(shù)據(jù)( ibdata,、.ibd ),,表結(jié)構(gòu).frm不備份,。
MySQL主從復(fù)制常見故障及解決方法,?
1.1.1故障1:從庫數(shù)據(jù)與主庫沖突
show slave status; 報(bào)錯(cuò):且show slave status\G
Slave_I /O_Running :Yes
Slave_SQL_Running:No
Seconds_Behind_Master:NULL
Last_error:Error 'Can' t create database 'xiaoliu' ; database exists' on query. Default
database: 'xiaoliu' .query: 'create database xiaoliu'
解決方法1:
stop slave;
set global sql_slave_skip_counter = 1; #將同步指針向下移動一個(gè),如果多次不同步可以重復(fù)操作
解決方法2:在從庫配置文件中配置,,直接跳過不影響業(yè)務(wù)的錯(cuò)誤號
grep slave-skip /etc/my .cnf
slave-skip-errors = 1032,1062,1007
1.1.2故障2:MySQL主從復(fù)制延遲問題原因和解決方案
問題1:主庫的從庫太多,,導(dǎo)致復(fù)制延遲 從庫數(shù)量一般 3—5個(gè)為宜,要復(fù)制的節(jié)點(diǎn)過多,,導(dǎo)致復(fù)制延遲 問題2:從庫硬件配置比主庫差,,導(dǎo)致延遲 查看Master和Slave的配置,可能因?yàn)榕渲貌划?dāng)導(dǎo)致復(fù)制的延遲 問題3:慢SQL語句過多 假如一條語句執(zhí)行時(shí)間超過2秒,, 就需要進(jìn)行優(yōu)化進(jìn)行調(diào)整 問題4:主從復(fù)制設(shè)計(jì)問題 主從復(fù)制單線程,,如果主庫的寫入并發(fā)太大,來不及傳送到從庫就會導(dǎo)致延遲 更高版本的MySQL可以支持多線程復(fù)制,,門戶網(wǎng)站則會自己開發(fā)多線程同步功能 問題5:主從庫之間的網(wǎng)絡(luò)延遲 主從庫網(wǎng)卡,、網(wǎng)線、連接的交換機(jī)等網(wǎng)絡(luò)設(shè)備都可能成為復(fù)制的瓶頸 導(dǎo)致復(fù)制延遲,,另外跨公網(wǎng)主從復(fù)制很容易導(dǎo)致主從復(fù)制延遲 問題6:主庫讀寫壓力大,,導(dǎo)致復(fù)制延遲 主庫硬件要好一些,架構(gòu)前端要加buffer緩存層
|