[SOLVED] Random peaks in Mysql load slowing all users

Issue

We have a site that has been working pretty well for the past 2 years. But we are actually seeing random peaks in the database load that make the site very slow for a few seconds.

These peaks only appear from a certain load on the server and are impossible to predict. More users = more peaks. Everything run very smoothly outside of those peaks (page load is < 300ms). CPU and RAM are not impacted by those peaks.

Spikes are especially visible in db connections were it can go from 100 connections to 1000 connections for 2 or 3 seconds. Then back to normal.

We have nothing in the PHP log, nothing in the slow query log (long_query_time = 0.1).

Server : Debian / MariaDB 10.3.31, Apache 2.4.38, PHP 7.3.31 All tables are InnoDB with primary keys. Connection by socket. Codeigniter 4.1.7. Redis cache.

What we already try :

Reboot the server / Restart Mysql

Slow query log with long_query_time = 0 for 24h then pt-query-digest on the result. Everything is ok.

General log for 3h when heavy traffic then pt-query-digest on the result. Everything is ok.

Explain on each request of the logs. Everything looks fine.

We no longer know where to look to find the source of the problem.

Additional info:

Environment : VMware virtual machine | CPU : 16x Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz | RAM : 31.39 GiB | Disks : SSDs via SAN network

SHOW VARIABLES : https://pastebin.com/fx99mrdt

SHOW GLOBAL STATUTS : https://pastebin.com/NY1PKqpp

SHOW ENGINE INNODB STATUS : https://pastebin.com/bNcKKTYN

MYSQL TUNNER : https://pastebin.com/8gx9Qp1j

EDIT 1:

EXPLAIN UPDATE *******.eleves_tchat
     SET lu = 'true'
     WHERE id_etablissement = '266'
       AND id_eleve = '512385'
       AND auteur = 'enseignant'
       AND lu = 'false';

 id | select_type | table        | type        | possible_keys                                                      | key                                                             | key_len | ref  | rows | Extra
----|-------------|--------------|-------------|--------------------------------------------------------------------|-----------------------------------------------------------------|---------|------|------|-----------------------------------------------------------------------------------------------
 1  | SIMPLE      | eleves_tchat | index_merge | fk_te_eleves_tchat_id_etablissement,lu,fk_te_eleves_tchat_id_eleve | fk_te_eleves_tchat_id_eleve,fk_te_eleves_tchat_id_etablissement | 4,4     | NULL | 1    | Using intersect(fk_te_eleves_tchat_id_eleve,fk_te_eleves_tchat_id_etablissement); Using where

EDIT 2:

SHOW CREATE TABLE eleves_tchat;

CREATE TABLE `eleves_tchat` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `date` datetime NOT NULL,
 `id_etablissement` int(11) NOT NULL,
 `id_eleve` int(11) NOT NULL,
 `auteur` enum('eleve','enseignant') NOT NULL,
 `message` text NOT NULL,
 `lu` enum('false','true') NOT NULL,
 PRIMARY KEY (`id`),
 KEY `fk_te_eleves_tchat_id_etablissement` (`id_etablissement`),
 KEY `lu` (`lu`),
 KEY `fk_te_eleves_tchat_id_eleve` (`id_eleve`),
 CONSTRAINT `fk_te_eleves_tchat_id_eleve` FOREIGN KEY (`id_eleve`) REFERENCES `eleves` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
 CONSTRAINT `fk_te_eleves_tchat_id_etablissement` FOREIGN KEY (`id_etablissement`) REFERENCES `mydomain_common`.`etablissements` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=610190 DEFAULT CHARSET=latin1


SHOW TABLE STATUS FROM mydomain_enseignant WHERE name = 'eleves_tchat'

+--------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+------------------+------------------+------------+-------------------+----------+----------------+---------+------------------+-----------+
|     Name     | Engine | Version | Row_format | Rows  | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment |   Create_time    |   Update_time    | Check_time |     Collation     | Checksum | Create_options | Comment | Max_index_length | Temporary |
+--------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+------------------+------------------+------------+-------------------+----------+----------------+---------+------------------+-----------+
| eleves_tchat | InnoDB |      10 | Dynamic    | 64917 |            105 |     6832128 |               0 |      4374528 |   4194304 |         610189 | 17/01/2022 10:10 | 07/03/2022 10:29 | NULL       | latin1_swedish_ci | NULL     |                |         |                0 | N         |
+--------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+------------------+------------------+------------+-------------------+----------+----------------+---------+------------------+-----------+

EDIT 3:

With composite index and OPTIMIZE’d eleves_tchat table.

EXPLAIN UPDATE mydomain_enseignant.eleves_tchat SET lu = 'true' WHERE id_etablissement = '266' AND id_eleve = '512385' AND auteur = 'enseignant' AND lu = 'false';

+----+-------------+--------------+-------+-------------------------------------------------------+-------------------------------------+---------+------+------+---------------------------+
| id | select_type |    table     | type  |                     possible_keys                     |                 key                 | key_len | ref  | rows |           Extra           |
+----+-------------+--------------+-------+-------------------------------------------------------+-------------------------------------+---------+------+------+---------------------------+
|  1 | SIMPLE      | eleves_tchat | range | fk_te_eleves_tchat_id_eleve,id_etablissement_id_el... | id_etablissement_id_eleve_auteur_lu |      10 | NULL |    1 | Using where; Using buffer |
+----+-------------+--------------+-------+-------------------------------------------------------+-------------------------------------+---------+------+------+---------------------------+

SHOW TABLE STATUS FROM mydomain_enseignant WHERE name = 'eleves_tchat'

+--------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+------------------+------------------+------------+-------------------+----------+----------------+---------+------------------+-----------+
|     Name     | Engine | Version | Row_format | Rows  | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment |   Create_time    |   Update_time    | Check_time |     Collation     | Checksum | Create_options | Comment | Max_index_length | Temporary |
+--------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+------------------+------------------+------------+-------------------+----------+----------------+---------+------------------+-----------+
| eleves_tchat | InnoDB |      10 | Dynamic    | 62117 |            126 |     7880704 |               0 |      3178496 |   4194304 |         611015 | 07/03/2022 18:09 | 07/03/2022 18:20 | NULL       | latin1_swedish_ci | NULL     |                |         |                0 | N         |
+--------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+------------------+------------------+------------+-------------------+----------+----------------+---------+------------------+-----------+

EDIT 4:

We are not using transactions on our app. ROLLBACKs are caused by the check_mysql_health plugin for Nagios that monitors the database by sending regular requests like this:

35211159 Query  SET autocommit=0
35211159 Query  SHOW VARIABLES LIKE 'version'
35211159 Query  SHOW STATUS LIKE 'Uptime'
35211159 Query  SHOW VARIABLES LIKE 'have_innodb'
35211159 Query  SHOW /*!50000 global */ STATUS LIKE 'Innodb_buffer_pool_wait_free'
35211159 Query  ROLLBACK
35211159 Quit

EDIT 5:

Since the initial question, we update our config with :

innodb_io_capacity = 1000
innodb_flush_neighbors = 0

Spikes are always here, so here is the pt-query-digest of a one day slow.log with long_query_time = 0. Some queries are taking 30s+, but only during our daily sql dump backup.

pt_query_slow_log : https://pastebin.com/hKvz37ca

EDIT 6:

SHOW CREATE TABLE PASSATIONS;

CREATE TABLE `passations` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `id_eleve` int(11) NOT NULL,
 `id_module` tinyint(2) NOT NULL,
 `type_seance` enum('normale','inversee','tutore') NOT NULL,
 `id_item` int(11) NOT NULL,
 `date` date NOT NULL,
 `heure` time NOT NULL,
 `format_question` enum('qcm','ouvert') NOT NULL,
 `type_question` enum('preparatoire','principale') NOT NULL,
 `duree` smallint(5) unsigned NOT NULL,
 `score` tinyint(1) NOT NULL,
 `correction` enum('0','1') NOT NULL,
 `premier` tinyint(1) NOT NULL,
 `num_reponse` tinyint(1) NOT NULL,
 `reponses` varchar(255) NOT NULL,
 `justification` varchar(2000) NOT NULL,
 `age` float NOT NULL,
 `methode` enum('evaluation','entrainement') NOT NULL,
 `niveau` tinyint(2) NOT NULL,
 `num_eval` tinyint(1) NOT NULL,
 `timestamp_entrainement` bigint(20) unsigned NOT NULL,
 `mots_bleus` enum('false','true') NOT NULL,
 `question_preparatoire` enum('false','true') NOT NULL,
 `lecture_audio` enum('false','true') NOT NULL,
 `id_binome` int(11) NOT NULL,
 `font_size` enum('14','16','18','20') NOT NULL,
 `line_height` enum('25','30','35') NOT NULL,
 `letter_spacing` enum('0','1','2') NOT NULL,
 `word_spacing` enum('0','5','10','15') NOT NULL,
 `font_family` enum('1','2','3','4') NOT NULL,
 `lire_couleur` enum('normal','syllabes','phonemes','muettes') NOT NULL,
 `phonemes` text NOT NULL,
 `expe` varchar(10) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `fk_te_passations_id_item` (`id_item`),
 KEY `fk_te_passations_id_module` (`id_module`),
 KEY `fk_te_passations_id_eleve` (`id_eleve`),
 KEY `id_eleve_id_module_premier_methode_type_seance_type_question` (`id_eleve`,`id_module`,`premier`,`methode`,`type_seance`,`type_question`) USING BTREE,
 CONSTRAINT `fk_te_passations_id_eleve` FOREIGN KEY (`id_eleve`) REFERENCES `eleves` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
 CONSTRAINT `fk_te_passations_id_item` FOREIGN KEY (`id_item`) REFERENCES `my_domain_common`.`items` (`id`) ON UPDATE CASCADE,
 CONSTRAINT `fk_te_passations_id_module` FOREIGN KEY (`id_module`) REFERENCES `my_domain_common`.`modules` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=81067688 DEFAULT CHARSET=latin1

EXPLAIN SELECT MAX(`id`) AS `id` FROM my_domain_enseignant.passations WHERE `id_eleve` IN ('499613','499611','499612','499614', '499615','499616','499617','499618','499619','499620', '499621','499622','499623','499624','499625','499626', '499627','499628','499629','499630','499631','499632', '499633','499634' ) AND `id_module` = '1' AND `type_seance` = 'normale' AND `type_question` = 'principale' AND `methode` = 'entrainement' AND `premier` = 1
  
+----+-------------+------------+-------+-------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------+---------+------+------+--------------------------+
| id | select_type |   table    | type  |                                                   possible_keys                                                   |                             key                              | key_len | ref  | rows |          Extra           |
+----+-------------+------------+-------+-------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | passations | range | fk_te_passations_id_module,fk_te_passations_id_eleve,id_eleve_id_module_premier_methode_type_seance_type_question | id_eleve_id_module_premier_methode_type_seance_type_question |       9 | NULL |  910 | Using where; Using index |
+----+-------------+------------+-------+-------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------+---------+------+------+--------------------------+

EDIT 7:

I wanted to illustrate what’s happenning when we have ~500 online users. On image 1, everything is ok, average page load is under 300ms. And then you will see a big spike in image 2, slowing down every users and making page load over 3 seconds.

Everything is ok ——-> Then BOOM!

Solution

We have probably found the cause of our problem. This seems to be related to the locking mechanism of PHP sessions when using the Redis Handler in Codeigniter 4 with concurrent Ajax requests.

Since we moved from Redis to File sessions, the spikes no longer appears and everything run smoothly.

For more information, see this opened issue on Codeigniter 4 github : https://github.com/codeigniter4/CodeIgniter4/issues/4391

Many thanks to everyone who helped!

Answered By – matrox445

Answer Checked By – Terry (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.