Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MariaDB - utf8mb4 collation specifier in select query results in long query time #41367

@jojen-cz

Description

@jojen-cz

Steps to reproduce

  1. Use owncloud server few years and update from time to time, use file versions and trashbin
  2. Upgrade to using utf8mb4 charset, run and don't care for some time
  3. Observe long run and heavy cpu load during occ system:cron

Expected behaviour

occ system:cron should run lightning fast

Actual behaviour

Long running operation while heavy cpu load from mariadb processes

According to the guide ( https://doc.owncloud.com/server/next/admin_manual/configuration/database/linux_database_configuration.html ), db table charset result in utf8mb4 charset and utf8mb4_bin collation.

But running occ system:cron took a very long time...

> SHOW FULL PROCESSLIST;
| Id    | User        | Host             | db       | Command | Time | State                    | Info                                                                                                                                                                                                                                                                                                                                                                         | Progress |
...
||   105 | owncloud    | 172.23.0.1:59388 | owncloud | Query   |   17 | Sending data             | SELECT `fileid`, `storage`, `path`, `parent`, `name`,
                                `mimetype`, `mimepart`, `size`, `mtime`, `encrypted`,
                                `etag`, `permissions`, `checksum`
                        FROM `oc_filecache`
                        WHERE `storage` = '3' AND `name` COLLATE utf8mb4_general_ci LIKE 'MathNet.Numerics.5.0.0.v%.d1717775685'

As one can see, the query time is 17 seconds but can take more, like 40 seconds. And it forcess the collation to utf8mb4_general_ci.

The table has around 500k records. I tried to add index on the name column and for path to speed things up. Resulting table structure is like:

> show create table oc_filecache;
| Table        | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | oc_filecache | CREATE TABLE `oc_filecache` (
  `fileid` bigint(20) NOT NULL AUTO_INCREMENT,
  `storage` int(11) NOT NULL DEFAULT 0,
  `path` varchar(4000) COLLATE utf8mb4_bin DEFAULT NULL,
  `path_hash` varchar(32) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  `parent` bigint(20) NOT NULL DEFAULT 0,
  `name` varchar(250) COLLATE utf8mb4_bin DEFAULT NULL,
  `mimetype` int(11) NOT NULL DEFAULT 0,
  `mimepart` int(11) NOT NULL DEFAULT 0,
  `size` bigint(20) NOT NULL DEFAULT 0,
  `mtime` bigint(20) NOT NULL DEFAULT 0,
  `storage_mtime` bigint(20) NOT NULL DEFAULT 0,
  `encrypted` int(11) NOT NULL DEFAULT 0,
  `unencrypted_size` bigint(20) NOT NULL DEFAULT 0,
  `etag` varchar(40) COLLATE utf8mb4_bin DEFAULT NULL,
  `permissions` int(11) DEFAULT 0,
  `checksum` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
  PRIMARY KEY (`fileid`),
  UNIQUE KEY `fs_storage_path_hash` (`storage`,`path_hash`),
  KEY `fs_parent_name_hash` (`parent`,`name`),
  KEY `fs_storage_mimetype` (`storage`,`mimetype`),
  KEY `fs_storage_mimepart` (`storage`,`mimepart`),
  KEY `fs_storage_size` (`storage`,`size`,`fileid`),
  KEY `fs_parent_storage_size` (`parent`,`storage`,`size`),
  KEY `path_index` (`path`(512)),
  KEY `path_hash_index` (`path`(750)) USING HASH,
  KEY `name_index` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=2413871 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED |

then I tried fiddling with the query itself, requesting collation different than column's one looked weird:

SELECT `fileid`, `storage`, `path`, `parent`, `name`, `mimetype`, `mimepart`, `size`, `mtime`, `encrypted`, `etag`, `permissions`, `checksum` FROM `oc_filecache` WHERE `storage` = '3' AND `name` COLLATE utf8mb4_general_ci LIKE 'MathNet.Numerics.5.0.0.v%.d1717775685'

poor results - high time and database load

Set the collation to match the column:

SELECT `fileid`, `storage`, `path`, `parent`, `name`, `mimetype`, `mimepart`, `size`, `mtime`, `encrypted`, `etag`, `permissions`, `checksum` FROM `oc_filecache` WHERE `storage` = '3' AND `name` COLLATE utf8mb4_bin LIKE 'MathNet.Numerics.5.0.0.v%.d1717775685'

same poor results - high time and database load

No collation forcing:

SELECT `fileid`, `storage`, `path`, `parent`, `name`, `mimetype`, `mimepart`, `size`, `mtime`, `encrypted`, `etag`, `permissions`, `checksum` FROM `oc_filecache` WHERE `storage` = '3' AND `name` COLLATE LIKE 'MathNet.Numerics.5.0.0.v%.d1717775685'

swift response with little load

explaining the queries show the difference - only the last query uses index:

MariaDB [owncloud]> explain SELECT `fileid`, `storage`, `path`, `parent`, `name`,                                 `mimetype`, `mimepart`, `size`, `mtime`, `encrypted`,                                 `etag`, `permissions`, `checksum`                         FROM `oc_filecache`                         WHERE `storage` = '3' AND `name` COLLATE utf8mb4_general_ci LIKE 'MathNet.Numerics.5.0.0.v%.d1717775685';
+------+-------------+--------------+------+------------------------------------------------------------------------------+----------------------+---------+-------+--------+-------------+
| id   | select_type | table        | type | possible_keys                                                                | key                  | key_len | ref   | rows   | Extra       |
+------+-------------+--------------+------+------------------------------------------------------------------------------+----------------------+---------+-------+--------+-------------+
|    1 | SIMPLE      | oc_filecache | ref  | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size | fs_storage_path_hash | 4       | const | 316334 | Using where |
+------+-------------+--------------+------+------------------------------------------------------------------------------+----------------------+---------+-------+--------+-------------+
1 row in set (0.007 sec)

MariaDB [owncloud]> explain SELECT `fileid`, `storage`, `path`, `parent`, `name`,                                 `mimetype`, `mimepart`, `size`, `mtime`, `encrypted`,                                 `etag`, `permissions`, `checksum`                         FROM `oc_filecache`                         WHERE `storage` = '3' AND `name` COLLATE utf8mb4_bin LIKE 'MathNet.Numerics.5.0.0.v%.d1717775685';
+------+-------------+--------------+------+------------------------------------------------------------------------------+----------------------+---------+-------+--------+-------------+
| id   | select_type | table        | type | possible_keys                                                                | key                  | key_len | ref   | rows   | Extra       |
+------+-------------+--------------+------+------------------------------------------------------------------------------+----------------------+---------+-------+--------+-------------+
|    1 | SIMPLE      | oc_filecache | ref  | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size | fs_storage_path_hash | 4       | const | 316334 | Using where |
+------+-------------+--------------+------+------------------------------------------------------------------------------+----------------------+---------+-------+--------+-------------+
1 row in set (0.007 sec)

MariaDB [owncloud]> explain SELECT `fileid`, `storage`, `path`, `parent`, `name`,                                 `mimetype`, `mimepart`, `size`, `mtime`, `encrypted`,                                 `etag`, `permissions`, `checksum`                         FROM `oc_filecache`                         WHERE `storage` = '3' AND `name` LIKE 'MathNet.Numerics.5.0.0.v%.d1717775685';
+------+-------------+--------------+-------+-----------------------------------------------------------------------------------------+------------+---------+------+------+------------------------------------+
| id   | select_type | table        | type  | possible_keys                                                                           | key        | key_len | ref  | rows | Extra                              |
+------+-------------+--------------+-------+-----------------------------------------------------------------------------------------+------------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | oc_filecache | range | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,name_index | name_index | 1003    | NULL |    1 | Using index condition; Using where |
+------+-------------+--------------+-------+-----------------------------------------------------------------------------------------+------------+---------+------+------+------------------------------------+
1 row in set (0.011 sec)

I downloaded current source package ( https://download.owncloud.com/server/stable/owncloud-complete-20250311.zip ), and searched internals for collation processing:
while following files use the utf8mb4_bin collation:

\lib\private\DB\ConnectionFactory.php
\lib\private\Repair\Collation.php
\lib\private\Setup\MySQL.php

... these files use utf8mb4_general_ci collation:

\lib\private\DB\AdapterMySQL.php
\lib\private\DB\QueryBuilder\ExpressionBuilder\MySqlExpressionBuilder.php

So...

  • there is a mix of collations used in code I don't understand...
  • according to the experiment above it seems MariaDB doesn't like the specification of collation in the query (LIKE clause) at all
    • as a result the engine seems to ignore existing index
    • whether the requested collation matches column collation doesn't make a difference
  • I don't know if current collations in my DB are correct, but it seems OK by the conversion guide to 4-byte unicode
  • I didn't find any config option that can change the behavior
  • I didn't find any reports regarding this issue
  • Upgrading app (which is planned though) is not expected to bring solution as the mix of collations still appears in current source package.

Server configuration

Operating system:
Linux
Web server:
Apache (docker image, 10.13.4) , but seems same in current package (https://download.owncloud.com/server/stable/owncloud-complete-20250311.zip)
Database:
MariaDB (10.3.10-MariaDB-log)
PHP version:
7.4.3 (docker image, 10.13.4)
ownCloud version: (see ownCloud admin page)
10.13.4 (docker image, 10.13.4)
Updated from an older ownCloud or fresh install:
updated
Where did you install ownCloud from:
docker
Signing status (ownCloud 9.0 and above):
No errors have been found.

Are you using external storage, if yes which one: local/smb/sftp/...
no
Are you using encryption: yes/no
no
Are you using an external user-backend, if yes which one: LDAP/ActiveDirectory/Webdav/...
no

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions