Skip to content

Independent Testing of CSD 2000 PostgreSQL Performance by Tang Cheng

Tang Cheng, also known online as osdba, is the author of Guide to Expert, The Pragmatic PostgreSQL, and the co-founder of CSU DATA. He has over 20 years of industry experience …

About the Author

Tang Cheng, also known online as osdba, is the author of Guide to Expert, The Pragmatic PostgreSQL, and the co-founder of CSU DATA. He has over 20 years of industry experience and been engaged in the PostgreSQL ecosystem for over 10 years. He has more than 10 years of experience in database, operating systems and storage. He had worked as a technical expert at NetEase Research and a senior database expert at Alibaba. He had been engaged in the architectural design, operation and maintenance of Alibaba’s PostgreSQL and Greenplum databases. He had worked on the maintenance and expansion of several clusters over 100 TB for Greenplum and solved many difficult problems of PostgreSQL and Greenplum.

1. Key Article Points

  • When the [PostgreSQL] fillfactor parameter is set to 70%, the use of ScaleFlux CSD 2000 Series’ transparent compression function allows the actual occupied space to increase 8%, while improving the performance tested by pgbench by nearly 40%.
  • In a separate capacity test, the transparent compression function of CSD 2000 allowed space occupation to only reach one-sixth of the original space, a huge storage savings.

2. Test Scenario

I previously used the CSS 1000 Series, which has bypass compression as a feature. This function allows hardware compression, greatly saving the CPU. However, it usually required code modification; that is, changing the gzip compression library originally called in the code to the compression library provided by ScaleFlux. The end result was useful but not widely applicable.

Recently, I heard that the new product CSD 2000 has transparent compression. I thought that this function is exactly what we expect from our open source database. Therefore, I instantly applied for a sample card from the company for testing.

I installed the card on my R730xd test machine and followed the official instructions to install the driver (my operating system is CentOS7.6). Here is the output:

[[email protected] ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 558.4G 0 disk
├─sdb2 8:18 0 20G 0 part /
├─sdb3 8:19 0 16G 0 part [SWAP]
└─sdb1 8:17 0 50G 0 part
sfdv0n1 250:0 0 8.7T 0 disk /data

The /dev/sfdv0n1 is the CSD 2000 card. To see the effects of transparent compression, I installed the originally borrowed CSS 1000 card on another Dell R730xd machine to run a comparison test. Initially, the physical capacity of the CSD 2000 is 3.2 TB, which is made into about 9.6 TB of logical capacity, three times higher. I formatted this file system prepare for the test:

mkfs -t ext4 /dev/sfdv0n1
mount -o discard /dev/sfdv0n1 /data
chown postgres.postgres /data
su – postgres
mkdir /data/pgdata

PostgreSQL database version 11.6 is used and initialized with initdb, where the main configuration parameters in postgresql.conf are:

listen_addresses = ‘*’
max_connections = 1000
unix_socket_directories = ‘/tmp’
shared_buffers = 128MB
max_wal_size = 10GB
min_wal_size = 8GB
vacuum_cost_limit = 10000\
vacuum_cost_delay = 1

Keep the default values of the other parameters. I created the data directory of the database under /data/pgdata, and ran pgbench for testing to establish 1 billion records. The estimated data size is over 150 GB.

3. Actual Test

3.1 Ordinary test

3.1.1 Test of a CSD 2000 card

[[email protected] ~]$ time pgbench -i -s 10000
dropping old tables…
NOTICE: table “pgbench_accounts” does not exist, skipping
NOTICE: table “pgbench_branches” does not exist, skipping
NOTICE: table “pgbench_history” does not exist, skipping
NOTICE: table “pgbench_tellers” does not exist, skipping
creating tables…
generating data…
100000 of 1000000000 tuples (0%) done (elapsed 0.09 s, remaining 878.83 s)
200000 of 1000000000 tuples (0%) done (elapsed 0.18 s, remaining 884.39 s)
300000 of 1000000000 tuples (0%) done (elapsed 0.27 s, remaining 908.15 s)
400000 of 1000000000 tuples (0%) done (elapsed 0.38 s, remaining 959.26 s)


999700000 of 1000000000 tuples (99%) done (elapsed 927.94 s, remaining 0.28 s)
999800000 of 1000000000 tuples (99%) done (elapsed 928.02 s, remaining 0.19 s)
999900000 of 1000000000 tuples (99%) done (elapsed 928.12 s, remaining 0.09 s)
1000000000 of 1000000000 tuples (100%) done (elapsed 928.19 s, remaining 0.00 s)
vacuuming…
creating primary keys…
done.

real 30m13.212s
user 3m49.150s
sys 0m5.648s

It takes 30 minutes to create data.

The space occupied by database layers:

[[email protected] data]# du -sh pgdata
158G pgdata

The csd-size.sh tool provided by ScaleFlux is used to check the actual occupied space:

[[email protected] ~]# ./csd-size.sh
Usage:
csd-size.sh csd_dev_name
Example:
csd-size.sh sfdv0n1
No device name given, default to use name “sfdv0n1”.

Device – /dev/sfdv0n1
Total capacity – 2977 GiB
Used space – 25 GiB
Free space – 2952 GiB
Logical data size – 227 GiB
Compression ratio – 8.94 (logical data size / used space)

The actually occupied space is only 25 GB, that is to say, 158 GB data has been compressed to 25 GB. The compression ratio is ~6.3 times, which is quite considerable.

Run a compression test on the CSD 2000 card:

[[email protected] ~]$ pgbench -j 128 -c 128 -T 300
starting vacuum…end.
transaction type: [builtin: TPC-B (sort of)]
scaling factor: 10000
query mode: simple
number of clients: 128
number of threads: 128
duration: 300 s
number of transactions actually processed: 9059584
latency average = 4.239 ms
tps = 30196.497967 (including connections establishing)
tps = 30198.786610 (excluding connections establishing)

3.1.2 CSS 1000 card

Also, I tested a CSS 1000 card on another machine:


[[email protected] pgdata]$ time pgbench -i -s 10000
dropping old tables…
NOTICE: table “pgbench_accounts” does not exist, skipping
NOTICE: table “pgbench_branches” does not exist, skipping
NOTICE: table “pgbench_history” does not exist, skipping
creating tables…
generating data…
100000 of 1000000000 tuples (0%) done (elapsed 0.07 s, remaining 658.03 s)
200000 of 1000000000 tuples (0%) done (elapsed 0.16 s, remaining 796.76 s)



999700000 of 1000000000 tuples (99%) done (elapsed 917.29 s, remaining 0.28 s)
999800000 of 1000000000 tuples (99%) done (elapsed 917.38 s, remaining 0.18 s)
999900000 of 1000000000 tuples (99%) done (elapsed 917.48 s, remaining 0.09 s)
1000000000 of 1000000000 tuples (100%) done (elapsed 917.57 s, remaining 0.00 s)
vacuuming…
creating primary keys…
done.

real 24m39.065s
user 3m45.821s
sys 0m5.443s

It takes 24 minutes to create data on the CSS 1000 card, faster than that on the CSD 2000 card.

Next, I ran a compression test on the CSS 1000 card:

[[email protected] data]$ time pgbench -j 128 -c 128 -T 300
starting vacuum…end.
transaction type:
scaling factor: 10000
query mode: simple
number of clients: 128
number of threads: 128
duration: 300 s
number of transactions actually processed: 8590121
latency average = 4.471 ms
tps = 28629.717270 (including connections establishing)
tps = 28631.618435 (excluding connections establishing)

real 5m0.382s
user 8m59.223s
sys 11m13.547s

The pgbench performance (30,000 TPS) of the CSD 2000 card is found better than that (about 29,000 TPS) of the CSS 1000 card.

3.2 Adjust the test of fillfactor

We have thus seen that we can achieve huge space savings with no noticeable change in performance. If more performance is desired, ScaleFlux engineers suggested that by adjusting the fillfactor parameter in the PostgreSQL table, you could greatly improve TPS without increasing space occupancy.

Fillfactor is 100% by default, but we can adjust it to 70%, which improves PostgreSQL’s writing performance. Without the CSD 2000 card, there will be a corresponding increase in space when the fillfactor parameter is turned down. However, as CSD 2000’s transparent compression function is adopted, even if the fillfactor parameter is turned down, the actual occupied space after compression will not increase too much.

We still need to take a look at the actual test to figure out the actual effect even based on the theory:

[[email protected] ~]$ time pgbench -i -s 10000 -F 70
dropping old tables…
999900000 of 1000000000 tuples (99%) done (elapsed 1027.00 s, remaining 0.10 s)
1000000000 of 1000000000 tuples (100%) done (elapsed 1027.10 s, remaining 0.00 s)
vacuuming…
creating primary keys…
done.

real 35m10.789s
user 3m51.888s
sys 0m5.489s

It takes 35 minutes to create the data, a little longer than the previous 30 minutes. Let’s check the space occupancy:

[[email protected] ~]# ./csd-size.sh
Usage:
csd-size.sh csd_dev_name
Example:
csd-size.sh sfdv0n1

No device name given, default to use name “sfdv0n1”.
Device – /dev/sfdv0n1
Total capacity -2977 GiB
Used space -27 GiB
Free space -2950 GiB
Logical data size – 277 GiB
Compression ratio – 10.08 (logical data size / used space)

The space is 27 GB, only 2 GB higher than the original 25 GB, far less than the 22% growth in logical data size, which does validate the previous theory. Then let’s take a look at the pgbench performance:

[[email protected] ~]$ pgbench -j 128 -c 128 -T 60
starting vacuum…end.
transaction type:
scaling factor: 10000
query mode: simple
number of clients: 128
number of threads: 128
duration: 60 s
number of transactions actually processed: 2508991
latency average = 3.063 ms
tps = 41787.598119 (including connections establishing)
tps = 41803.617591 (excluding connections establishing)

TPS was found to have skyrocketed from 30,000 to 42,000, up nearly 40% in performance.

Let’s make another brave move by reducing fillfactor to 50%:

[[email protected] ~]$ time pgbench -i -s 10000 -F 50
dropping old tables…
creating tables…
generating data…
100000 of 1000000000 tuples (0%) done (elapsed 2.71 s, remaining 27112.63 s)
200000 of 1000000000 tuples (0%) done (elapsed 2.84 s, remaining 14181.00 s)


999700000 of 1000000000 tuples (99%) done (elapsed 1168.05 s, remaining 0.35 s)
999800000 of 1000000000 tuples (99%) done (elapsed 1168.16 s, remaining 0.23 s)
999900000 of 1000000000 tuples (99%) done (elapsed 1168.27 s, remaining 0.12 s)
1000000000 of 1000000000 tuples (100%) done (elapsed 1168.39 s, remaining 0.00 s)
vacuuming…
creating primary keys…
done.

real 42m19.784s
user 3m53.464s
sys 0m5.717s

Let’s check the space occupancy after compression:

[[email protected] ~]# ./csd-size.sh
Usage:
csd-size.sh csd_dev_name
Example:
csd-size.sh sfdv0n1

No device name given, default to use name “sfdv0n1”.
Device – /dev/sfdv0n1
Total capacity – 2977 GiB
Used space – 30 GiB
Free space – 2947 GiB
Logical data size – 355 GiB
Compression ratio – 11.80 (logical data size / used space)

The actual occupied space rose to 30 GB. Let’s view the space size on the file system:

[[email protected] data]# du -sh pgdata
286G pgdata

The space size is 286 GB.

Let’s figure out the test results by pgbench:

[[email protected] ~]$ pgbench -j 128 -c 128 -T 300
starting vacuum…end.
transaction type: [builtin: TPC-B (sort of)]
scaling factor: 10000
query mode: simple
number of clients: 128
number of threads: 128
duration: 300 s
number of transactions actually processed: 13208072
latency average = 2.908 ms
tps = 44012.763817 (including connections establishing)
tps = 44016.173486 (excluding connections establishing)

You can see that TPS hits 44,000, up a bit compared with the previous fillfactor = 70%, but not too much. So, it seems appropriate to set the fillfactor to 70%, as 50% sees diminishing returns.

4. Test Conclusions

Using CSD2000 for PostgreSQL not only saves space (pgbench results suggest 6x space savings), but also improves performance. (TPC-B results show up to 30 ~ 40% more TPS)

5. Monitoring Used Space

Someone asked, “If a card is used as 9.6 TB when the actual physical capacity is 3.2 TB, would there be a situation where the actual space is not available because the compression ratio is not as high as three times, but the card shows that there is still plenty of free space available, leading to serious problems?”

Thankfully, this should not come up because ScaleFlux has anticipated the issue. After installing the card driver, there is an internal capacity management program [Space Balancer] that allows the space occupancy ratio seen by “df -h” to be correct. The space occupancy ratio seen by “df -h” will also be 80% when the actual space occupancy ratio reaches 80%, so that the monitoring program can give an accurate alarm. However, it is important to note that the CSD 2000 disk cannot be managed by LVM, or else the capacity management program may not function properly. In addition, be sure to add the “discard” option when mounting the disk, just as follows:

mount -o discard /dev/sfdv0n1 /data

Otherwise the actual occupied space will not be released when a large file is deleted, which will lead to serious problems.

————————————-

—– End of Document —–

————————————-

This independent test of the CSD 2000 Series was originally published by Tang Cheng on WeChat on 2/6/2020. This document has been translated to English by ScaleFlux staff; all examples of bolded emphasis have been added by the editing team.

Translate »