Thanks to visit codestin.com
Credit goes to www.slideshare.net

PostgreSQL
Table Partitioning / Sharding
AmirReza Hashemi
PostgreSQL DataBase
Why PSQL?
● Open Source / Cross platform
● Reliability and Stability
● Extensible
● Designed for high volume environments
● Only PSQL has Inherited Tables
● …..
You work on a project that stores data in a
relational database.
The application gets deployed to production
and early on the performance is great,
selecting data from the database is snappy and
insert latency goes unnoticed.
Here’s a classic scenario.
Whats Problems!!!
Over a time period of days / weeks / months the
database starts to get bigger and queries slow
down.
- A Database Administrator (DBA) will
take a look and see that the database is
tuned.
- They offer suggestions to add certain
indexes,
- Move logging to separate disk partitions,
- Adjust database engine parameters and
verify that the database is healthy.
Potential solutions
This will buy you more time and may resolve
this issues to a degree.
At a certain point you realize the
data in the database is the
bottleneck.
There are various approaches that can help you
make your application and database run faster.
Let’s take a look at two of them:
- Table partitioning
- Sharding
Table Partitioning
The main idea :
You take one MASTER TABLE and split it
into many smaller tables
these smaller tables are called partitions or
child tables.
Table Partitioning
Master Table:
Also referred to as a Master Partition Table, this table is the template child tables are created from. This is a normal
table, but it doesn’t contain any data and requires a trigger.
Child Table:
These tables inherit their structure from the master table and belong to a single master table. The child tables
contain all of the data. These tables are also referred to as Table Partitions.
Partition Function:
A partition function is a Stored Procedure that determines which child table should accept a new record. The
master table has a trigger which calls a partition function.
Table Partitioning
Here’s a summary of what should be done:
- Create a master table
- Create a partition function
- Create a table trigger
Implementation
Constraint exclusion is a query optimization technique that improves performance for partitioned
tables :
SET constraint_exclusion = partition ;
Implementation
Performance Testing On Specified Date
--partition table
SELECT * FROM hashvalue_PT
WHERE hashtime = DATE '2008-08-01'
--non partition table
SELECT * FROM hashvalue WHERE
hashtime = DATE '2008-08-01'
When both contains 200 millions of
data, search on specified date,
partition table is more faster than
non-partition table about 144.45%
Search on specified date
“2008-08-01”
Records Retrieved = 741825
Partition Table = 359.61 seconds
Non Partition Table = 879.062
seconds
Performance Testing On Specified Date
Sharding
Sharding
Sharding is like partitioning. The
difference is that with traditional
partitioning, partitions are stored in
the same database while sharding
shards (partitions) are stored in
different servers.
PostgreSQL does not provide built-in tool for sharding. We will use citus which extends PostgreSQL
capability to do sharding and replication.
Sharding Installation
DB server1: 192.168.56.10 (Master)
DB Server2: 192.168.56.11 (Worker)
- Pkg install pg_citus
- root@DB:~ # grep shared_preload_libraries /var/db/postgres/data96/postgresql.conf
shared_preload_libraries = 'citus' # (change requires restart)
- root@DB:~ # grep listen_addresses /var/db/postgres/data96/postgresql.conf
isten_addresses = '*' # what IP address(es) to listen on;
- Echo “host all all 192.168.56.0/24 trust” >> /var/db/postgres/data96/pg_hba.conf
- service postgresql restart
- ONLY ON MASTER: root@DB:/var/db/postgres/data96 # cat pg_worker_list.conf
192.168.56.11 5432
- service postgresql reload
- postgres=# create extension citus;
CREATE EXTENSION
Sharding Installation
verify that the master is ready:
postgres=# SELECT * FROM master_get_active_worker_nodes();
node_name | node_port
---------------+-----------
192.168.56.11 | 5432
(1 row)
Sharding Installation
Every thing is going fine until now, so we can create on the master the
table to be sharded.
CREATE TABLE sales
(deptno int not null,
deptname varchar(20),
total_amount int,
CONSTRAINT pk_sales PRIMARY KEY (deptno)) ;
We need have inform Citus that data of table sales will be distributed
among MASTER and WORKER:
SELECT master_create_distributed_table('sales', 'deptno', 'hash');
Sharding Installation
In our example we are going to create one shard on each worker. We will
Specify
the table name : sales
total shard count : 2
replication factor : 1 –No replication
SELECT master_create_worker_shards(sales, 2, 1);
Sharding is done
Sharding result
insert into sales (deptno,deptname,total_amount) values (1,'french_dept',10000);
insert into sales (deptno,deptname,total_amount) values (2,'german_dept',15000);
insert into sales (deptno,deptname,total_amount) values (3,'china_dept',21000);
insert into sales (deptno,deptname,total_amount) values (4,'gambia_dept',8750);
insert into sales (deptno,deptname,total_amount) values (5,'japan_dept',12010);
insert into sales (deptno,deptname,total_amount) values (6,'china_dept',35000);
insert into sales (deptno,deptname,total_amount) values (7,'nigeria_dept',10000);
insert into sales (deptno,deptname,total_amount) values (8,'senegal_dept',33000);
Sharding Checking
Slide
Format
Arrange
Tools
Table
Add-ons
Help
All changes saved in Drive
Background...
Layout
Conclusion
Note that not all SQL commands are able to work on inheritance hierarchies. Commands that
are used for data querying, data modification, or schema modification (e.g., SELECT, UPDATE,
DELETE, most variants of ALTER TABLE, but not INSERT or ALTER TABLE ... RENAME) typically
default to including child tables and support the ONLY notation to exclude them. Commands
that do database maintenance and tuning (e.g., REINDEX, VACUUM) typically only work on
individual, physical tables and do not support recursing over inheritance hierarchies. The
respective behavior of each individual command is documented in its reference page (Reference
I, SQL Commands).
A serious limitation of the inheritance feature is that indexes (including unique constraints) and
foreign key constraints only apply to single tables, not to their inheritance children. This is true
on both the referencing and referenced sides of a foreign key constraint.
Conclusion
Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits:
Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single
partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the
heavily-used parts of the indexes fit in memory.
When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that
partition instead of using an index and random access reads scattered across the whole table.
Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE NO
INHERIT and DROP TABLE are both far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.
Seldom-used data can be migrated to cheaper and slower storage media.
The benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning
depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server.
Currently, PostgreSQL supports partitioning via table inheritance. Each partition must be created as a child table of a single parent table. The parent table
itself is normally empty; it exists just to represent the entire data set. You should be familiar with inheritance (see Section 5.9) before attempting to set up
partitioning.
END

PostgreSQL Table Partitioning / Sharding

  • 1.
    PostgreSQL Table Partitioning /Sharding AmirReza Hashemi
  • 2.
  • 3.
    Why PSQL? ● OpenSource / Cross platform ● Reliability and Stability ● Extensible ● Designed for high volume environments ● Only PSQL has Inherited Tables ● …..
  • 4.
    You work ona project that stores data in a relational database. The application gets deployed to production and early on the performance is great, selecting data from the database is snappy and insert latency goes unnoticed. Here’s a classic scenario. Whats Problems!!! Over a time period of days / weeks / months the database starts to get bigger and queries slow down.
  • 5.
    - A DatabaseAdministrator (DBA) will take a look and see that the database is tuned. - They offer suggestions to add certain indexes, - Move logging to separate disk partitions, - Adjust database engine parameters and verify that the database is healthy. Potential solutions This will buy you more time and may resolve this issues to a degree. At a certain point you realize the data in the database is the bottleneck. There are various approaches that can help you make your application and database run faster. Let’s take a look at two of them: - Table partitioning - Sharding
  • 6.
  • 7.
    The main idea: You take one MASTER TABLE and split it into many smaller tables these smaller tables are called partitions or child tables. Table Partitioning
  • 8.
    Master Table: Also referredto as a Master Partition Table, this table is the template child tables are created from. This is a normal table, but it doesn’t contain any data and requires a trigger. Child Table: These tables inherit their structure from the master table and belong to a single master table. The child tables contain all of the data. These tables are also referred to as Table Partitions. Partition Function: A partition function is a Stored Procedure that determines which child table should accept a new record. The master table has a trigger which calls a partition function. Table Partitioning
  • 9.
    Here’s a summaryof what should be done: - Create a master table - Create a partition function - Create a table trigger Implementation Constraint exclusion is a query optimization technique that improves performance for partitioned tables : SET constraint_exclusion = partition ;
  • 10.
  • 11.
    Performance Testing OnSpecified Date --partition table SELECT * FROM hashvalue_PT WHERE hashtime = DATE '2008-08-01' --non partition table SELECT * FROM hashvalue WHERE hashtime = DATE '2008-08-01' When both contains 200 millions of data, search on specified date, partition table is more faster than non-partition table about 144.45% Search on specified date “2008-08-01” Records Retrieved = 741825 Partition Table = 359.61 seconds Non Partition Table = 879.062 seconds
  • 12.
    Performance Testing OnSpecified Date
  • 13.
  • 14.
    Sharding Sharding is likepartitioning. The difference is that with traditional partitioning, partitions are stored in the same database while sharding shards (partitions) are stored in different servers. PostgreSQL does not provide built-in tool for sharding. We will use citus which extends PostgreSQL capability to do sharding and replication.
  • 15.
    Sharding Installation DB server1:192.168.56.10 (Master) DB Server2: 192.168.56.11 (Worker) - Pkg install pg_citus - root@DB:~ # grep shared_preload_libraries /var/db/postgres/data96/postgresql.conf shared_preload_libraries = 'citus' # (change requires restart) - root@DB:~ # grep listen_addresses /var/db/postgres/data96/postgresql.conf isten_addresses = '*' # what IP address(es) to listen on; - Echo “host all all 192.168.56.0/24 trust” >> /var/db/postgres/data96/pg_hba.conf - service postgresql restart - ONLY ON MASTER: root@DB:/var/db/postgres/data96 # cat pg_worker_list.conf 192.168.56.11 5432 - service postgresql reload - postgres=# create extension citus; CREATE EXTENSION
  • 16.
    Sharding Installation verify thatthe master is ready: postgres=# SELECT * FROM master_get_active_worker_nodes(); node_name | node_port ---------------+----------- 192.168.56.11 | 5432 (1 row)
  • 17.
    Sharding Installation Every thingis going fine until now, so we can create on the master the table to be sharded. CREATE TABLE sales (deptno int not null, deptname varchar(20), total_amount int, CONSTRAINT pk_sales PRIMARY KEY (deptno)) ; We need have inform Citus that data of table sales will be distributed among MASTER and WORKER: SELECT master_create_distributed_table('sales', 'deptno', 'hash');
  • 18.
    Sharding Installation In ourexample we are going to create one shard on each worker. We will Specify the table name : sales total shard count : 2 replication factor : 1 –No replication SELECT master_create_worker_shards(sales, 2, 1); Sharding is done
  • 19.
    Sharding result insert intosales (deptno,deptname,total_amount) values (1,'french_dept',10000); insert into sales (deptno,deptname,total_amount) values (2,'german_dept',15000); insert into sales (deptno,deptname,total_amount) values (3,'china_dept',21000); insert into sales (deptno,deptname,total_amount) values (4,'gambia_dept',8750); insert into sales (deptno,deptname,total_amount) values (5,'japan_dept',12010); insert into sales (deptno,deptname,total_amount) values (6,'china_dept',35000); insert into sales (deptno,deptname,total_amount) values (7,'nigeria_dept',10000); insert into sales (deptno,deptname,total_amount) values (8,'senegal_dept',33000);
  • 20.
  • 21.
    Conclusion Note that notall SQL commands are able to work on inheritance hierarchies. Commands that are used for data querying, data modification, or schema modification (e.g., SELECT, UPDATE, DELETE, most variants of ALTER TABLE, but not INSERT or ALTER TABLE ... RENAME) typically default to including child tables and support the ONLY notation to exclude them. Commands that do database maintenance and tuning (e.g., REINDEX, VACUUM) typically only work on individual, physical tables and do not support recursing over inheritance hierarchies. The respective behavior of each individual command is documented in its reference page (Reference I, SQL Commands). A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children. This is true on both the referencing and referenced sides of a foreign key constraint.
  • 22.
    Conclusion Partitioning refers tosplitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits: Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table. Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE NO INHERIT and DROP TABLE are both far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE. Seldom-used data can be migrated to cheaper and slower storage media. The benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server. Currently, PostgreSQL supports partitioning via table inheritance. Each partition must be created as a child table of a single parent table. The parent table itself is normally empty; it exists just to represent the entire data set. You should be familiar with inheritance (see Section 5.9) before attempting to set up partitioning.
  • 23.