"WARNING: Mismatch found between sl_table and pg_class." in Slony-I

Published on Tue, Apr 29, 2014

WARNING: Mismatch found between sl_table and pg_class. Slonik command REPAIR CONFIG may be useful to rectify this.
2014-04-26 07:32:54 PDT FATAL slon_node_health_check() returned false - fatal health problem!
REPAIR CONFIG may be helpful to rectify this problem

You see this WARNING message in logs and immediate stop of replication, if Slony has observed a mismatch of pg_class.oid and sl_table.tabreloid of a replicating table in a node. Because, by architecture slony holds all replicating objects OID information in its catalogs captured at configure time from pg_class.oid.

In which case pg_class.oid != sl_table.tabreloid ?
***
***Most cases, a node moved its place using pg_dump/pg_restore by causing objects OID to change.

To mimic the above WARNING message, I have used two node replication setup between two database on same cluster[5432] for few tables. (Refer here on how to setup Slony replication). Here’s the current OID information on slave node(demo database) for one of the object ‘dtest’:

demo=# select oid,relfilenode,relname from pg_class where relname='dtest';
  oid  | relfilenode | relname
-------+-------------+---------
 26119 |       26119 | detest
(1 row)
demo=# select tab_id,tab_reloid,tab_relname from _rf.sl_table where tab_relname='dtest';
 tab_id | tab_reloid | tab_relname
--------+------------+-------------
      2 |      26119 | dtest
(1 row)

Ok, ‘dtest’ OID 26119 stored in slony catalog in sl_table.tabreloid.(Slony schema _rf). Take the logical backup and restore of same demo database simply to change the object OID like below: (Remember, Slon process are stopped at this moment)

-bash-4.1$ pg_dump -Fc -p 5432 -U postgres demo >/tmp/demo93.dmp
-bash-4.1$ psql -c "alter database demo rename to demo_bk;"
-bash-4.1$ psql -c "create database demo;"
-bash-4.1$ pg_restore -Fc -p 5432 -U postgres -d demo /tmp/demo93.dmp
-bash-4.1$ psql -c "select oid,relfilenode,relname from pg_class where relname='dtest';"
  oid  | relfilenode | relname
-------+-------------+---------
 26640 |       26640 | dtest
(1 row)
-bash-4.1$ psql -c "select tab_id,tab_reloid,tab_relname from _rf.sl_table where tab_relname='dtest';"
 tab_id | tab_reloid | tab_relname
--------+------------+-------------
      2 |      26119 | dtest
(1 row)

Now, pg_class.oid of ‘dtest’ has changed to 26640 whereas sl_table.tab_reloid still reflects the old OID 26119. At this stage if we start slon process it essentially stops with WARNING message on mismatch of OID by running a query pg_class.oid = sl_table.tabreloid. On returning false result it won’t move ahead until its fixed. We can also call the function slon_node_health_check() explicitly for verification :

demo=# select _rf.slon_node_health_check();
WARNING:  table [id,nsp,name]=[1,a,public] - sl_table does not match pg_class/pg_namespace
WARNING:  table [id,nsp,name]=[2,dtest,public] - sl_table does not match pg_class/pg_namespace
WARNING:  table [id,nsp,name]=[3,movepage,public] - sl_table does not match pg_class/pg_namespace
WARNING:  Mismatch found between sl_table and pg_class.  Slonik command REPAIR CONFIG may be useful to rectify this.
 slon_node_health_check
------------------------
 f
(1 row)

We can fix this in two ways.

  1. Using Slonik command line utility with preamble script REPAIR CONFIGorĀ 
  2. Using Slony catalog function updatereloid() within psql terminal.

Method 1: Create preamble script as below and execute with slonik command. I would be using second method, its just for reference.

demo=# \o /tmp/repair_conf.slonik
demo=# select 'REPAIR CONFIG ( SET ID = '||set_id||', EVENT NODE = 1 );' FROM _rf.sl_set;
demo=# \o

Add nodes information at the beginning of the file "/tmp/repair_conf.slonik"

cluster name = rf;
node 1 admin conninfo = 'host=localhost dbname=postgres user=postgres port=5432 password=postgres';
node 2 admin conninfo = 'host=localhost dbname=demo  user=postgres port=5432 password=postgres';

 REPAIR CONFIG ( SET ID = 1, EVENT NODE = 2 );
 REPAIR CONFIG ( SET ID = 2, EVENT NODE = 2 );
 REPAIR CONFIG ( SET ID = 3, EVENT NODE = 2 );

-bash-4.1$ slonik /tmp/repair_conf.slonik

Method 2: Pass the table-set id and node information to a function:

demo=# select _rf.updatereloid(tab_set,2) from _rf.sl_table ;   
 updatereloid
--------------
            1
            1
            1
(3 rows)

Cool, lets check the OID information now on slave node (demo database) from pg_class and _slonycatalog.sl_table

-bash-4.1$  psql -d demo -c "select oid,relfilenode,relname from pg_class where relname='dtest';"
  oid  | relfilenode | relname
-------+-------------+---------
 26119 |       26119 | dtest
(1 row)

-bash-4.1$ psql -d demo -c "select tab_id,tab_reloid,tab_relname from _rf.sl_table where tab_relname='dtest';"
 tab_id | tab_reloid | tab_relname
--------+------------+-------------
      2 |      26119 | dtest
(1 row)

After the update, slony will begin syncing without any issues.
Thanks to Slony-I team.

–Raghav