Same problem same WAL file: 016-08-16 02:58:28.984 CDT replicator@[unknown][17378:idle] 10.100.128.210(49089) walreceiver : ERROR: requested WAL segment 00000002000022F400000096 has already been removed The oldest WAL on the master is dated yesterday august 15 at 3:17pm. The backup started yesterday at 8:40pm, over 5 hours after the oldest WAL on the master as shown below: -rw-r--r-- 1 postgres postgres 33 May 12 2015 00000002.history -rw------- 1 postgres postgres 16777216 Aug 15 15:17 000000020000230F000000E2 -rw------- 1 postgres postgres 16777216 Aug 15 15:17 000000020000230F000000E3 -rw------- 1 postgres postgres 16777216 Aug 15 15:18 000000020000230F000000E4 -rw------- 1 postgres postgres 16777216 Aug 15 15:18 000000020000230F000000E5 -rw------- 1 postgres postgres 16777216 Aug 15 15:18 000000020000230F000000E6 -rw------- 1 postgres postgres 16777216 Aug 15 15:18 000000020000230F000000E7 -rw------- 1 postgres postgres 16777216 Aug 15 15:18 000000020000230F000000E8 -rw------- 1 postgres postgres 16777216 Aug 15 15:19 000000020000230F000000E9 -rw------- 1 postgres postgres 16777216 Aug 15 15:19 000000020000230F000000EA -rw------- 1 postgres postgres 16777216 Aug 15 15:19 000000020000230F000000EB -rw------- 1 postgres postgres 16777216 Aug 15 15:20 000000020000230F000000EC -rw------- 1 postgres postgres 16777216 Aug 15 15:20 000000020000230F000000ED -rw------- 1 postgres postgres 16777216 Aug 15 15:20 000000020000230F000000EE -rw------- 1 postgres postgres 16777216 Aug 15 15:20 000000020000230F000000EF -rw------- 1 postgres postgres 16777216 Aug 15 15:20 000000020000230F000000F0 -rw------- 1 postgres postgres 16777216 Aug 15 15:21 000000020000230F000000F1 -rw------- 1 postgres postgres 16777216 Aug 15 15:21 000000020000230F000000F2 -rw------- 1 postgres postgres 16777216 Aug 15 15:21 000000020000230F000000F3 -rw------- 1 postgres postgres 16777216 Aug 15 15:22 000000020000230F000000F4 -rw------- 1 postgres postgres 16777216 Aug 15 15:22 000000020000230F000000F5 -rw------- 1 postgres postgres 16777216 Aug 15 15:22 000000020000230F000000F6 --More-- After starting the slave, it went through all the WALs starting with: 2016-08-16 02:50:38.727 T00 INFO: get WAL segment 000000020000230E00000015 2016-08-16 02:50:38.898 CDT [unknown]@[unknown][15758:] [local] [unknown] : LOG: connection received: host=[local] 2016-08-16 02:50:38.898 CDT postgres@postgres[15758:] [local] [unknown] : FATAL: the database system is starting up 2016-08-16 02:50:39.053 T00 INFO: archive-get stop 2016-08-16 02:50:39.095 CDT @[15744:] : LOG: restored log file "000000020000230E00000015" from archive 2016-08-16 02:50:39.165 CDT @[15744:] : LOG: redo starts at 230E/15002B38 Here is the snippet at the end where the error first occurs 2016-08-16 02:56:02.908 T00 INFO: archive-get start: --buffer-size=8388608 --compress --db-path=/pgdata/lenderx --log-level-console=info --log-level-file=info --log-path=/pgarchive/pgbackrest/log/lenderx --repo-path=/pgarchive/pgbackrest --stanza=lenderx 2016-08-16 02:56:02.908 T00 INFO: get WAL segment 000000020000230F0000008E 2016-08-16 02:56:03.350 T00 INFO: archive-get stop 2016-08-16 02:56:03.393 CDT @[15744:] : LOG: restored log file "000000020000230F0000008E" from archive 2016-08-16 02:56:03.521 CDT [unknown]@[unknown][17049:] 10.100.128.210(49031) [unknown] : LOG: connection received: host=10.100.128.210 port=49031 2016-08-16 02:56:03.650 CDT replicator@[unknown][17049:authentication] 10.100.128.210(49031) [unknown] : LOG: replication connection authorized: user=replicator SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, compression=on) 2016-08-16 02:56:03.708 CDT replicator@[unknown][17049:idle] 10.100.128.210(49031) walreceiver : ERROR: requested WAL segment 00000002000022F400000096 has already been removed 2016-08-16 02:56:03.736 CDT replicator@[unknown][17049:idle] 10.100.128.210(49031) walreceiver : LOG: disconnection: session time: 0:00:00.214 user=replicator database= host=10.100.128.210 port=49031 2016-08-16 02:56:03.838 T00 INFO: archive-get start: --buffer-size=8388608 --compress --db-path=/pgdata/lenderx --log-level-console=info --log-level-file=info --log-path=/pgarchive/pgbackrest/log/lenderx --repo-path=/pgarchive/pgbackrest --stanza=lenderx 2016-08-16 02:56:03.839 T00 INFO: get WAL segment 000000020000230F0000008F 2016-08-16 02:56:04.198 T00 INFO: archive-get stop 2016-08-16 02:56:04.235 CDT @[15744:] : LOG: restored log file "000000020000230F0000008F" from archive 2016-08-16 02:56:04.735 T00 INFO: archive-get start: --buffer-size=8388608 --compress --db-path=/pgdata/lenderx --log-level-console=info --log-level-file=info --log-path=/pgarchive/pgbackrest/log/lenderx --repo-path=/pgarchive/pgbackrest --stanza=lenderx 2016-08-16 02:56:04.736 T00 INFO: get WAL segment 000000020000230F00000090 2016-08-16 02:56:05.069 T00 INFO: archive-get stop 2016-08-16 02:56:05.110 CDT @[15744:] : LOG: restored log file "000000020000230F00000090" from archive 2016-08-16 02:56:05.562 T00 INFO: archive-get start: --buffer-size=8388608 --compress --db-path=/pgdata/lenderx --log-level-console=info --log-level-file=info --log-path=/pgarchive/pgbackrest/log/lenderx --repo-path=/pgarchive/pgbackrest --stanza=lenderx 2016-08-16 02:56:05.562 T00 INFO: get WAL segment 000000020000230F00000091 2016-08-16 02:56:05.903 T00 INFO: archive-get stop Then it continues to get WALs up to the timeline streaming message: 2016-08-16 02:57:07.438 T00 INFO: archive-get start: --buffer-size=8388608 --compress --db-path=/pgdata/lenderx --log-level-console=info --log-level-file=info --log-path=/pgarchive/pgbackrest/log/lenderx --repo-path=/pgarchive/pgbackrest --stanza=lenderx 2016-08-16 02:57:07.438 T00 INFO: get WAL segment 000000020000230F000000DF 2016-08-16 02:57:07.443 T00 INFO: unable to find 000000020000230F000000DF in the archive 2016-08-16 02:57:07.444 T00 INFO: archive-get stop 2016-08-16 02:57:07.515 CDT @[17284:] : LOG: fetching timeline history file for timeline 2 from primary server 2016-08-16 02:57:07.524 CDT @[17284:] : LOG: started streaming WAL from primary at 230F/DE000000 on timeline 2 2016-08-16 02:57:08.944 CDT [unknown]@[unknown][17285:] 10.100.128.210(49064) [unknown] : LOG: connection received: host=10.100.128.210 port=49064 2016-08-16 02:57:09.066 CDT replicator@[unknown][17285:authentication] 10.100.128.210(49064) [unknown] : LOG: replication connection authorized: user=replicator SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384, compression=on) 2016-08-16 02:57:09.120 CDT replicator@[unknown][17285:idle] 10.100.128.210(49064) walreceiver : ERROR: requested WAL segment 00000002000022F400000096 has already been removed There is not problem with space for pg_xlog on master or slave: /dev/mapper/vg02-pgxloglv 60G 24G 37G 39% /pgxlog /dev/mapper/vg01-pgxloglv 60G 7.6G 53G 13% /pgxlog There is nothing marked .ready on the master. There is another slave pulling from the master (mine is 10.80.129.46): postgres=# select * from pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state -------+----------+------------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------ 12874 | 30315692 | replicator | walreceiver | 10.80.129.46 | | 57843 | 2016-08-16 02:57:07.494827-05 | | streaming | 230F/FB4994C8 | 230F/FB4994C8 | 230F/FB4994C8 | 230F/FB4992A8 | 0 | async 5075 | 30315692 | replicator | walreceiver | 10.100.128.209 | | 46847 | 2016-08-13 13:12:48.218157-05 | | streaming | 230F/FB4994C8 | 230F/FB4994C8 | 230F/FB4994C8 | 230F/FB4994C8 | 0 | async (2 rows) On the slave I run: postgres=# select now() as now, now()-pg_last_xact_replay_timestamp() as lagtime, pg_last_xact_replay_timestamp() as replay_timestamp, pg_last_xlog_receive_location() as receive_loc, pg_last_xlog_replay_location() as replay_loc, CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay, (select pg_xlog_location_diff(write_location, replay_location) as log_cnt_diff from pg_stat_replication); now | lagtime | replay_timestamp | receive_loc | replay_loc | log_delay | log_cnt_diff -------------------------------+-----------------+-------------------------------+---------------+---------------+-----------+-------------- 2016-08-16 03:54:36.145632-05 | 00:00:00.040109 | 2016-08-16 03:54:36.105523-05 | 230F/FC184DD0 | 230F/FC184DD0 | 0 | (1 row)