Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit f815de7

Browse files
ajkrConnor1996
authored andcommitted
Ensure writes to WAL tail during FlushWAL(true /* sync */) will be synced (facebook#10560)
Summary: WAL append and switch can both happen between `FlushWAL(true /* sync */)`'s sync operations and its call to `MarkLogsSynced()`. We permit this since locks need to be released for the sync operations. Such an appended/switched WAL is both inactive and incompletely synced at the time `MarkLogsSynced()` processes it. Prior to this PR, `MarkLogsSynced()` assumed all inactive WALs were fully synced and removed them from consideration for future syncs. That was wrong in the scenario described above and led to the latest append(s) never being synced. This PR changes `MarkLogsSynced()` to only remove inactive WALs from consideration for which all flushed data has been synced. Pull Request resolved: facebook#10560 Test Plan: repro unit test for the scenario described above. Without this PR, it fails on "key2" not found Reviewed By: riversand963 Differential Revision: D38957391 Pulled By: ajkr fbshipit-source-id: da77175eba97ff251a4219b227b3bb2d4843ed26
1 parent 3dba9fa commit f815de7

3 files changed

Lines changed: 60 additions & 5 deletions

File tree

HISTORY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Rocksdb Change Log
22

33
## Unreleased
4+
### Bug Fixes
5+
* Fixed bug where `FlushWAL(true /* sync */)` (used by `GetLiveFilesStorageInfo()`, which is used by checkpoint and backup) could cause parallel writes at the tail of a WAL file to never be synced.
46

57
### Bug Fixes
68

db/db_impl/db_impl.cc

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1488,20 +1488,28 @@ void DBImpl::MarkLogsSynced(uint64_t up_to, bool synced_dir,
14881488
auto& wal = *it;
14891489
assert(wal.IsSyncing());
14901490

1491-
if (logs_.size() > 1) {
1491+
if (wal.number < logs_.back().number) {
1492+
// Inactive WAL
14921493
if (immutable_db_options_.track_and_verify_wals_in_manifest &&
14931494
wal.GetPreSyncSize() > 0) {
14941495
synced_wals->AddWal(wal.number, WalMetadata(wal.GetPreSyncSize()));
14951496
}
1496-
logs_to_free_.push_back(wal.ReleaseWriter());
1497-
it = logs_.erase(it);
1497+
if (wal.GetPreSyncSize() == wal.writer->file()->GetFlushedSize()) {
1498+
// Fully synced
1499+
logs_to_free_.push_back(wal.ReleaseWriter());
1500+
it = logs_.erase(it);
1501+
} else {
1502+
assert(wal.GetPreSyncSize() < wal.writer->file()->GetFlushedSize());
1503+
wal.FinishSync();
1504+
++it;
1505+
}
14981506
} else {
1507+
assert(wal.number == logs_.back().number);
1508+
// Active WAL
14991509
wal.FinishSync();
15001510
++it;
15011511
}
15021512
}
1503-
assert(logs_.empty() || logs_[0].number > up_to ||
1504-
(logs_.size() == 1 && !logs_[0].IsSyncing()));
15051513
log_sync_cv_.SignalAll();
15061514
}
15071515

db/db_write_test.cc

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -386,6 +386,51 @@ TEST_P(DBWriteTest, UnflushedPutRaceWithTrackedWalSync) {
386386
Close();
387387
}
388388

389+
TEST_P(DBWriteTest, InactiveWalFullySyncedBeforeUntracked) {
390+
// Repro bug where a WAL is appended and switched after
391+
// `FlushWAL(true /* sync */)`'s sync finishes and before it untracks fully
392+
// synced inactive logs. Previously such a WAL would be wrongly untracked
393+
// so the final append would never be synced.
394+
Options options = GetOptions();
395+
std::unique_ptr<FaultInjectionTestEnv> fault_env(
396+
new FaultInjectionTestEnv(env_));
397+
options.env = fault_env.get();
398+
Reopen(options);
399+
400+
ASSERT_OK(Put("key1", "val1"));
401+
402+
SyncPoint::GetInstance()->SetCallBack(
403+
"DBImpl::SyncWAL:BeforeMarkLogsSynced:1", [this](void* /* arg */) {
404+
ASSERT_OK(Put("key2", "val2"));
405+
ASSERT_OK(dbfull()->TEST_SwitchMemtable());
406+
});
407+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
408+
409+
ASSERT_OK(db_->FlushWAL(true /* sync */));
410+
411+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->DisableProcessing();
412+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
413+
414+
ASSERT_OK(Put("key3", "val3"));
415+
416+
ASSERT_OK(db_->FlushWAL(true /* sync */));
417+
418+
Close();
419+
420+
// Simulate full loss of unsynced data. This should drop nothing since we did
421+
// `FlushWAL(true /* sync */)` before `Close()`.
422+
fault_env->DropUnsyncedFileData();
423+
424+
Reopen(options);
425+
426+
ASSERT_EQ("val1", Get("key1"));
427+
ASSERT_EQ("val2", Get("key2"));
428+
ASSERT_EQ("val3", Get("key3"));
429+
430+
// Need to close before `fault_env` goes out of scope.
431+
Close();
432+
}
433+
389434
TEST_P(DBWriteTest, IOErrorOnWALWriteTriggersReadOnlyMode) {
390435
std::unique_ptr<FaultInjectionTestEnv> mock_env(
391436
new FaultInjectionTestEnv(env_));

0 commit comments

Comments
 (0)