To cater to the paranoid, a new datastore-wide setting "verify-new" is
introduced. When set, a verify job will be spawned right after a new
backup is added to the store (only verifying the added snapshot).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Force consumers to use the lookup_datastore method instead of
potentially opening a datastore twice, and pass the config we have
already loaded into open_with_path, removing the need for open(1).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Avoid races when updating manifest data by flocking a lock file.
update_manifest is used to ensure updates always happen with the lock
held.
Snapshot deletion also acquires the lock, so it cannot interfere with an
outstanding manifest write.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Removing a snapshot has some more safety checks which we don't want to
ignore when removing an entire group (i.e. locking the manifest and
notifying GC).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
There's no point in having that as a seperate method, just parse the
thing into a struct and write it back out correctly.
Also makes further changes to the method simpler.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
If we can't acquire a lock (either because the snapshot disappeared, it
is about to be forgotten/pruned, or it is currently still running) skip
the snapshot. Hold the lock during verification, so that it cannot be
deleted while we are still verifying.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
...to avoid it being forgotten or pruned while in use.
Update lock error message for deletions to be consistent.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
To untangle the server code from the actual backup
implementation.
It would be ideal if the whole backup/ dir could become its
own crate with minimal dependencies, certainly without
depending on the actual api server. That would then also be
used more easily to create forensic tools for all the data
file types we have in the backup repositories.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
This is only acquired in those two methods, both as shared. So it has
no use.
It seems, that it was planned in the past that the index deletion
should take the exclusive, while read and write takes the shared
flock on the index, as one can guess from the lock comments in commit
0465218953
But then later, in commit c8ec450e37)
the documented semantics where changed to use a temp file and do an
atomic rename instead for atomicity.
The reader shared flock on the index file was done inbetween,
probably as preparatory step, but was not removed again when strategy
was changed to using the file rename instead.
Do so now, to avoid confusion of readers and a useless flock.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
In theory, one can do std::mem::forget, and ignore the drop handler. With
the lifetime hack, this could result in a crash.
So we simply require 'static lifetime now (futures also needs that).
- remove chrono dependency
- depend on proxmox 0.3.8
- remove epoch_now, epoch_now_u64 and epoch_now_f64
- remove tm_editor (moved to proxmox crate)
- use new helpers from proxmox 0.3.8
* epoch_i64 and epoch_f64
* parse_rfc3339
* epoch_to_rfc3339_utc
* strftime_local
- BackupDir changes:
* store epoch and rfc3339 string instead of DateTime
* backup_time_to_string now return a Result
* remove unnecessary TryFrom<(BackupGroup, i64)> for BackupDir
- DynamicIndexHeader: change ctime to i64
- FixedIndexHeader: change ctime to i64
since converting from i64 epoch timestamp to DateTime is not always
possible. previously, passing invalid backup-time from client to server
(or vice-versa) panicked the corresponding tokio task. now we get proper
error messages including the invalid timestamp.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
by either printing the original, out-of-range timestamp as-is, or
bailing with a proper error message instead of panicking.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
even if it can't be handled by chrono. silently replacing it with epoch
0 is confusing..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
We need to update the atime of chunk files if they already exist,
otherwise a concurrently running GC could sweep them away.
This is protected with ChunkStore.mutex, so the fstat/unlink does not
race with touching.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
The iterator of get_chunk_iterator is extended with a third parameter
indicating whether the current file is a chunk (false) or a .bad file
(true).
Count their sizes to the total of removed bytes, since it also frees
disk space.
.bad files are only deleted if the corresponding chunk exists, i.e. has
been rewritten. Otherwise we might delete data only marked bad because
of transient errors.
While at it, also clean up and use nix::unistd::unlinkat instead of
unsafe libc calls.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
This ensures that following backups will always upload the chunk,
thereby replacing it with a correct version again.
Format for renaming is <digest>.<counter>.bad where <counter> is used if
a chunk is found to be bad again before a GC cleans it up.
Care has been taken to deliberately only rename a chunk in conditions
where it is guaranteed to be an error in the chunk itself. Otherwise a
broken index file could lead to an unwanted mass-rename of chunks.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Save the state ("ok" or "failed") and the UPID of the respective
verify task. With this we can easily allow to open the relevant task
log and show when the last verify happened.
As we already load the manifest when listing the snapshots, just add
it there directly.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>