2019-08-14 12:08:27 +00:00
|
|
|
//! This module implements the proxmox backup data storage
|
2019-02-12 12:27:11 +00:00
|
|
|
//!
|
2019-08-14 12:08:27 +00:00
|
|
|
//! Proxmox backup splits large files into chunks, and stores them
|
|
|
|
//! deduplicated using a content addressable storage format.
|
2019-02-12 12:27:11 +00:00
|
|
|
//!
|
2019-08-14 12:08:27 +00:00
|
|
|
//! A chunk is simply defined as binary blob, which is stored inside a
|
|
|
|
//! `ChunkStore`, addressed by the SHA256 digest of the binary blob.
|
|
|
|
//!
|
|
|
|
//! Index files are used to reconstruct the original file. They
|
|
|
|
//! basically contain a list of SHA256 checksums. The `DynamicIndex*`
|
|
|
|
//! format is able to deal with dynamic chunk sizes, whereas the
|
|
|
|
//! `FixedIndex*` format is an optimization to store a list of equal
|
|
|
|
//! sized chunks.
|
2019-03-22 09:14:50 +00:00
|
|
|
//!
|
|
|
|
//! # ChunkStore Locking
|
|
|
|
//!
|
|
|
|
//! We need to be able to restart the proxmox-backup service daemons,
|
|
|
|
//! so that we can update the software without rebooting the host. But
|
|
|
|
//! such restarts must not abort running backup jobs, so we need to
|
|
|
|
//! keep the old service running until those jobs are finished. This
|
2019-03-30 16:21:40 +00:00
|
|
|
//! implies that we need some kind of locking for the
|
2019-03-22 09:14:50 +00:00
|
|
|
//! ChunkStore. Please note that it is perfectly valid to have
|
|
|
|
//! multiple parallel ChunkStore writers, even when they write the
|
|
|
|
//! same chunk (because the chunk would have the same name and the
|
|
|
|
//! same data). The only real problem is garbage collection, because
|
|
|
|
//! we need to avoid deleting chunks which are still referenced.
|
|
|
|
//!
|
|
|
|
//! * Read Index Files:
|
|
|
|
//!
|
|
|
|
//! Acquire shared lock for .idx files.
|
|
|
|
//!
|
|
|
|
//!
|
|
|
|
//! * Delete Index Files:
|
|
|
|
//!
|
|
|
|
//! Acquire exclusive lock for .idx files. This makes sure that we do
|
|
|
|
//! not delete index files while they are still in use.
|
|
|
|
//!
|
|
|
|
//!
|
|
|
|
//! * Create Index Files:
|
|
|
|
//!
|
2019-03-30 15:26:52 +00:00
|
|
|
//! Acquire shared lock for ChunkStore (process wide).
|
2019-03-22 09:14:50 +00:00
|
|
|
//!
|
2019-03-30 16:21:40 +00:00
|
|
|
//! Note: When creating .idx files, we create temporary (.tmp) file,
|
|
|
|
//! then do an atomic rename ...
|
2019-03-22 09:14:50 +00:00
|
|
|
//!
|
|
|
|
//!
|
|
|
|
//! * Garbage Collect:
|
|
|
|
//!
|
2019-03-30 15:26:52 +00:00
|
|
|
//! Acquire exclusive lock for ChunkStore (process wide). If we have
|
|
|
|
//! already an shared lock for ChunkStore, try to updraged that
|
|
|
|
//! lock.
|
2019-03-22 09:14:50 +00:00
|
|
|
//!
|
|
|
|
//!
|
|
|
|
//! * Server Restart
|
|
|
|
//!
|
|
|
|
//! Try to abort running garbage collection to release exclusive
|
|
|
|
//! ChunkStore lock asap. Start new service with existing listening
|
|
|
|
//! socket.
|
|
|
|
//!
|
2019-03-30 15:26:52 +00:00
|
|
|
//!
|
2019-03-30 16:21:40 +00:00
|
|
|
//! # Garbage Collection (GC)
|
2019-03-30 15:26:52 +00:00
|
|
|
//!
|
|
|
|
//! Deleting backups is as easy as deleting the corresponding .idx
|
|
|
|
//! files. Unfortunately, this does not free up any storage, because
|
|
|
|
//! those files just contains references to chunks.
|
|
|
|
//!
|
|
|
|
//! To free up some storage, we run a garbage collection process at
|
|
|
|
//! regular intervals. The collector uses an mark and sweep
|
2019-03-31 07:44:35 +00:00
|
|
|
//! approach. In the first phase, it scans all .idx files to mark used
|
|
|
|
//! chunks. The second phase then removes all unmarked chunks from the
|
2019-03-30 15:26:52 +00:00
|
|
|
//! store.
|
|
|
|
//!
|
|
|
|
//! The above locking mechanism makes sure that we are the only
|
2019-03-30 16:21:40 +00:00
|
|
|
//! process running GC. But we still want to be able to create backups
|
|
|
|
//! during GC, so there may be multiple backup threads/tasks
|
|
|
|
//! running. Either started before GC started, or started while GC is
|
|
|
|
//! running.
|
2019-03-30 15:26:52 +00:00
|
|
|
//!
|
2019-03-30 16:21:40 +00:00
|
|
|
//! ## `atime` based GC
|
2019-03-30 15:26:52 +00:00
|
|
|
//!
|
2019-03-30 16:21:40 +00:00
|
|
|
//! The idea here is to mark chunks by updating the `atime` (access
|
|
|
|
//! timestamp) on the chunk file. This is quite simple and does not
|
2019-03-31 07:44:35 +00:00
|
|
|
//! need additional RAM.
|
2019-03-30 16:21:40 +00:00
|
|
|
//!
|
|
|
|
//! One minor problem is that recent Linux versions use the `relatime`
|
|
|
|
//! mount flag by default for performance reasons (yes, we want
|
|
|
|
//! that). When enabled, `atime` data is written to the disk only if
|
|
|
|
//! the file has been modified since the `atime` data was last updated
|
|
|
|
//! (`mtime`), or if the file was last accessed more than a certain
|
2019-03-31 07:44:35 +00:00
|
|
|
//! amount of time ago (by default 24h). So we may only delete chunks
|
|
|
|
//! with `atime` older than 24 hours.
|
|
|
|
//!
|
|
|
|
//! Another problem arise from running backups. The mark phase does
|
|
|
|
//! not find any chunks from those backups, because there is no .idx
|
|
|
|
//! file for them (created after the backup). Chunks created or
|
|
|
|
//! touched by those backups may have an `atime` as old as the start
|
|
|
|
//! time of those backup. Please not that the backup start time may
|
|
|
|
//! predate the GC start time. Se we may only delete chunk older than
|
|
|
|
//! the start time of those running backup jobs.
|
2019-03-30 16:21:40 +00:00
|
|
|
//!
|
|
|
|
//!
|
|
|
|
//! ## Store `marks` in RAM using a HASH
|
|
|
|
//!
|
|
|
|
//! Not sure if this is better. TODO
|
2018-12-31 15:08:04 +00:00
|
|
|
|
2019-06-05 06:41:20 +00:00
|
|
|
#[macro_export]
|
|
|
|
macro_rules! PROXMOX_BACKUP_PROTOCOL_ID_V1 {
|
|
|
|
() => { "proxmox-backup-protocol-v1" }
|
|
|
|
}
|
2019-06-05 06:12:13 +00:00
|
|
|
|
2019-06-27 07:01:41 +00:00
|
|
|
#[macro_export]
|
|
|
|
macro_rules! PROXMOX_BACKUP_READER_PROTOCOL_ID_V1 {
|
|
|
|
() => { "proxmox-backup-reader-protocol-v1" }
|
|
|
|
}
|
|
|
|
|
2019-06-22 07:12:25 +00:00
|
|
|
mod file_formats;
|
|
|
|
pub use file_formats::*;
|
2019-06-14 12:58:37 +00:00
|
|
|
|
2019-06-21 07:51:18 +00:00
|
|
|
mod crypt_config;
|
|
|
|
pub use crypt_config::*;
|
2019-06-08 07:51:49 +00:00
|
|
|
|
2019-06-18 09:17:22 +00:00
|
|
|
mod key_derivation;
|
|
|
|
pub use key_derivation::*;
|
|
|
|
|
2019-08-14 11:05:11 +00:00
|
|
|
mod crypt_reader;
|
|
|
|
pub use crypt_reader::*;
|
|
|
|
|
|
|
|
mod crypt_writer;
|
|
|
|
pub use crypt_writer::*;
|
|
|
|
|
|
|
|
mod checksum_reader;
|
|
|
|
pub use checksum_reader::*;
|
|
|
|
|
|
|
|
mod checksum_writer;
|
|
|
|
pub use checksum_writer::*;
|
|
|
|
|
2019-08-22 12:03:43 +00:00
|
|
|
mod chunker;
|
|
|
|
pub use chunker::*;
|
|
|
|
|
2019-06-12 04:30:03 +00:00
|
|
|
mod data_chunk;
|
|
|
|
pub use data_chunk::*;
|
|
|
|
|
2019-06-21 09:32:07 +00:00
|
|
|
mod data_blob;
|
|
|
|
pub use data_blob::*;
|
|
|
|
|
2019-08-14 11:05:11 +00:00
|
|
|
mod data_blob_reader;
|
|
|
|
pub use data_blob_reader::*;
|
|
|
|
|
|
|
|
mod data_blob_writer;
|
|
|
|
pub use data_blob_writer::*;
|
|
|
|
|
2019-08-16 10:27:17 +00:00
|
|
|
mod catalog_blob;
|
|
|
|
pub use catalog_blob::*;
|
|
|
|
|
2019-05-14 08:05:29 +00:00
|
|
|
mod chunk_stream;
|
|
|
|
pub use chunk_stream::*;
|
|
|
|
|
2019-02-25 11:52:10 +00:00
|
|
|
mod chunk_stat;
|
|
|
|
pub use chunk_stat::*;
|
|
|
|
|
2019-07-02 06:22:29 +00:00
|
|
|
mod read_chunk;
|
|
|
|
pub use read_chunk::*;
|
|
|
|
|
2019-02-12 13:13:31 +00:00
|
|
|
mod chunk_store;
|
|
|
|
pub use chunk_store::*;
|
|
|
|
|
2019-02-27 13:32:34 +00:00
|
|
|
mod index;
|
|
|
|
pub use index::*;
|
|
|
|
|
2019-02-12 13:13:31 +00:00
|
|
|
mod fixed_index;
|
|
|
|
pub use fixed_index::*;
|
|
|
|
|
|
|
|
mod dynamic_index;
|
|
|
|
pub use dynamic_index::*;
|
|
|
|
|
2019-03-05 06:18:12 +00:00
|
|
|
mod backup_info;
|
|
|
|
pub use backup_info::*;
|
|
|
|
|
2019-02-12 13:13:31 +00:00
|
|
|
mod datastore;
|
|
|
|
pub use datastore::*;
|