docs: tech overview: avoid 'we' and other small style fixes/additions

"we" should be avoided, it's never quite clear who is "we" in the
context here and it leads to some technical wrong meanings, e.g., we
(here assumed to be "we developers") do not read any backup data, the
Proxmox Backup client does.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This commit is contained in:
Thomas Lamprecht 2021-02-04 12:27:13 +01:00
parent 3253d8a2e4
commit efc09f63cc

View File

@ -59,11 +59,11 @@ blocks, and changing existing files changes only their own blocks.
As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps', which As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps', which
can track the changed blocks of an image. Since these bitmap are also a can track the changed blocks of an image. Since these bitmap are also a
representation of the image split into chunks, we have a direct relation representation of the image split into chunks, there is a direct relation
between dirty blocks of the image and chunks we have to upload, so only between dirty blocks of the image and chunks which need to get uploaded, so
modified chunks of the disk have to be uploaded for a backup. only modified chunks of the disk have to be uploaded for a backup.
Since we always split the image into chunks of the same size, unchanged blocks Since the image is always split into chunks of the same size, unchanged blocks
will result in identical checksums for those chunks, so such chunks do not need will result in identical checksums for those chunks, so such chunks do not need
to be backed up again. This way storage snapshots are not needed to find the to be backed up again. This way storage snapshots are not needed to find the
changed blocks. changed blocks.
@ -126,21 +126,22 @@ approximation:
p(n, d) = 1 - e^{-n^2/(2d)} p(n, d) = 1 - e^{-n^2/(2d)}
Where `n` is the number of tries, and `d` is the number of possibilities. So Where `n` is the number of tries, and `d` is the number of possibilities.
for example, if we assume a large datastore of 1 PiB, and an average chunk size For a concrete example lets assume a large datastore of 1 PiB, and an average
of 4 MiB, we have :math:`n = 268435456` tries, and :math:`d = 2^{256}` chunk size of 4 MiB. That means :math:`n = 268435456` tries, and :math:`d =
possibilities. Using the above formula we get that the probability of a 2^{256}` possibilities. Inserting those values in the formula from earlier you
collision in that scenario is: will see that the probability of a collision in that scenario is:
.. math:: .. math::
3.1115 * 10^{-61} 3.1115 * 10^{-61}
For context, in a lottery game of 6 of 45, the chance to correctly guess all 6 For context, in a lottery game of guessing 6 out of 45, the chance to correctly
numbers is only :math:`1.2277 * 10^{-7}`. guess all 6 numbers is only :math:`1.2277 * 10^{-7}`, that means the chance of
collission is about the same as winning 13 such lotto games *in a row*.
So it is extremely unlikely that such a collision would occur by accident in a In conclusion, it is extremely unlikely that such a collision would occur by
normal datastore. accident in a normal datastore.
Additionally, SHA-256 is prone to length extension attacks, but since there is Additionally, SHA-256 is prone to length extension attacks, but since there is
an upper limit for how big the chunk are, this is not a problem, since a an upper limit for how big the chunk are, this is not a problem, since a
@ -152,9 +153,10 @@ File-based Backup
Since dynamically sized chunks (for file-based backups) are created on a custom Since dynamically sized chunks (for file-based backups) are created on a custom
archive format (pxar) and not over the files directly, there is no relation archive format (pxar) and not over the files directly, there is no relation
between files and the chunks. This means we have to read all files again for between files and the chunks. This means that the Proxmox Backup client has to
every backup, otherwise it would not be possible to generate a consistent pxar read all files again for every backup, otherwise it would not be possible to
archive where the original chunks can be reused. generate a consistent independent pxar archive where the original chunks can be
reused. Note that there will be still only new or change chunks be uploaded.
Verification of encrypted chunks Verification of encrypted chunks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^