docs: tech overview: avoid 'we' and other small style fixes/additions
"we" should be avoided, it's never quite clear who is "we" in the context here and it leads to some technical wrong meanings, e.g., we (here assumed to be "we developers") do not read any backup data, the Proxmox Backup client does. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This commit is contained in:
parent
3253d8a2e4
commit
efc09f63cc
@ -59,11 +59,11 @@ blocks, and changing existing files changes only their own blocks.
|
|||||||
|
|
||||||
As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps', which
|
As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps', which
|
||||||
can track the changed blocks of an image. Since these bitmap are also a
|
can track the changed blocks of an image. Since these bitmap are also a
|
||||||
representation of the image split into chunks, we have a direct relation
|
representation of the image split into chunks, there is a direct relation
|
||||||
between dirty blocks of the image and chunks we have to upload, so only
|
between dirty blocks of the image and chunks which need to get uploaded, so
|
||||||
modified chunks of the disk have to be uploaded for a backup.
|
only modified chunks of the disk have to be uploaded for a backup.
|
||||||
|
|
||||||
Since we always split the image into chunks of the same size, unchanged blocks
|
Since the image is always split into chunks of the same size, unchanged blocks
|
||||||
will result in identical checksums for those chunks, so such chunks do not need
|
will result in identical checksums for those chunks, so such chunks do not need
|
||||||
to be backed up again. This way storage snapshots are not needed to find the
|
to be backed up again. This way storage snapshots are not needed to find the
|
||||||
changed blocks.
|
changed blocks.
|
||||||
@ -126,21 +126,22 @@ approximation:
|
|||||||
|
|
||||||
p(n, d) = 1 - e^{-n^2/(2d)}
|
p(n, d) = 1 - e^{-n^2/(2d)}
|
||||||
|
|
||||||
Where `n` is the number of tries, and `d` is the number of possibilities. So
|
Where `n` is the number of tries, and `d` is the number of possibilities.
|
||||||
for example, if we assume a large datastore of 1 PiB, and an average chunk size
|
For a concrete example lets assume a large datastore of 1 PiB, and an average
|
||||||
of 4 MiB, we have :math:`n = 268435456` tries, and :math:`d = 2^{256}`
|
chunk size of 4 MiB. That means :math:`n = 268435456` tries, and :math:`d =
|
||||||
possibilities. Using the above formula we get that the probability of a
|
2^{256}` possibilities. Inserting those values in the formula from earlier you
|
||||||
collision in that scenario is:
|
will see that the probability of a collision in that scenario is:
|
||||||
|
|
||||||
.. math::
|
.. math::
|
||||||
|
|
||||||
3.1115 * 10^{-61}
|
3.1115 * 10^{-61}
|
||||||
|
|
||||||
For context, in a lottery game of 6 of 45, the chance to correctly guess all 6
|
For context, in a lottery game of guessing 6 out of 45, the chance to correctly
|
||||||
numbers is only :math:`1.2277 * 10^{-7}`.
|
guess all 6 numbers is only :math:`1.2277 * 10^{-7}`, that means the chance of
|
||||||
|
collission is about the same as winning 13 such lotto games *in a row*.
|
||||||
|
|
||||||
So it is extremely unlikely that such a collision would occur by accident in a
|
In conclusion, it is extremely unlikely that such a collision would occur by
|
||||||
normal datastore.
|
accident in a normal datastore.
|
||||||
|
|
||||||
Additionally, SHA-256 is prone to length extension attacks, but since there is
|
Additionally, SHA-256 is prone to length extension attacks, but since there is
|
||||||
an upper limit for how big the chunk are, this is not a problem, since a
|
an upper limit for how big the chunk are, this is not a problem, since a
|
||||||
@ -152,9 +153,10 @@ File-based Backup
|
|||||||
|
|
||||||
Since dynamically sized chunks (for file-based backups) are created on a custom
|
Since dynamically sized chunks (for file-based backups) are created on a custom
|
||||||
archive format (pxar) and not over the files directly, there is no relation
|
archive format (pxar) and not over the files directly, there is no relation
|
||||||
between files and the chunks. This means we have to read all files again for
|
between files and the chunks. This means that the Proxmox Backup client has to
|
||||||
every backup, otherwise it would not be possible to generate a consistent pxar
|
read all files again for every backup, otherwise it would not be possible to
|
||||||
archive where the original chunks can be reused.
|
generate a consistent independent pxar archive where the original chunks can be
|
||||||
|
reused. Note that there will be still only new or change chunks be uploaded.
|
||||||
|
|
||||||
Verification of encrypted chunks
|
Verification of encrypted chunks
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
Loading…
Reference in New Issue
Block a user