docs: add more thoughts about chunk size

2020-12-01 10:28:06 +01:00
parent 60e6ee46de
commit 37f1b7dd8d
1 changed files with 21 additions and 0 deletions
--- a/README.rst
+++ b/README.rst
@ -112,3 +112,24 @@ Modern SSD are much faster, lets assume the following::
  MAX(64KB) = 354 MB/s;
  MAX(4KB)  =  67 MB/s;
  MAX(1KB)  =  18 MB/s;
 Also, the average chunk directly relates to the number of chunks produced by
 a backup::
  CHUNK_COUNT = BACKUP_SIZE / ACS
 Here are some staticics from my developer worstation::
  Disk Usage:       65 GB
  Directories:   58971
  Files:        726314
  Files < 64KB: 617541
 As you see, there are really many small files. If we would do file
 level deduplication, i.e. generate one chunk per file, we end up with
 more than 700000 chunks.
 Instead, our current algorithm only produce large chunks with an
 average chunks size of 4MB. With above data, this produce about 15000
 chunks (factor 50 less chunks).