docs: add more thoughts about chunk size

2020-12-01 10:28:06 +01:00
parent 60e6ee46de
commit 37f1b7dd8d
1 changed files with 21 additions and 0 deletions
--- a/README.rst
+++ b/README.rst
@ -112,3 +112,24 @@ Modern SSD are much faster, lets assume the following::
  MAX(64KB) = 354 MB/s;
  MAX(4KB)  =  67 MB/s;
  MAX(1KB)  =  18 MB/s;
+
+
+Also, the average chunk directly relates to the number of chunks produced by
+a backup::
+
+  CHUNK_COUNT = BACKUP_SIZE / ACS
+
+Here are some staticics from my developer worstation::
+
+  Disk Usage:       65 GB
+  Directories:   58971
+  Files:        726314
+  Files < 64KB: 617541
+
+As you see, there are really many small files. If we would do file
+level deduplication, i.e. generate one chunk per file, we end up with
+more than 700000 chunks.
+
+Instead, our current algorithm only produce large chunks with an
+average chunks size of 4MB. With above data, this produce about 15000
+chunks (factor 50 less chunks).