human byte: add proper unit type and support base-10

The new SizeUnit type takes over the auto scaling logic and could be used on its own too. Switch the internal type of HumanByte from u64 to f64, this results in a slight reduce of usable sizes we can represent (there's no unsigned float type after all) but we support pebibyte now with quite the precision and ebibytes should be also work out ok, and that really should us have covered for a while.. Partially adapted by Dietmar's version, but split up and change so: * there's no None type, for a SizeUnit that does not makes much sense * print the unit for byte too, better consistency and one can still use as_u64() or as_f64() if they do not want/need the unit rendered * left the "From usize/u64" impls intact, just convenient to have and avoids all over the tree changes to adapt to loosing that * move auto-scaling into SizeUnit, good fit there and I could see some re-use potential for non-human-byte users in the future * impl Display for SizeUnit instead of the separate unit_str method, better usability as it can be used directly in format (with zero alloc/copy) and saw no real reason of not having that this way * switch the place where we auto-scale in HumanByte's to the new_X helpers which allows for slightly reduced code usage and simplify implementation where possible * use rounding for the precision limit algorithm. This is a stupid problem as in practices there are cases for requiring every variant: - flooring would be good for limits, better less than to much - ceiling would be good for file sizes, to less can mean ENOSPACE and user getting angry if their working value is messed with - rounding can be good for rendering benchmark, closer to reality and no real impact So going always for rounding is really not the best solution.. Some of those changes where naturally opinionated, if there's a good practical reason we can switch back (or to something completely different). The single thing I kept and am not _that_ happy with is being able to have fractional bytes (1.1 B or even 0.01 B), which just does not makes much sense as most of those values cannot exist at all in reality - I say most as multiple of 1/8 Byte can exists, those are bits.o Note, the precission also changed from fixed 2 to max 3 (trailing zeros stripped), while that can be nice we should see if we get a better precision limiting algorithm, e.g., directly in the printer. Rust sadly does not supports "limit to precision of 3 but avoid trailing zeros" so we'd need to adapt their Grisu based algorithm our own - way to much complexity for this though.. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-11-20 17:32:16 +01:00
parent a58a5cf795
commit 930a71460f
1 changed files with 189 additions and 33 deletions
@@ -1,50 +1,206 @@
-pub struct HumanByte {
-    b: usize,
+use anyhow::{bail, Error};
+
+/// Size units for byte sizes
+#[derive(Debug, Copy, Clone, PartialEq)]
+pub enum SizeUnit {
+    Byte,
+    // SI (base 10)
+    KByte,
+    MByte,
+    GByte,
+    TByte,
+    PByte,
+    // IEC (base 2)
+    Kibi,
+    Mebi,
+    Gibi,
+    Tebi,
+    Pebi,
 }
-impl std::fmt::Display for HumanByte {
+
+impl SizeUnit {
+    /// Returns the scaling factor
+    pub fn factor(&self) -> f64 {
+        match self {
+            SizeUnit::Byte => 1.0,
+            // SI (base 10)
+            SizeUnit::KByte => 1_000.0,
+            SizeUnit::MByte => 1_000_000.0,
+            SizeUnit::GByte => 1_000_000_000.0,
+            SizeUnit::TByte => 1_000_000_000_000.0,
+            SizeUnit::PByte => 1_000_000_000_000_000.0,
+            // IEC (base 2)
+            SizeUnit::Kibi => 1024.0,
+            SizeUnit::Mebi => 1024.0 * 1024.0,
+            SizeUnit::Gibi => 1024.0 * 1024.0 * 1024.0,
+            SizeUnit::Tebi => 1024.0 * 1024.0 * 1024.0 * 1024.0,
+            SizeUnit::Pebi => 1024.0 * 1024.0 * 1024.0 * 1024.0 * 1024.0,
+        }
+    }
+
+    /// gets the biggest possible unit still having a value greater zero before the decimal point
+    /// 'binary' specifies if IEC (base 2) units should be used or SI (base 10) ones
+    pub fn auto_scale(size: f64, binary: bool) -> SizeUnit {
+        if binary {
+            let bits = 63 - (size as u64).leading_zeros();
+            match bits {
+                50.. => SizeUnit::Pebi,
+                40..=49 => SizeUnit::Tebi,
+                30..=39 => SizeUnit::Gibi,
+                20..=29 => SizeUnit::Mebi,
+                10..=19 => SizeUnit::Kibi,
+                _ => SizeUnit::Byte,
+            }
+        } else {
+            if size >= 1_000_000_000_000_000.0 {
+                SizeUnit::PByte
+            } else if size >= 1_000_000_000_000.0 {
+                SizeUnit::TByte
+            } else if size >= 1_000_000_000.0 {
+                SizeUnit::GByte
+            } else if size >= 1_000_000.0 {
+                SizeUnit::MByte
+            } else if size >= 1_000.0 {
+                SizeUnit::KByte
+            } else {
+                SizeUnit::Byte
+            }
+        }
+    }
+}
+
+/// Returns the string repesentation
+impl std::fmt::Display for SizeUnit {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        if self.b < 1024 {
-            return write!(f, "{} B", self.b);
+        match self {
+            SizeUnit::Byte => write!(f, "B"),
+            // SI (base 10)
+            SizeUnit::KByte => write!(f, "KB"),
+            SizeUnit::MByte => write!(f, "MB"),
+            SizeUnit::GByte => write!(f, "GB"),
+            SizeUnit::TByte => write!(f, "TB"),
+            SizeUnit::PByte => write!(f, "PB"),
+            // IEC (base 2)
+            SizeUnit::Kibi => write!(f, "KiB"),
+            SizeUnit::Mebi => write!(f, "MiB"),
+            SizeUnit::Gibi => write!(f, "GiB"),
+            SizeUnit::Tebi => write!(f, "TiB"),
+            SizeUnit::Pebi => write!(f, "PiB"),
        }
-        let kb: f64 = self.b as f64 / 1024.0;
-        if kb < 1024.0 {
-            return write!(f, "{:.2} KiB", kb);
    }
-        let mb: f64 = kb / 1024.0;
-        if mb < 1024.0 {
-            return write!(f, "{:.2} MiB", mb);
+}
+
+/// Byte size which can be displayed in a human friendly way
+pub struct HumanByte {
+    /// The siginficant value, it does not includes any factor of the `unit`
+    size: f64,
+    /// The scale/unit of the value
+    unit: SizeUnit,
+}
+
+impl HumanByte {
+    /// Create instance with size and unit (size must be positive)
+    pub fn with_unit(size: f64, unit: SizeUnit) -> Result<Self, Error> {
+        if size < 0.0 {
+            bail!("byte size may not be negative");
        }
-        let gb: f64 = mb / 1024.0;
-        if gb < 1024.0 {
-            return write!(f, "{:.2} GiB", gb);
+        Ok(HumanByte { size, unit })
    }
-        let tb: f64 = gb / 1024.0;
-        if tb < 1024.0 {
-            return write!(f, "{:.2} TiB", tb);
+
+    /// Create a new instance with optimal binary unit computed
+    pub fn new_binary(size: f64) -> Self {
+        let unit = SizeUnit::auto_scale(size, true);
+        HumanByte { size: size / unit.factor(), unit }
    }
-        let pb: f64 = tb / 1024.0;
-        return write!(f, "{:.2} PiB", pb);
+
+    /// Create a new instance with optimal decimal unit computed
+    pub fn new_decimal(size: f64) -> Self {
+        let unit = SizeUnit::auto_scale(size, false);
+        HumanByte { size: size / unit.factor(), unit }
+    }
+
+    /// Returns the size as u64 number of bytes
+    pub fn as_u64(&self) -> u64 {
+        self.as_f64() as u64
+    }
+
+    /// Returns the size as f64 number of bytes
+    pub fn as_f64(&self) -> f64 {
+        self.size * self.unit.factor()
+    }
+
+    /// Returns a copy with optimal binary unit computed
+    pub fn auto_scale_binary(self) -> Self {
+        HumanByte::new_binary(self.as_f64())
+    }
+
+    /// Returns a copy with optimal decimal unit computed
+    pub fn auto_scale_decimal(self) -> Self {
+        HumanByte::new_decimal(self.as_f64())
+    }
+}
+
+impl From<u64> for HumanByte {
+    fn from(v: u64) -> Self {
+        HumanByte::new_binary(v as f64)
    }
 }
 impl From<usize> for HumanByte {
    fn from(v: usize) -> Self {
-        HumanByte { b: v }
+        HumanByte::new_binary(v as f64)
    }
 }
-impl From<u64> for HumanByte {
-    fn from(v: u64) -> Self {
-        HumanByte { b: v as usize }
+
+impl std::fmt::Display for HumanByte {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        let precision = f.precision().unwrap_or(3) as f64;
+        let precision_factor = 1.0 * 10.0_f64.powf(precision);
+        // this could cause loss of information, rust has sadly no shortest-max-X flt2dec fmt yet
+        let size = ((self.size * precision_factor).round()) / precision_factor;
+        write!(f, "{} {}", size, self.unit)
    }
 }

 #[test]
-fn correct_byte_convert() {
-    fn convert(b: usize) -> String {
+fn test_human_byte_auto_unit_decimal() {
+    fn convert(b: u64) -> String {
+        HumanByte::new_decimal(b as f64).to_string()
+    }
+    assert_eq!(convert(987), "987 B");
+    assert_eq!(convert(1022), "1.022 KB");
+    assert_eq!(convert(9_000), "9 KB");
+    assert_eq!(convert(1_000), "1 KB");
+    assert_eq!(convert(1_000_000), "1 MB");
+    assert_eq!(convert(1_000_000_000), "1 GB");
+    assert_eq!(convert(1_000_000_000_000), "1 TB");
+    assert_eq!(convert(1_000_000_000_000_000), "1 PB");
+
+    assert_eq!(convert((1 << 30) + 103 * (1 << 20)), "1.182 GB");
+    assert_eq!(convert((1 << 30) + 128 * (1 << 20)), "1.208 GB");
+    assert_eq!(convert((2 << 50) + 500 * (1 << 40)), "2.802 PB");
+}
+
+#[test]
+fn test_human_byte_auto_unit_binary() {
+    fn convert(b: u64) -> String {
        HumanByte::from(b).to_string()
    }
-    assert_eq!(convert(1023), "1023 B");
-    assert_eq!(convert(1 << 10), "1.00 KiB");
-    assert_eq!(convert(1 << 20), "1.00 MiB");
-    assert_eq!(convert((1 << 30) + 103 * (1 << 20)), "1.10 GiB");
-    assert_eq!(convert((2 << 50) + 500 * (1 << 40)), "2.49 PiB");
+    assert_eq!(convert(987), "987 B");
+    assert_eq!(convert(1022), "1022 B");
+    assert_eq!(convert(9_000), "8.789 KiB");
+    assert_eq!(convert(10_000_000), "9.537 MiB");
+    assert_eq!(convert(10_000_000_000), "9.313 GiB");
+    assert_eq!(convert(10_000_000_000_000), "9.095 TiB");
+
+    assert_eq!(convert(1 << 10), "1 KiB");
+    assert_eq!(convert((1 << 10) * 10), "10 KiB");
+    assert_eq!(convert(1 << 20), "1 MiB");
+    assert_eq!(convert(1 << 30), "1 GiB");
+    assert_eq!(convert(1 << 40), "1 TiB");
+    assert_eq!(convert(1 << 50), "1 PiB");
+
+    assert_eq!(convert((1 << 30) + 103 * (1 << 20)), "1.101 GiB");
+    assert_eq!(convert((1 << 30) + 128 * (1 << 20)), "1.125 GiB");
+    assert_eq!(convert((1 << 40) + 128 * (1 << 30)), "1.125 TiB");
+    assert_eq!(convert((2 << 50) + 512 * (1 << 40)), "2.5 PiB");
 }