human byte: add proper unit type and support base-10

The new SizeUnit type takes over the auto scaling logic and could be
used on its own too.

Switch the internal type of HumanByte from u64 to f64, this results
in a slight reduce of usable sizes we can represent (there's no
unsigned float type after all) but we support pebibyte now with quite
the precision and ebibytes should be also work out ok, and that
really should us have covered for a while..

Partially adapted by Dietmar's version, but split up and change so:
* there's no None type, for a SizeUnit that does not makes much sense
* print the unit for byte too, better consistency and one can still
  use as_u64() or as_f64() if they do not want/need the unit rendered
* left the "From usize/u64" impls intact, just convenient to have and
  avoids all over the tree changes to adapt to loosing that
* move auto-scaling into SizeUnit, good fit there and I could see
  some re-use potential for non-human-byte users in the future
* impl Display for SizeUnit instead of the separate unit_str method,
  better usability as it can be used directly in format (with zero
  alloc/copy) and saw no real reason of not having that this way
* switch the place where we auto-scale in HumanByte's to the new_X
  helpers which allows for slightly reduced code usage and simplify
  implementation where possible
* use rounding for the precision limit algorithm. This is a stupid
  problem as in practices there are cases for requiring every variant:
  - flooring would be good for limits, better less than to much
  - ceiling would be good for file sizes, to less can mean ENOSPACE
    and user getting angry if their working value is messed with
  - rounding can be good for rendering benchmark, closer to reality
    and no real impact
  So going always for rounding is really not the best solution..

Some of those changes where naturally opinionated, if there's a good
practical reason we can switch back (or to something completely
different).

The single thing I kept and am not _that_ happy with is being able to
have fractional bytes (1.1 B or even 0.01 B), which just does not
makes much sense as most of those values cannot exist at all in
reality - I say most as multiple of 1/8 Byte can exists, those are
bits.o

Note, the precission also changed from fixed 2 to max 3 (trailing
zeros stripped), while that can be nice we should see if we get
a better precision limiting algorithm, e.g., directly in the printer.
Rust sadly does not supports "limit to precision of 3 but avoid
trailing zeros" so we'd need to adapt their Grisu based algorithm our
own - way to much complexity for this though..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This commit is contained in:
Thomas Lamprecht 2021-11-20 17:32:16 +01:00
parent a58a5cf795
commit 930a71460f

View File

@ -1,50 +1,206 @@
pub struct HumanByte {
b: usize,
use anyhow::{bail, Error};
/// Size units for byte sizes
#[derive(Debug, Copy, Clone, PartialEq)]
pub enum SizeUnit {
Byte,
// SI (base 10)
KByte,
MByte,
GByte,
TByte,
PByte,
// IEC (base 2)
Kibi,
Mebi,
Gibi,
Tebi,
Pebi,
}
impl std::fmt::Display for HumanByte {
impl SizeUnit {
/// Returns the scaling factor
pub fn factor(&self) -> f64 {
match self {
SizeUnit::Byte => 1.0,
// SI (base 10)
SizeUnit::KByte => 1_000.0,
SizeUnit::MByte => 1_000_000.0,
SizeUnit::GByte => 1_000_000_000.0,
SizeUnit::TByte => 1_000_000_000_000.0,
SizeUnit::PByte => 1_000_000_000_000_000.0,
// IEC (base 2)
SizeUnit::Kibi => 1024.0,
SizeUnit::Mebi => 1024.0 * 1024.0,
SizeUnit::Gibi => 1024.0 * 1024.0 * 1024.0,
SizeUnit::Tebi => 1024.0 * 1024.0 * 1024.0 * 1024.0,
SizeUnit::Pebi => 1024.0 * 1024.0 * 1024.0 * 1024.0 * 1024.0,
}
}
/// gets the biggest possible unit still having a value greater zero before the decimal point
/// 'binary' specifies if IEC (base 2) units should be used or SI (base 10) ones
pub fn auto_scale(size: f64, binary: bool) -> SizeUnit {
if binary {
let bits = 63 - (size as u64).leading_zeros();
match bits {
50.. => SizeUnit::Pebi,
40..=49 => SizeUnit::Tebi,
30..=39 => SizeUnit::Gibi,
20..=29 => SizeUnit::Mebi,
10..=19 => SizeUnit::Kibi,
_ => SizeUnit::Byte,
}
} else {
if size >= 1_000_000_000_000_000.0 {
SizeUnit::PByte
} else if size >= 1_000_000_000_000.0 {
SizeUnit::TByte
} else if size >= 1_000_000_000.0 {
SizeUnit::GByte
} else if size >= 1_000_000.0 {
SizeUnit::MByte
} else if size >= 1_000.0 {
SizeUnit::KByte
} else {
SizeUnit::Byte
}
}
}
}
/// Returns the string repesentation
impl std::fmt::Display for SizeUnit {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
if self.b < 1024 {
return write!(f, "{} B", self.b);
match self {
SizeUnit::Byte => write!(f, "B"),
// SI (base 10)
SizeUnit::KByte => write!(f, "KB"),
SizeUnit::MByte => write!(f, "MB"),
SizeUnit::GByte => write!(f, "GB"),
SizeUnit::TByte => write!(f, "TB"),
SizeUnit::PByte => write!(f, "PB"),
// IEC (base 2)
SizeUnit::Kibi => write!(f, "KiB"),
SizeUnit::Mebi => write!(f, "MiB"),
SizeUnit::Gibi => write!(f, "GiB"),
SizeUnit::Tebi => write!(f, "TiB"),
SizeUnit::Pebi => write!(f, "PiB"),
}
let kb: f64 = self.b as f64 / 1024.0;
if kb < 1024.0 {
return write!(f, "{:.2} KiB", kb);
}
}
/// Byte size which can be displayed in a human friendly way
pub struct HumanByte {
/// The siginficant value, it does not includes any factor of the `unit`
size: f64,
/// The scale/unit of the value
unit: SizeUnit,
}
impl HumanByte {
/// Create instance with size and unit (size must be positive)
pub fn with_unit(size: f64, unit: SizeUnit) -> Result<Self, Error> {
if size < 0.0 {
bail!("byte size may not be negative");
}
let mb: f64 = kb / 1024.0;
if mb < 1024.0 {
return write!(f, "{:.2} MiB", mb);
}
let gb: f64 = mb / 1024.0;
if gb < 1024.0 {
return write!(f, "{:.2} GiB", gb);
}
let tb: f64 = gb / 1024.0;
if tb < 1024.0 {
return write!(f, "{:.2} TiB", tb);
}
let pb: f64 = tb / 1024.0;
return write!(f, "{:.2} PiB", pb);
Ok(HumanByte { size, unit })
}
/// Create a new instance with optimal binary unit computed
pub fn new_binary(size: f64) -> Self {
let unit = SizeUnit::auto_scale(size, true);
HumanByte { size: size / unit.factor(), unit }
}
/// Create a new instance with optimal decimal unit computed
pub fn new_decimal(size: f64) -> Self {
let unit = SizeUnit::auto_scale(size, false);
HumanByte { size: size / unit.factor(), unit }
}
/// Returns the size as u64 number of bytes
pub fn as_u64(&self) -> u64 {
self.as_f64() as u64
}
/// Returns the size as f64 number of bytes
pub fn as_f64(&self) -> f64 {
self.size * self.unit.factor()
}
/// Returns a copy with optimal binary unit computed
pub fn auto_scale_binary(self) -> Self {
HumanByte::new_binary(self.as_f64())
}
/// Returns a copy with optimal decimal unit computed
pub fn auto_scale_decimal(self) -> Self {
HumanByte::new_decimal(self.as_f64())
}
}
impl From<u64> for HumanByte {
fn from(v: u64) -> Self {
HumanByte::new_binary(v as f64)
}
}
impl From<usize> for HumanByte {
fn from(v: usize) -> Self {
HumanByte { b: v }
HumanByte::new_binary(v as f64)
}
}
impl From<u64> for HumanByte {
fn from(v: u64) -> Self {
HumanByte { b: v as usize }
impl std::fmt::Display for HumanByte {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let precision = f.precision().unwrap_or(3) as f64;
let precision_factor = 1.0 * 10.0_f64.powf(precision);
// this could cause loss of information, rust has sadly no shortest-max-X flt2dec fmt yet
let size = ((self.size * precision_factor).round()) / precision_factor;
write!(f, "{} {}", size, self.unit)
}
}
#[test]
fn correct_byte_convert() {
fn convert(b: usize) -> String {
fn test_human_byte_auto_unit_decimal() {
fn convert(b: u64) -> String {
HumanByte::new_decimal(b as f64).to_string()
}
assert_eq!(convert(987), "987 B");
assert_eq!(convert(1022), "1.022 KB");
assert_eq!(convert(9_000), "9 KB");
assert_eq!(convert(1_000), "1 KB");
assert_eq!(convert(1_000_000), "1 MB");
assert_eq!(convert(1_000_000_000), "1 GB");
assert_eq!(convert(1_000_000_000_000), "1 TB");
assert_eq!(convert(1_000_000_000_000_000), "1 PB");
assert_eq!(convert((1 << 30) + 103 * (1 << 20)), "1.182 GB");
assert_eq!(convert((1 << 30) + 128 * (1 << 20)), "1.208 GB");
assert_eq!(convert((2 << 50) + 500 * (1 << 40)), "2.802 PB");
}
#[test]
fn test_human_byte_auto_unit_binary() {
fn convert(b: u64) -> String {
HumanByte::from(b).to_string()
}
assert_eq!(convert(1023), "1023 B");
assert_eq!(convert(1 << 10), "1.00 KiB");
assert_eq!(convert(1 << 20), "1.00 MiB");
assert_eq!(convert((1 << 30) + 103 * (1 << 20)), "1.10 GiB");
assert_eq!(convert((2 << 50) + 500 * (1 << 40)), "2.49 PiB");
assert_eq!(convert(987), "987 B");
assert_eq!(convert(1022), "1022 B");
assert_eq!(convert(9_000), "8.789 KiB");
assert_eq!(convert(10_000_000), "9.537 MiB");
assert_eq!(convert(10_000_000_000), "9.313 GiB");
assert_eq!(convert(10_000_000_000_000), "9.095 TiB");
assert_eq!(convert(1 << 10), "1 KiB");
assert_eq!(convert((1 << 10) * 10), "10 KiB");
assert_eq!(convert(1 << 20), "1 MiB");
assert_eq!(convert(1 << 30), "1 GiB");
assert_eq!(convert(1 << 40), "1 TiB");
assert_eq!(convert(1 << 50), "1 PiB");
assert_eq!(convert((1 << 30) + 103 * (1 << 20)), "1.101 GiB");
assert_eq!(convert((1 << 30) + 128 * (1 << 20)), "1.125 GiB");
assert_eq!(convert((1 << 40) + 128 * (1 << 30)), "1.125 TiB");
assert_eq!(convert((2 << 50) + 512 * (1 << 40)), "2.5 PiB");
}