also changes:
* correct comment about reset (replace 'sync' with 'action')
* check schedule change correctly (only when it is actually changed)
with this changes, we can drop the 'lookup_last_worker' method
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
we rely on the jobstate handling to write the error of the worker
into its state file, but we used '?' here in a block which does not
return the error to the block, but to the function/closure instead
so if a prune job failed because of such an '?', we did not write
into the statefile and got a wrong state there
instead use our try_block! macro that wraps the code in a closure
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
like the sync jobs, so that if an admin configures a schedule it
really starts the next time that time is reached not immediately
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
listing, updating or deleting a user is now possible for the user
itself, in addition to higher-privileged users that have appropriate
privileges on '/access/users'.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
filtered by those they are privileged enough to read individually. this
allows such users to configure prune/GC schedules via the GUI (the API
already allowed it previously).
permission-wise, a user with this privilege can already:
- list all stores they have access to (returns just name/comment)
- read the config of each store they have access to individually
(returns full config of that datastore + digest of whole config)
but combines them to
- read configs of all datastores they have access to (returns full
config of those datastores + digest of whole config)
user that have AUDIT on just /datastore without propagate can now no
longer read all configurations (but this could be added it back, it just
seems to make little sense to me).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
by packing the auth into a RwLock and starting a background
future that renews the ticket every 15 minutes
we still use the BroadcastFuture for the first ticket and only
if that is finished we start the scheduled future
we have to store an abort handle for the renewal future and abort it when
the http client is dropped, so we do not request new tickets forever
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
like we do for PVE. this is visible on the dashboard, and caused 403 on
each update which bothers me when looking at the dev console.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
A client can omit uploading chunks in the "known_chunks" list, those
then also won't be written on the server side. Check all those chunks
mentioned in the index but not uploaded for existance and report an
error if they don't exist instead of marking a potentially broken backup
as "successful".
This is only important if the base snapshot references corrupted chunks,
but has not been negatively verified. Also, it is important to only
verify this at the end, *after* all index writers are closed, since only
then can it be guaranteed that no GC will sweep referenced chunks away.
If a chunk is found missing, also mark the previous backup with a
verification failure, since we know the missing chunk has to referenced
in it (only way it could have been inserted to known_chunks with
checked=false). This has the benefit of automatically doing a
full-upload backup if the user attempts to retry after seeing the new
error, instead of requiring a manual verify or forget.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Do not allow clients to reuse chunks from the previous backup if it has
a failed validation result. This would result in a new "successful"
backup that potentially references broken chunks.
If the previous backup has not been verified, assume it is fine and
continue on.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
- remove chrono dependency
- depend on proxmox 0.3.8
- remove epoch_now, epoch_now_u64 and epoch_now_f64
- remove tm_editor (moved to proxmox crate)
- use new helpers from proxmox 0.3.8
* epoch_i64 and epoch_f64
* parse_rfc3339
* epoch_to_rfc3339_utc
* strftime_local
- BackupDir changes:
* store epoch and rfc3339 string instead of DateTime
* backup_time_to_string now return a Result
* remove unnecessary TryFrom<(BackupGroup, i64)> for BackupDir
- DynamicIndexHeader: change ctime to i64
- FixedIndexHeader: change ctime to i64
since converting from i64 epoch timestamp to DateTime is not always
possible. previously, passing invalid backup-time from client to server
(or vice-versa) panicked the corresponding tokio task. now we get proper
error messages including the invalid timestamp.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
otherwise operations like catalog shell panic when viewing pxar archives
containing such entries, e.g. with mtime very far ahead into the future.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
by either printing the original, out-of-range timestamp as-is, or
bailing with a proper error message instead of panicking.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
even if it can't be handled by chrono. silently replacing it with epoch
0 is confusing..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
else we get the default of 16k, which is quite low for our use case.
this improves the TLS upload benchmark speed by about 30-40% for me.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
fixes the error, "manifest does not contain
file 'X.pxar'", that occurs when trying to mount
a pxar archive with 'proxmox-backup-client mount':
Signed-off-by: Dylan Whyte <d.whyte@proxmox.com>
similar to the other fix, if we do not set the buffer size manually,
we get better performance for high latency connections
restore benchmark from f.gruenbicher:
no delay, without patch: ~50MB/s
no delay, with patch: ~50MB/s
25ms delay, without patch: ~11MB/s
25ms delay, with path: ~50MB/s
my own restore benchmark:
no delay, without patch: ~1.5GiB/s
no delay, with patch: ~1.5GiB/s
25ms delay, without patch: 30MiB/s
25ms delay, with patch: ~950MiB/s
for some more details about those benchmarks see
https://lists.proxmox.com/pipermail/pbs-devel/2020-September/000600.html
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
by leaving the buffer sizes on default, we get much better tcp performance
for high latency links
throughput is still impacted by latency, but much less so when
leaving the sizes at default.
the disadvantage is slightly higher memory usage of the server
(details below)
my local benchmarks (proxmox-backup-client benchmark):
pbs client:
PVE Host
Epyc 7351P (16core/32thread)
64GB Memory
pbs server:
VM on Host
1 Socket, 4 Cores (Host CPU type)
4GB Memory
average of 3 runs, rounded to MB/s
| no delay | 1ms | 5ms | 10ms | 25ms |
without this patch | 230MB/s | 55MB/s | 13MB/s | 7MB/s | 3MB/s |
with this patch | 293MB/s | 293MB/s | 249MB/s | 241MB/s | 104MB/s |
memory usage (resident memory) of proxmox-backup-proxy:
| peak during benchmarks | after benchmarks |
without this patch | 144MB | 100MB |
with this patch | 145MB | 130MB |
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
We need to update the atime of chunk files if they already exist,
otherwise a concurrently running GC could sweep them away.
This is protected with ChunkStore.mutex, so the fstat/unlink does not
race with touching.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
The iterator of get_chunk_iterator is extended with a third parameter
indicating whether the current file is a chunk (false) or a .bad file
(true).
Count their sizes to the total of removed bytes, since it also frees
disk space.
.bad files are only deleted if the corresponding chunk exists, i.e. has
been rewritten. Otherwise we might delete data only marked bad because
of transient errors.
While at it, also clean up and use nix::unistd::unlinkat instead of
unsafe libc calls.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
This ensures that following backups will always upload the chunk,
thereby replacing it with a correct version again.
Format for renaming is <digest>.<counter>.bad where <counter> is used if
a chunk is found to be bad again before a GC cleans it up.
Care has been taken to deliberately only rename a chunk in conditions
where it is guaranteed to be an error in the chunk itself. Otherwise a
broken index file could lead to an unwanted mass-rename of chunks.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
a range from high to low in rust results in an empty range
(see std::ops::Range documentation)
so we need to generate the range from 0..data.len() and then reverse it
also, the task log contains a newline at the end, so we have to remove
that (should it exist)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
this implements parsing and calculating calendarevents that have a
basic date component (year-mon-day) with the usual syntax options
(*, ranges, lists)
and some special events:
monthly
yearly/annually (like systemd)
quarterly
semiannually,semi-annually (like systemd)
includes some regression tests
the ~ syntax for days (the last x days of the month) is not yet
implemented
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
instead of using 'as' and silently converting wrong,
use the TryInto trait and raise an error if we cannot convert
this should only happen if we have a negative year,
but this is expected (we do not want schedules from before the year 0)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
add_* are modeled after add_days
subtract one for set_mon to have a consistent interface for all fields
(i.e. getter/setter return/expect the 'real' number, not the ones
in the tm struct)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
the tm struct contains the year - 1900 but we added that
if we want to use the libc normalization correctly, the tm struct
must have the correct year in it, else the computations for timezones,
etc. fail
instead add a getter that adds the years and a setter that subtracts it again
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
if we give multiple options/ranges for a value, e.g.
2,4,8
we always choose the biggest, instead of the smallest that is next
this happens because in DateTimeValue::find_next(value)
'next' can be set multiple times and we set it when the new
value was *bigger* than the last found 'next' value, when in reality
we have to choose the *smallest* next we can find
reverse the comparison operator to fix this
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
we never passed 'false' to it anyway so remove it
(we can add it again if we should ever need it)
also remove the adding of wday (gets normalized anyway)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
we want to use dates for the calendarspec, and with that there are some
impossible combinations that cannot be detected during parsing
(e.g. some datetimes do not exist in some timezones, and the timezone
can change after setting the schedule)
so finding no timestamp is not an error anymore but a valid result
we omit logging in that case (since it is not an error anymore)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
mktime/gmtime can normalize time and even can handle special timezone
cases like the fact that the time 2:30 on specific day/timezone combos
do not exists
we have to convert the signature of all functions that use
normalize_time since mktime/gmtime can return an EOVERFLOW
but if this happens there is no way we can find a good time anyway
since normalize_time will always set wday according to the rest of the
time, remove set_wday
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
while it was correct, there was no measurable speed gain
(a benchmark yielded 2.8 ms for a spec that did not find a timestamp either way)
so remove it for simpler code
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
when trying to parse the task status, we seek 8k from the end
which may be into the middle of a line, so the datetime parsing
can fail (when the log message contains ': ')
This patch does a fast search for the last line, and avoid the
'lines' iterator.
for datastores where the requesting user has read or write permissions,
since the API method itself filters by that already. this is the same
permission setting and filtering that the datastore list API endpoint
does.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Because if not, the backups it creates have bogus permissions and may
seem like they got broken once the daemon is started again with the
correct user/group.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Save the state ("ok" or "failed") and the UPID of the respective
verify task. With this we can easily allow to open the relevant task
log and show when the last verify happened.
As we already load the manifest when listing the snapshots, just add
it there directly.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
It's a string-type.
Implement Serialize via Display, Deserialize via FromStr and
add an API_SCHEMA so that it can be used as a type within
the #[api] macro.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
to avoid `map_struct` which is actually unsafe because it
does not verify alignment constraints at all
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
There is no requirement to have at least
a blank line, attribute or comment in between two
interface definitions, e.g.
iface lo inet loopback
iface lo inet6 loopback
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
it really is not necessary, since the only time we are interested in
loading the state from the file is when we list it, and there
we use JobState::load directly to avoid the lock
we still need to create the file on syncjob creation though, so
that we have the correct time for the schedule
to do this we add a new create_state_file that overwrites it on creation
of a syncjob
for safety, we subtract 30 seconds from the in-memory state in case
the statefile is missing
since we call create_state_file from proxmox-backup-api,
we have to chown the lock file after creating to the backup user,
else the sync job scheduling cannot aquire the lock
also we remove the lock file on statefile removal
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
so that we can log if triggered by a schedule, and writing to a jobstatefile
also correctly polls now the abort_future of the worker, so that
users can stop a sync
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
and move the pull parameters into the worker, so that the task log
contains the error if there is one
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
this is intended to be a generic helper to (de)serialize job states
(e.g., sync, verify, and so on)
writes a json file into '/var/lib/proxmox-backup/jobstates/TYPE-ID.json'
the api creates the directory with the correct permissions, like
the rrd directory
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
the endtime should be the timestamp of the last log line
or if there is no log at all, the starttime
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
representing a state via an enum makes more sense in this case
we also implement FromStr and Display to make it easy to convet from/to
a string
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
To prevent forgetting the base snapshot of a running backup, and catch
the case when it still happens (e.g. via manual rm) to at least error
out instead of storing a potentially invalid backup.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
This reverts commit d53fbe2474.
The HashSet and "register" function are unnecessary, as we already know
which backup is the one we need to check: the last one, stored as
'last_backup'.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
An flock on the snapshot dir itself is used in addition to the group dir
lock. The lock is used to avoid races with forget and prune, while
having more granularity than the group lock (i.e. the group lock is
necessary to prevent more than one backup per group, but the snapshot
lock still allows backups unrelated to the currently running to be
forgotten/pruned).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Attempt to lock the backup directory to be deleted, if it works keep the
lock until the deletion is complete. This way we ensure that no other
locking operation (e.g. using a snapshot as base for another backup) can
happen concurrently.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>