Commit Graph

3911 Commits

Author SHA1 Message Date
Stefan Reiter
ba50f57e93 file-restore: increase lock timeout on QEMU map
This lock is held during VM startup, so that multiple calls will not
start VMs twice. But this means that the timeout needs to incorporate
the time it might take a VM to boot, so increase it quite a bit.

This could previously lead to "interrupted system call" errors when
accessing backups with many disks.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
(cherry picked from commit 66501529a2)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-23 12:30:09 +02:00
Thomas Lamprecht
7d79f3d5f7 file restore daemon: log about basic steps
to make the log more useful..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 9a06eb1618)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:08:15 +02:00
Thomas Lamprecht
fa3fdea590 file restore daemon: reword warning about manual execution
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 309e14ebb7)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:08:10 +02:00
Thomas Lamprecht
aa2cd76c58 restore daemon: use millisecond log resolution
During startup most of the stuff is happening in milliseconds (or
less), so the timestamp granularity of seconds made it hard to tell
if the previous command required 990ms or 1ms, which is quite the
difference in the restore daemon context.

Using micros seems not to bring too much additional information, a
millisecond is already an ok lower time resolution for logging, so
switch only to millis for now.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit ecd66ecaf6)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:08:00 +02:00
Thomas Lamprecht
e2d82c7d4d restore daemon: create /run/proxmox-backup on startup
fixes file restore again.

The new Memcom tracking file lives in `/run/proxmox-backup` and is
always created on REST interaction, as CachedUserInfo uses it to
efficiently track config changes, and such a cache is used in each
REST handle_request.

Further, the Memcom infra expects the base run PBS dir to exists
already, which is an OK assumption to have, but in the file-restore
daemon we have a significantly more minimal environment, and the run
dir was simply not required there, even /run isn't a tmpfs yet.

Fixes fda19dcc6f ("fix CachedUserInfo by using a shared memory version counter")
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 33d7292f29)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:08:00 +02:00
Thomas Lamprecht
e9c2a34def REST: set error message extenesion for bad-request response log
We send it already to the user via the response body, but the
log_response does not has, nor wants to have FWIW, access to the
async body stream, so pass it through the ErrorMessageExtension
mechanism like we do else where.

Note that this is not only useful for PBS API proxy/daemon but also
the REST server of the file-restore daemon running inside the restore
VM, and it really is *very* helpful to debug things there..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit f4d371d2d2)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:07:41 +02:00
Thomas Lamprecht
0fad95f032 REST: rust fmt
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 2d48533378)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:07:41 +02:00
Stoiko Ivanov
683595940b fix #3496: acme: plugin: add sleep for dns propagation
the dns plugin config allow for a specified amount of time to wait for
the TXT record to be set and propagated through DNS.

This patch adds a sleep for this amount of time.
The log message was taken from the perl implementation in proxmox-acme
for consistency.

Tested with the powerdns plugin in my test setup.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
(cherry picked from commit 3f84541412)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:06:29 +02:00
Stoiko Ivanov
40060c1fed config: acme: make validation_delay crate public
we need the setting in acme::plugin.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
(cherry picked from commit 4d8bd03668)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:06:29 +02:00
Stoiko Ivanov
2abee30fdd acme: plugin: fix error message
extract_challenge is used by both dns-01 and http-01 challenges.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
(cherry picked from commit f9bd5e1691)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:06:29 +02:00
Fabian Ebner
dac877252b api: disk list: sort by name
So callers get more stable results. Most noticeable, the disk list in
the web UI doesn't jump around upon reloading, and while sorting could
be done directly there, like this other callers get the benefit too.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
(cherry picked from commit bbff317aa7)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:04:57 +02:00
Fabian Ebner
dd749b0e47 disks: also check for file systems with lsblk
Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
(cherry picked from commit 20429238e0)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:04:57 +02:00
Fabian Ebner
f98c02cbc6 disks: refactor partition type handling
in preparation to also get the file system type from lsblk.

Co-developed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
(cherry picked from commit 364299740f)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:04:57 +02:00
Thomas Lamprecht
218d7e3ec6 rest: log response: avoid unnecessary mut on variable
a match expresses the fallback slightly nicer and needs no mut,
which is always nice to avoid.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 6b5013edb3)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:02:47 +02:00
Stefan Reiter
acefa2bb6e auth: 'crypt' is not thread safe
According to crypt(3):
"crypt places its result in a static storage area, which will be
overwritten by subsequent calls to crypt. It is not safe to call crypt
from multiple threads simultaneously."

This means that multiple login calls as a PBS-realm user can collide and
produce intermittent authentication failures. A visible case is for
file-restore, where VMs with many disks lead to just as many auth-calls
at the same time, as the GUI tries to expand each tree element on load.

Instead, use the thread-safe variant 'crypt_r', which places the result
into a pre-allocated buffer of type 'crypt_data'. The C struct is laid
out according to 'lib/crypt.h.in' and the man page mentioned above.

Use the opportunity and make both arguments to the rust 'crypt' function
take a &[u8].

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
(cherry picked from commit c4c4b5a3ef)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 10:02:01 +02:00
Dietmar Maurer
36551172f3 depend on proxmox 0.11.6 (changed make_tmp_file() return type)
(cherry picked from commit bfd357c5a1)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-20 09:45:28 +02:00
Thomas Lamprecht
f2aeb13c68 subscription: set higher-level error to message instead of bailing
While the PVE one "bails" too, it has an eval around those and moves
the error to the message property, so lets do so too to ensure a user
can force an update on a too old subscription

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit b81818b6ad)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-07-09 12:48:03 +02:00
Dominik Csapak
44b9d6f162 tape/drive: fix logging when requesting media
we try to load the correct media in a loop until we find the correct tape.
when encountering an error or wrong tape, we want to log that (and send
an email if one is set) that requests the correct tape.

while trying to avoid printing the same errors more than once in a row,
we had at least one case (starting with an empty tape in the drive)
which would not print/send any tape request.

reworking that code to use a custom 'TapeRequest' enum, which contains
the state + error message, and a helper that prints and sends an email
when the state changes

this reduces the change check/log to a single variable, instead of 4
(tried, last_media_uuid, last_error, failure_reason)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
2021-06-30 11:22:04 +02:00
Dietmar Maurer
53e80e8aa2 tape: fix LTO locate_file for HP drives
Add test code to the first locate_file command, compute locate_offset.
Subsequent locate_file commands use that offset.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
2021-06-30 11:22:04 +02:00
Dominik Csapak
f94aa5ceb1 fix #3393 (again): pxar/create: try to read xattrs/fcaps/acls by default
we have a static list of filesystems and their capabilities regarding
file attributes and fs features (e.g. sockets/fifos/etc) which also
includes xattrs,acls and fcaps

if we did not know a filesystem by its magic number (for example cephfs),
we did not even attempt to read xattrs, etc.

this patch adds those flags by default to unknown filesystems, and
removes them when we encounter EOPNOTSUPP (to remove the number
of syscalls)

with this, we should be able to catch xattrs/acls/fcaps on all
(unknown) fs types that support them

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2021-06-30 11:22:04 +02:00
Dominik Csapak
3e4b9868a0 proxmox-backup-manager: show task log on datastore create
since the output:
Result: "<UPID>"
is not really interesting, show instead the task log while
the datastore is creating, since it is now run in a worker

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-06-30 11:22:04 +02:00
Dietmar Maurer
2165f0d450 api: define and use REALM_ID_SCHEMA 2021-06-10 11:10:00 +02:00
Wolfgang Bumiller
1e7639bfc4 fixup minimum lru capacity
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2021-06-08 10:13:46 +02:00
Stefan Reiter
4121628d99 tools/lru_cache: make minimum capacity 1
Setting this to 0 is not just useless, but breaks the logic horribly
enough to cause random segfaults - better forbid this, to avoid someone
else having to debug it again ;)

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-06-08 09:42:55 +02:00
Stefan Reiter
da78b90f9c backup: remove AsyncIndexReader
superseded by CachedChunkReader, with less code and more speed

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-06-08 09:42:46 +02:00
Stefan Reiter
1ef6e8b6a7 replace AsyncIndexReader with SeekableCachedChunkReader
admin/datastore reads linearly only, so no need for cache (capacity of 1
basically means no cache except for the currently active chunk).
mount can do random access too, so cache last 8 chunks for possibly a
mild performance improvement.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-06-08 09:42:44 +02:00
Stefan Reiter
10351f7075 backup: add AsyncRead/Seek to CachedChunkReader
Implemented as a seperate struct SeekableCachedChunkReader that contains
the original as an Arc, since the read_at future captures the
CachedChunkReader, which would otherwise not work with the lifetimes
required by AsyncRead. This is also the reason we cannot use a shared
read buffer and have to allocate a new one for every read. It also means
that the struct items required for AsyncRead/Seek do not need to be
included in a regular CachedChunkReader.

This is intended as a replacement for AsyncIndexReader, so we have less
code duplication and can utilize the LRU cache there too (even though
actual request concurrency is not supported in these traits).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-06-08 09:42:40 +02:00
Stefan Reiter
70a152deb7 backup: add CachedChunkReader utilizing AsyncLruCache
Provides a fast arbitrary read implementation with full async and
concurrency support.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-06-08 09:42:37 +02:00
Stefan Reiter
5446bfbba8 tools: add AsyncLruCache as a wrapper around sync LruCache
Supports concurrent 'access' calls to the same key via a
BroadcastFuture. These are stored in a seperate HashMap, the LruCache
underneath is only modified once a valid value has been retrieved.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-06-08 09:42:34 +02:00
Stefan Reiter
400885e620 tools/BroadcastFuture: add testcase for better understanding
Explicitly test that data will stay available and can be retrieved
immediately via listen(), even if the future producing the data and
notifying the consumers was already run in the past.

Wasn't broken or anything, but helps with understanding IMO.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-06-08 09:42:29 +02:00
Dominik Csapak
f960fc3b6f fix #3433: use PVE's wearout logic in PBS
in PVE, the logic how wearout gets read from the smartctl output was
changed from a vendor -> id map to a sorted list of specific
attribute field names.

copy that list to pbs (in the same order), and use that to get the
wearout

in the future we might want to split the disk logic into its own crate
and reuse it in pve

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-06-08 08:31:37 +02:00
Dominik Csapak
d2354a16cd client/pull: log snapshots that are skipped because of time
we skip snapshots that are older than the newest snapshot of the group in
the target datastore, log it so the user can know why it is not synced

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-06-07 10:51:25 +02:00
Dominik Csapak
2de4dc3a81 backup/chunk_store: optionally log progress on creation
and enable it for the worker variants

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-06-04 09:32:09 +02:00
Dietmar Maurer
b90036dadd cleanup: factor out config::datastore::lock_config() 2021-06-04 09:04:14 +02:00
Dominik Csapak
4708f4fc21 api2/config/datastore: change create datastore api call to a worker
so that longer running creates (e.g. a slow storage), does not
run in a timeout and we can follow its creation

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
2021-06-04 09:02:05 +02:00
Dominik Csapak
062cf75cdf proxmox-backup-proxy: fix leftover references on datastore removal
when we remove a datastore via api/cli, the proxy
has sometimes leftover references to that datastore in its
DATASTORE_MAP which includes an open filehandle on the
'.lock' file

this prevents unmounting/exporting the datastore even after removal,
only a reload/restart of the proxy did help

add a command to our command socket, which removes all non
configured datastores from the map, dropping the open filehandle

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
2021-06-04 08:22:53 +02:00
Dominik Csapak
e5950360ca tape/drive: improve tape device locking behaviour
by implementing a custom error type that is either 'TimeOut' or
'Other'.

In the api, check in the worker loop for exactly 'TimeOut' errors and continue only
then. All other errors lead to a aborted task.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-06-02 17:08:00 +02:00
Dominik Csapak
5b358ff0b1 server/prune_job: fix locking during prune jobs
removing the backup dir must acquire the snapshot lock, else it can
happen that we remove a snapshot while it is being restored
or backed up to tape

the original commit that adds the force flag
(c9756b40d1)
mentions that the prune checks itself if the snapshot is in use,
but i could not find such code, so simply set force to false

to avoid failing and aborting the prune job, warn if it could not
and continue

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-06-02 17:04:49 +02:00
Fabian Grünbichler
3420029b5e Revert "file-restore-daemon: work around tokio DuplexStream bug"
This reverts commit 75f9f40922, which is
no longer needed now that we use tokio >= 1.6 which contains the proper
fix.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2021-06-01 10:31:19 +02:00
Fabian Grünbichler
3e3b505cc8 reorder serde usage/derive
this is deprecated with rustc 1.52+, and will become a hard error at
some point:

https://github.com/rust-lang/rust/issues/79202

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2021-05-31 14:53:08 +02:00
Dietmar Maurer
0bca966ec5 fix typo: s/dies/does/ 2021-05-31 11:01:15 +02:00
Dominik Csapak
84737fb33f lto/sg_tape/encryption: remove non lto-4 supported byte
from the SspDataEncryptionCapabilityPage

it seems we do not need it, since the EXTDECC flag is only used for
determining if the drive is capable to be configured via
ADI (Automation/Drive Interface) which we do not use at all.

this makes the call work with LTO-4 again

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-05-31 10:58:38 +02:00
Dominik Csapak
03380db560 api2/tape: add api call to list media sets
we want a 'media-set' selector in the gui, this makes it
very easy to do and is not as costly as reusing the media list,
since we do not need to iterate over all media (e.g. unassigned)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-05-26 18:10:57 +02:00
Dominik Csapak
c24cb13382 api: node/journal: fix parameter extraction of /nodes/node/journal
by extracting them via the api macro into the function signature

this fixes an issue, where giving 'since' and 'until' where not
used since we tried to extract them as 'str' while they were numbers.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-05-25 13:26:51 +02:00
Stefan Reiter
3a804a8a20 file-restore-daemon: limit concurrent download calls
While the issue with vsock packets starving kernel memory is mostly
worked around by the '64k -> 4k buffer' patch in
'proxmox-backup-restore-image', let's be safe and also limit the number
of concurrent transfers. 8 downloads per VM seems like a fair value.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-05-25 11:56:43 +02:00
Stefan Reiter
1fde4167ea file-restore-daemon: watchdog: add inhibit for long downloads
The extract API call may be active for more than the watchdog timeout,
so a simple ping is not enough.

This adds an "inhibit" API, which will stop the watchdog from completing
as long as at least one WatchdogInhibitor instance is alive. Keep one in
the download task, so it will be dropped once it completes (or errors).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-05-25 11:56:43 +02:00
Stefan Reiter
75f9f40922 file-restore-daemon: work around tokio DuplexStream bug
See this PR for more info: https://github.com/tokio-rs/tokio/pull/3756

As a workaround use a pair of connected unix sockets - this obviously
incurs some overhead, albeit not measureable on my machine. Once tokio
includes the fix we can go back to a DuplexStream for performance and
simplicity.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-05-25 11:56:43 +02:00
Thomas Lamprecht
e9c2638f90 apt: fix removal of non-existant http-proxy config
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-05-25 11:54:46 +02:00
Oguz Bektas
338c545f85 tasks: fix typos in API description
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
2021-05-25 07:54:57 +02:00
Stefan Reiter
e379b4a31c file-restore-daemon: disk: add RawFs bucket type
Used to specify a filesystem placed directly on a disk, without a
partition table inbetween. Detected by simply attempting to mount the
disk itself.

A helper "make_dev_node" is extracted to avoid code duplication.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-05-25 07:53:22 +02:00