Pacific

v16.2.7 Pacific

This is the seventh backport release in the Pacific series.

Notable Changes

  • Critical bug in OMAP format upgrade is fixed. This could cause data corruption (improperly formatted OMAP keys) after pre-Pacific cluster upgrade if bluestore-quick-fix-on-mount parameter is set to true or ceph-bluestore-tool's quick-fix/repair commands are invoked. Relevant tracker: https://tracker.ceph.com/issues/53062 bluestore-quick-fix-on-mount continues to be set to false, by default.

  • CephFS: If you are not using cephadm, you must disable FSMap sanity checks before starting the upgrade:

    ceph config set mon mon_mds_skip_sanity true
    

    After the upgrade has finished and the cluster is stable, please remove that setting:

    ceph config rm mon mon_mds_skip_sanity
    

    Clusters managed by and upgraded using cephadm take care of this step automatically.

  • MGR: The pg_autoscaler will use the 'scale-up' profile as the default profile. 16.2.6 changed the default profile to 'scale-down' but we ran into issues with the device_health_metrics pool consuming too many PGs, which is not ideal for performance. So we will continue to use the 'scale-up' profile by default, until we implement a limit on the number of PGs default pools should consume, in combination with the 'scale-down' profile.

  • Cephadm & Ceph Dashboard: NFS management has been completely reworked to ensure that NFS exports are managed consistently across the different Ceph components. Prior to this, there were 3 incompatible implementations for configuring the NFS exports: Ceph-Ansible/OpenStack Manila, Ceph Dashboard and 'mgr/nfs' module. With this release the 'mgr/nfs' way becomes the official interface, and the remaining components (Cephadm and Ceph Dashboard) adhere to it. While this might require manually migrating from the deprecated implementations, it will simplify the user experience for those heavily relying on NFS exports.

  • Dashboard: "Cluster Expansion Wizard". After the 'cephadm bootstrap' step, users that log into the Ceph Dashboard will be presented with a welcome screen. If they choose to follow the installation wizard, they will be guided through a set of steps to help them configure their Ceph cluster: expanding the cluster by adding more hosts, detecting and defining their storage devices, and finally deploying and configuring the different Ceph services.

  • OSD: When using mclock_scheduler for QoS, there is no longer a need to run any manual benchmark. The OSD now automatically sets an appropriate value for osd_mclock_max_capacity_iops by running a simple benchmark during initialization.

  • MGR: The global recovery event in the progress module has been optimized and a sleep_interval of 5 seconds has been added between stats collection, to reduce the impact of the progress module on the MGR, especially in large clusters.

Changelog

  • rpm, debian: move smartmontools and nvme-cli to ceph-base (pr#44164, Yaarit Hatuka)

  • qa: miscellaneous perf suite fixes (pr#44154, Neha Ojha)

  • qa/suites/orch/cephadm: mgr-nfs-upgrade: add missing 0-distro dir (pr#44201, Sebastian Wagner)

  • *: s/virtualenv/python -m venv/ (pr#43002, Kefu Chai, Ken Dreyer)

  • admin/doc-requirements.txt: pin Sphinx at 3.5.4 (pr#43748, Kefu Chai)

  • backport mgr/nfs bits (pr#43811, Sage Weil, Michael Fritch)

  • ceph-volume: get_first_lv() refactor (pr#43960, Guillaume Abrioux)

  • ceph-volume: fix a typo causing AttributeError (pr#43949, Taha Jahangir)

  • ceph-volume: fix bug with miscalculation of required db/wal slot size for VGs with multiple PVs (pr#43948, Guillaume Abrioux, Cory Snyder)

  • ceph-volume: fix lvm activate --all --no-systemd (pr#43267, Dimitri Savineau)

  • ceph-volume: util/prepare fix osd_id_available() (pr#43708, Guillaume Abrioux)

  • ceph.spec: selinux scripts respect CEPH_AUTO_RESTART_ON_UPGRADE (pr#43235, Dan van der Ster)

  • cephadm: November batch (pr#43906, Sebastian Wagner, Sage Weil, Daniel Pivonka, Andrew Sharapov, Paul Cuzner, Adam King, Melissa Li)

  • cephadm: October batch (pr#43728, Patrick Donnelly, Sage Weil, Cory Snyder, Sebastian Wagner, Paul Cuzner, Joao Eduardo Luis, Zac Dover, Dmitry Kvashnin, Daniel Pivonka, Adam King, jianglong01, Guillaume Abrioux, Melissa Li, Roaa Sakr, Kefu Chai, Brad Hubbard, Michael Fritch, Javier Cacheiro)

  • cephfs-mirror, test: add thrasher for cephfs mirror daemon, HA test yamls (issue#50372, pr#43924, Venky Shankar)

  • cephfs-mirror: shutdown ClusterWatcher on termination (pr#43198, Willem Jan Withagen, Venky Shankar)

  • cmake: link Threads::Threads instead of CMAKE_THREAD_LIBS_INIT (pr#43167, Ken Dreyer)

  • cmake: s/Python_EXECUTABLE/Python3_EXECUTABLE/ (pr#43264, Michael Fritch)

  • crush: cancel upmaps with up set size != pool size (pr#43415, huangjun)

  • doc/radosgw/nfs: add note about NFSv3 deprecation (pr#43941, Michael Fritch)

  • doc: document subvolume (group) pins (pr#43925, Patrick Donnelly)

  • github: add dashboard PRs to Dashboard project (pr#43610, Ernesto Puerta)

  • librbd/cache/pwl: persistant cache backports (pr#43772, Kefu Chai, Yingxin Cheng, Yin Congmin, Feng Hualong, Jianpeng Ma, Ilya Dryomov, Hualong Feng)

  • librbd/cache/pwl: SSD caching backports (pr#43918, Yin Congmin, Jianpeng Ma)

  • librbd/object_map: rbd diff between two snapshots lists entire image content (pr#43805, Sunny Kumar)

  • librbd: fix pool validation lockup (pr#43113, Ilya Dryomov)

  • mds/FSMap: do not assert allow_standby_replay on old FSMaps (pr#43614, Patrick Donnelly)

  • mds: Add new flag to MClientSession (pr#43251, Kotresh HR)

  • mds: do not trim stray dentries during opening the root (pr#43815, Xiubo Li)

  • mds: skip journaling blocklisted clients when in replay state (pr#43841, Venky Shankar)

  • mds: switch mds_lock to fair mutex to fix the slow performance issue (pr#43148, Xiubo Li, Kefu Chai)

  • MDSMonitor: assertion during upgrade to v16.2.5+ (pr#43890, Patrick Donnelly)

  • MDSMonitor: handle damaged state from standby-replay (pr#43200, Patrick Donnelly)

  • MDSMonitor: no active MDS after cluster deployment (pr#43891, Patrick Donnelly)

  • mgr/dashboard,prometheus: fix handling of server_addr (issue#52002, pr#43631, Scott Shambarger)

  • mgr/dashboard: all pyfakefs must be pinned on same version (pr#43930, Rishabh Dave)

  • mgr/dashboard: BATCH incl.: NFS integration, Cluster Expansion Workflow, and Angular 11 upgrade (pr#43682, Alfonso Martínez, Avan Thakkar, Aashish Sharma, Nizamudeen A, Pere Diaz Bou, Varsha Rao, Ramana Raja, Sage Weil, Kefu Chai)

  • mgr/dashboard: cephfs MDS Workload to use rate for counter type metric (pr#43190, Jan Horacek)

  • mgr/dashboard: clean-up controllers and API backward versioning compatibility (pr#43543, Ernesto Puerta, Avan Thakkar)

  • mgr/dashboard: Daemon Events listing using bootstrap class (pr#44057, Nizamudeen A)

  • mgr/dashboard: deprecated variable usage in Grafana dashboards (pr#43188, Patrick Seidensal)

  • mgr/dashboard: Device health status is not getting listed under hosts section (pr#44053, Aashish Sharma)

  • mgr/dashboard: Edit a service feature (pr#43939, Nizamudeen A)

  • mgr/dashboard: Fix failing config dashboard e2e check (pr#43238, Nizamudeen A)

  • mgr/dashboard: fix flaky inventory e2e test (pr#44056, Nizamudeen A)

  • mgr/dashboard: fix missing alert rule details (pr#43812, Ernesto Puerta)

  • mgr/dashboard: Fix orchestrator/01-hosts.e2e-spec.ts failure (pr#43541, Nizamudeen A)

  • mgr/dashboard: include mfa_ids in rgw user-details section (pr#43893, Avan Thakkar)

  • mgr/dashboard: Incorrect MTU mismatch warning (pr#43185, Aashish Sharma)

  • mgr/dashboard: monitoring: grafonnet refactoring for radosgw dashboards (pr#43644, Aashish Sharma)

  • mgr/dashboard: Move force maintenance test to the workflow test suite (pr#43347, Nizamudeen A)

  • mgr/dashboard: pin a version for autopep8 and pyfakefs (pr#43646, Nizamudeen A)

  • mgr/dashboard: Predefine labels in create host form (pr#44077, Nizamudeen A)

  • mgr/dashboard: provisioned values is misleading in RBD image table (pr#44051, Avan Thakkar)

  • mgr/dashboard: replace "Ceph-cluster" Client connections with active-standby MGRs (pr#43523, Avan Thakkar)

  • mgr/dashboard: rgw daemon list: add realm column (pr#44047, Alfonso Martínez)

  • mgr/dashboard: Spelling mistake in host-form Network address field (pr#43973, Avan Thakkar)

  • mgr/dashboard: Visual regression tests for ceph dashboard (pr#42678, Aaryan Porwal)

  • mgr/dashboard: visual tests: Add more ignore regions for dashboard component (pr#43240, Aaryan Porwal)

  • mgr/influx: use "N/A" for unknown hostname (pr#43368, Kefu Chai)

  • mgr/mirroring: remove unnecessary fs_name arg from daemon status command (issue#51989, pr#43199, Venky Shankar)

  • mgr/nfs: nfs-rgw batch backport (pr#43075, Sebastian Wagner, Sage Weil, Varsha Rao, Ramana Raja)

  • mgr/progress: optimize global recovery && introduce 5 seconds interval (pr#43353, Kamoltat, Neha Ojha)

  • mgr/prometheus: offer ability to disable cache (pr#43931, Patrick Seidensal)

  • mgr/volumes: Fix permission during subvol creation with mode (pr#43223, Kotresh HR)

  • mgr: Add check to prevent mgr from crashing (pr#43445, Aswin Toni)

  • mon,auth: fix proposal (and mon db rebuild) of rotating secrets (pr#43697, Sage Weil)

  • mon/MDSMonitor: avoid crash when decoding old FSMap epochs (pr#43615, Patrick Donnelly)

  • mon: Allow specifying new tiebreaker monitors (pr#43457, Greg Farnum)

  • mon: MonMap: display disallowed_leaders whenever they're set (pr#43972, Greg Farnum)

  • mon: MonMap: do not increase mon_info_t's compatv in stretch mode, really (pr#43971, Greg Farnum)

  • monitoring: ethernet bonding filter in Network Load (pr#43694, Pere Diaz Bou)

  • msg/async/ProtocolV2: Set the recv_stamp at the beginning of receiving a message (pr#43511, dongdong tao)

  • msgr/async: fix unsafe access in unregister_conn() (pr#43548, Sage Weil, Radoslaw Zarzynski)

  • os/bluestore: _do_write_small fix head_pad (pr#43756, dheart)

  • os/bluestore: do not select absent device in volume selector (pr#43970, Igor Fedotov)

  • os/bluestore: fix invalid omap name conversion when upgrading to per-pg (pr#43793, Igor Fedotov)

  • os/bluestore: list obj which equals to pend (pr#43512, Mykola Golub, Kefu Chai)

  • os/bluestore: multiple repair fixes (pr#43731, Igor Fedotov)

  • osd/OSD: mkfs need wait for transcation completely finish (pr#43417, Chen Fan)

  • osd: fix partial recovery become whole object recovery after restart osd (pr#43513, Jianwei Zhang)

  • osd: fix to allow inc manifest leaked (pr#43306, Myoungwon Oh)

  • osd: fix to recover adjacent clone when set_chunk is called (pr#43099, Myoungwon Oh)

  • osd: handle inconsistent hash info during backfill and deep scrub gracefully (pr#43544, Ronen Friedman, Mykola Golub)

  • osd: re-cache peer_bytes on every peering state activate (pr#43437, Mykola Golub)

  • osd: Run osd bench test to override default max osd capacity for mclock (pr#41731, Sridhar Seshasayee)

  • Pacific: BlueStore: Omap upgrade to per-pg fix fix (pr#43922, Adam Kupczyk)

  • Pacific: client: do not defer releasing caps when revoking (pr#43782, Xiubo Li)

  • Pacific: mds: add read/write io size metrics support (pr#43784, Xiubo Li)

  • Pacific: test/libcephfs: put inodes after lookup (pr#43562, Patrick Donnelly)

  • pybind/mgr/cephadm: set allow_standby_replay during CephFS upgrade (pr#43559, Patrick Donnelly)

  • pybind/mgr/CMakeLists.txt: exclude files not used at runtime (pr#43787, Duncan Bellamy)

  • pybind/mgr/pg_autoscale: revert to default profile scale-up (pr#44032, Kamoltat)

  • qa/mgr/dashboard/test_pool: don't check HEALTH_OK (pr#43440, Ernesto Puerta)

  • qa/mgr/dashboard: add extra wait to test (pr#43351, Ernesto Puerta)

  • qa/rgw: pacific branch targets ceph-pacific branch of java_s3tests (pr#43809, Casey Bodley)

  • qa/tasks/kubeadm: force docker cgroup engine to systemd (pr#43937, Sage Weil)

  • qa/tasks/mgr: skip test_diskprediction_local on python>=3.8 (pr#43421, Kefu Chai)

  • qa/tests: advanced version to reflect the latest 16.2.6 release (pr#43242, Yuri Weinstein)

  • qa: disable metrics on kernel client during upgrade (pr#44034, Patrick Donnelly)

  • qa: lengthen grace for fs map showing dead MDS (pr#43702, Patrick Donnelly)

  • qa: reduce frag split confs for dir_split counter test (pr#43828, Patrick Donnelly)

  • rbd-mirror: fix mirror image removal (pr#43662, Arthur Outhenin-Chalandre)

  • rbd-mirror: unbreak one-way snapshot-based mirroring (pr#43315, Ilya Dryomov)

  • rgw/notification: make notifications agnostic of bucket reshard (pr#42946, Yuval Lifshitz)

  • rgw/notifications: cache object size to avoid accessing invalid memory (pr#42949, Yuval Lifshitz)

  • rgw/notifications: send correct size in case of delete marker creation (pr#42643, Yuval Lifshitz)

  • rgw/notifications: support v4 auth for topics and notifications (pr#42947, Yuval Lifshitz)

  • rgw/rgw_rados: make RGW request IDs non-deterministic (pr#43695, Cory Snyder)

  • rgw/sts: fix for copy object operation using sts (pr#43703, Pritha Srivastava)

  • rgw/tracing: unify SO version numbers within librgw2 package (pr#43619, Nathan Cutler)

  • rgw: add abstraction for ops log destination and add file logger (pr#43740, Casey Bodley, Cory Snyder)

  • rgw: Ensure buckets too old to decode a layout have layout logs (pr#43823, Adam C. Emerson)

  • rgw: fix bucket purge incomplete multipart uploads (pr#43862, J. Eric Ivancich)

  • rgw: fix spelling of eTag in S3 message structure (pr#42945, Tom Schoonjans)

  • rgw: fix sts memory leak (pr#43348, yuliyang_yewu)

  • rgw: remove prefix & delim params for bucket removal & mp upload abort (pr#43975, J. Eric Ivancich)

  • rgw: use existing s->bucket in s3 website retarget() (pr#43777, Casey Bodley)

  • snap-schedule: count retained snapshots per retention policy (pr#43434, Jan Fajerski)

  • test: shutdown the mounter after test finishes (pr#43475, Xiubo Li)

v16.2.6 Pacific

危险

DATE: 01 NOV 2021.

DO NOT UPGRADE TO CEPH PACIFIC FROM AN OLDER VERSION.

A recently-discovered bug (https://tracker.ceph.com/issues/53062) can cause data corruption. This bug occurs during OMAP format conversion for clusters that are updated to Pacific. New clusters are not affected by this bug.

The trigger for this bug is BlueStore's repair/quick-fix functionality. This bug can be triggered in two known ways:

  1. manually via the ceph-bluestore-tool, or

  2. automatically, by OSD if bluestore_fsck_quick_fix_on_mount is set to true.

The fix for this bug is expected to be available in Ceph v16.2.7.

DO NOT set bluestore_quick_fix_on_mount to true. If it is currently set to true in your configuration, immediately set it to false.

DO NOT run ceph-bluestore-tool's repair/quick-fix commands.

This is the sixth backport release in the Pacific series.

Notable Changes

  • MGR: The pg_autoscaler has a new default 'scale-down' profile which provides more performance from the start for new pools (for newly created clusters). Existing clusters will retain the old behavior, now called the 'scale-up' profile. For more details, see: https://docs.ceph.com/en/latest/rados/operations/placement-groups/

  • CephFS: the upgrade procedure for CephFS is now simpler. It is no longer necessary to stop all MDS before upgrading the sole active MDS. After disabling standby-replay, reducing max_mds to 1, and waiting for the file systems to become stable (each fs with 1 active and 0 stopping daemons), a rolling upgrade of all MDS daemons can be performed.

  • Dashboard: now allows users to set up and display a custom message (MOTD, warning, etc.) in a sticky banner at the top of the page. For more details, see: https://docs.ceph.com/en/pacific/mgr/dashboard/#message-of-the-day-motd

  • Several fixes in BlueStore, including a fix for the deferred write regression, which led to excessive RocksDB flushes and compactions. Previously, when bluestore_prefer_deferred_size_hdd was equal to or more than bluestore_max_blob_size_hdd (both set to 64K), all the data was deferred, which led to increased consumption of the column family used to store deferred writes in RocksDB. Now, the bluestore_prefer_deferred_size parameter independently controls deferred writes, and only writes smaller than this size use the deferred write path.

  • The default value of osd_client_message_cap has been set to 256, to provide better flow control by limiting maximum number of in-flight client requests.

  • PGs no longer show a active+clean+scrubbing+deep+repair state when osd_scrub_auto_repair is set to true, for regular deep-scrubs with no repair required.

  • ceph-mgr-modules-core debian package does not recommend ceph-mgr-rook anymore. As the latter depends on python3-numpy which cannot be imported in different Python sub-interpreters multi-times if the version of python3-numpy is older than 1.19. Since apt-get installs the Recommends packages by default, ceph-mgr-rook was always installed along with ceph-mgr debian package as an indirect dependency. If your workflow depends on this behavior, you might want to install ceph-mgr-rook separately.

  • This is the first release built for Debian Bullseye.

Changelog

  • bind on loopback address if no other addresses are available (pr#42477, Kefu Chai)

  • ceph-monstore-tool: use a large enough paxos/{first,last}_committed (issue#38219, pr#42411, Kefu Chai)

  • ceph-volume/tests: retry when destroying osd (pr#42546, Guillaume Abrioux)

  • ceph-volume/tests: update ansible environment variables in tox (pr#42490, Dimitri Savineau)

  • ceph-volume: Consider /dev/root as mounted (pr#42755, David Caro)

  • ceph-volume: fix lvm activate arguments (pr#43116, Dimitri Savineau)

  • ceph-volume: fix lvm migrate without args (pr#43110, Dimitri Savineau)

  • ceph-volume: fix raw list with logical partition (pr#43087, Guillaume Abrioux, Dimitri Savineau)

  • ceph-volume: implement bluefs volume migration (pr#42219, Kefu Chai, Igor Fedotov)

  • ceph-volume: lvm batch: fast_allocations(): avoid ZeroDivisionError (pr#42493, Jonas Zeiger)

  • ceph-volume: pvs --noheadings replace pvs --no-heading (pr#43076, FengJiankui)

  • ceph-volume: remove --all ref from deactivate help (pr#43098, Dimitri Savineau)

  • ceph-volume: support no_systemd with lvm migrate (pr#43091, Dimitri Savineau)

  • ceph-volume: work around phantom atari partitions (pr#42753, Blaine Gardner)

  • ceph.spec.in: drop gdbm from build deps (pr#43000, Kefu Chai)

  • cephadm: August batch 1 (pr#42736, Sage Weil, Dimitri Savineau, Guillaume Abrioux, Sebastian Wagner, Varsha Rao, Zac Dover, Adam King, Cory Snyder, Michael Fritch, Asbjørn Sannes, "Wang,Fei", Javier Cacheiro, 胡玮文, Daniel Pivonka)

  • cephadm: September batch 1 (issue#52038, pr#43029, Sebastian Wagner, Dimitri Savineau, Paul Cuzner, Oleander Reis, Adam King, Yuxiang Zhu, Zac Dover, Alfonso Martínez, Sage Weil, Daniel Pivonka)

  • cephadm: use quay, not docker (pr#42534, Sage Weil)

  • cephfs-mirror: record directory path cancel in DirRegistry (issue#51666, pr#42458, Venky Shankar)

  • client: flush the mdlog in unsafe requests' relevant and auth MDSes only (pr#42925, Xiubo Li)

  • client: make sure only to update dir dist from auth mds (pr#42937, Xue Yantao)

  • cls/cmpomap: empty values are 0 in U64 comparisons (pr#42908, Casey Bodley)

  • cmake, ceph.spec.in: build with header only fmt on RHEL (pr#42472, Kefu Chai)

  • cmake: build static libs if they are internal ones (pr#39902, Kefu Chai)

  • cmake: exclude "grafonnet-lib" target from "all" (pr#42898, Kefu Chai)

  • cmake: link bundled fmt statically (pr#42692, Kefu Chai)

  • cmake: Replace boost download url (pr#42693, Rafał Wądołowski)

  • common/buffer: fix SIGABRT in rebuild_aligned_size_and_memory (pr#42976, Yin Congmin)

  • common/Formatter: include used header (pr#42233, Kefu Chai)

  • common/options: Set osd_client_message_cap to 256 (pr#42615, Mark Nelson)

  • compression/snappy: use uint32_t to be compatible with 1.1.9 (pr#42542, Kefu Chai, Nathan Cutler)

  • debian/control: ceph-mgr-modules-core does not Recommend ceph-mgr-roo… (pr#42300, Kefu Chai)

  • debian/control: dh-systemd is part of debhelper now (pr#43151, David Galloway)

  • debian/control: remove cython from Build-Depends (pr#43131, Kefu Chai)

  • doc/ceph-volume: add lvm migrate/new-db/new-wal (pr#43089, Dimitri Savineau)

  • doc/rados/operations: s/max_misplaced/target_max_misplaced_ratio/ (pr#42250, Paul Reece, Kefu Chai)

  • doc/releases/pacific.rst: remove notes about autoscaler (pr#42265, Neha Ojha)

  • Don't persist report data (pr#42888, Brad Hubbard)

  • krbd: escape udev_enumerate_add_match_sysattr values (pr#42969, Ilya Dryomov)

  • kv/RocksDBStore: Add handling of block_cache option for resharding (pr#42844, Adam Kupczyk)

  • kv/RocksDBStore: enrich debug message (pr#42544, Toshikuni Fukaya, Satoru Takeuchi)

  • librgw/notifications: initialize kafka and amqp (pr#42648, Yuval Lifshitz)

  • mds: add debugging when rejecting mksnap with EPERM (pr#42935, Patrick Donnelly)

  • mds: create file system with specific ID (pr#42900, Ramana Raja)

  • mds: MDCache.cc:5319 FAILED ceph_assert(rejoin_ack_gather.count(mds->get_nodeid())) (pr#42938, chencan)

  • mds: META_POP_READDIR, META_POP_FETCH, META_POP_STORE, and cache_hit_rate are not updated (pr#42939, Yongseok Oh)

  • mds: to print the unknow type value (pr#42088, Xiubo Li, Jos Collin)

  • MDSMonitor: monitor crash after upgrade from ceph 15.2.13 to 16.2.4 (pr#42536, Patrick Donnelly)

  • mgr/DaemonServer: skip redundant update of pgp_num_actual (pr#42223, Dan van der Ster)

  • mgr/dashboard/api: set a UTF-8 locale when running pip (pr#42829, Kefu Chai)

  • mgr/dashboard: Add configurable MOTD or wall notification (pr#42414, Volker Theile)

  • mgr/dashboard: cephadm e2e start script: add --expanded option (pr#42789, Alfonso Martínez)

  • mgr/dashboard: cephadm-e2e job script: improvements (pr#42585, Alfonso Martínez)

  • mgr/dashboard: disable create snapshot with subvolumes (pr#42819, Pere Diaz Bou)

  • mgr/dashboard: don't notify for suppressed alerts (pr#42974, Tatjana Dehler)

  • mgr/dashboard: fix Accept-Language header parsing (pr#42297, 胡玮文)

  • mgr/dashboard: fix rename inventory to disks (pr#42810, Navin Barnwal)

  • mgr/dashboard: fix ssl cert validation for rgw service creation (pr#42628, Avan Thakkar)

  • mgr/dashboard: Fix test_error force maintenance dashboard check (pr#42354, Nizamudeen A)

  • mgr/dashboard: monitoring: replace Grafana JSON with Grafonnet based code (pr#42812, Aashish Sharma)

  • mgr/dashboard: Refresh button on the iscsi targets page (pr#42817, Nizamudeen A)

  • mgr/dashboard: remove usage of 'rgw_frontend_ssl_key' (pr#42316, Avan Thakkar)

  • mgr/dashboard: show perf. counters for rgw svc. on Cluster > Hosts (pr#42629, Alfonso Martínez)

  • mgr/dashboard: stats=false not working when listing buckets (pr#42889, Avan Thakkar)

  • mgr/dashboard: tox.ini: delete useless env. 'apidocs' (pr#42788, Alfonso Martínez)

  • mgr/dashboard: update translations for pacific (pr#42606, Tatjana Dehler)

  • mgr/mgr_util: switch using unshared cephfs connections whenever possible (issue#51256, pr#42083, Venky Shankar)

  • mgr/pg_autoscaler: Introduce autoscaler scale-down feature (pr#42428, Kamoltat, Kefu Chai)

  • mgr/rook: Add timezone info (pr#39834, Varsha Rao, Sebastian Wagner)

  • mgr/telemetry: pass leaderboard flag even w/o ident (pr#42228, Sage Weil)

  • mgr/volumes: Add config to insert delay at the beginning of the clone (pr#42086, Kotresh HR)

  • mgr/volumes: use dedicated libcephfs handles for subvolume calls and … (issue#51271, pr#42914, Venky Shankar)

  • mgr: set debug_mgr=2/5 (so INFO goes to mgr log by default) (pr#42225, Sage Weil)

  • mon/MDSMonitor: do not pointlessly kill standbys that are incompatible with current CompatSet (pr#42578, Patrick Donnelly, Zhi Zhang)

  • mon/OSDMonitor: resize oversized Lec::epoch_by_pg, after PG merging, preventing osdmap trimming (pr#42224, Dan van der Ster)

  • mon/PGMap: remove DIRTY field in ceph df detail when cache tiering is not in use (pr#42860, Deepika Upadhyay)

  • mon: return -EINVAL when handling unknown option in 'ceph osd pool get' (pr#42229, Zhao Cuicui)

  • mon: Sanely set the default CRUSH rule when creating pools in stretch… (pr#42909, Greg Farnum)

  • monitoring/grafana/build/Makefile: revamp for arm64 builds, pushes to docker and quay, jenkins (pr#42211, Dan Mick)

  • monitoring/grafana/cluster: use per-unit max and limit values (pr#42679, David Caro)

  • monitoring: Clean up Grafana dashboards (pr#42299, Patrick Seidensal)

  • monitoring: fix Physical Device Latency unit (pr#42298, Seena Fallah)

  • msg: active_connections regression (pr#42936, Sage Weil)

  • nfs backport June (pr#42096, Varsha Rao)

  • os/bluestore: accept undecodable multi-block bluefs transactions on log (pr#43023, Igor Fedotov)

  • os/bluestore: cap omap naming scheme upgrade transaction (pr#42956, Igor Fedotov)

  • os/bluestore: compact db after bulk omap naming upgrade (pr#42426, Igor Fedotov)

  • os/bluestore: fix bluefs migrate command (pr#43100, Igor Fedotov)

  • os/bluestore: fix erroneous SharedBlob record removal during repair (pr#42423, Igor Fedotov)

  • os/bluestore: fix using incomplete bluefs log when dumping it (pr#43007, Igor Fedotov)

  • os/bluestore: make deferred writes less aggressive for large writes (pr#42773, Igor Fedotov, Adam Kupczyk)

  • os/bluestore: Remove possibility of replay log and file inconsistency (pr#42424, Adam Kupczyk)

  • os/bluestore: respect bluestore_warn_on_spurious_read_errors setting (pr#42897, Igor Fedotov)

  • osd/scrub: separate between PG state flags and internal scrubber operation (pr#42398, Ronen Friedman)

  • osd: log snaptrim message to dout (pr#42482, Arthur Outhenin-Chalandre)

  • osd: move down peers out from peer_purged (pr#42238, Mykola Golub)

  • pybind/mgr/stats: validate cmdtag (pr#42702, Jos Collin)

  • pybind/mgr: Fix IPv6 url generation (pr#42990, Sebastian Wagner)

  • pybind/rbd: fix mirror_image_get_status (pr#42972, Ilya Dryomov, Will Smith)

  • qa/*/test_envlibrados_for_rocksdb.sh: install libarchive-3.3.3 (pr#42344, Neha Ojha)

  • qa/cephadm: centos_8.x_container_tools_3.0.yaml (pr#42868, Sebastian Wagner)

  • qa/rgw: move ignore-pg-availability.yaml out of suites/rgw (pr#40694, Casey Bodley)

  • qa/standalone: Add missing cleanups after completion of a subset of osd and scrub tests (pr#42258, Sridhar Seshasayee)

  • qa/tests: advanced pacific version to reflect the latest 16.2.5 point (pr#42264, Yuri Weinstein)

  • qa/workunits/mon/test_mon_config_key: use subprocess.run() instead of proc.communicate() (pr#42221, Kefu Chai)

  • qa: FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/ceph/3fab6bea-f243-47a4-a956-8c03a62b61b5.client4721/mds_sessions' (pr#42165, Patrick Donnelly)

  • qa: increase the pg_num for cephfs_data/metadata pools (pr#42923, Xiubo Li)

  • qa: test_ls_H_prints_human_readable_file_size failure (pr#42166, Patrick Donnelly)

  • radosgw-admin: skip GC init on read-only admin ops (pr#42655, Mark Kogan)

  • radosgw: include realm_{id,name} in service map (pr#42213, Sage Weil)

  • rbd-mirror: add perf counters to snapshot replayer (pr#42987, Arthur Outhenin-Chalandre)

  • rbd-mirror: fix potential async op tracker leak in start_image_replayers (pr#42979, Mykola Golub)

  • rbd: fix default pool handling for nbd map/unmap (pr#42980, Sunny Kumar)

  • Remove dependency on lsb_release (pr#43001, Ken Dreyer)

  • RGW - Bucket Remove Op: Pass in user (pr#42135, Daniel Gryniewicz)

  • RGW - Don't move attrs before setting them (pr#42320, Daniel Gryniewicz)

  • rgw : add check empty for sync url (pr#42653, caolei)

  • rgw : add check for tenant provided in RGWCreateRole (pr#42637, caolei)

  • rgw : modfiy error XML for deleterole (pr#42639, caolei)

  • rgw multisite: metadata sync treats all errors as 'transient' for retry (pr#42656, Casey Bodley)

  • RGW Zipper - Make sure bucket list progresses (pr#42625, Daniel Gryniewicz)

  • rgw/amqp/test: fix mock prototype for librabbitmq-0.11.0 (pr#42649, Yuval Lifshitz)

  • rgw/http/notifications: support content type in HTTP POST messages (pr#42644, Yuval Lifshitz)

  • rgw/multisite: return correct error code when op fails (pr#42646, Yuval Lifshitz)

  • rgw/notification: add exception handling for persistent notification thread (pr#42647, Yuval Lifshitz)

  • rgw/notification: fix persistent notification hang when ack-levl=none (pr#40696, Yuval Lifshitz)

  • rgw/notification: fixing the "persistent=false" flag (pr#40695, Yuval Lifshitz)

  • rgw/notifications: delete bucket notification object when empty (pr#42631, Yuval Lifshitz)

  • rgw/notifications: support metadata filter in CompleteMultipartUpload and Copy events (pr#42321, Yuval Lifshitz)

  • rgw/notifications: support metadata filter in CompleteMultipartUploa… (pr#42566, Yuval Lifshitz)

  • rgw/rgw_file: Fix the return value of read() and readlink() (pr#42654, Dai zhiwei, luo rixin)

  • rgw/sts: correcting the evaluation of session policies (pr#42632, Pritha Srivastava)

  • rgw/sts: read_obj_policy() consults iam_user_policies on ENOENT (pr#42650, Casey Bodley)

  • rgw: allow rgw-orphan-list to process multiple data pools (pr#42635, J. Eric Ivancich)

  • rgw: allow to set ssl options and ciphers for beast frontend (pr#42363, Mykola Golub)

  • rgw: avoid infinite loop when deleting a bucket (issue#49206, pr#42230, Jeegn Chen)

  • rgw: avoid occuring radosgw daemon crash when access a conditionally … (pr#42626, xiangrui meng, yupeng chen)

  • rgw: Backport of 51674 to Pacific (pr#42346, Adam C. Emerson)

  • rgw: deprecate the civetweb frontend (pr#41367, Casey Bodley)

  • rgw: Don't segfault on datalog trim (pr#42336, Adam C. Emerson)

  • rgw: during reshard lock contention, adjust logging (pr#42641, J. Eric Ivancich)

  • rgw: extending existing ssl support for vault KMS (pr#42093, Jiffin Tony Thottan)

  • rgw: fail as expected when set/delete-bucket-website attempted on a non-exis… (pr#42642, xiangrui meng)

  • rgw: fix bucket object listing when marker matches prefix (pr#42638, J. Eric Ivancich)

  • rgw: fix for mfa resync crash when supplied with only one totp_pin (pr#42652, Pritha Srivastava)

  • rgw: fix segfault related to explicit object manifest handling (pr#42633, Mark Kogan)

  • rgw: Improve error message on email id reuse (pr#41783, Ponnuvel Palaniyappan)

  • rgw: objectlock: improve client error messages (pr#40693, Matt Benjamin)

  • rgw: parse tenant name out of rgwx-bucket-instance (pr#42231, Casey Bodley)

  • rgw: radosgw-admin errors if marker not specified on data/mdlog trim (pr#42640, Adam C. Emerson)

  • rgw: remove quota soft threshold (pr#42634, Zulai Wang)

  • rgw: require bucket name in bucket chown (pr#42323, Zulai Wang)

  • rgw: when deleted obj removed in versioned bucket, extra del-marker added (pr#42645, J. Eric Ivancich)

  • rpm/luarocks: simplify conditional and support Leap 15.3 (pr#42561, Nathan Cutler)

  • rpm: drop use of $FIRST_ARG in ceph-immutable-object-cache (pr#42480, Nathan Cutler)

  • run-make-check.sh: Increase failure output log size (pr#42850, David Galloway)

  • SimpleRADOSStriper: use debug_cephsqlite (pr#42659, Patrick Donnelly)

  • src/pybind/mgr/mirroring/fs/snapshot_mirror.py: do not assume a cephf… (pr#42226, Sébastien Han)

  • test/rgw: fix use of poll() with timers in unittest_rgw_dmclock_scheduler (pr#42651, Casey Bodley)

  • Warning Cleanup and Clang Compile Fix (pr#40692, Adam C. Emerson)

  • workunits/rgw: semicolon terminates perl statements (pr#43168, Matt Benjamin)

v16.2.5 Pacific

This is the fifth backport release in the Pacific series. We recommend all users update to this release.

Notable Changes

  • ceph-mgr-modules-core debian package does not recommend ceph-mgr-rook anymore. As the latter depends on python3-numpy which cannot be imported in different Python sub-interpreters multi-times if the version of python3-numpy is older than 1.19. Since apt-get installs the Recommends packages by default, ceph-mgr-rook was always installed along with ceph-mgr debian package as an indirect dependency. If your workflow depends on this behavior, you might want to install ceph-mgr-rook separately.

  • mgr/nfs: nfs module is moved out of volumes plugin. Prior using the ceph nfs commands, nfs mgr module must be enabled.

  • volumes/nfs: The cephfs cluster type has been removed from the nfs cluster create subcommand. Clusters deployed by cephadm can support an NFS export of both rgw and cephfs from a single NFS cluster instance.

  • The nfs cluster update command has been removed. You can modify the placement of an existing NFS service (and/or its associated ingress service) using orch ls --export and orch apply -i ....

  • The orch apply nfs command no longer requires a pool or namespace argument. We strongly encourage users to use the defaults so that the nfs cluster ls and related commands will work properly.

  • The nfs cluster delete and nfs export delete commands are deprecated and will be removed in a future release. Please use nfs cluster rm and nfs export rm instead.

  • A long-standing bug that prevented 32-bit and 64-bit client/server interoperability under msgr v2 has been fixed. In particular, mixing armv7l (armhf) and x86_64 or aarch64 servers in the same cluster now works.

Changelog

  • .github/labeler: add api-change label (pr#41818, Ernesto Puerta)

  • Improve mon location handling for stretch clusters (pr#40484, Greg Farnum)

  • MDS heartbeat timed out between during executing MDCache::start_files_to_recover() (pr#42061, Yongseok Oh)

  • MDS slow request lookupino #0x100 on rank 1 block forever on dispatched (pr#40856, Xiubo Li, Patrick Donnelly)

  • MDSMonitor: crash when attempting to mount cephfs (pr#42068, Patrick Donnelly)

  • Pacific stretch mon state [Merge after 40484] (pr#41130, Greg Farnum)

  • Pacific: Add DoutPrefixProvider for RGW Log Messages in Pacfic (pr#40054, Ali Maredia, Kalpesh Pandya, Casey Bodley)

  • Pacific: Direct MMonJoin messages to leader, not first rank [Merge after 41130] (pr#41131, Greg Farnum)

  • Revert "pacific: mgr/dashboard: Generate NPM dependencies manifest" (pr#41549, Nizamudeen A)

  • Update boost url, fixing windows build (pr#41259, Lucian Petrut)

  • bluestore: use string_view and strip trailing slash for dir listing (pr#41755, Jonas Jelten, Kefu Chai)

  • build(deps): bump node-notifier from 8.0.0 to 8.0.1 in /src/pybind/mgr/dashboard/frontend (pr#40813, Ernesto Puerta, dependabot[bot])

  • ceph-volume: fix batch report and respect ceph.conf config values (pr#41714, Andrew Schoen)

  • ceph_test_rados_api_service: more retries for servicemkap (pr#41182, Sage Weil)

  • cephadm june final batch (pr#42117, Kefu Chai, Sage Weil, Zac Dover, Sebastian Wagner, Varsha Rao, Sandro Bonazzola, Juan Miguel Olmo Martínez)

  • cephadm: batch backport for May (2) (pr#41219, Adam King, Sage Weil, Zac Dover, Dennis Körner, jianglong01, Avan Thakkar, Juan Miguel Olmo Martínez)

  • cephadm: june batch 1 (pr#41684, Sage Weil, Paul Cuzner, Juan Miguel Olmo Martínez, VasishtaShastry, Zac Dover, Sebastian Wagner, Adam King, Michael Fritch, Daniel Pivonka, sunilkumarn417)

  • cephadm: june batch 2 (pr#41815, Sebastian Wagner, Daniel Pivonka, Zac Dover, Michael Fritch)

  • cephadm: june batch 3 (pr#41913, Zac Dover, Adam King, Michael Fritch, Patrick Donnelly, Sage Weil, Juan Miguel Olmo Martínez, jianglong01)

  • cephadm: may batch 1 (pr#41151, Juan Miguel Olmo Martínez, Sage Weil, Zac Dover, Daniel Pivonka, Adam King, Stanislav Datskevych, jianglong01, Kefu Chai, Deepika Upadhyay, Joao Eduardo Luis)

  • cephadm: may batch 3 (pr#41463, Sage Weil, Michael Fritch, Adam King, Patrick Seidensal, Juan Miguel Olmo Martínez, Dimitri Savineau, Zac Dover, Sebastian Wagner)

  • cephfs-mirror backports (issue#50523, issue#50035, issue#50266, issue#50442, issue#50581, issue#50229, issue#49939, issue#50224, issue#50298, pr#41475, Venky Shankar, Lucian Petrut)

  • cephfs-mirror: backports (issue#50447, issue#50867, issue#51204, pr#41947, Venky Shankar)

  • cephfs-mirror: reopen logs on SIGHUP (issue#51413, issue#51318, pr#42097, Venky Shankar)

  • cephfs-top: self-adapt the display according the window size (pr#41053, Xiubo Li)

  • client: Fix executeable access check for the root user (pr#41294, Kotresh HR)

  • client: fix the opened inodes counter increasing (pr#40685, Xiubo Li)

  • client: make Inode to inherit from RefCountedObject (pr#41052, Xiubo Li)

  • cls/rgw: look for plain entries in non-ascii plain namespace too (pr#41774, Mykola Golub)

  • common/buffer: adjust align before calling posix_memalign() (pr#41249, Ilya Dryomov)

  • common/mempool: only fail tests if sharding is very bad (pr#40566, singuliere)

  • common/options/global.yaml.in: increase default value of bluestore_cache_trim_max_skip_pinned (pr#40918, Neha Ojha)

  • crush/crush: ensure alignof(crush_work_bucket) is 1 (pr#41983, Kefu Chai)

  • debian,cmake,cephsqlite: hide non-public symbols (pr#40689, Kefu Chai)

  • debian/control: ceph-mgr-modules-core does not Recommend ceph-mgr-rook (pr#41877, Kefu Chai)

  • doc: pacific updates (pr#42066, Patrick Donnelly)

  • librbd/cache/pwl: fix parsing of cache_type in create_image_cache_state() (pr#41244, Ilya Dryomov)

  • librbd/mirror/snapshot: avoid UnlinkPeerRequest with a unlinked peer (pr#41304, Arthur Outhenin-Chalandre)

  • librbd: don't stop at the first unremovable image when purging (pr#41664, Ilya Dryomov)

  • make-dist: refuse to run if script path contains a colon (pr#41086, Nathan Cutler)

  • mds: "FAILED ceph_assert(r == 0 || r == -2)" (pr#42072, Xiubo Li)

  • mds: "cluster [ERR] Error recovering journal 0x203: (2) No such file or directory" in cluster log" (pr#42059, Xiubo Li)

  • mds: Add full caps to avoid osd full check (pr#41691, Patrick Donnelly, Kotresh HR)

  • mds: CephFS kclient gets stuck when getattr() on a certain file (pr#42062, "Yan, Zheng", Xiubo Li)

  • mds: Error ENOSYS: mds.a started profiler (pr#42056, Xiubo Li)

  • mds: MDSLog::journaler pointer maybe crash with use-after-free (pr#42060, Xiubo Li)

  • mds: avoid journaling overhead for setxattr("ceph.dir.subvolume") for no-op case (pr#41995, Patrick Donnelly)

  • mds: do not assert when receiving a unknow metric type (pr#41596, Patrick Donnelly, Xiubo Li)

  • mds: journal recovery thread is possibly asserting with mds_lock not locked (pr#42058, Xiubo Li)

  • mds: mkdir on ephemerally pinned directory sometimes blocked on journal flush (pr#42071, Xiubo Li)

  • mds: scrub error on inode 0x1 (pr#41685, Milind Changire)

  • mds: standby-replay only trims cache when it reaches the end of the replay log (pr#40855, Xiubo Li, Patrick Donnelly)

  • mgr/DaemonServer.cc: prevent mgr crashes caused by integer underflow that is triggered by large increases to pg_num/pgp_num (pr#41862, Cory Snyder)

  • mgr/Dashboard: Remove erroneous elements in hosts-overview Grafana dashboard (pr#40982, Malcolm Holmes)

  • mgr/dashboard: API Version changes do not apply to pre-defined methods (list, create etc.) (pr#41675, Aashish Sharma)

  • mgr/dashboard: Alertmanager fails to POST alerts (pr#41987, Avan Thakkar)

  • mgr/dashboard: Fix 500 error while exiting out of maintenance (pr#41915, Nizamudeen A)

  • mgr/dashboard: Fix bucket name input allowing space in the value (pr#42119, Nizamudeen A)

  • mgr/dashboard: Fix for query params resetting on change-password (pr#41440, Nizamudeen A)

  • mgr/dashboard: Generate NPM dependencies manifest (pr#41204, Nizamudeen A)

  • mgr/dashboard: Host Maintenance Follow ups (pr#41056, Nizamudeen A)

  • mgr/dashboard: Include Network address and labels on Host Creation form (pr#42027, Nizamudeen A)

  • mgr/dashboard: OSDs placement text is unreadable (pr#41096, Aashish Sharma)

  • mgr/dashboard: RGW buckets async validator performance enhancement and name constraints (pr#41296, Nizamudeen A)

  • mgr/dashboard: User database migration has been cut out (pr#42140, Volker Theile)

  • mgr/dashboard: avoid data processing in crush-map component (pr#41203, Avan Thakkar)

  • mgr/dashboard: bucket details: show lock retention period only in days (pr#41948, Alfonso Martínez)

  • mgr/dashboard: crushmap tree doesn't display crush type other than root (pr#42007, Kefu Chai, Avan Thakkar)

  • mgr/dashboard: disable NFSv3 support in dashboard (pr#41200, Volker Theile)

  • mgr/dashboard: drop container image name and id from services list (pr#41505, Avan Thakkar)

  • mgr/dashboard: fix API docs link (pr#41507, Avan Thakkar)

  • mgr/dashboard: fix ESOCKETTIMEDOUT E2E failure (pr#41427, Avan Thakkar)

  • mgr/dashboard: fix HAProxy (now called ingress) (pr#41298, Avan Thakkar)

  • mgr/dashboard: fix OSD out count (pr#42153, 胡玮文)

  • mgr/dashboard: fix OSDs Host details/overview grafana graphs (issue#49769, pr#41324, Alfonso Martínez, Michael Wodniok)

  • mgr/dashboard: fix base-href (pr#41634, Avan Thakkar)

  • mgr/dashboard: fix base-href: revert it to previous approach (pr#41251, Avan Thakkar)

  • mgr/dashboard: fix bucket objects and size calculations (pr#41646, Avan Thakkar)

  • mgr/dashboard: fix bucket versioning when locking is enabled (pr#41197, Avan Thakkar)

  • mgr/dashboard: fix for right sidebar nav icon not clickable (pr#42008, Aaryan Porwal)

  • mgr/dashboard: fix set-ssl-certificate{,-key} commands (pr#41170, Alfonso Martínez)

  • mgr/dashboard: fix typo: Filesystems to File Systems (pr#42016, Navin Barnwal)

  • mgr/dashboard: ingress service creation follow-up (pr#41428, Avan Thakkar)

  • mgr/dashboard: pass Grafana datasource in URL (pr#41633, Ernesto Puerta)

  • mgr/dashboard: provide the service events when showing a service in the UI (pr#41494, Aashish Sharma)

  • mgr/dashboard: run cephadm-backend e2e tests with KCLI (pr#42156, Alfonso Martínez)

  • mgr/dashboard: set required env. variables in run-backend-api-tests.sh (pr#41069, Alfonso Martínez)

  • mgr/dashboard: show RGW tenant user id correctly in 'NFS create export' form (pr#41528, Alfonso Martínez)

  • mgr/dashboard: show partially deleted RBDs (pr#41891, Tatjana Dehler)

  • mgr/dashboard: simplify object locking fields in 'Bucket Creation' form (pr#41777, Alfonso Martínez)

  • mgr/dashboard: update frontend deps due to security vulnerabilities (pr#41402, Alfonso Martínez)

  • mgr/dashboard:include compression stats on pool dashboard (pr#41577, Ernesto Puerta, Paul Cuzner)

  • mgr/nfs: do not depend on cephadm.utils (pr#41842, Sage Weil)

  • mgr/progress: ensure progress stays between [0,1] (pr#41312, Dan van der Ster)

  • mgr/prometheus:Improve the pool metadata (pr#40804, Paul Cuzner)

  • mgr/pybind/snap_schedule: do not fail when no fs snapshots are available (pr#41044, Sébastien Han)

  • mgr/volumes/nfs: drop type param during cluster create (pr#41005, Michael Fritch)

  • mon,doc: deprecate min_compat_client (pr#41468, Patrick Donnelly)

  • mon/MonClient: reset authenticate_err in _reopen_session() (pr#41019, Ilya Dryomov)

  • mon/MonClient: tolerate a rotating key that is slightly out of date (pr#41450, Ilya Dryomov)

  • mon/OSDMonitor: drop stale failure_info after a grace period (pr#41090, Kefu Chai)

  • mon/OSDMonitor: drop stale failure_info even if can_mark_down() (pr#41982, Kefu Chai)

  • mon: load stashed map before mkfs monmap (pr#41768, Dan van der Ster)

  • nfs backport May (pr#41389, Varsha Rao)

  • os/FileStore: fix to handle readdir error correctly (pr#41236, Misono Tomohiro)

  • os/bluestore: fix unexpected ENOSPC in Avl/Hybrid allocators (pr#41655, Igor Fedotov, Neha Ojha)

  • os/bluestore: introduce multithreading sync for bluestore's repairer (pr#41752, Igor Fedotov)

  • os/bluestore: tolerate zero length for allocators' init_[add/rm]_free() (pr#41753, Igor Fedotov)

  • osd/PG.cc: handle removal of pgmeta object (pr#41680, Neha Ojha)

  • osd/osd_type: use f->dump_unsigned() when appropriate (pr#42045, Kefu Chai)

  • osd/scrub: replace a ceph_assert() with a test (pr#41944, Ronen Friedman)

  • osd: Override recovery, backfill and sleep related config options during OSD and mclock scheduler initialization (pr#41125, Sridhar Seshasayee, Zac Dover)

  • osd: clear data digest when write_trunc (pr#42019, Zengran Zhang)

  • osd: compute OSD's space usage ratio via raw space utilization (pr#41113, Igor Fedotov)

  • osd: don't assert in-flight backfill is always in recovery list (pr#41320, Mykola Golub)

  • osd: fix scrub reschedule bug (pr#41971, wencong wan)

  • pacific: client: abort after MDS blocklist (issue#50530, pr#42070, Venky Shankar)

  • pybind/ceph_volume_client: use cephfs mkdirs api (pr#42159, Patrick Donnelly)

  • pybind/mgr/devicehealth: scrape-health-metrics command accidentally renamed to scrape-daemon-health-metrics (pr#41089, Patrick Donnelly)

  • pybind/mgr/progress: Disregard unreported pgs (pr#41872, Kamoltat)

  • pybind/mgr/snap_schedule: Invalid command: Unexpected argument 'fs=cephfs' (pr#42064, Patrick Donnelly)

  • qa/config/rados: add dispatch delay testing params (pr#41136, Deepika Upadhyay)

  • qa/distros/podman: preserve registries.conf (pr#40729, Sage Weil)

  • qa/suites/rados/standalone: remove mon_election symlink (pr#41212, Neha Ojha)

  • qa/suites/rados: add simultaneous scrubs to the thrasher (pr#42120, Ronen Friedman)

  • qa/tasks/qemu: precise repos have been archived (pr#41643, Ilya Dryomov)

  • qa/tests: corrected point versions to reflect latest releases (pr#41313, Yuri Weinstein)

  • qa/tests: initial checkin for pacific-p2p suite (2) (pr#41208, Yuri Weinstein)

  • qa/tests: replaced ubuntu_latest.yaml with ubuntu 20.04 (pr#41460, Patrick Donnelly, Kefu Chai)

  • qa/upgrade: conditionally disable update_features tests (pr#41629, Deepika)

  • qa/workunits/rbd: use bionic version of qemu-iotests for focal (pr#41195, Ilya Dryomov)

  • qa: AttributeError: 'RemoteProcess' object has no attribute 'split' (pr#41811, Patrick Donnelly)

  • qa: add async dirops testing (pr#41823, Patrick Donnelly)

  • qa: check mounts attribute in ctx (pr#40634, Jos Collin)

  • qa: convert some legacy Filesystem.rados calls (pr#40996, Patrick Donnelly)

  • qa: drop the distro~HEAD directory from the fs suite (pr#41169, Radoslaw Zarzynski)

  • qa: fs:bugs does not specify distro (pr#42063, Patrick Donnelly)

  • qa: fs:upgrade uses teuthology default distro (pr#42067, Patrick Donnelly)

  • qa: scrub code does not join scrubopts with comma (pr#42065, Kefu Chai, Patrick Donnelly)

  • qa: test_data_scan.TestDataScan.test_pg_files AssertionError: Items in the second set but not the first (pr#42069, Xiubo Li)

  • qa: test_ephemeral_pin_distribution failure (pr#41659, Patrick Donnelly)

  • qa: update RHEL to 8.4 (pr#41822, Patrick Donnelly)

  • rbd-mirror: fix segfault in snapshot replayer shutdown (pr#41503, Arthur Outhenin-Chalandre)

  • rbd: --source-spec-file should be --source-spec-path (pr#41122, Ilya Dryomov)

  • rbd: don't attempt to interpret image cache state json (pr#41281, Ilya Dryomov)

  • rgw: Simplify log shard probing and err on the side of omap (pr#41576, Adam C. Emerson)

  • rgw: completion of multipart upload leaves delete marker (pr#41769, J. Eric Ivancich)

  • rgw: crash on multipart upload to bucket with policy (pr#41893, Or Friedmann)

  • rgw: radosgw_admin remove bucket not purging past 1,000 objects (pr#41863, J. Eric Ivancich)

  • rgw: radoslist incomplete multipart parts marker (pr#40819, J. Eric Ivancich)

  • rocksdb: pickup fix to detect PMULL instruction (pr#41079, Kefu Chai)

  • session dump includes completed_requests twice, once as an integer and once as a list (pr#42057, Dan van der Ster)

  • systemd: remove ProtectClock=true for ceph-osd@.service (pr#41232, Wong Hoi Sing Edison)

  • test/librbd: use really invalid domain (pr#42010, Mykola Golub)

  • win32*.sh: disable libcephsqlite when targeting Windows (pr#40557, Lucian Petrut)

v16.2.4 Pacific

This is a hotfix release addressing a number of security issues and regressions. We recommend all users update to this release.

Changelog

v16.2.3 Pacific

This is the third backport release in the Pacific series. We recommend all users update to this release.

Notable Changes

  • This release fixes a cephadm upgrade bug that caused some systems to get stuck in a loop restarting the first mgr daemon.

v16.2.2 Pacific

This is the second backport release in the Pacific series. We recommend all users update to this release.

Notable Changes

  • Cephadm now supports an ingress service type that provides load balancing and HA (via haproxy and keepalived on a virtual IP) for RGW service (see High availability service for RGW). (The experimental rgw-ha service has been removed.)

Changelog

  • ceph-fuse: src/include/buffer.h: 1187: FAILED ceph_assert(_num <= 1024) (pr#40628, Yanhu Cao)

  • ceph-volume: fix "device" output (pr#41054, Sébastien Han)

  • ceph-volume: fix raw listing when finding OSDs from different clusters (pr#40985, Sébastien Han)

  • ceph.spec.in: Enable tcmalloc on IBM Power and Z (pr#39488, Nathan Cutler, Yaakov Selkowitz)

  • cephadm april batch 3 (issue#49737, pr#40922, Adam King, Sage Weil, Daniel Pivonka, Shreyaa Sharma, Sebastian Wagner, Juan Miguel Olmo Martínez, Zac Dover, Jeff Layton, Guillaume Abrioux, 胡玮文, Melissa Li, Nathan Cutler, Yaakov Selkowitz)

  • cephadm: april batch 1 (pr#40544, Sage Weil, Daniel Pivonka, Joao Eduardo Luis, Adam King)

  • cephadm: april batch backport 2 (pr#40746, Guillaume Abrioux, Sage Weil, Paul Cuzner)

  • cephadm: specify addr on bootstrap's host add (pr#40554, Joao Eduardo Luis)

  • cephfs: minor ceph-dokan improvements (pr#40627, Lucian Petrut)

  • client: items pinned in cache preventing unmount (pr#40629, Xiubo Li)

  • client: only check pool permissions for regular files (pr#40686, Xiubo Li)

  • cmake: define BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT globally (pr#40706, Kefu Chai)

  • cmake: pass unparsed args to add_ceph_test() (pr#40523, Kefu Chai)

  • cmake: use --smp 1 --memory 256M to crimson tests (pr#40568, Kefu Chai)

  • crush/CrushLocation: do not print logging message in constructor (pr#40679, Alex Wu)

  • doc/cephfs/nfs: add user id, fs name and key to FSAL block (pr#40687, Varsha Rao)

  • include/librados: fix doxygen syntax for docs build (pr#40805, Josh Durgin)

  • mds: "cluster [WRN] Scrub error on inode 0x1000000039d (/client.0/tmp/blogbench-1.0/src/blogtest_in) see mds.a log and damage ls output for details" (pr#40825, Milind Changire)

  • mds: skip the buffer in UnknownPayload::decode() (pr#40682, Xiubo Li)

  • mgr/PyModule: put mgr_module_path before Py_GetPath() (pr#40517, Kefu Chai)

  • mgr/dashboard: Device health status is not getting listed under hosts section (pr#40494, Aashish Sharma)

  • mgr/dashboard: Fix for alert notification message being undefined (pr#40588, Nizamudeen A)

  • mgr/dashboard: Fix for broken User management role cloning (pr#40398, Nizamudeen A)

  • mgr/dashboard: Improve descriptions in some parts of the dashboard (pr#40545, Nizamudeen A)

  • mgr/dashboard: Remove username and password from request body (pr#40981, Nizamudeen A)

  • mgr/dashboard: Remove username, password fields from Manager Modules/dashboard,influx (pr#40489, Aashish Sharma)

  • mgr/dashboard: Revoke read-only user's access to Manager modules (pr#40648, Nizamudeen A)

  • mgr/dashboard: Unable to login to ceph dashboard until clearing cookies manually (pr#40586, Avan Thakkar)

  • mgr/dashboard: debug nodeenv hangs (pr#40815, Ernesto Puerta)

  • mgr/dashboard: filesystem pool size should use stored stat (pr#40980, Avan Thakkar)

  • mgr/dashboard: fix broken feature toggles (pr#40474, Ernesto Puerta)

  • mgr/dashboard: fix duplicated rows when creating NFS export (pr#40990, Alfonso Martínez)

  • mgr/dashboard: fix errors when creating NFS export (pr#40822, Alfonso Martínez)

  • mgr/dashboard: improve telemetry opt-in reminder notification message (pr#40887, Waad Alkhoury)

  • mgr/dashboard: test prometheus rules through promtool (pr#40929, Aashish Sharma, Kefu Chai)

  • mon: Modifying trim logic to change paxos_service_trim_max dynamically (pr#40691, Aishwarya Mathuria)

  • monmaptool: Don't call set_port on an invalid address (pr#40690, Brad Hubbard, Kefu Chai)

  • os/FileStore: don't propagate split/merge error to "create"/"remove" (pr#40989, Mykola Golub)

  • os/bluestore/BlueFS: do not _flush_range deleted files (pr#40677, weixinwei)

  • osd/PeeringState: fix acting_set_writeable min_size check (pr#40759, Samuel Just)

  • packaging: require ceph-common for immutable object cache daemon (pr#40665, Ilya Dryomov)

  • pybind/mgr/volumes: deadlock on async job hangs finisher thread (pr#40630, Kefu Chai, Patrick Donnelly)

  • qa/suites/krbd: don't require CEPHX_V2 for unmap subsuite (pr#40826, Ilya Dryomov)

  • qa/suites/rados/cephadm: stop testing on broken focal kubic podman (pr#40512, Sage Weil)

  • qa/tasks/ceph.conf: shorten cephx TTL for testing (pr#40663, Sage Weil)

  • qa/tasks/cephfs: create enough subvolumes (pr#40688, Ramana Raja)

  • qa/tasks/vstart_runner.py: start max required mgrs (pr#40612, Alfonso Martínez)

  • qa/tasks: Add wait_for_clean() check prior to initiating scrubbing (pr#40461, Sridhar Seshasayee)

  • qa: "AttributeError: 'NoneType' object has no attribute 'mon_manager'" (pr#40645, Rishabh Dave)

  • qa: "log [ERR] : error reading sessionmap 'mds2_sessionmap'" (pr#40852, Patrick Donnelly)

  • qa: fix ino_release_cb racy behavior (pr#40683, Patrick Donnelly)

  • qa: fs:cephadm mount does not wait for mds to be created (pr#40528, Patrick Donnelly)

  • qa: test standby_replay in workloads (pr#40853, Patrick Donnelly)

  • rbd-mirror: fix UB while registering perf counters (pr#40680, Arthur Outhenin-Chalandre)

  • rgw: add latency to the request summary of an op (pr#40448, Ali Maredia)

  • rgw: Backport of datalog improvements to Pacific (pr#40559, Yuval Lifshitz, Adam C. Emerson)

  • test: disable mgr/mirroring for test_mirroring_init_failure_with_recovery test (issue#50020, pr#40684, Venky Shankar)

  • tools/cephfs_mirror/PeerReplayer.cc: add missing include (pr#40678, Duncan Bellamy)

  • vstart.sh: disable "auth_allow_insecure_global_id_reclaim" (pr#40957, Kefu Chai)

v16.2.1 Pacific

This is the first bugfix release in the Pacific stable series. It addresses a security vulnerability in the Ceph authentication framework.

We recommend all Pacific users upgrade.

Security fixes

  • This release includes a security fix that ensures the global_id value (a numeric value that should be unique for every authenticated client or daemon in the cluster) is reclaimed after a network disconnect or ticket renewal in a secure fashion. Two new health alerts may appear during the upgrade indicating that there are clients or daemons that are not yet patched with the appropriate fix.

    To temporarily mute the health alerts around insecure clients for the duration of the upgrade, you may want to:

    ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1h
    ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1h
    

    For more information, see CVE-2021-20288: Unauthorized global_id reuse in cephx.

v16.2.0 Pacific

This is the first stable release of Ceph Pacific.

Major Changes from Octopus

General

  • Cephadm can automatically upgrade an Octopus cluster to Pacific with a single command to start the process.

  • Cephadm has improved significantly over the past year, with improved support for RGW (standalone and multisite), and new support for NFS and iSCSI. Most of these changes have already been backported to recent Octopus point releases, but with the Pacific release we will switch to backporting bug fixes only.

  • Packages are built for the following distributions:

    • CentOS 8

    • Ubuntu 20.04 (Focal)

    • Ubuntu 18.04 (Bionic)

    • Debian Buster

    • Container image (based on CentOS 8)

    With the exception of Debian Buster, packages and containers are built for both x86_64 and aarch64 (arm64) architectures.

    Note that cephadm clusters may work on many other distributions, provided Python 3 and a recent version of Docker or Podman is available to manage containers. For more information, see Requirements.

Dashboard

The Ceph Dashboard brings improvements in the following management areas:

  • Orchestrator/Cephadm:

    • Host management: maintenance mode, labels.

    • Services: display placement specification.

    • OSD: disk replacement, display status of ongoing deletion, and improved health/SMART diagnostics reporting.

  • Official Ceph RESTful API:

    • OpenAPI v3 compliant.

    • Stability commitment starting from Pacific release.

    • Versioned via HTTP Accept header (starting with v1.0).

    • Thoroughly tested (>90% coverage and per Pull Request validation).

    • Fully documented.

  • RGW:

    • Multi-site synchronization monitoring.

    • Management of multiple RGW daemons and their resources (buckets and users).

    • Bucket and user quota usage visualization.

    • Improved configuration of S3 tenanted users.

  • Security (multiple enhancements and fixes resulting from a pen testing conducted by IBM):

    • Account lock-out after a configurable number of failed log-in attempts.

    • Improved cookie policies to mitigate XSS/CSRF attacks.

    • Reviewed and improved security in HTTP headers.

    • Sensitive information reviewed and removed from logs and error messages.

    • TLS 1.0 and 1.1 support disabled.

    • Debug mode when enabled triggers HEALTH_WARN.

  • Pools:

    • Improved visualization of replication and erasure coding modes.

    • CLAY erasure code plugin supported.

  • Alerts and notifications:

    • Alert triggered on MTU mismatches in the cluster network.

    • Favicon changes according cluster status.

  • Other:

    • Landing page: improved charts and visualization.

    • Telemetry configuration wizard.

    • OSDs: management of individual OSD flags.

    • RBD: per-RBD image Grafana dashboards.

    • CephFS: Dirs and Caps displayed.

    • NFS: v4 support only (v3 backward compatibility planned).

    • Front-end: Angular 10 update.

RADOS

  • Pacific introduces RocksDB Sharding, which reduces disk space requirements.

  • Ceph now provides QoS between client I/O and background operations via the mclock scheduler.

  • The balancer is now on by default in upmap mode to improve distribution of PGs across OSDs.

  • The output of ceph -s has been improved to show recovery progress in one progress bar. More detailed progress bars are visible via the ceph progress command.

RBD block storage

  • Image live-migration feature has been extended to support external data sources. Images can now be instantly imported from local files, remote files served over HTTP(S) or remote S3 buckets in raw (rbd export v1) or basic qcow and qcow2 formats. Support for rbd export v2 format, advanced QCOW features and rbd export-diff snapshot differentials is expected in future releases.

  • Initial support for client-side encryption has been added. This is based on LUKS and in future releases will allow using per-image encryption keys while maintaining snapshot and clone functionality -- so that parent image and potentially multiple clone images can be encrypted with different keys.

  • A new persistent write-back cache is available. The cache operates in a log-structured manner, providing full point-in-time consistency for the backing image. It should be particularly suitable for PMEM devices.

  • A Windows client is now available in the form of librbd.dll and rbd-wnbd (Windows Network Block Device) daemon. It allows mapping, unmapping and manipulating images similar to rbd-nbd.

  • librbd API now offers quiesce/unquiesce hooks, allowing for coordinated snapshot creation.

RGW object storage

  • Initial support for S3 Select. See Features Support for supported queries.

  • Bucket notification topics can be configured as persistent, where events are recorded in rados for reliable delivery.

  • Bucket notifications can be delivered to SSL-enabled AMQP endpoints.

  • Lua scripts can be run during requests and access their metadata.

  • SSE-KMS now supports KMIP as a key management service.

  • Multisite data logs can now be deployed on cls_fifo to avoid large omap cluster warnings and make their trimming cheaper. See rgw_data_log_backing.

CephFS distributed file system

  • The CephFS MDS modifies on-RADOS metadata such that the new format is no longer backwards compatible. It is not possible to downgrade a file system from Pacific (or later) to an older release.

  • Multiple file systems in a single Ceph cluster is now stable. New Ceph clusters enable support for multiple file systems by default. Existing clusters must still set the "enable_multiple" flag on the FS. See also Multiple Ceph File Systems.

  • A new mds_autoscaler ceph-mgr plugin is available for automatically deploying MDS daemons in response to changes to the max_mds configuration. Expect further enhancements in the future to simplify and automate MDS scaling.

  • cephfs-top is a new utility for looking at performance metrics from CephFS clients. It is development preview quality and will have bugs. For more information, see CephFS Top Utility.

  • A new snap_schedule ceph-mgr plugin provides a command toolset for scheduling snapshots on a CephFS file system. For more information, see Snapshot Scheduling Module.

  • First class NFS gateway support in Ceph is here! It's now possible to create scale-out ("active-active") NFS gateway clusters that export CephFS using a few commands. The gateways are deployed via cephadm (or Rook, in the future). For more information, see CephFS & RGW Exports over NFS.

  • Multiple active MDS file system scrub is now stable. It is no longer necessary to set max_mds to 1 and wait for non-zero ranks to stop. Scrub commands can only be sent to rank 0: ceph tell mds.<fs_name>:0 scrub start /path .... For more information, see Ceph File System Scrub.

  • Ephemeral pinning -- policy based subtree pinning -- is considered stable. mds_export_ephemeral_random and mds_export_ephemeral_distributed now default to true. For more information, see Setting subtree partitioning policies.

  • A new cephfs-mirror daemon is available to mirror CephFS file systems to a remote Ceph cluster. For more information, see CephFS Snapshot Mirroring.

  • A Windows client is now available for connecting to CephFS. This is offered through a new ceph-dokan utility which operates via the Dokan userspace API, similar to FUSE. For more information, see Mount CephFS on Windows.

Upgrading from Octopus or Nautilus

Before starting, make sure your cluster is stable and healthy (no down or recovering OSDs). (This is optional, but recommended.)

备注

WARNING: Please do not set bluestore_fsck_quick_fix_on_mount to true or run ceph-bluestore-tool repair or quick-fix commands in Pacific versions <= 16.2.6, because this can lead to data corruption, details in https://tracker.ceph.com/issues/53062.

Upgrading cephadm clusters

If your cluster is deployed with cephadm (first introduced in Octopus), then the upgrade process is entirely automated. To initiate the upgrade,

ceph orch upgrade start --ceph-version 16.2.0

The same process is used to upgrade to future minor releases.

Upgrade progress can be monitored with ceph -s (which provides a simple progress bar) or more verbosely with

ceph -W cephadm

The upgrade can be paused or resumed with

ceph orch upgrade pause   # to pause
ceph orch upgrade resume  # to resume

or canceled with

ceph orch upgrade stop

Note that canceling the upgrade simply stops the process; there is no ability to downgrade back to Octopus.

Upgrading non-cephadm clusters

备注

If you cluster is running Octopus (15.2.x), you might choose to first convert it to use cephadm so that the upgrade to Pacific is automated (see above). For more information, see Converting an existing cluster to cephadm.

  1. Set the noout flag for the duration of the upgrade. (Optional, but recommended.):

    # ceph osd set noout
    
  2. Upgrade monitors by installing the new packages and restarting the monitor daemons. For example, on each monitor host,:

    # systemctl restart ceph-mon.target
    

    Once all monitors are up, verify that the monitor upgrade is complete by looking for the octopus string in the mon map. The command:

    # ceph mon dump | grep min_mon_release
    

    should report:

    min_mon_release 16 (pacific)
    

    If it doesn't, that implies that one or more monitors hasn't been upgraded and restarted and/or the quorum does not include all monitors.

  3. Upgrade ceph-mgr daemons by installing the new packages and restarting all manager daemons. For example, on each manager host,:

    # systemctl restart ceph-mgr.target
    

    Verify the ceph-mgr daemons are running by checking ceph -s:

    # ceph -s
    
    ...
      services:
       mon: 3 daemons, quorum foo,bar,baz
       mgr: foo(active), standbys: bar, baz
    ...
    
  4. Upgrade all OSDs by installing the new packages and restarting the ceph-osd daemons on all OSD hosts:

    # systemctl restart ceph-osd.target
    

    Note that if you are upgrading from Nautilus, the first time each OSD starts, it will do a format conversion to improve the accounting for "omap" data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data). You can disable this automatic conversion with:

    # ceph config set osd bluestore_fsck_quick_fix_on_mount false
    

    You can monitor the progress of the OSD upgrades with the ceph versions or ceph osd versions commands:

    # ceph osd versions
    {
       "ceph version 14.2.5 (...) nautilus (stable)": 12,
       "ceph version 16.2.0 (...) pacific (stable)": 22,
    }
    
  5. Upgrade all CephFS MDS daemons. For each CephFS file system,

    1. Disable standby_replay:

    # ceph fs set <fs_name> allow_standby_replay false

    1. Reduce the number of ranks to 1. (Make note of the original number of MDS daemons first if you plan to restore it later.):

      # ceph status
      # ceph fs set <fs_name> max_mds 1
      
    2. Wait for the cluster to deactivate any non-zero ranks by periodically checking the status:

      # ceph status
      
    3. Take all standby MDS daemons offline on the appropriate hosts with:

      # systemctl stop ceph-mds@<daemon_name>
      
    4. Confirm that only one MDS is online and is rank 0 for your FS:

      # ceph status
      
    5. Upgrade the last remaining MDS daemon by installing the new packages and restarting the daemon:

      # systemctl restart ceph-mds.target
      
    6. Restart all standby MDS daemons that were taken offline:

      # systemctl start ceph-mds.target
      
    7. Restore the original value of max_mds for the volume:

      # ceph fs set <fs_name> max_mds <original_max_mds>
      
  6. Upgrade all radosgw daemons by upgrading packages and restarting daemons on all hosts:

    # systemctl restart ceph-radosgw.target
    
  7. Complete the upgrade by disallowing pre-Pacific OSDs and enabling all new Pacific-only functionality:

    # ceph osd require-osd-release pacific
    
  8. If you set noout at the beginning, be sure to clear it with:

    # ceph osd unset noout
    
  9. Consider transitioning your cluster to use the cephadm deployment and orchestration framework to simplify cluster management and future upgrades. For more information on converting an existing cluster to cephadm, see Converting an existing cluster to cephadm.

Post-upgrade

  1. Verify the cluster is healthy with ceph health.

    If your CRUSH tunables are older than Hammer, Ceph will now issue a health warning. If you see a health alert to that effect, you can revert this change with:

    ceph config set mon mon_crush_min_required_version firefly
    

    If Ceph does not complain, however, then we recommend you also switch any existing CRUSH buckets to straw2, which was added back in the Hammer release. If you have any 'straw' buckets, this will result in a modest amount of data movement, but generally nothing too severe.:

    ceph osd getcrushmap -o backup-crushmap
    ceph osd crush set-all-straw-buckets-to-straw2
    

    If there are problems, you can easily revert with:

    ceph osd setcrushmap -i backup-crushmap
    

    Moving to 'straw2' buckets will unlock a few recent features, like the crush-compat balancer mode added back in Luminous.

  2. If you did not already do so when upgrading from Mimic, we recommened you enable the new v2 network protocol, issue the following command:

    ceph mon enable-msgr2
    

    This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated,:

    ceph mon dump
    

    and verify that each monitor has both a v2: and v1: address listed.

  3. Consider enabling the telemetry module to send anonymized usage statistics and crash information to the Ceph upstream developers. To see what would be reported (without actually sending any information to anyone),:

    ceph mgr module enable telemetry
    ceph telemetry show
    

    If you are comfortable with the data that is reported, you can opt-in to automatically report the high-level cluster metadata with:

    ceph telemetry on
    

    The public dashboard that aggregates Ceph telemetry can be found at https://telemetry-public.ceph.com/.

    For more information about the telemetry module, see the documentation.

Upgrade from pre-Nautilus releases (like Mimic or Luminous)

You must first upgrade to Nautilus (14.2.z) or Octopus (15.2.z) before upgrading to Pacific.

Notable Changes

  • A new library is available, libcephsqlite. It provides a SQLite Virtual File System (VFS) on top of RADOS. The database and journals are striped over RADOS across multiple objects for virtually unlimited scaling and throughput only limited by the SQLite client. Applications using SQLite may change to the Ceph VFS with minimal changes, usually just by specifying the alternate VFS. We expect the library to be most impactful and useful for applications that were storing state in RADOS omap, especially without striping which limits scalability.

  • New bluestore_rocksdb_options_annex config parameter. Complements bluestore_rocksdb_options and allows setting rocksdb options without repeating the existing defaults.

  • $pid expansion in config paths like admin_socket will now properly expand to the daemon pid for commands like ceph-mds or ceph-osd. Previously only ceph-fuse/rbd-nbd expanded $pid with the actual daemon pid.

  • The allowable options for some radosgw-admin commands have been changed.

    • mdlog-list, datalog-list, sync-error-list no longer accepts start and end dates, but does accept a single optional start marker.

    • mdlog-trim, datalog-trim, sync-error-trim only accept a single marker giving the end of the trimmed range.

    • Similarly the date ranges and marker ranges have been removed on the RESTful DATALog and MDLog list and trim operations.

  • ceph-volume: The lvm batch subcommand received a major rewrite. This closed a number of bugs and improves usability in terms of size specification and calculation, as well as idempotency behaviour and disk replacement process. Please refer to https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/ for more detailed information.

  • Configuration variables for permitted scrub times have changed. The legal values for osd_scrub_begin_hour and osd_scrub_end_hour are 0 - 23. The use of 24 is now illegal. Specifying 0 for both values causes every hour to be allowed. The legal values for osd_scrub_begin_week_day and osd_scrub_end_week_day are 0 - 6. The use of 7 is now illegal. Specifying 0 for both values causes every day of the week to be allowed.

  • volume/nfs: Recently "ganesha-" prefix from cluster id and nfs-ganesha common config object was removed, to ensure consistent namespace across different orchestrator backends. Please delete any existing nfs-ganesha clusters prior to upgrading and redeploy new clusters after upgrading to Pacific.

  • A new health check, DAEMON_OLD_VERSION, will warn if different versions of Ceph are running on daemons. It will generate a health error if multiple versions are detected. This condition must exist for over mon_warn_older_version_delay (set to 1 week by default) in order for the health condition to be triggered. This allows most upgrades to proceed without falsely seeing the warning. If upgrade is paused for an extended time period, health mute can be used like this "ceph health mute DAEMON_OLD_VERSION --sticky". In this case after upgrade has finished use "ceph health unmute DAEMON_OLD_VERSION".

  • MGR: progress module can now be turned on/off, using the commands: ceph progress on and ceph progress off.

  • An AWS-compliant API: "GetTopicAttributes" was added to replace the existing "GetTopic" API. The new API should be used to fetch information about topics used for bucket notifications.

  • librbd: The shared, read-only parent cache's config option immutable_object_cache_watermark now has been updated to property reflect the upper cache utilization before space is reclaimed. The default immutable_object_cache_watermark now is 0.9. If the capacity reaches 90% the daemon will delete cold cache.

  • OSD: the option osd_fast_shutdown_notify_mon has been introduced to allow the OSD to notify the monitor it is shutting down even if osd_fast_shutdown is enabled. This helps with the monitor logs on larger clusters, that may get many 'osd.X reported immediately failed by osd.Y' messages, and confuse tools.

  • The mclock scheduler has been refined. A set of built-in profiles are now available that provide QoS between the internal and external clients of Ceph. To enable the mclock scheduler, set the config option "osd_op_queue" to "mclock_scheduler". The "high_client_ops" profile is enabled by default, and allocates more OSD bandwidth to external client operations than to internal client operations (such as background recovery and scrubs). Other built-in profiles include "high_recovery_ops" and "balanced". These built-in profiles optimize the QoS provided to clients of mclock scheduler.

  • The balancer is now on by default in upmap mode. Since upmap mode requires require_min_compat_client luminous, new clusters will only support luminous and newer clients by default. Existing clusters can enable upmap support by running ceph osd set-require-min-compat-client luminous. It is still possible to turn the balancer off using the ceph balancer off command. In earlier versions, the balancer was included in the always_on_modules list, but needed to be turned on explicitly using the ceph balancer on command.

  • Version 2 of the cephx authentication protocol (CEPHX_V2 feature bit) is now required by default. It was introduced in 2018, adding replay attack protection for authorizers and making msgr v1 message signatures stronger (CVE-2018-1128 and CVE-2018-1129). Support is present in Jewel 10.2.11, Luminous 12.2.6, Mimic 13.2.1, Nautilus 14.2.0 and later; upstream kernels 4.9.150, 4.14.86, 4.19 and later; various distribution kernels, in particular CentOS 7.6 and later. To enable older clients, set cephx_require_version and cephx_service_require_version config options to 1.

  • blacklist has been replaced with blocklist throughout. The following commands have changed:

    • ceph osd blacklist ... are now ceph osd blocklist ...

    • ceph <tell|daemon> osd.<NNN> dump_blacklist is now ceph <tell|daemon> osd.<NNN> dump_blocklist

  • The following config options have changed:

    • mon osd blacklist default expire is now mon osd blocklist default expire

    • mon mds blacklist interval is now mon mds blocklist interval

    • mon mgr blacklist interval is now ''mon mgr blocklist interval``

    • rbd blacklist on break lock is now rbd blocklist on break lock

    • rbd blacklist expire seconds is now rbd blocklist expire seconds

    • mds session blacklist on timeout is now mds session blocklist on timeout

    • mds session blacklist on evict is now mds session blocklist on evict

  • The following librados API calls have changed:

    • rados_blacklist_add is now rados_blocklist_add; the former will issue a deprecation warning and be removed in a future release.

    • rados.blacklist_add is now rados.blocklist_add in the C++ API.

  • The JSON output for the following commands now shows blocklist instead of blacklist:

    • ceph osd dump

    • ceph <tell|daemon> osd.<N> dump_blocklist

  • Monitors now have config option mon_allow_pool_size_one, which is disabled by default. However, if enabled, user now have to pass the --yes-i-really-mean-it flag to osd pool set size 1, if they are really sure of configuring pool size 1.

  • ceph pg #.# list_unfound output has been enhanced to provide might_have_unfound information which indicates which OSDs may contain the unfound objects.

  • OSD: A new configuration option osd_compact_on_start has been added which triggers an OSD compaction on start. Setting this option to true and restarting an OSD will result in an offline compaction of the OSD prior to booting.

  • OSD: the option named bdev_nvme_retry_count has been removed. Because in SPDK v20.07, there is no easy access to bdev_nvme options, and this option is hardly used, so it was removed.

  • Alpine build related script, documentation and test have been removed since the most updated APKBUILD script of Ceph is already included by Alpine Linux's aports repository.