PageRenderTime 66ms CodeModel.GetById 26ms RepoModel.GetById 1ms app.codeStats 0ms

ReStructuredText | 333 lines | 242 code | 91 blank | 0 comment | 0 complexity | df1cdc4c77d9b0863714dd75d532734e MD5 | raw file
  1. .. SPDX-License-Identifier: GPL-2.0-only
  2. ========
  3. dm-clone
  4. ========
  5. Introduction
  6. ============
  7. dm-clone is a device mapper target which produces a one-to-one copy of an
  8. existing, read-only source device into a writable destination device: It
  9. presents a virtual block device which makes all data appear immediately, and
  10. redirects reads and writes accordingly.
  11. The main use case of dm-clone is to clone a potentially remote, high-latency,
  12. read-only, archival-type block device into a writable, fast, primary-type device
  13. for fast, low-latency I/O. The cloned device is visible/mountable immediately
  14. and the copy of the source device to the destination device happens in the
  15. background, in parallel with user I/O.
  16. For example, one could restore an application backup from a read-only copy,
  17. accessible through a network storage protocol (NBD, Fibre Channel, iSCSI, AoE,
  18. etc.), into a local SSD or NVMe device, and start using the device immediately,
  19. without waiting for the restore to complete.
  20. When the cloning completes, the dm-clone table can be removed altogether and be
  21. replaced, e.g., by a linear table, mapping directly to the destination device.
  22. The dm-clone target reuses the metadata library used by the thin-provisioning
  23. target.
  24. Glossary
  25. ========
  26. Hydration
  27. The process of filling a region of the destination device with data from
  28. the same region of the source device, i.e., copying the region from the
  29. source to the destination device.
  30. Once a region gets hydrated we redirect all I/O regarding it to the destination
  31. device.
  32. Design
  33. ======
  34. Sub-devices
  35. -----------
  36. The target is constructed by passing three devices to it (along with other
  37. parameters detailed later):
  38. 1. A source device - the read-only device that gets cloned and source of the
  39. hydration.
  40. 2. A destination device - the destination of the hydration, which will become a
  41. clone of the source device.
  42. 3. A small metadata device - it records which regions are already valid in the
  43. destination device, i.e., which regions have already been hydrated, or have
  44. been written to directly, via user I/O.
  45. The size of the destination device must be at least equal to the size of the
  46. source device.
  47. Regions
  48. -------
  49. dm-clone divides the source and destination devices in fixed sized regions.
  50. Regions are the unit of hydration, i.e., the minimum amount of data copied from
  51. the source to the destination device.
  52. The region size is configurable when you first create the dm-clone device. The
  53. recommended region size is the same as the file system block size, which usually
  54. is 4KB. The region size must be between 8 sectors (4KB) and 2097152 sectors
  55. (1GB) and a power of two.
  56. Reads and writes from/to hydrated regions are serviced from the destination
  57. device.
  58. A read to a not yet hydrated region is serviced directly from the source device.
  59. A write to a not yet hydrated region will be delayed until the corresponding
  60. region has been hydrated and the hydration of the region starts immediately.
  61. Note that a write request with size equal to region size will skip copying of
  62. the corresponding region from the source device and overwrite the region of the
  63. destination device directly.
  64. Discards
  65. --------
  66. dm-clone interprets a discard request to a range that hasn't been hydrated yet
  67. as a hint to skip hydration of the regions covered by the request, i.e., it
  68. skips copying the region's data from the source to the destination device, and
  69. only updates its metadata.
  70. If the destination device supports discards, then by default dm-clone will pass
  71. down discard requests to it.
  72. Background Hydration
  73. --------------------
  74. dm-clone copies continuously from the source to the destination device, until
  75. all of the device has been copied.
  76. Copying data from the source to the destination device uses bandwidth. The user
  77. can set a throttle to prevent more than a certain amount of copying occurring at
  78. any one time. Moreover, dm-clone takes into account user I/O traffic going to
  79. the devices and pauses the background hydration when there is I/O in-flight.
  80. A message `hydration_threshold <#regions>` can be used to set the maximum number
  81. of regions being copied, the default being 1 region.
  82. dm-clone employs dm-kcopyd for copying portions of the source device to the
  83. destination device. By default, we issue copy requests of size equal to the
  84. region size. A message `hydration_batch_size <#regions>` can be used to tune the
  85. size of these copy requests. Increasing the hydration batch size results in
  86. dm-clone trying to batch together contiguous regions, so we copy the data in
  87. batches of this many regions.
  88. When the hydration of the destination device finishes, a dm event will be sent
  89. to user space.
  90. Updating on-disk metadata
  91. -------------------------
  92. On-disk metadata is committed every time a FLUSH or FUA bio is written. If no
  93. such requests are made then commits will occur every second. This means the
  94. dm-clone device behaves like a physical disk that has a volatile write cache. If
  95. power is lost you may lose some recent writes. The metadata should always be
  96. consistent in spite of any crash.
  97. Target Interface
  98. ================
  99. Constructor
  100. -----------
  101. ::
  102. clone <metadata dev> <destination dev> <source dev> <region size>
  103. [<#feature args> [<feature arg>]* [<#core args> [<core arg>]*]]
  104. ================ ==============================================================
  105. metadata dev Fast device holding the persistent metadata
  106. destination dev The destination device, where the source will be cloned
  107. source dev Read only device containing the data that gets cloned
  108. region size The size of a region in sectors
  109. #feature args Number of feature arguments passed
  110. feature args no_hydration or no_discard_passdown
  111. #core args An even number of arguments corresponding to key/value pairs
  112. passed to dm-clone
  113. core args Key/value pairs passed to dm-clone, e.g. `hydration_threshold
  114. 256`
  115. ================ ==============================================================
  116. Optional feature arguments are:
  117. ==================== =========================================================
  118. no_hydration Create a dm-clone instance with background hydration
  119. disabled
  120. no_discard_passdown Disable passing down discards to the destination device
  121. ==================== =========================================================
  122. Optional core arguments are:
  123. ================================ ==============================================
  124. hydration_threshold <#regions> Maximum number of regions being copied from
  125. the source to the destination device at any
  126. one time, during background hydration.
  127. hydration_batch_size <#regions> During background hydration, try to batch
  128. together contiguous regions, so we copy data
  129. from the source to the destination device in
  130. batches of this many regions.
  131. ================================ ==============================================
  132. Status
  133. ------
  134. ::
  135. <metadata block size> <#used metadata blocks>/<#total metadata blocks>
  136. <region size> <#hydrated regions>/<#total regions> <#hydrating regions>
  137. <#feature args> <feature args>* <#core args> <core args>*
  138. <clone metadata mode>
  139. ======================= =======================================================
  140. metadata block size Fixed block size for each metadata block in sectors
  141. #used metadata blocks Number of metadata blocks used
  142. #total metadata blocks Total number of metadata blocks
  143. region size Configurable region size for the device in sectors
  144. #hydrated regions Number of regions that have finished hydrating
  145. #total regions Total number of regions to hydrate
  146. #hydrating regions Number of regions currently hydrating
  147. #feature args Number of feature arguments to follow
  148. feature args Feature arguments, e.g. `no_hydration`
  149. #core args Even number of core arguments to follow
  150. core args Key/value pairs for tuning the core, e.g.
  151. `hydration_threshold 256`
  152. clone metadata mode ro if read-only, rw if read-write
  153. In serious cases where even a read-only mode is deemed
  154. unsafe no further I/O will be permitted and the status
  155. will just contain the string 'Fail'. If the metadata
  156. mode changes, a dm event will be sent to user space.
  157. ======================= =======================================================
  158. Messages
  159. --------
  160. `disable_hydration`
  161. Disable the background hydration of the destination device.
  162. `enable_hydration`
  163. Enable the background hydration of the destination device.
  164. `hydration_threshold <#regions>`
  165. Set background hydration threshold.
  166. `hydration_batch_size <#regions>`
  167. Set background hydration batch size.
  168. Examples
  169. ========
  170. Clone a device containing a file system
  171. ---------------------------------------
  172. 1. Create the dm-clone device.
  173. ::
  174. dmsetup create clone --table "0 1048576000 clone $metadata_dev $dest_dev \
  175. $source_dev 8 1 no_hydration"
  176. 2. Mount the device and trim the file system. dm-clone interprets the discards
  177. sent by the file system and it will not hydrate the unused space.
  178. ::
  179. mount /dev/mapper/clone /mnt/cloned-fs
  180. fstrim /mnt/cloned-fs
  181. 3. Enable background hydration of the destination device.
  182. ::
  183. dmsetup message clone 0 enable_hydration
  184. 4. When the hydration finishes, we can replace the dm-clone table with a linear
  185. table.
  186. ::
  187. dmsetup suspend clone
  188. dmsetup load clone --table "0 1048576000 linear $dest_dev 0"
  189. dmsetup resume clone
  190. The metadata device is no longer needed and can be safely discarded or reused
  191. for other purposes.
  192. Known issues
  193. ============
  194. 1. We redirect reads, to not-yet-hydrated regions, to the source device. If
  195. reading the source device has high latency and the user repeatedly reads from
  196. the same regions, this behaviour could degrade performance. We should use
  197. these reads as hints to hydrate the relevant regions sooner. Currently, we
  198. rely on the page cache to cache these regions, so we hopefully don't end up
  199. reading them multiple times from the source device.
  200. 2. Release in-core resources, i.e., the bitmaps tracking which regions are
  201. hydrated, after the hydration has finished.
  202. 3. During background hydration, if we fail to read the source or write to the
  203. destination device, we print an error message, but the hydration process
  204. continues indefinitely, until it succeeds. We should stop the background
  205. hydration after a number of failures and emit a dm event for user space to
  206. notice.
  207. Why not...?
  208. ===========
  209. We explored the following alternatives before implementing dm-clone:
  210. 1. Use dm-cache with cache size equal to the source device and implement a new
  211. cloning policy:
  212. * The resulting cache device is not a one-to-one mirror of the source device
  213. and thus we cannot remove the cache device once cloning completes.
  214. * dm-cache writes to the source device, which violates our requirement that
  215. the source device must be treated as read-only.
  216. * Caching is semantically different from cloning.
  217. 2. Use dm-snapshot with a COW device equal to the source device:
  218. * dm-snapshot stores its metadata in the COW device, so the resulting device
  219. is not a one-to-one mirror of the source device.
  220. * No background copying mechanism.
  221. * dm-snapshot needs to commit its metadata whenever a pending exception
  222. completes, to ensure snapshot consistency. In the case of cloning, we don't
  223. need to be so strict and can rely on committing metadata every time a FLUSH
  224. or FUA bio is written, or periodically, like dm-thin and dm-cache do. This
  225. improves the performance significantly.
  226. 3. Use dm-mirror: The mirror target has a background copying/mirroring
  227. mechanism, but it writes to all mirrors, thus violating our requirement that
  228. the source device must be treated as read-only.
  229. 4. Use dm-thin's external snapshot functionality. This approach is the most
  230. promising among all alternatives, as the thinly-provisioned volume is a
  231. one-to-one mirror of the source device and handles reads and writes to
  232. un-provisioned/not-yet-cloned areas the same way as dm-clone does.
  233. Still:
  234. * There is no background copying mechanism, though one could be implemented.
  235. * Most importantly, we want to support arbitrary block devices as the
  236. destination of the cloning process and not restrict ourselves to
  237. thinly-provisioned volumes. Thin-provisioning has an inherent metadata
  238. overhead, for maintaining the thin volume mappings, which significantly
  239. degrades performance.
  240. Moreover, cloning a device shouldn't force the use of thin-provisioning. On
  241. the other hand, if we wish to use thin provisioning, we can just use a thin
  242. LV as dm-clone's destination device.