PageRenderTime 2770ms CodeModel.GetById 21ms RepoModel.GetById 0ms app.codeStats 0ms

/_docs/archives.md

https://gitlab.com/eighthave/reproducible-builds-website
Markdown | 207 lines | 157 code | 50 blank | 0 comment | 0 complexity | efba92f4b05a34094943775602930cbe MD5 | raw file
  1. ---
  2. title: Archive metadata
  3. layout: docs
  4. permalink: /docs/archives/
  5. ---
  6. Most archive formats record metadata that will capture details about the
  7. build environment if no care is taken. File last modification time is
  8. obvious, but file ordering, users, groups, numeric ids, and permissions
  9. can also be of concern. Tar will be used as the main example but these tips
  10. apply to other archive formats as well.
  11. File modification times
  12. -----------------------
  13. Most archive formats will, by default, record file last modification
  14. times, while some will also record file creation times.
  15. Tar has a way to specify the modification time that is used for all
  16. archive members:
  17. {% highlight sh %}
  18. $ tar --mtime='2015-10-21 00:00Z' -cf product.tar build
  19. {% endhighlight %}
  20. (Notice how `Z` is used to specify that time is in the UTC
  21. [timezone]({{ "/docs/timezones/" | prepend: site.baseurl }}).)
  22. For other archive formats, it is always possible to use `touch` to reset
  23. the modification times to a [predefined value]({{ "/docs/timestamps/" | prepend: site.baseurl }})
  24. before creating the archive:
  25. {% highlight sh %}
  26. $ find build -print0 |
  27. xargs -0r touch --no-dereference --date="@${SOURCE_DATE_EPOCH}"
  28. $ zip -r product.zip build
  29. {% endhighlight %}
  30. In some cases, it is preferable to keep the original times for files
  31. that have not been created or modified during the build process:
  32. {% highlight sh %}
  33. $ find build -newermt "@${SOURCE_DATE_EPOCH}" -print0 |
  34. xargs -0r touch --no-dereference --date="@${SOURCE_DATE_EPOCH}"
  35. $ zip -r product.zip build
  36. {% endhighlight %}
  37. A patch has been written to simplify the latter operation with GNU
  38. Tar. It is currently available in Debian since
  39. [tar](https://packages.qa.debian.org/tar) version 1.28-1. Hopefully it
  40. will be integrated upstream soon, but you might want to use it with
  41. caution. It adds a new `--clamp-mtime` flag which will only set the time
  42. when the file is more recent than the value specified with `--mtime`:
  43. {% highlight sh %}
  44. # Only in Debian unstable for now
  45. $ tar --mtime='2015-10-21 00:00Z' --clamp-mtime -cf product.tar build
  46. {% endhighlight %}
  47. This has the benefit of leaving the original file modification time
  48. untouched.
  49. File ordering
  50. -------------
  51. When asked to record directories, most archive formats will read their
  52. content in the order returned by the filesystem which is [likely to be
  53. different on every run]({{ "/docs/stable-inputs/" | prepend: site.baseurl }}).
  54. With version 1.28, GNU Tar has gained the `--sort=name` option which will
  55. sort filenames in a locale independent manner:
  56. {% highlight sh %}
  57. # Works with GNU Tar 1.28
  58. $ tar --sort=name -cf product.tar build
  59. {% endhighlight %}
  60. For older versions or other archive formats, it is possible to use
  61. `find` and `sort` to achieve the same effect:
  62. {% highlight sh %}
  63. $ find build -print0 | LC_ALL=C sort -z |
  64. tar --no-recursion --null -T - -cf product.tar
  65. {% endhighlight %}
  66. Care must be taken to ensure that `sort` is called in the context of the
  67. C locale to avoid any surprises related to collation order.
  68. Users, groups and numeric ids
  69. -----------------------------
  70. Depending on the archive format, the user and group owning the file
  71. can be recorded. Sometimes it will be using a string, sometimes using
  72. the associated numeric ids.
  73. When files belong to predefined system groups, this is not a problem,
  74. but builds are often performed with regular users. Recording of the
  75. account name or its associated ids might be a source of reproducibility
  76. issues.
  77. Tar offers a way to specify the user and group owning the file. Using
  78. `0`/`0` and `--numeric-owner` is a safe bet, as it will effectively
  79. record 0 as values:
  80. {% highlight sh %}
  81. $ tar --owner=0 --group=0 --numeric-owner -cf product.tar build
  82. {% endhighlight %}
  83. PAX headers
  84. -----------
  85. GNU tar defaults to the pax format and if `POSIXLY_CORRECT` is set, that adds files' ctime, atime and the PID of the tar process as non-deterministic metadata.
  86. To avoid this, either `unset POSIXLY_CORRECT` (only works with [tar>1.32](https://git.savannah.gnu.org/cgit/tar.git/commit/?id=ef0f882382f6)) or add to the tar call
  87. `--pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime` or `--format=gnu` (both only available in GNU tar)
  88. or use `--format=ustar` if the limitations in that format are not a problem.
  89. Full example
  90. ------------
  91. The recommended way to create a Tar archive is thus:
  92. <div class="correct">
  93. {% highlight sh %}
  94. # requires GNU Tar 1.28+
  95. $ tar --sort=name \
  96. --mtime="@${SOURCE_DATE_EPOCH}" \
  97. --owner=0 --group=0 --numeric-owner \
  98. --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
  99. -cf product.tar build
  100. {% endhighlight %}
  101. </div>
  102. Post-processing
  103. ---------------
  104. If tools do not support options to create reproducible archives, it is
  105. always possible to perform post-processing.
  106. [strip-nondeterminism](https://packages.debian.org/sid/strip-nondeterminism)
  107. already has support to normalize Zip and Jar archives (with [limitations](https://bugs.debian.org/859103)). Custom scripts
  108. like Tor Browser's
  109. [re-dzip.sh](https://gitweb.torproject.org/builders/tor-browser-bundle.git/tree/gitian/build-helpers/re-dzip.sh)
  110. might also be an option.
  111. Static libraries
  112. ----------------
  113. Static libraries (`.a`) on Unix-like systems are *ar* archives. Like
  114. other archive formats, they contain metadata, namely timestamps, UIDs,
  115. GIDs, and permissions. None are actually required for using them as
  116. libraries.
  117. GNU `ar` and other tools from
  118. [binutils](https://www.gnu.org/software/binutils/) have a *deterministic
  119. mode* which will use zero for UIDs, GIDs, timestamps, and use consistent
  120. file modes for all files. It can be made the default by passing the
  121. `--enable-deterministic-archives` option to `./configure`. It is already
  122. enabled by default for some distributions[^distros-with-default] and so
  123. far it seems to be pretty safe [except for
  124. Makefiles](https://bugs.debian.org/798804) using targets like
  125. `archive.a(foo.o)`.
  126. When binutils is not built with deterministic archives by default, build
  127. systems have to be changed to pass the right options to `ar` and
  128. friends. `ARFLAGS` can be set to `Dcvr` with many build systems to turn on the
  129. deterministic mode. Care must also be taken to pass `-D` if `ranlib` is
  130. used to create the function index.
  131. Another option is post-processing with
  132. [strip-nondeterminism](https://packages.debian.org/sid/strip-nondeterminism)
  133. or `objcopy`:
  134. objcopy --enable-deterministic-archives libfoo.a
  135. The above does not fix [file ordering]({{ "/docs/stable-inputs/" | prepend: site.baseurl }}).
  136. [^distros-with-default]: Debian since [version 2.25-6](https://tracker.debian.org/news/675691)/stretch, Ubuntu since version 2.25-8ubuntu1/artful 17.10. It is the default for Fedora 22 and Fedora 23, but it seems this will be [reverted in Fedora 24](https://bugzilla.redhat.com/show_bug.cgi?id=1195883).
  137. Initramfs images
  138. ----------------
  139. *cpio* archives are commonly used for initramfs images. The *cpio* header
  140. format (see `man 5 cpio`) can contain device and inode numbers, which whilst
  141. deterministic, can vary from system to system.
  142. One way to filter these is by piping through bsdtar.
  143. Example of non-deterministic code:
  144. ```
  145. echo ucode.bin |
  146. bsdcpio -o -H newc -R 0:0 > ucode.img
  147. ```
  148. Example of deterministic code:
  149. ```
  150. echo ucode.bin |
  151. bsdtar --uid 0 --gid 0 -cnf - -T - |
  152. bsdtar --null -cf - --format=newc @- > ucode.img
  153. ```
  154. Note that other issues such as timestamps may still require rectification prior
  155. to archival.
  156. ## GNU Libtool
  157. [GNU Libtool](https://www.gnu.org/software/libtool/) prior to `74c8993c` (first included in version 2.2.7b) did not sort the find output. It appears that many packages (including all [GNU GCC](https://gcc.gnu.org/) versions so far) are bootstrapped with a version prior to this.