PageRenderTime 51ms CodeModel.GetById 21ms RepoModel.GetById 0ms app.codeStats 0ms

/Documentation/devel/stage1-implementors-guide.md

https://gitlab.com/github-cloud-corporation/rkt
Markdown | 208 lines | 150 code | 58 blank | 0 comment | 0 complexity | 795981b07b8045602c8cd310912712cf MD5 | raw file
  1. # Stage1 ACI implementor's guide
  2. ## Background
  3. rkt's execution of pods is divided roughly into three separate stages:
  4. 1. Stage 0: discovering, fetching, verifying, storing, and compositing of both application (stage2) and stage1 images for execution.
  5. 2. Stage 1: execution of the stage1 image from within the composite image prepared by stage0.
  6. 3. Stage 2: execution of individual application images within the containment afforded by stage1.
  7. This separation of concerns is reflected in the file-system and layout of the composite image prepared by stage0:
  8. 1. Stage 0: `rkt` executable, and the pod manifest created at `/var/lib/rkt/pods/prepare/$uuid/pod`.
  9. 2. Stage 1: `stage1.aci`, made available at `/var/lib/rkt/pods/run/$uuid/stage1` by `rkt run`.
  10. 3. Stage 2: `$app.aci`, made available at `/var/lib/rkt/pods/run/$uuid/stage1/rootfs/opt/stage2/$appname` by `rkt run`, where `$appname` is the name of the app in the pod manifest.
  11. The stage1 implementation is what creates the execution environment for the contained applications.
  12. This occurs via entrypoints from stage0 on behalf of `rkt run` and `rkt enter`.
  13. These entrypoints are executable programs located via annotations from within the stage1 ACI manifest, and executed from within the stage1 of a given pod at `/var/lib/rkt/pods/$state/$uuid/stage1/rootfs`.
  14. Stage2 is the deployed application image.
  15. Stage1 is the vehicle for getting there from stage0.
  16. For any given pod instance, stage1 may be replaced by a completely different implementation.
  17. This allows users to employ different containment strategies on the same host running the same interchangeable ACIs.
  18. ## Entrypoints
  19. ### rkt run
  20. `coreos.com/rkt/stage1/run`
  21. 1. rkt prepares the pod's stage1 and stage2 images and pod manifest under `/var/lib/rkt/pods/prepare/$uuid`, acquiring an exclusive advisory lock on the directory.
  22. Upon a successful preparation, the directory will be renamed to `/var/lib/rkt/pods/run/$uuid`.
  23. 2. chdirs to `/var/lib/rkt/pods/run/$uuid`.
  24. 3. resolves the `coreos.com/rkt/stage1/run` entrypoint via annotations found within `/var/lib/rkt/pods/run/$uuid/stage1/manifest`.
  25. 4. executes the resolved entrypoint relative to `/var/lib/rkt/pods/run/$uuid/stage1/rootfs`.
  26. It is the responsibility of this entrypoint to consume the pod manifest and execute the constituent apps in the appropriate environments as specified by the pod manifest.
  27. The environment variable `RKT_LOCK_FD` contains the file descriptor number of the open directory handle for `/var/lib/rkt/pods/run/$uuid`.
  28. It is necessary that stage1 leave this file descriptor open and in its locked state for the duration of the `rkt run`.
  29. In the bundled rkt stage1 which includes systemd-nspawn and systemd, the entrypoint is a static Go program found at `/init` within the stage1 ACI rootfs.
  30. The majority of its execution entails generating a systemd-nspawn argument list and writing systemd unit files for the constituent apps before executing systemd-nspawn.
  31. Systemd-nspawn then boots the stage1 systemd with the just-written unit files for launching the contained apps.
  32. The `/init` program's primary job is translating a pod manifest to systemd-nspawn systemd.services.
  33. An alternative stage1 could forego systemd-nspawn and systemd altogether, or retain these and introduce something like novm or qemu-kvm for greater isolation by first starting a VM.
  34. All that is required is an executable at the place indicated by the `coreos.com/rkt/stage1/run` entrypoint that knows how to apply the pod manifest and prepared ACI file-systems.
  35. The resolved entrypoint must inform rkt of its PID for the benefit of `rkt enter`.
  36. Stage1 implementors have two options for doing so; only one must be implemented:
  37. * `/var/lib/rkt/pods/run/$uuid/pid`: the PID of the process that will be given to the "enter" entrypoint.
  38. * `/var/lib/rkt/pods/run/$uuid/ppid`: the PID of the parent of the process that will be given to the "enter" entrypoint. That parent process must have exactly one child process.
  39. #### Arguments
  40. * `--debug` to activate debugging
  41. * `--net[=$NET1,$NET2,...]` to configure the creation of a contained network.
  42. See the [rkt networking documentation](../networking/overview.md) for details.
  43. * `--mds-token=$TOKEN` passes the auth token to the apps via `AC_METADATA_URL` env var
  44. * `--interactive` to run a pod interactively, that is, pass standard input to the application (only for pods with one application)
  45. * `--local-config=$PATH` to override the local configuration directory
  46. * `--private-users=$SHIFT` to define a UID/GID shift when using user namespaces. SHIFT is a two-value colon-separated parameter, the first value is the host UID to assign to the container and the second one is the number of host UIDs to assign.
  47. #### Arguments added in interface version 2
  48. * `--hostname=$HOSTNAME` configures the host name of the pod. If empty, it will be "rkt-$PODUUID".
  49. #### Arguments added in interface version 3
  50. * `--disable-capabilities-restriction` gives all capabilities to apps (overrides `retain-set` and `remove-set`)
  51. * `--disable-paths` disables inaccessible and read-only paths (such as `/proc/sysrq-trigger`)
  52. * `--disable-seccomp` disables seccomp (overrides `retain-set` and `remove-set`)
  53. ### rkt enter
  54. `coreos.com/rkt/stage1/enter`
  55. 1. rkt verifies the pod and image to enter are valid and running
  56. 2. chdirs to `/var/lib/rkt/pods/run/$uuid`
  57. 3. resolves the `coreos.com/rkt/stage1/enter` entrypoint via annotations found within `/var/lib/rkt/pods/run/$uuid/stage1/manifest`
  58. 4. executes the resolved entrypoint relative to `/var/lib/rkt/pods/run/$uuid/stage1/rootfs`
  59. In the bundled rkt stage1, the entrypoint is a statically-linked C program found at `/enter` within the stage1 ACI rootfs.
  60. This program enters the namespaces of the systemd-nspawn container's PID 1 before executing the `/enterexec` program.
  61. `enterexec` then `chroot`s into the ACI's rootfs, loading the application and its environment.
  62. An alternative stage1 would need to do whatever is appropriate for entering the application environment created by its own `coreos.com/rkt/stage1/run` entrypoint.
  63. #### Arguments
  64. 1. `--pid=$PID` passes the PID of the process that is PID 1 in the container.
  65. rkt finds that PID by one of the two supported methods described in the `rkt run` section.
  66. 2. `--appname=$NAME` passes the app name of the specific application to enter.
  67. 3. the separator `--`
  68. 4. cmd to execute.
  69. 5. optionally, any cmd arguments.
  70. ### rkt gc
  71. `coreos.com/rkt/stage1/gc`
  72. The gc entrypoint deals with garbage collecting resources allocated by stage1.
  73. For example, it removes the network namespace of a pod.
  74. #### Arguments
  75. * `--debug` to activate debugging
  76. * UUID of the pod
  77. ### rkt stop
  78. `coreos.com/rkt/stage1/stop`
  79. The optional stop entrypoint initiates an orderly shutdown of stage1.
  80. In the bundled rkt stage 1, the entrypoint is sending SIGTERM signal to systemd-nspawn. For kvm flavor, it is calling `systemctl halt` on the container (through SSH).
  81. #### Arguments
  82. * `--force` to force the stopping of the pod. E.g. in the bundled rkt stage 1, stop sends SIGKILL
  83. * UUID of the pod
  84. ## Versioning
  85. The stage1 command line interface is versioned using an annotation with the name `coreos.com/rkt/stage1/interface-version`.
  86. If the annotation is not present, rkt assumes the version is 1.
  87. The current version of the stage1 interface is 3.
  88. ## Examples
  89. ### Stage1 ACI manifest
  90. ```json
  91. {
  92. "acKind": "ImageManifest",
  93. "acVersion": "0.8.6",
  94. "name": "foo.com/rkt/stage1",
  95. "labels": [
  96. {
  97. "name": "version",
  98. "value": "0.0.1"
  99. },
  100. {
  101. "name": "arch",
  102. "value": "amd64"
  103. },
  104. {
  105. "name": "os",
  106. "value": "linux"
  107. }
  108. ],
  109. "annotations": [
  110. {
  111. "name": "coreos.com/rkt/stage1/run",
  112. "value": "/ex/run"
  113. },
  114. {
  115. "name": "coreos.com/rkt/stage1/enter",
  116. "value": "/ex/enter"
  117. },
  118. {
  119. "name": "coreos.com/rkt/stage1/gc",
  120. "value": "/ex/gc"
  121. },
  122. {
  123. "name": "coreos.com/rkt/stage1/stop",
  124. "value": "/ex/stop"
  125. },
  126. {
  127. "name": "coreos.com/rkt/stage1/interface-version",
  128. "value": "2"
  129. }
  130. ]
  131. }
  132. ```
  133. ## Filesystem Layout Assumptions
  134. The following paths are reserved for the stage1 image, and they will be created during stage0.
  135. When creating a stage1 image, developers SHOULD NOT create or use these paths in the image's filesystem.
  136. ### stage2
  137. `opt/stage2`
  138. This directory path is used for extracting the ACI of every app in the pod.
  139. Each app's rootfs will appear under this directory,
  140. e.g. `/var/lib/rkt/pods/run/$uuid/stage1/rootfs/opt/stage2/$appname/rootfs`.
  141. ### status
  142. `rkt/status`
  143. This directory path is used for storing the apps' exit statuses.
  144. For example, if an app named `foo` exits with status = `42`, stage1 should write `42`
  145. in `/var/lib/rkt/pods/run/$uuid/stage1/rootfs/rkt/status/foo`.
  146. Later the exit status can be retrieved and shown by `rkt status $uuid`.
  147. ### env
  148. `rkt/env`
  149. This directory path is used for passing environment variables to each app.
  150. For example, environment variables for an app named `foo` will be stored in `rkt/env/foo`.