PageRenderTime 40ms CodeModel.GetById 12ms RepoModel.GetById 0ms app.codeStats 0ms

/pypy/doc/sandbox.rst

https://bitbucket.org/pypy/pypy/
ReStructuredText | 144 lines | 113 code | 31 blank | 0 comment | 0 complexity | faa0e0ef7579f94a61dab6752412809b MD5 | raw file
Possible License(s): AGPL-3.0, BSD-3-Clause, Apache-2.0
  1. .. _sandbox:
  2. PyPy's sandboxing features
  3. ==========================
  4. Introduction
  5. ------------
  6. PyPy offers sandboxing at a level similar to OS-level sandboxing (e.g.
  7. SECCOMP_ on Linux), but implemented in a fully portable way. To use it,
  8. a (regular, trusted) program launches a subprocess that is a special
  9. sandboxed version of PyPy. This subprocess can run arbitrary untrusted
  10. Python code, but all its input/output is serialized to a stdin/stdout
  11. pipe instead of being directly performed. The outer process reads the
  12. pipe and decides which commands are allowed or not (sandboxing), or even
  13. reinterprets them differently (virtualization). A potential attacker
  14. can have arbitrary code run in the subprocess, but cannot actually do
  15. any input/output not controlled by the outer process. Additional
  16. barriers are put to limit the amount of RAM and CPU time used.
  17. Note that this is very different from sandboxing at the Python language
  18. level, i.e. placing restrictions on what kind of Python code the
  19. attacker is allowed to run (why? read about pysandbox_).
  20. .. _SECCOMP: http://code.google.com/p/seccompsandbox/wiki/overview
  21. .. _pysandbox: https://mail.python.org/pipermail/python-dev/2013-November/130132.html
  22. Another point of comparison: if we were instead to try to plug CPython
  23. into a special virtualizing C library, we would get a result
  24. that is not only OS-specific, but unsafe, because CPython can be
  25. segfaulted (in many ways, all of them really, really obscure).
  26. Given enough efforts, an attacker can turn almost any
  27. segfault into a vulnerability. The C code generated by
  28. PyPy is not segfaultable, as long as our code generators are correct -
  29. that's a lower number of lines of code to trust. For the paranoid,
  30. PyPy translated with sandboxing also contains systematic run-time
  31. checks (against buffer overflows for example)
  32. that are normally only present in debugging versions.
  33. .. warning::
  34. The hard work from the PyPy side is done --- you get a fully secure
  35. version. What is only experimental and unpolished is the library to
  36. use this sandboxed PyPy from a regular Python interpreter (CPython, or
  37. an unsandboxed PyPy). Contributions welcome.
  38. .. warning::
  39. Tested with PyPy2. May not work out of the box with PyPy3.
  40. Overview
  41. --------
  42. One of PyPy's translation aspects is a sandboxing feature. It's "sandboxing" as
  43. in "full virtualization", but done in normal C with no OS support at all. It's
  44. a two-processes model: we can translate PyPy to a special "pypy-c-sandbox"
  45. executable, which is safe in the sense that it doesn't do any library or
  46. system calls - instead, whenever it would like to perform such an operation, it
  47. marshals the operation name and the arguments to its stdout and it waits for
  48. the marshalled result on its stdin. This pypy-c-sandbox process is meant to be
  49. run by an outer "controller" program that answers these operation requests.
  50. The pypy-c-sandbox program is obtained by adding a transformation during
  51. translation, which turns all RPython-level external function calls into
  52. stubs that do the marshalling/waiting/unmarshalling. An attacker that
  53. tries to escape the sandbox is stuck within a C program that contains no
  54. external function calls at all except for writing to stdout and reading from
  55. stdin. (It's still attackable in theory, e.g. by exploiting segfault-like
  56. situations, but as explained in the introduction we think that PyPy is
  57. rather safe against such attacks.)
  58. The outer controller is a plain Python program that can run in CPython
  59. or a regular PyPy. It can perform any virtualization it likes, by
  60. giving the subprocess any custom view on its world. For example, while
  61. the subprocess thinks it's using file handles, in reality the numbers
  62. are created by the controller process and so they need not be (and
  63. probably should not be) real OS-level file handles at all. In the demo
  64. controller I've implemented there is simply a mapping from numbers to
  65. file-like objects. The controller answers to the "os_open" operation by
  66. translating the requested path to some file or file-like object in some
  67. virtual and completely custom directory hierarchy. The file-like object
  68. is put in the mapping with any unused number >= 3 as a key, and the
  69. latter is returned to the subprocess. The "os_read" operation works by
  70. mapping the pseudo file handle given by the subprocess back to a
  71. file-like object in the controller, and reading from the file-like
  72. object.
  73. Translating an RPython program with sandboxing enabled also uses a special flag
  74. that enables all sorts of C-level assertions against index-out-of-bounds
  75. accesses.
  76. By the way, as you should have realized, it's really independent from
  77. the fact that it's PyPy that we are translating. Any RPython program
  78. should do. I've successfully tried it on the JS interpreter. The
  79. controller is only called "pypy_interact" because it emulates a file
  80. hierarchy that makes pypy-c-sandbox happy - it contains (read-only)
  81. virtual directories like /bin/lib/pypy1.2/lib-python and
  82. /bin/lib/pypy1.2/lib_pypy and it
  83. pretends that the executable is /bin/pypy-c.
  84. Howto
  85. -----
  86. Grab a copy of the pypy repository_. In the directory pypy/goal, run::
  87. ../../rpython/bin/rpython -O2 --sandbox targetpypystandalone.py
  88. If you don't have a regular PyPy installed, you should, because it's
  89. faster to translate; but you can also run the same line with ``python``
  90. in front.
  91. .. _repository: https://bitbucket.org/pypy/pypy
  92. To run it, use the tools in the pypy/sandbox directory::
  93. ./pypy_interact.py /some/path/pypy-c-sandbox [args...]
  94. Just like with pypy-c, if you pass no argument you get the interactive
  95. prompt. In theory it's impossible to do anything bad or read a random
  96. file on the machine from this prompt. To pass a script as an argument you need
  97. to put it in a directory along with all its dependencies, and ask
  98. pypy_interact to export this directory (read-only) to the subprocess'
  99. virtual /tmp directory with the ``--tmp=DIR`` option. Example::
  100. mkdir myexported
  101. cp script.py myexported/
  102. ./pypy_interact.py --tmp=myexported /some/path/pypy-c-sandbox /tmp/script.py
  103. This is safe to do even if script.py comes from some random
  104. untrusted source, e.g. if it is done by an HTTP server.
  105. To limit the used heapsize, use the ``--heapsize=N`` option to
  106. pypy_interact.py. You can also give a limit to the CPU time (real time) by
  107. using the ``--timeout=N`` option.
  108. Not all operations are supported; e.g. if you type os.readlink('...'),
  109. the controller crashes with an exception and the subprocess is killed.
  110. Other operations make the subprocess die directly with a "Fatal RPython
  111. error". None of this is a security hole. More importantly, *most other
  112. built-in modules are not enabled. Please read all the warnings in this
  113. page before complaining about this. Contributions welcome.*