/pypy/doc/sandbox.rst
ReStructuredText | 144 lines | 113 code | 31 blank | 0 comment | 0 complexity | faa0e0ef7579f94a61dab6752412809b MD5 | raw file
Possible License(s): AGPL-3.0, BSD-3-Clause, Apache-2.0
- .. _sandbox:
- PyPy's sandboxing features
- ==========================
- Introduction
- ------------
- PyPy offers sandboxing at a level similar to OS-level sandboxing (e.g.
- SECCOMP_ on Linux), but implemented in a fully portable way. To use it,
- a (regular, trusted) program launches a subprocess that is a special
- sandboxed version of PyPy. This subprocess can run arbitrary untrusted
- Python code, but all its input/output is serialized to a stdin/stdout
- pipe instead of being directly performed. The outer process reads the
- pipe and decides which commands are allowed or not (sandboxing), or even
- reinterprets them differently (virtualization). A potential attacker
- can have arbitrary code run in the subprocess, but cannot actually do
- any input/output not controlled by the outer process. Additional
- barriers are put to limit the amount of RAM and CPU time used.
- Note that this is very different from sandboxing at the Python language
- level, i.e. placing restrictions on what kind of Python code the
- attacker is allowed to run (why? read about pysandbox_).
- .. _SECCOMP: http://code.google.com/p/seccompsandbox/wiki/overview
- .. _pysandbox: https://mail.python.org/pipermail/python-dev/2013-November/130132.html
- Another point of comparison: if we were instead to try to plug CPython
- into a special virtualizing C library, we would get a result
- that is not only OS-specific, but unsafe, because CPython can be
- segfaulted (in many ways, all of them really, really obscure).
- Given enough efforts, an attacker can turn almost any
- segfault into a vulnerability. The C code generated by
- PyPy is not segfaultable, as long as our code generators are correct -
- that's a lower number of lines of code to trust. For the paranoid,
- PyPy translated with sandboxing also contains systematic run-time
- checks (against buffer overflows for example)
- that are normally only present in debugging versions.
- .. warning::
- The hard work from the PyPy side is done --- you get a fully secure
- version. What is only experimental and unpolished is the library to
- use this sandboxed PyPy from a regular Python interpreter (CPython, or
- an unsandboxed PyPy). Contributions welcome.
- .. warning::
-
- Tested with PyPy2. May not work out of the box with PyPy3.
- Overview
- --------
- One of PyPy's translation aspects is a sandboxing feature. It's "sandboxing" as
- in "full virtualization", but done in normal C with no OS support at all. It's
- a two-processes model: we can translate PyPy to a special "pypy-c-sandbox"
- executable, which is safe in the sense that it doesn't do any library or
- system calls - instead, whenever it would like to perform such an operation, it
- marshals the operation name and the arguments to its stdout and it waits for
- the marshalled result on its stdin. This pypy-c-sandbox process is meant to be
- run by an outer "controller" program that answers these operation requests.
- The pypy-c-sandbox program is obtained by adding a transformation during
- translation, which turns all RPython-level external function calls into
- stubs that do the marshalling/waiting/unmarshalling. An attacker that
- tries to escape the sandbox is stuck within a C program that contains no
- external function calls at all except for writing to stdout and reading from
- stdin. (It's still attackable in theory, e.g. by exploiting segfault-like
- situations, but as explained in the introduction we think that PyPy is
- rather safe against such attacks.)
- The outer controller is a plain Python program that can run in CPython
- or a regular PyPy. It can perform any virtualization it likes, by
- giving the subprocess any custom view on its world. For example, while
- the subprocess thinks it's using file handles, in reality the numbers
- are created by the controller process and so they need not be (and
- probably should not be) real OS-level file handles at all. In the demo
- controller I've implemented there is simply a mapping from numbers to
- file-like objects. The controller answers to the "os_open" operation by
- translating the requested path to some file or file-like object in some
- virtual and completely custom directory hierarchy. The file-like object
- is put in the mapping with any unused number >= 3 as a key, and the
- latter is returned to the subprocess. The "os_read" operation works by
- mapping the pseudo file handle given by the subprocess back to a
- file-like object in the controller, and reading from the file-like
- object.
- Translating an RPython program with sandboxing enabled also uses a special flag
- that enables all sorts of C-level assertions against index-out-of-bounds
- accesses.
- By the way, as you should have realized, it's really independent from
- the fact that it's PyPy that we are translating. Any RPython program
- should do. I've successfully tried it on the JS interpreter. The
- controller is only called "pypy_interact" because it emulates a file
- hierarchy that makes pypy-c-sandbox happy - it contains (read-only)
- virtual directories like /bin/lib/pypy1.2/lib-python and
- /bin/lib/pypy1.2/lib_pypy and it
- pretends that the executable is /bin/pypy-c.
- Howto
- -----
- Grab a copy of the pypy repository_. In the directory pypy/goal, run::
- ../../rpython/bin/rpython -O2 --sandbox targetpypystandalone.py
- If you don't have a regular PyPy installed, you should, because it's
- faster to translate; but you can also run the same line with ``python``
- in front.
- .. _repository: https://bitbucket.org/pypy/pypy
- To run it, use the tools in the pypy/sandbox directory::
- ./pypy_interact.py /some/path/pypy-c-sandbox [args...]
- Just like with pypy-c, if you pass no argument you get the interactive
- prompt. In theory it's impossible to do anything bad or read a random
- file on the machine from this prompt. To pass a script as an argument you need
- to put it in a directory along with all its dependencies, and ask
- pypy_interact to export this directory (read-only) to the subprocess'
- virtual /tmp directory with the ``--tmp=DIR`` option. Example::
- mkdir myexported
- cp script.py myexported/
- ./pypy_interact.py --tmp=myexported /some/path/pypy-c-sandbox /tmp/script.py
- This is safe to do even if script.py comes from some random
- untrusted source, e.g. if it is done by an HTTP server.
- To limit the used heapsize, use the ``--heapsize=N`` option to
- pypy_interact.py. You can also give a limit to the CPU time (real time) by
- using the ``--timeout=N`` option.
- Not all operations are supported; e.g. if you type os.readlink('...'),
- the controller crashes with an exception and the subprocess is killed.
- Other operations make the subprocess die directly with a "Fatal RPython
- error". None of this is a security hole. More importantly, *most other
- built-in modules are not enabled. Please read all the warnings in this
- page before complaining about this. Contributions welcome.*