The process_utils Module

This module provides functions for doing process management.

These are the main sections of this module:

Asynchronous Process Utilities

There is a function and class which can be used together with your custom Tollius or asyncio run loop.

The osrf_pycommon.process_utils.async_execute_process() function is a coroutine which allows you to run a process and get the output back bit by bit in real-time, either with stdout and stderr separated or combined. This function also allows you to emulate the terminal using a pty simply by toggling a flag in the parameters.

Along side this coroutine is a Protocol class, osrf_pycommon.process_utils.AsyncSubprocessProtocol, from which you can inherit in order to customize how the yielded output is handled.

Because this coroutine is built on the trollius/asyncio framework’s subprocess functions, it is portable and should behave the same on all major OS’s. (including on Windows where an IOCP implementation is used)

osrf_pycommon.process_utils.async_execute_process(protocol_class, cmd=None, cwd=None, env=None, shell=False, emulate_tty=False, stderr_to_stdout=True)[source]

Coroutine to execute a subprocess and yield the output back asynchronously.

This function is meant to be used with the Python asyncio module, which is available via pip with Python 3.3 and built-in to Python 3.4. On Python >= 2.6 you can use the trollius module to get the same functionality, but without using the new yield from syntax.

Here is an example of how to use this function:

import asyncio
from osrf_pycommon.process_utils import async_execute_process
from osrf_pycommon.process_utils import AsyncSubprocessProtocol
from osrf_pycommon.process_utils import get_loop


@asyncio.coroutine
def setup():
    transport, protocol = yield from async_execute_process(
        AsyncSubprocessProtocol, ['ls', '/usr'])
    returncode = yield from protocol.complete
    return returncode

retcode = get_loop().run_until_complete(setup())
get_loop().close()

That same example using trollius would look like this:

import trollius as asyncio
from osrf_pycommon.process_utils import async_execute_process
from osrf_pycommon.process_utils import AsyncSubprocessProtocol
from osrf_pycommon.process_utils import get_loop


@asyncio.coroutine
def setup():
    transport, protocol = yield asyncio.From(async_execute_process(
        AsyncSubprocessProtocol, ['ls', '/usr']))
    returncode = yield asyncio.From(protocol.complete)
    raise asyncio.Return(returncode)

retcode = get_loop().run_until_complete(setup())
get_loop().close()

This difference is required because in Python < 3.3 the yield from syntax is not valid.

In both examples, the first argument is the default AsyncSubprocessProtocol protocol class, which simply prints output from stdout to stdout and output from stderr to stderr.

If you want to capture and do something with the output or write to the stdin, then you need to subclass from the AsyncSubprocessProtocol class, and override the on_stdout_received, on_stderr_received, and on_process_exited functions.

See the documentation for the AsyncSubprocessProtocol class for more details, but here is an example which uses asyncio from Python 3.4:

import asyncio
from osrf_pycommon.process_utils import async_execute_process
from osrf_pycommon.process_utils import AsyncSubprocessProtocol
from osrf_pycommon.process_utils import get_loop


class MyProtocol(AsyncSubprocessProtocol):
    def __init__(self, file_name, **kwargs):
        self.fh = open(file_name, 'w')
        AsyncSubprocessProtocol.__init__(self, **kwargs)

    def on_stdout_received(self, data):
        # Data has line endings intact, but is bytes in Python 3
        self.fh.write(data.decode('utf-8'))

    def on_stderr_received(self, data):
        self.fh.write(data.decode('utf-8'))

    def on_process_exited(self, returncode):
        self.fh.write("Exited with return code: {0}".format(returncode))
        self.fh.close()


@asyncio.coroutine
def log_command_to_file(cmd, file_name):

    def create_protocol(**kwargs):
        return MyProtocol(file_name, **kwargs)

    transport, protocol = yield from async_execute_process(
        create_protocol, cmd)
    returncode = yield from protocol.complete
    return returncode

get_loop().run_until_complete(
    log_command_to_file(['ls', '/'], '/tmp/out.txt'))
get_loop().close()

See the subprocess.Popen class for more details on some of the parameters to this function like cwd, env, and shell.

See the osrf_pycommon.process_utils.execute_process() function for more details on the emulate_tty parameter.

Parameters:
  • protocol_class (AsyncSubprocessProtocol or a subclass) – Protocol class which handles subprocess callbacks
  • cmd (list) – list of arguments where the executable is the first item
  • cwd (str) – directory in which to run the command
  • env (dict) – a dictionary of environment variable names to values
  • shell (bool) – if True, the cmd variable is interpreted by a the shell
  • emulate_tty (bool) – if True, pty’s are passed to the subprocess for stdout and stderr, see osrf_pycommon.process_utils.execute_process().
  • stderr_to_stdout (bool) – if True, stderr is directed to stdout, so they are not captured separately.
class osrf_pycommon.process_utils.AsyncSubprocessProtocol(stdin=None, stdout=None, stderr=None)[source]

Protocol to subclass to get events from async_execute_process().

When subclassing this Protocol class, you should override these functions:

def on_stdout_received(self, data):
    # ...

def on_stderr_received(self, data):
    # ...

def on_process_exited(self, returncode):
    # ...

By default these functions just print the data received from stdout and stderr and does nothing when the process exits.

Data received by the on_stdout_received and on_stderr_received functions is always in bytes (str in Python2 and bytes in Python3). Therefore, it may be necessary to call .decode() on the data before printing to the screen.

Additionally, the data received will not be stripped of new lines, so take that into consideration when printing the result.

You can also override these less commonly used functions:

def on_stdout_open(self):
    # ...

def on_stdout_close(self, exc):
    # ...

def on_stderr_open(self):
    # ...

def on_stderr_close(self, exc):
    # ...

These functions are called when stdout/stderr are opened and closed, and can be useful when using pty’s for example. The exc parameter of the *_close functions is None unless there was an exception.

In addition to the overridable functions this class has a few useful public attributes. The stdin attribute is a reference to the PipeProto which follows the asyncio.WriteTransport interface. The stdout and stderr attributes also reference their PipeProto. The complete attribute is a asyncio.Future which is set to complete when the process exits and its result is the return code.

The complete attribute can be used like this:

import asyncio
from osrf_pycommon.process_utils import async_execute_process
from osrf_pycommon.process_utils import AsyncSubprocessProtocol
from osrf_pycommon.process_utils import get_loop


@asyncio.coroutine
def setup():
    transport, protocol = yield from async_execute_process(
        AsyncSubprocessProtocol, ['ls', '-G', '/usr'])
    retcode = yield from protocol.complete
    print("Exited with", retcode)

# This will block until the protocol.complete Future is done.
get_loop().run_until_complete(setup())
get_loop().close()

In addtion to these functions, there is a utility function for getting the correct asyncio event loop:

osrf_pycommon.process_utils.get_loop()[source]

This function will return the proper event loop for the subprocess async calls.

On Unix this just returns asyncio.get_event_loop(), but on Windows it will set and return a asyncio.ProactorEventLoop instead.

Treatment of File Descriptors

Unlike subprocess.Popen, all of the process_utils functions behave the same way on Python versions 2.7 through 3.4, and they do not close inheritable <https://docs.python.org/3.4/library/os.html#fd-inheritance>. file descriptors before starting subprocesses. This is equivalent to passing close_fds=False to subprocess.Popen on all Python versions.

In Python 3.2, the subprocess.Popen default for the close_fds option changed from False to True so that file descriptors opened by the parent process were closed before spawning the child process. In Python 3.4, PEP 0446 additionally made it so even when close_fds=False file descriptors which are non-inheritable are still closed before spawning the subprocess.

If you want to be able to pass file descriptors to subprocesses in Python 3.4 or higher, you will need to make sure they are inheritable <https://docs.python.org/3.4/library/os.html#fd-inheritance>.

Synchronous Process Utilities

For synchronous execution and output capture of subprocess, there are two functions:

These functions are not yet using the trollius/asyncio framework as a back-end and therefore on Windows will not stream the data from the subprocess as it does on Unix machines. Instead data will not be yielded until the subprocess is finished and all output is buffered (the normal warnings about long running programs with lots of output apply).

The streaming of output does not work on Windows because on Windows the select.select() method only works on sockets and not file-like objects which are used with subprocess pipes. asyncio implements Windows subprocess support by implementing a Proactor event loop based on Window’s IOCP API. One future option will be to implement this synchronous style method using IOCP in this module, but another option is to just make synchronous the asynchronous calls, but there are issues with that as well. In the mean time, if you need streaming of output in both Windows and Unix, use the asynchronous calls.

osrf_pycommon.process_utils.execute_process(cmd, cwd=None, env=None, shell=False, emulate_tty=False)[source]

Executes a command with arguments and returns output line by line.

All arguments, except emulate_tty, are passed directly to subprocess.Popen.

execute_process returns a generator which yields the output, line by line, until the subprocess finishes at which point the return code is yielded.

This is an example of how this function should be used:

from __future__ import print_function
from osrf_pycommon.process_utils import execute_process

cmd = ['ls', '-G']
for line in execute_process(cmd, cwd='/usr'):
    if isinstance(line, int):
        # This is a return code, the command has exited
        print("'{0}' exited with: {1}".format(' '.join(cmd), line))
        continue  # break would also be appropriate here
    # In Python 3, it will be a bytes array which needs to be decoded
    if not isinstance(line, str):
        line = line.decode('utf-8')
    # Then print it to the screen
    print(line, end='')

stdout and stderr are always captured together and returned line by line through the returned generator. New line characters are preserved in the output, so if re-printing the data take care to use end='' or first rstrip the output lines.

When emulate_tty is used on Unix systems, commands will identify that they are on a tty and should output color to the screen as if you were running it on the terminal, and therefore there should not be any need to pass arguments like -c color.ui=always to commands like git. Additionally, programs might also behave differently in when emulate_tty is being used, for example, Python will default to unbuffered output when it detects a tty.

emulate_tty works by using psuedo-terminals on Unix machines, and so if you are running this command many times in parallel (like hundreds of times) then you may get one of a few different OSError‘s. For example, “OSError: [Errno 24] Too many open files: ‘/dev/ttyp0’” or “OSError: out of pty devices”. You should also be aware that you share pty devices with the rest of the system, so even if you are not using a lot, it is possible to get this error. You can catch this error before getting data from the generator, so when using emulate_tty you might want to do something like this:

from __future__ import print_function
from osrf_pycommon.process_utils import execute_process

cmd = ['ls', '-G', '/usr']
try:
    output = execute_process(cmd, emulate_tty=True)
except OSError:
    output = execute_process(cmd, emulate_tty=False)
for line in output:
    if isinstance(line, int):
        print("'{0}' exited with: {1}".format(' '.join(cmd), line))
        continue
    # In Python 3, it will be a bytes array which needs to be decoded
    if not isinstance(line, str):
        line = line.decode('utf-8')
    print(line, end='')

This way if a pty cannot be opened in order to emulate the tty then you can try again without emulation, and any other OSError should raise again with emulate_tty set to False. Obviously, you only want to do this if emulating the tty is non-critical to your processing, like when you are using it to capture color.

Any color information that the command outputs as ANSI escape sequences is captured by this command. That way you can print the output to the screen and preserve the color formatting.

If you do not want color to be in the output, then try setting emulate_tty to False, but that does not guarantee that there is no color in the output, instead it only will cause called processes to identify that they are not being run in a terminal. Most well behaved programs will not output color if they detect that they are not being executed in a terminal, but you shouldn’t rely on that.

If you want to ensure there is no color in the output from an executed process, then use this function:

osrf_pycommon.terminal_color.remove_ansi_escape_senquences()

Exceptions can be raised by functions called by the implementation, for example, subprocess.Popen can raise an OSError when the given command is not found. If you want to check for the existence of an executable on the path, see: which(). However, this function itself does not raise any special exceptions.

Parameters:
  • cmd (list) – list of strings with the first item being a command and subsequent items being any arguments to that command; passed directly to subprocess.Popen.
  • cwd (str) – path in which to run the command, defaults to None which means os.getcwd() is used; passed directly to subprocess.Popen.
  • env (dict) – environment dictionary to use for executing the command, default is None which uses the os.environ environment; passed directly to subprocess.Popen.
  • shell (bool) – If True the system shell is used to evaluate the command, default is False; passed directly to subprocess.Popen.
  • emulate_tty (bool) – If True attempts to use a pty to convince subprocess’s that they are being run in a terminal. Typically this is useful for capturing colorized output from commands. This does not work on Windows (no pty’s), so it is considered False even when True. Defaults to False.
Returns:

a generator which yields output from the command line by line

Return type:

generator which yields strings

Availability: Unix (streaming), Windows (blocking)

osrf_pycommon.process_utils.execute_process_split(cmd, cwd=None, env=None, shell=False, emulate_tty=False)[source]

execute_process(), except stderr is returned separately.

Instead of yielding output line by line until yielding a return code, this function always a triplet of stdout, stderr, and return code. Each time only one of the three will not be None. Once you receive a non-None return code (type will be int) there will be no more stdout or stderr. Therefore you can use the command like this:

from __future__ import print_function
import sys
from osrf_pycommon.process_utils import execute_process_split

cmd = ['time', 'ls', '-G']
for out, err, ret in execute_process_split(cmd, cwd='/usr'):
    # In Python 3, it will be a bytes array which needs to be decoded
    out = out.decode('utf-8') if out is not None else None
    err = err.decode('utf-8') if err is not None else None
    if ret is not None:
        # This is a return code, the command has exited
        print("'{0}' exited with: {1}".format(' '.join(cmd), ret))
        break
    if out is not None:
        print(out, end='')
    if err is not None:
        print(err, end='', file=sys.stderr)

When using this, it is possible that the stdout and stderr data can be returned in a different order than what would happen on the terminal. This is due to the fact that the subprocess is given different buffers for stdout and stderr and so there is a race condition on the subprocess writing to the different buffers and this command reading the buffers. This can be avoided in most scenarios by using emulate_tty, because of the use of pty‘s, though the ordering can still not be guaranteed and the number of pty‘s is finite as explained in the documentation for execute_process(). For situations where output ordering between stdout and stderr are critical, they should not be returned separately and instead should share one buffer, and so execute_process() should be used.

For all other parameters and documentation see: execute_process()

Availability: Unix (streaming), Windows (blocking)

Utility Functions

Currently there is only one utility function, a Python implementation of the which shell command.

osrf_pycommon.process_utils.which(cmd, mode=1, path=None, **kwargs)[source]

Given a command, mode, and a PATH string, return the path which conforms to the given mode on the PATH, or None if there is no such file.

mode defaults to os.F_OK | os.X_OK. path defaults to the result of os.environ.get("PATH"), or can be overridden with a custom search path.

Backported from shutil.which() (https://docs.python.org/3.3/library/shutil.html#shutil.which), available in Python 3.3.