9.2.8. Helper functions — MDAnalysis.lib.util

Small helper functions that don’t fit anywhere else.

9.2.8.1. Files and directories

MDAnalysis.lib.util.filename(name, ext=None, keep=False)[source]

Return a new name that has suffix attached; replaces other extensions.

Parameters:
  • *name* – filename; extension is replaced unless keep=True; name can also be a NamedStream (and its NamedStream.name will be changed accordingly)
  • *ext* – extension
  • *keep*
    • False: replace existing extension with ext;
    • True: keep old extension if one existed

Changed in version 0.9.0: Also permits NamedStream to pass through.

MDAnalysis.lib.util.openany(directory[, mode='r'])[source]

Context manager to open a compressed (bzip2, gzip) or plain file (uses anyopen()).

MDAnalysis.lib.util.anyopen(datasource, mode='r', reset=True)[source]

Open datasource (gzipped, bzipped, uncompressed) and return a stream.

datasource can be a filename or a stream (see isstream()). By default, a stream is reset to its start if possible (via seek() or reset()).

If possible, the attribute stream.name is set to the filename or “<stream>” if no filename could be associated with the datasource.

Parameters:
  • *datasource* – a file (from file or open()) or a stream (e.g. from urllib2.urlopen() or cStringIO.StringIO)
  • *mode* – ‘r’ or ‘w’ or ‘a’, more complicated modes (‘r+’, ‘w+’ are not supported because only the first letter is looked at) ['r']
  • *reset* – try to read (mode ‘r’) the stream from the start [True]
Returns:

tuple stream which is a file-like object

See also

openany() to be used with the with statement.

Changed in version 0.9.0: Only returns the stream and tries to set stream.name = filename instead of the previous behavior to return a tuple (stream, filename).

MDAnalysis.lib.util.greedy_splitext(p)[source]

Split extension in path p at the left-most separator.

Extensions are taken to be separated from the filename with the separator os.extsep (as used by os.path.splitext()).

Parameters:p (path, string) –
Returns:
  • Tuple (root, extension) where root is the full path and
  • filename with all extensions removed whereas extension is the
  • string of all extensions.

Example

>>> greedy_splitext("/home/joe/protein.pdb.bz2")
('/home/joe/protein', '.pdb.bz2')
MDAnalysis.lib.util.which(program)[source]

Determine full path of executable program on PATH.

(Jay at http://stackoverflow.com/questions/377017/test-if-executable-exists-in-python)

MDAnalysis.lib.util.realpath(*args)[source]

Join all args and return the real path, rooted at /.

Expands ‘~’, ‘~user’, and environment variables such as :envvar`$HOME`.

Returns None if any of the args is None.

MDAnalysis.lib.util.guess_format(filename)[source]

Return the format of filename

The current heuristic simply looks at the filename extension and can work around compressed format extensions

filename can also be a stream, in which case filename.name is looked at for a hint to the format

Raises:*ValueError*

New in version 0.11.0: Moved into lib.util

9.2.8.2. Streams

Many of the readers are not restricted to just reading files. They can also use gzip-compressed or bzip2-compressed files (through the internal use of openany()). It is also possible to provide more general streams as inputs, such as a cStringIO.StringIO() instances (essentially, a memory buffer) by wrapping these instances into a NamedStream. This NamedStream can then be used in place of an ordinary file name (typically, with a class:~MDAnalysis.core.AtomGroup.Universe but it is also possible to write to such a stream using MDAnalysis.Writer()).

In the following example, we use a PDB stored as a string pdb_s:

import MDAnalysis
from MDAnalysis.lib.util import NamedStream
import cStringIO

pdb_s = "TITLE     Lonely Ion\nATOM      1  NA  NA+     1      81.260  64.982  10.926  1.00  0.00\n"
u = MDAnalysis.Universe(NamedStream(cStringIO.StringIO(pdb_s), "ion.pdb"))
print(u)
#  <Universe with 1 atoms>
print(u.atoms.positions)
# [[ 81.26000214  64.98200226  10.92599964]]

It is important to provide a proper pseudo file name with the correct extension (”.pdb”) to NamedStream because the file type recognition uses the extension of the file name to determine the file format or alternatively provide the format="pdb" keyword argument to the Universe.

The use of streams becomes more interesting when MDAnalysis is used as glue between different analysis packages and when one can arrange things so that intermediate frames (typically in the PDB format) are not written to disk but remain in memory via e.g. cStringIO buffers.

Note

A remote connection created by urllib2.urlopen() is not seekable and therefore will often not work as an input. But try it...

class MDAnalysis.lib.util.NamedStream(stream, filename, reset=True, close=False)[source]

Stream that also provides a (fake) name.

By wrapping a stream stream in this class, it can be passed to code that uses inspection of the filename to make decisions. For instance. os.path.split() will work correctly on a NamedStream.

The class can be used as a context manager.

NamedStream is derived from io.IOBase (to indicate that it is a stream). Many operations that normally expect a string will also work with a NamedStream; for instance, most of the functions in os.path will work with the exception of os.path.expandvars() and os.path.expanduser(), which will return the NamedStream itself instead of a string if no substitutions were made.

Example

Wrap a cStringIO.StringIO() instance to write to:

import cStringIO
import os.path
stream = cStringIO.StringIO()
f = NamedStream(stream, "output.pdb")
print(os.path.splitext(f))

Wrap a file instance to read from:

stream = open("input.pdb")
f = NamedStream(stream, stream.name)

Use as a context manager (closes stream automatically when the with block is left):

with NamedStream(open("input.pdb"), "input.pdb") as f:
   # use f
   print f.closed  # --> False
   # ...
print f.closed     # --> True

Note

This class uses its own __getitem__() method so if stream implements stream.__getitem__() then that will be masked and this class should not be used.

Warning

By default, NamedStream.close() will not close the stream but instead reset() it to the beginning. [1] Provide the force=True keyword to NamedStream.close() to always close the stream.

Initialize the NamedStream from a stream and give it a name.

The constructor attempts to rewind the stream to the beginning unless the keyword reset is set to False. If rewinding fails, a MDAnalysis.StreamWarning is issued.

Note

By default, this stream will not be closed by with and close() (see there) unless the close keyword is set to True.

Parameters:
  • stream (stream) – an open stream (e.g. file or cStringIO.StringIO())
  • filename (str) – the filename that should be associated with the stream
  • Keywords
  • --------
  • reset (boolean, default True) – start the stream from the beginning (either reset() or seek()) when the class instance is constructed
  • close (booelan, default True) – close the stream when a with block exits or when close() is called; note that the default is not to close the stream
  • versionadded (.) –
__del__()[source]

Always closes the stream.

__ge__(other)

x.__ge__(y) <==> x>=y

__gt__(other)

x.__gt__(y) <==> x>y

__le__(other)

x.__le__(y) <==> x<=y

close(force=False)[source]

Reset or close the stream.

If NamedStream.close_stream is set to False (the default) then this method will not close the stream and only reset() it.

If the force = True keyword is provided, the stream will be closed.

Note

This close() method is non-standard. del NamedStream always closes the underlying stream.

closed

True if stream is closed.

fileno()[source]

Return the underlying file descriptor (an integer) of the stream if it exists.

An IOError is raised if the IO object does not use a file descriptor.

flush()[source]

Flush the write buffers of the stream if applicable.

This does nothing for read-only and non-blocking streams. For file objects one also needs to call os.fsync() to write contents to disk.

readable()[source]

Return True if the stream can be read from.

If False, read() will raise IOError.

reset()[source]

Move to the beginning of the stream

seek(offset, whence=0)[source]

Change the stream position to the given byte offset .

offset is interpreted relative to the position indicated by whence. Values for whence are:

  • io.SEEK_SET or 0 – start of the stream (the default); offset should be zero or positive
  • io.SEEK_CUR or 1 – current stream position; offset may be negative
  • io.SEEK_END or 2 – end of the stream; offset is usually negative
Returns:the new absolute position.
seekable()[source]

Return True if the stream supports random access.

If False, seek(), tell() and truncate() will raise IOError.

tell()[source]

Return the current stream position.

truncate(*size)[source]

Truncate the stream’s size to size.

The size defaults to the current position (if no size argument is supplied). The current file position is not changed.

writable()[source]

Return True if the stream can be written to.

If False, write() will raise IOError.

MDAnalysis.lib.util.isstream(obj)[source]

Detect if obj is a stream.

We consider anything a stream that has the methods

  • close()

and either set of the following

  • read(), readline(), readlines()
  • write(), writeline(), writelines()

See also

io

Parameters:*obj* – stream or string
Returns:True is obj is a stream, False otherwise

New in version 0.9.0.

9.2.8.3. Containers and lists

MDAnalysis.lib.util.iterable(obj)[source]

Returns True if obj can be iterated over and is not a string nor a NamedStream

MDAnalysis.lib.util.asiterable(obj)[source]

Returns obj so that it can be iterated over; a string is not treated as iterable

MDAnalysis.lib.util.hasmethod(obj, m)[source]

Return True if object obj contains the method m.

9.2.8.4. File parsing

class MDAnalysis.lib.util.FORTRANReader(fmt)[source]

FORTRANReader provides a method to parse FORTRAN formatted lines in a file.

Usage:

atomformat = FORTRANReader('2I10,2X,A8,2X,A8,3F20.10,2X,A8,2X,A8,F20.10')
for line in open('coordinates.crd'):
    serial,TotRes,resName,name,x,y,z,chainID,resSeq,tempFactor = atomformat.read(line)

Fortran format edit descriptors; see Fortran Formats for the syntax.

Only simple one-character specifiers supported here: I F E A X (see FORTRAN_format_regex).

Strings are stripped of leading and trailing white space.

Set up the reader with the FORTRAN format string.

The string fmt should look like ‘2I10,2X,A8,2X,A8,3F20.10,2X,A8,2X,A8,F20.10’.

__len__()[source]

Returns number of entries.

number_of_matches(line)[source]

Return how many format entries could be populated with legal values.

parse_FORTRAN_format(edit_descriptor)[source]

Parse the descriptor.

parse_FORTRAN_format(edit_descriptor) –> dict
Returns:dict with totallength (in chars), repeat, length, format, decimals
Raises:ValueError if the edit_descriptor is not recognized and cannot be parsed

Note

Specifiers: L ES EN T TL TR / r S SP SS BN BZ are not supported, and neither are the scientific notation Ew.dEe forms.

read(line)[source]

Parse line according to the format string and return list of values.

Values are converted to Python types according to the format specifier.

Returns:list of entries with appropriate types
Raises:ValueError if any of the conversions cannot be made (e.g. space for an int)
MDAnalysis.lib.util.FORTRAN_format_regex = '(?P<repeat>\\d+?)(?P<format>[IFEAX])(?P<numfmt>(?P<length>\\d+)(\\.(?P<decimals>\\d+))?)?'

Regular expresssion (see re) to parse a simple FORTRAN edit descriptor. (?P<repeat>\d?)(?P<format>[IFELAX])(?P<numfmt>(?P<length>\d+)(\.(?P<decimals>\d+))?)?

9.2.8.5. Data manipulation and handling

MDAnalysis.lib.util.fixedwidth_bins(delta, xmin, xmax)[source]

Return bins of width delta that cover xmin,xmax (or a larger range).

dict = fixedwidth_bins(delta,xmin,xmax)

The dict contains ‘Nbins’, ‘delta’, ‘min’, and ‘max’.

9.2.8.6. Strings

MDAnalysis.lib.util.convert_aa_code(x)[source]

Converts between 3-letter and 1-letter amino acid codes.

See also

Data are defined in amino_acid_codes and inverse_aa_codes.

MDAnalysis.lib.util.parse_residue(residue)[source]

Process residue string.

Examples

  • “LYS300:HZ1” –> (“LYS”, 300, “HZ1”)
  • “K300:HZ1” –> (“LYS”, 300, “HZ1”)
  • “K300” –> (“LYS”, 300, None)
  • “4GB300:H6O” –> (“4GB”, 300, “H6O”)
  • “4GB300” –> (“4GB”, 300, None)
Argument:The residue must contain a 1-letter or 3-letter or 4-letter residue string, a number (the resid) and optionally an atom identifier, which must be separate from the residue with a colon (”:”). White space is allowed in between.
Returns:(3-letter aa string, resid, atomname); known 1-letter aa codes are converted to 3-letter codes
MDAnalysis.lib.util.conv_float(s)[source]

Convert an object s to float if possible.

Function to be passed into map() or a list comprehension. If the argument can be interpreted as a float it is converted, otherwise the original object is passed back.

9.2.8.7. Class decorators

MDAnalysis.lib.util.cached(key)[source]

Cache a property within a class

Requires the Class to have a cache dict called _cache.

Usage:

class A(object):
    def__init__(self):
        self._cache = dict()

    @property
    @cached('keyname')
    def size(self):
        # This code gets ran only if the lookup of keyname fails
        # After this code has been ran once, the result is stored in
        # _cache with the key: 'keyname'
        size = 10.0

New in version 0.9.0.

Footnotes

[1]The reason why NamedStream.close() does not close a stream by default (but just rewinds it to the beginning) is so that one can use the class NamedStream as a drop-in replacement for file names, which are often re-opened (e.g. when the same file is used as a topology and coordinate file or when repeatedly iterating through a trajectory in some implementations). The close=True keyword can be supplied in order to make NamedStream.close() actually close the underlying stream and NamedStream.close(force=True) will also close it.

Changed in version 0.11.0: Moved mathematical functions into lib.mdamath