Previous topic

tensor.extra_ops – Tensor Extra Ops

Next topic

config – Theano Configuration

This Page

gradient – Symbolic Differentiation

Symbolic gradient is usually computed from tensor.grad(), which offers a more convenient syntax for the common case of wanting the gradient in some expressions with respect to a scalar cost. The grad_sources_inputs() function does the underlying work, and is more flexible, but is also more awkward to use when tensor.grad() can do the job.

Driver for gradient calculations.

exception theano.gradient.DisconnectedInputError

Raised when grad is asked to compute the gradient with respect to a disconnected input and disconnected_inputs=’raise’.

class theano.gradient.DisconnectedType

A type indicating that a variable is a result of taking the gradient of c with respect to x when c is not a function of x. A symbolic placeholder for 0, but to convey the extra information that this gradient is 0 because it is disconnected.

exception theano.gradient.GradientError(arg, err_pos, abs_err, rel_err, abs_tol, rel_tol)

This error is raised when a gradient is calculated, but incorrect.

theano.gradient.Lop(f, wrt, eval_points, consider_constant=None, disconnected_inputs='raise')

Computes the L operation on f wrt to wrt evaluated at points given in eval_points. Mathematically this stands for the jacobian of f wrt to wrt left muliplied by the eval points.

Return type:Variable or list/tuple of Variables depending on type of f
Returns:symbolic expression such that L_op[i] = sum_i ( d f[i] / d wrt[j]) eval_point[i] where the indices in that expression are magic multidimensional indices that specify both the position within a list and all coordinates of the tensor element in the last If f is a list/tuple, then return a list/tuple with the results.
exception theano.gradient.NullTypeGradError

Raised when grad encounters a NullType.

theano.gradient.Rop(f, wrt, eval_points)

Computes the R operation on f wrt to wrt evaluated at points given in eval_points. Mathematically this stands for the jacobian of f wrt to wrt right muliplied by the eval points.

Return type:Variable or list/tuple of Variables depending on type of f
Returns:symbolic expression such that R_op[i] = sum_j ( d f[i] / d wrt[j]) eval_point[j] where the indices in that expression are magic multidimensional indices that specify both the position within a list and all coordinates of the tensor element in the last. If wrt is a list/tuple, then return a list/tuple with the results.
theano.gradient.format_as(use_list, use_tuple, outputs)

Formats the outputs according to the flags use_list and use_tuple. If use_list is True, outputs is returned as a list (if outputs is not a list or a tuple then it is converted in a one element list). If use_tuple is True, outputs is returned as a tuple (if outputs is not a list or a tuple then it is converted into a one element tuple). Otherwise (if both flags are false), outputs is returned.

theano.gradient.grad(cost, wrt, consider_constant=None, disconnected_inputs='raise', add_names=True, known_grads=None, return_disconnected='zero')
Parameters:
  • consider_constant – a list of expressions not to backpropagate through
  • disconnected_inputs (string) – Defines the behaviour if some of the variables in wrt are not part of the computational graph computing cost (or if all links are non-differentiable). The possible values are: - ‘ignore’: considers that the gradient on these parameters is zero. - ‘warn’: consider the gradient zero, and print a warning. - ‘raise’: raise DisconnectedInputError.
  • add_names (bool) – If True, variables generated by grad will be named (d<cost.name>/d<wrt.name>) provided that both cost and wrt have names
  • known_grads (dict) – If not None, a dictionary mapping variables to their gradients. This is useful in the case where you know the gradient on some variables but do not know the original cost.
  • return_disconnected (string) –
    • ‘zero’ : If wrt[i] is disconnected, return value i will be
      wrt[i].zeros_like()
    • ‘None’ : If wrt[i] is disconnected, return value i will be
      None
    • ‘Disconnected’ : returns variables of type DisconnectedType
Return type:

Variable or list/tuple of Variables (depending upon wrt)

Returns:

symbolic expression of gradient of cost with respect to wrt. If an element of wrt is not differentiable with respect to the output, then a zero variable is returned. It returns an object of same type as wrt: a list/tuple or Variable in all cases.

theano.gradient.grad_not_implemented(op, x_pos, x, comment='')

Return an un-computable symbolic variable of type x.type.

If any call to tensor.grad results in an expression containing this un-computable variable, an exception (NotImplementedError) will be raised indicating that the gradient on the x_pos‘th input of op has not been implemented. Likewise if any call to theano.function involves this variable.

Optionally adds a comment to the exception explaining why this gradient is not implemented.

theano.gradient.grad_undefined(op, x_pos, x, comment='')

Return an un-computable symbolic variable of type x.type.

If any call to tensor.grad results in an expression containing this un-computable variable, an exception (GradUndefinedError) will be raised indicating that the gradient on the x_pos‘th input of op is mathematically undefined. Likewise if any call to theano.function involves this variable.

Optionally adds a comment to the exception explaining why this gradient is not defined.

theano.gradient.hessian(cost, wrt, consider_constant=None, disconnected_inputs='raise')
Parameters:
  • consider_constant – a list of expressions not to backpropagate through
  • disconnected_inputs (string) – Defines the behaviour if some of the variables in wrt are not part of the computational graph computing cost (or if all links are non-differentiable). The possible values are: - ‘ignore’: considers that the gradient on these parameters is zero. - ‘warn’: consider the gradient zero, and print a warning. - ‘raise’: raise an exception.
Returns:

either a instance of Variable or list/tuple of Variables (depending upon wrt) repressenting the Hessian of the cost with respect to (elements of) wrt. If an element of wrt is not differentiable with respect to the output, then a zero variable is returned. The return value is of same type as wrt: a list/tuple or TensorVariable in all cases.

theano.gradient.jacobian(expression, wrt, consider_constant=None, disconnected_inputs='raise')
Parameters:
  • consider_constant – a list of expressions not to backpropagate through
  • disconnected_inputs (string) – Defines the behaviour if some of the variables in wrt are not part of the computational graph computing cost (or if all links are non-differentiable). The possible values are: - ‘ignore’: considers that the gradient on these parameters is zero. - ‘warn’: consider the gradient zero, and print a warning. - ‘raise’: raise an exception.
Returns:

either a instance of Variable or list/tuple of Variables (depending upon wrt) repesenting the jacobian of expression with respect to (elements of) wrt. If an element of wrt is not differentiable with respect to the output, then a zero variable is returned. The return value is of same type as wrt: a list/tuple or TensorVariable in all cases.

class theano.gradient.numeric_grad(f, pt, eps=None, out_type=None)

Compute the numeric derivative of a scalar-valued function at a particular point.

static abs_rel_err(a, b)

Return absolute and relative error between a and b.

The relative error is a small number when a and b are close, relative to how big they are.

Formulas used:
abs_err = abs(a - b) rel_err = abs_err / max(abs(a) + abs(b), 1e-8)

The denominator is clipped at 1e-8 to avoid dividing by 0 when a and b are both close to 0.

The tuple (abs_err, rel_err) is returned

abs_rel_errors(g_pt)

Return the abs and rel error of gradient estimate g_pt

g_pt must be a list of ndarrays of the same length as self.gf, otherwise a ValueError is raised.

Corresponding ndarrays in g_pt and self.gf must have the same shape or ValueError is raised.

max_err(g_pt, abs_tol, rel_tol)

Find the biggest error between g_pt and self.gf.

What is measured is the violation of relative and absolute errors, wrt the provided tolerances (abs_tol, rel_tol). A value > 1 means both tolerances are exceeded.

Return the argmax of min(abs_err / abs_tol, rel_err / rel_tol) over g_pt, as well as abs_err and rel_err at this point.

theano.gradient.verify_grad(fun, pt, n_tests=2, rng=None, eps=None, out_type=None, abs_tol=None, rel_tol=None, mode=None, cast_to_output_type=False)

Test a gradient by Finite Difference Method. Raise error on failure.

Example:
>>> verify_grad(theano.tensor.tanh,
                (numpy.asarray([[2,3,4], [-1, 3.3, 9.9]]),),
                rng=numpy.random)

Raises an Exception if the difference between the analytic gradient and numerical gradient (computed through the Finite Difference Method) of a random projection of the fun’s output to a scalar exceeds the given tolerance.

Parameters:
  • fun – a Python function that takes Theano variables as inputs, and returns a Theano variable. For instance, an Op instance with a single output.
  • pt – the list of numpy.ndarrays to use as input values. These arrays must be either float32 or float64 arrays.
  • n_tests – number of times to run the test
  • rng – random number generator used to sample u, we test gradient of sum(u * fun) at pt
  • eps – stepsize used in the Finite Difference Method (Default None is type-dependent) Raising the value of eps can raise or lower the absolute and relative error of the verification depending of the Op. Raising the eps do not lower the verification quality. It is better to raise eps then raising abs_tol or rel_tol.
  • out_type – dtype of output, if complex (i.e. ‘complex32’ or ‘complex64’)
  • abs_tol – absolute tolerance used as threshold for gradient comparison
  • rel_tol – relative tolerance used as threshold for gradient comparison
Note :

WARNING to unit-test writers: if op is a function that builds a graph, try to make it a SMALL graph. Often verify grad is run in debug mode, which can be very slow if it has to verify a lot of intermediate computations.

Note :

This function does not support multiple outputs. In tests/test_scan.py there is an experimental verify_grad that covers that case as well by using random projections.