Planet SymPy

May 01, 2013

Fabian Pedregosa

Logistic Ordinal Regression

TL;DR: I've implemented a logistic ordinal regression or proportional odds model. Here is the Python code

The logistic ordinal regression model, also known as the proportional odds was introduced in the early 80s by McCullagh [1, 2] and is a generalized linear model specially tailored for the case of predicting ordinal variables, that is, variables that are discrete (as in classification) but which can be ordered (as in regression). It can be seen as an extension of the logistic regression model to the ordinal setting.

Given $X \in \mathbb{R}^{n \times p}$ input data and $y \in \mathbb{N}^n$ target values. For simplicity we assume $y$ is a non-decreasing vector, that is, $y_1 \leq y_2 \leq ...$. Just as the logistic regression models posterior probability $P(y=j|X_i)$ as the logistic function, in the logistic ordinal regression we model the cummulative probability as the logistic function. That is,

$$P(y \leq j|X_i) = \phi(\theta_j - w^T X_i) = \frac{1}{1 + \exp(w^T X_i - \theta_j)}$$

where $w, \theta$ are vectors to be estimated from the data and $\phi$ is the logistic function defined as $\phi(t) = 1 / (1 + \exp(-t))$.

<figure style="float: left; width: 380px; margin: 0px 15px 15px 0px"> <figcaption style="margin-left: 10px; margin-right: 5px">Toy example with three classes denoted in different colors. Also shown the vector of coefficients $w$ and the thresholds $\theta_0$ and $\theta_1$</figcaption> </figure>

Compared to multiclass logistic regression, we have added the constrain that the hyperplanes that separate the different classes are parallel for all classes, that is, the vector $w$ is common across classes. To decide to which class will $X_i$ be predicted we make use of the vector of thresholds $\theta$. If there are $K$ different classes, $\theta$ is a non-decreasing vector (that is, $\theta_1 \leq \theta_2 \leq ... \leq \theta_{K-1}$) of size $K-1$. We will then assign the class $j$ if the prediction $w^T X$ (recall that it's a linear model) lies in the interval $[\theta_{j-1}, \theta_{j}[$. In order to keep the same definition for extremal classes, we define $\theta_{0} = - \infty$ and $\theta_K = + \infty$.

The intuition is that we are seeking a vector $w$ such that $X w$ produces a set of values that are well separated into the different classes by the different thresholds $\theta$. We choose a logistic function to model the probability $P(y \leq j|X_i)$ but other choices are possible. In the proportional hazards model 1 the probability is modeled as $-\log(1 - P(y \leq j | X_i)) = \exp(\theta_j - w^T X_i)$. Other link functions are possible, where the link function satisfies $\text{link}(P(y \leq j | X_i)) = \theta_j - w^T X_i$. Under this framework, the logistic ordinal regression model has a logistic link function and the proportional hazards model has a log-log link function.

The logistic ordinal regression model is also known as the proportional odds model, because the ratio of corresponding odds for two different samples $X_1$ and $X_2$ is $\exp(w^T(X_1 - X_2))$ and so does not depend on the class $j$ but only on the difference between the samples $X_1$ and $X_2$.

Optimization

Model estimation can be posed as an optimization problem. Here, we minimize the loss function for the model, defined as minus the log-likelihood:

\begin{align} \mathcal{L}(w, \theta) &= - \sum_{i=1}^n \log(\phi(\theta_{y_i} - w^T X_i) - \phi(\theta_{y_i -1} - w^T X_i)) \\ &= \sum_{i=1}^n w^T X_i - \log(\exp(\theta_{y_i}) - \exp(\theta_{y_i-1})) + \log(\phi(\theta_{y_i -1} - w^T X_i)) + \log(\phi(\theta_{y_i} - w^T X_i)) \\ \end{align}

In this sum all terms are convex on $w$, thus the loss function is convex over $w$. It might be also jointly convex over $w$ and $\theta$, although I haven't checked. I use the function fmin_slsqp in scipy.optimize to optimize $\mathcal{L}$ under the constraint that $\theta$ is a non-decreasing vector. There might be better options, I don't know. If you do know, please leave a comment!.

Using the formula $\log(\phi(t))^\prime = (1 - \phi(t))$, we can compute the gradient of the loss function as

\begin{align} \nabla_w \mathcal{L}(w, \theta) &= - \sum_{i=1}^n X_i ( -1 + \phi(\theta_{y_i} - w^T X_i)) + \phi(\theta_{y_i-1} - w^T X_i)) \\ % \nabla_\theta \mathcal{L}(w, \theta) &= \sum_{i=1}^n - \frac{e_{y_i} \exp(\theta_{y_i}) - e_{y_i -1} \exp(\theta_{y_i -1})}{\exp(\theta_{y_i}) - \exp(\theta_{y_i-1})} \\ \nabla_\theta \mathcal{L}(w, \theta) &= \sum_{i=1}^n e_{y_i} \left(1 - \phi(\theta_{y_i} - w^T X_i) - \frac{1}{1 - \exp(\theta_{y_i -1} - \theta_{y_i})}\right) \\ & \qquad + e_{y_i -1}\left(1 - \phi(\theta_{y_i -1} - w^T X_i) - \frac{1}{1 - \exp(- (\theta_{y_i-1} - \theta_{y_i}))}\right) \end{align}

where $e_i$ is the $i$th canonical vector.

Code

I've implemented a Python version of this algorithm using Scipy's optimize.fmin_slsqp function. This takes as arguments the loss function, the gradient denoted before and a function that is > 0 when the inequalities on $\theta$ are satisfied.

Code can be found here as part of the minirank package, which is my sandbox for code related to ranking and ordinal regression. At some point I would like to submit it to scikit-learn but right now the I don't know how the code will scale to medium-scale problems, but I suspect not great. On top of that I'm not sure if there is a real demand of these models for scikit-learn and I don't want to bloat the package with unused features.

Performance

I compared the prediction accuracy of this model in the sense of mean absolute error (IPython notebook) on the boston house-prices dataset. To have an ordinal variable, I rounded the values to the closest integer, which gave me a problem of size 506 $\times$ 13 with 46 different target values. Although not a huge increase in accuracy, this model did give me better results on this particular dataset:

<figure> </figure>

Here, ordinal logistic regression is the best-performing model, followed by a Linear Regression model and a One-versus-All Logistic regression model as implemented in scikit-learn.

1. "Regression models for ordinal data", P. McCullagh, Journal of the royal statistical society. Series B (Methodological), 1980

2. "Generalized Linear Models", P. McCullagh and J. A. Nelder (Book)

3. "Loss Functions for Preference Levels : Regression with Discrete Ordered Labels", Jason D. M. Rennie, Nathan Srebro

April 15, 2013

Fabian Pedregosa

Isotonic Regression

My latest contribution for scikit-learn is an implementation of the isotonic regression model that I coded with Nelle Varoquaux and Alexandre Gramfort. This model finds the best least squares fit to a set of points, given the constraint that the fit must be a non-decreasing function. The example on the scikit-learn website gives an intuition on this model.

The original points are in red, and the estimated ones are in green. As you can see, there is one estimation (green point) for each data sample (red point). Calling $y \in \mathbb{R}^n$ the input data, the model can be written concisely as an optimization problem over $x$

$$\text{argmin}_x \|y - x \|^2 \\ \text{subject to } x_0 \leq x_1 \leq \cdots \leq x_n$$

The algorithm implemented in scikit-learn 3 is the pool adjacent violators algorithm 1, which is an efficient linear time $\mathcal{O}(n)$ algorithm. The algorithm sweeps through the data looking for violations of the monotonicity constraint. When it finds one, it adjusts the estimate to the best possible fit with constraints. Sometimes it also needs to modify previous points to make sure the new estimate does not violate the constraints. The following picture shows how it proceeds at each iteration

1. "Active set algorithms for isotonic regression; A unifying framework", Michael J. Best, Nilotpal Chakravarti

2. Python notebook to generate the figures: ipynb and web version

3. The algorithm is used through the sklearn.isotonic.IsotonicRegression object (doc) or the function sklearn.isotonic.isotonic_regression (doc

April 08, 2013

Official SymPy blog

Google Summer of Code 2013

SymPy was accepted by Google once again to participate in Google Summer of Code for 2013. Please go to http://www.google-melange.com/gsoc/org/google/gsoc2013/sympy for more information about how to apply and get started.

April 06, 2013

Aaron Meurer

How to make attributes un-inheritable in Python using descriptors

For https://github.com/sympy/sympy/pull/1969, and previous work at https://github.com/sympy/sympy/pull/1901, we added the ability for the SymPy doctester to run or not run doctests conditionally depending on whether or not required external dependencies are installed. This means that for example we can doctest all the plotting examples without them failing when matplotlib is not installed.

For functions, this is as easy as decorating the function with @doctest_depends, which adds the attribute _doctest_depends_on to the function with a list of what dependencies the doctest depends on. The doctest will then not run the doctest unless those dependencies are installed.

For classes, this is not so easy. Ideally, one could just define _doctest_depends_on as an attribute of the class. However, the issue is that with classes, we have inheritance. But if class A has a docstring with a doctest that depends on some modules, it doesn’t mean that a subclass B of A will have a doctest that does.

Really, what we need to do is to decorate the docstring itself, not the class. Unfortunately, Python does not allow adding attributes to strings

>>> a = ""
>>> a.x = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'x'


So what we have to do is to create a attribute that doesn’t inherit.

I had for some time wanted to give descriptors in Python a try, since they are a cool feature, but also the second most complicated feature in Python (the first is metaclasses). If you don’t know what a descriptor is, I recommend reading this blog post by Guido van Rossum, the creator of Python. It’s the best explanation of the feature there is.

Basically, Python lets attributes define what happens when they are accessed (like a.x). You may already know that objects can define how their attributes are accessed via __getattr__. This is different. With descriptors, the attributes themselves define what happens. This may sound less useful, but in fact, it’s a very core feature of the language.

If you’ve ever wondered how property, classmethod, or staticmethod work in Python, the answer is descriptors. Basically, if you have something like

class A(object):
def f(self):
return 1
f = property(f)


Then A().f magically calls what would normally be A().f(). The way it works is that property defines the __get__ method, which returns f(obj), where obj is the calling object, here A() (remember in Python that the first argument of a method, usually called self is the object that calls the method).

Descriptors can allow method to define arbitrary behavior when called, set, or deleted. To make an attribute inaccessible to subclasses, then, you just need to define a descriptor that prevents the attribute from being accessed if the class of the calling object is not the original class. Here is some code:

class nosubclasses(object):
def __init__(self, f, cls):
self.f = f
self.cls = cls
def __get__(self, obj, type=None):
if type == self.cls:
if hasattr(self.f, '__get__'):
return self.f.__get__(obj, type)
return self.f
raise AttributeError


it works like this

In [2]: class MyClass(object):
...:     x = 1
...:

In [3]: MyClass.x = nosubclasses(MyClass.x, MyClass)

In [4]: class MySubclass(MyClass):
...:     pass
...:

In [5]: MyClass.x
Out[5]: 1

In [6]: MyClass().x
Out[6]: 1

In [80]: MySubclass.x
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-80-2b2f456dd101> in <module>()
----> 1 MySubclass.x

<ipython-input-51-7fe1b5063367> in __get__(self, obj, type)
8                 return self.f.__get__(obj, type)
9             return self.f
---> 10         raise AttributeError

AttributeError:

In [81]: MySubclass().x
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-81-93764eeb9948> in <module>()
----> 1 MySubclass().x

<ipython-input-51-7fe1b5063367> in __get__(self, obj, type)
8                 return self.f.__get__(obj, type)
9             return self.f
---> 10         raise AttributeError

AttributeError:


Note that by using the third argument to __get__, this works regardless if the attribute is accessed from the class or the object. I have to call __get__ on self.f again if it has it to ensure that the right thing happens if the attribute has other descriptor logic defined (and note that regular methods have descriptor logic defined—that’s how they convert the first argument self to implicitly be the calling object).

One could easily make class decorator that automatically adds the attribute to the class in a non-inheritable way:

def nosubclass_x(args):
def _wrapper(cls):
cls.x = nosubclasses(args, cls)
return cls
return _wrapper


This automatically adds the property x to the decorated class with the value given in the decorator, and it won’t be accessible to subclasses:

In [87]: @nosubclass_x(1)
....: class MyClass(object):
....:     pass
....:

In [88]: MyClass().x
Out[88]: 1

In [89]: MySubclass().x
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-89-93764eeb9948> in <module>()
----> 1 MySubclass().x

<ipython-input-51-7fe1b5063367> in __get__(self, obj, type)
8                 return self.f.__get__(obj, type)
9             return self.f
---> 10         raise AttributeError

AttributeError:


For SymPy, we can’t use class decorators because we still support Python 2.5, and they were introduced in Python 2.6. The best work around is to just call Class.attribute = nosubclasses(Class.attribute, Class) after the class definition. Unfortunately, you can’t access a class inside its definition like you can with functions, so this has to go at the end.

Name Mangling

After coming up with all this, I remembered that Python already has a pretty standard way to define attributes in such a way that subclasses won’t have access to them. All you have to do is use two underscores before the name, like __x, and it will be name mangled. This means that the name will be renamed to _classname__x outside the class definition. The name will not be inherited by subclasses. There are some subtleties with this, particularly for strange class names (names that are too long, or names that begin with an underscore). I asked about this on StackOverflow. The best answer is that there was a function in the standard library, but it was removed in Python 3. My tests reveal that the behavior is different in CPYthon than in PyPy, so getting it right for every possible class is nontrivial. The descriptor thing should work everywhere, though. On the other hand, getattr(obj, '_' + obj.__class__.__name__ + attributename) will work 99% of the time, and is much easier both to write and to understand than the descriptor.

Introduction

This post uses some LaTeX. You may want to read it on the original site.

This is the last of a three part series connecting SymPy and Theano to transform mathematical expressions into efficient numeric code (see part 1 and part 2). We have seen that it is simple and computationally profitable to combine the best parts of both projects.

In this post we’ll switch from computing scalar expressionss to computing matrix expressions. We’ll define the Kalman filter in SymPy and send it to Theano for code generation. We’ll then use SymPy to define a more performant blocked version of the same algorithm.

Kalman Filter

The Kalman filter is an algorithm to compute the Bayesian update of a normal random variable given a linear observation with normal noise. It is commonly used when an uncertain quantity is updated with the results of noisy observations. For example it is used in weather forecasting after weather stations report in with new measurements, in aircraft/car control to automatically adjust for external conditions real-time, or even on your smartphone’s GPS navigation as you update your position based on fuzzy GPS signals. It’s everywhere, it’s important, and it needs to be computed quickly and continuously. It suits our needs today because it can be completely defined with a pair of matrix expressions.

from sympy import MatrixSymbol, latex
n       = 1000                          # Number of variables in our system/current state
k       = 500                           # Number of variables in the observation
mu      = MatrixSymbol('mu', n, 1)      # Mean of current state
Sigma   = MatrixSymbol('Sigma', n, n)   # Covariance of current state
H       = MatrixSymbol('H', k, n)       # A measurement operator on current state
R       = MatrixSymbol('R', k, k)       # Covariance of measurement noise
data    = MatrixSymbol('data', k, 1)    # Observed measurement data

newmu   = mu + Sigma*H.T * (R + H*Sigma*H.T).I * (H*mu - data)      # Updated mean
newSigma= Sigma - Sigma*H.T * (R + H*Sigma*H.T).I * H * Sigma       # Updated covariance

print latex(newmu)
print latex(newSigma)


$$\Sigma H^T \left(H \Sigma H^T + R\right)^{-1} \left(-data + H \mu\right) + \mu$$ $$- \Sigma H^T \left(H \Sigma H^T + R\right)^{-1} H \Sigma + \Sigma$$

Theano Execution

The objects above are for symbolic mathematics, not for numeric computation. If we want to compute this expression we pass our expressions to Theano.

inputs = [mu, Sigma, H, R, data]
outputs = [newmu, newSigma]
dtypes = {inp: 'float64' for inp in inputs}

from sympy.printing.theanocode import theano_function
f = theano_function(inputs, outputs, dtypes=dtypes)


Theano builds a Python function that calls down to a combination of low-level C code, scipy functions, and calls to the highly optimized DGEMM routine for matrix multiplication. As input this function takes five numpy arrays corresponding to our five symbolic inputs and produces two numpy arrays corresponding to our two symbolic outputs. Recent work allows any SymPy matrix expression to be translated to and run by Theano.

import numpy
ninputs = [numpy.random.rand(*i.shape).astype('float64') for i in inputs]
nmu, nSigma = f(*ninputs)


Blocked Execution

These arrays are too large to fit comfortably in the fastest parts of the memory hierarchy. As a result each sequential C, scipy, or DGEMM call needs to move big chunks of memory around while it computes. After one operation completes the next operation moves around the same memory while it performs its task. This repeated memory shuffling hurts performance.

A common approach to reduce memory shuffling is to cut the computation into smaller blocks. We then perform as many computations as possible on a single block before moving on. This is a standard technique in matrix multiplication.

from sympy import BlockMatrix, block_collapse
A, B, C, D, E, F, G, K = [MatrixSymbol(a, n, n) for a in 'ABCDEFGK']
X = BlockMatrix([[A, B],
[C, D]])
Y = BlockMatrix([[E, F],
[G, K]])
print latex(X*Y)


$$\begin{bmatrix} A & B \\ C & D \end{bmatrix} \begin{bmatrix} E & F \\ G & K \end{bmatrix}$$

print latex(block_collapse(X*Y))


$$\begin{bmatrix} A E + B G & A F + B K \\ C E + D G & C F + D K\end{bmatrix}$$

We are now able to focus on substantially smaller chunks of the array. For example we can choose to keep A in local memory and perform all computations that involve A. We will still need to shuffle some memory around (this is inevitable) but by organizing with blocks we’re able to shuffle less.

This idea extends beyond matrix multiplication. For example, SymPy knows how to block a matrix inverse

print latex(block_collapse(X.I))


$$\begin{bmatrix} \left(- B D^{-1} C + A\right)^{-1} & - A^{-1} B \left(- C A^{-1} B + D\right)^{-1} \\ - \left(- C A^{-1} B + D\right)^{-1} C A^{-1} & \left(- C A^{-1} B + D\right)^{-1} \end{bmatrix}$$

High performance dense linear algebra libraries hard-code all of these tricks into each individual routine. The call to the general matrix multiply routine DGEMM performs blocked matrix multiply within the call. The call to the general matrix solve routine DGESV can perform blocked matrix solve. Unfortunately these routines are unable to coordinate blocked computation between calls.

Fortunately, SymPy and Theano can.

SymPy can define and reduce the blocked matrix expressions using relations like what are shown above.

from sympy import blockcut, block_collapse
blocksizes = {
Sigma: [(n/2, n/2), (n/2, n/2)],
H:     [(k/2, k/2), (n/2, n/2)],
R:     [(k/2, k/2), (k/2, k/2)],
mu:    [(n/2, n/2), (1,)],
data:  [(k/2, k/2), (1,)]
}
blockinputs = [blockcut(i, *blocksizes[i]) for i in inputs]
blockoutputs = [o.subs(dict(zip(inputs, blockinputs))) for o in outputs]
collapsed_outputs = map(block_collapse, blockoutputs)

fblocked = theano_function(inputs, collapsed_outputs, dtypes=dtypes)


Theano is then able to coordinate this computation and compile it to low-level code. At this stage the expresssions/computations are fairly complex and difficult to present. Here is an image of the computation (click for zoomable PDF) as a directed acyclic graph.

Results

Lets time each function on the same inputs and see which is faster

>>> timeit f(*ninputs)
1 loops, best of 3: 2.69 s per loop

>>> timeit fblocked(*ninputs)
1 loops, best of 3: 2.12 s per loop


That’s a 20% performance increase from just a few lines of high-level code.

Blocked matrix multiply and blocked solve routines have long been established as a good idea. High level mathematical and array programming libraries like SymPy and Theano allow us to extend this good idea to arbitrary array computations.

Analysis

Good Things

First, lets note that we’re not introducing a new library for dense linear algebra. Instead we’re noting that pre-existing general purpose high-level tools can be composed to that effect.

Second, lets acknoledge that we could take this further. For example Theano seemlessly handles GPU interactions. We could take this same code to a GPU accelerated machine and it would just run faster without any action on our part.

Bad Things

However, there are some drawbacks.

Frequent readers of my blog might recall a previous post about the Kalman filter. In it I showed how we could use SymPy’s inference engine to select appropriate BLAS/LAPACK calls. For example we could infer that because $H \Sigma H^T + R$ was symmetric positive definite we could use the substantially more efficient POSV routine for matrix solve rather than GESV (POSV uses the Cholesky algorithm for decomposition rather than straight LU). Theano doesn’t support the specialized BLAS/LAPACK routines though, so we are unable to take advantage of this benefit. The lower-level interface (Theano) is not sufficiently rich to use all information captured in the higher-level (SymPy) representation.

Also, I’ve noticed that the blocked version of this computation experiences some significant roundoff errors (on the order of 1e-3). I’m in the process of tracking this down. The problem must occur somewhere in the following tool-chain

SymPy -> Blocking -> Theano -> SciPy -> C routines -> BLAS

Debugging in this context can be wonderful if all elements are well unit-tested. If they’re not (they’re not) then tracking down errors like this requires an unfortunate breadth of expertise.

Scripts

March 29, 2013

Fabian Pedregosa

Householder matrices

Householder matrices are square matrices of the form

$$P = I - \beta v v^T$$

where $\beta$ is a scalar and $v$ is a vector. It has the useful property that for suitable chosen $v$ and $\beta$ it makes the product $P x$ to zero out all of the coordinates but one, that is, $P x = \|x\| e_1$. The following code, given $x$, finds the values of $\beta, v$ that verify that property. The algorithm can be found in several textbooks 1

def house(x):
"""
Given a vetor x, computes vectors v with v[0] = 1
and scalar beta such that P = I - beta v v^T
is orthogonal and P x = ||x|| e_1

Parameters
----------
x : array, shape (n,) or (n, 1)

Returns
-------
beta : scalar
v : array, shape (n, 1)
"""
x = np.asarray(x)
if x.ndim == 1:
x = x[:, np.newaxis]
sigma = linalg.norm(x[1:, 0]) ** 2
v = np.vstack((1, x[1:]))
if sigma == 0:
beta = 0
else:
mu = np.sqrt(x[0, 0] ** 2 + sigma)
if x[0, 0] <= 0:
v[0, 0] = x[0, 0] - mu
else:
v[0, 0] = - sigma / (x[0, 0] + mu)
beta = 2 * (v[0, 0] ** 2) / (sigma + v[0, 0] ** 2)
v /= v[0, 0]
return beta, v


As promised, this computes the parameters of $P$ such that $P x = \|x\| e_1$, exact to 15 decimals:

>>> n = 5
>>> x = np.random.randn(n)
>>> beta, v = house(x)
>>> P = np.eye(n) - beta * np.dot(v, v.T)
>>> print np.round(P.dot(x) / linalg.norm(x), decimals=15)
[ 1. -0. -0.  0. -0.]


This property is what it makes Householder matrices useful in the context of numerical analysis. It can be used for example to compute the QR decomposition of a given matrix. The idea is to succesively zero out the sub-diagonal elements, thus leaving a triangular matrix at the end. In the first iteration we compute a Householder matrix $P_0$ such that $P_0 X$ has only zero below the diagonal of the first column, then compute a Householder matrix $P_1$ such that $P_1 X$ zeroes out the subdiagonal elements of the second column and so on. At the end we will have that $P_0 P_1 ... P_n X$ is an upper triangular matrix. Since all $P_i$ are orthogonal, the product $P_0 P_1 ... P_n$ is again an orthogonal matrix, namely the $Q$ matrix in the QR decomposition.

If we choose X as 20-by-20 random matrix, with colors representing different values

we can see the process of the Householder matrices being applied one by one to obtain an upper triangular matrix

A similar application of Householder matrices is to reduce a given symmetric matrix to tridiagonal form, which proceeds in a similar way as in the QR algorithm, only that now we multiply by the matrix $X$ by the left and right with the Householder matrices. Also, in this case we seek for Householder matrices that zero out the elements of the subdiagonal plus one, instead of just subdiagonal elements. This algorithm is used for example as a preprocessing step for most dense eigensolvers

1. "Matrix Computations" third edition, Golub & VanLoan (Algorithm 5.1.1).

2. Code to reproduce the figures can be found here, source for the IPython notebook can be found here

Introduction

This post uses some LaTeX. You may want to read it on the original site.

In my last post I showed how SymPy can benefit from Theano. In particular Theano provided a mature platform for code generation that outperformed SymPy’s attempt at the same problem. I argued that projects should stick to one specialty and depend on others for secondary concerns. Interfaces are better than add-ons.

In this post I’ll show how Theano can benefit from SymPy. In particular I’ll demonstrate the practicality of SymPy’s impressive scalar simplification routines for generating efficient programs.

After re-reading over this post I realize that it’s somewhat long. I’ve decided to put the results first in hopes that it’ll motivate you to keep reading.

Projectoperation count
SymPy27
Theano24
SymPy+Theano17

Now, lets find out what those numbers mean.

Example problem

We use a larger version of our problem from last time; a radial wavefunction corresponding to n = 6 and l = 2 for Carbon (Z = 6)

from sympy.physics.hydrogen import R_nl
from sympy.abc import x
n, l, Z = 6, 2, 6
expr = R_nl(n, l, x, Z)
print latex(expr)

$$\frac{1}{210} \sqrt{70} x^{2} \left(- \frac{4}{3} x^{3} + 16 x^{2} - 56 x + 56\right) e^{- x}$$

We want to generate code to compute both this expression and its derivative. Both SymPy and Theano can compute and simplify derivatives. In this post we’ll measure the complexity of a computation that simultaneously computes both the above expression and its derivative. We’ll arrive at this computation through a couple of different routes that use overlapping parts of SymPy and Theano. This will supply a couple of direct comparisons.

Disclaimer: I’ve chosen a larger expression here to exaggerate results. Simpler expressions yield less impressive results.

Simplification

We show the expression, it’s derivative, and SymPy’s simplification of that derivative. In each case we quantify the complexity of the expression by the number of algebraic operations

The target expression:

print latex(expr)

$$\frac{1}{210} \sqrt{70} x^{2} \left(- \frac{4}{3} x^{3} + 16 x^{2} - 56 x + 56\right) e^{- x}$$

print "Operations: ", count_ops(expr)
Operations:  17

It’s derivative

print latex(expr.diff(x))

$$\frac{1}{210} \sqrt{70} x^{2} \left(- 4 x^{2} + 32 x - 56\right) e^{- x} - \frac{1}{210} \sqrt{70} x^{2} \left(- \frac{4}{3} x^{3} + 16 x^{2} - 56 x + 56\right) e^{- x} + \frac{1}{105} \sqrt{70} x \left(- \frac{4}{3} x^{3} + 16 x^{2} - 56 x + 56\right) e^{- x}$$

print "Operations: ", count_ops(expr.diff(x))
Operations:  48

The result of simplify on the derivative. Note the significant cancellation of the above expression.

print latex(simplify(expr.diff(x)))

$$\frac{2}{315} \sqrt{70} x \left(x^{4} - 17 x^{3} + 90 x^{2} - 168 x + 84\right) e^{- x}$$

print "Operations: ", count_ops(simplify(expr.diff(x)))
Operations:  18

An unevaluated derivative object. We’ll end up passing this to Theano so that it computes the derivative on its own.

print latex(Derivative(expr, x))

$$\frac{\partial}{\partial x}\left(\frac{1}{210} \sqrt{70} x^{2} \left(- \frac{4}{3} x^{3} + 16 x^{2} - 56 x + 56\right) e^{- x}\right)$$

Bounds on the cost of Differentiation

Scalar differentiation is actually a very simple transformation.

You need to know how to transform all of the elementary functions (exp, log, sin, cos, polynomials, etc...), the chain rule, and that’s it. Theorems behind automatic differentiation state that the cost of a derivative will be at most five times the cost of the original. In this case we’re guaranteed to have at most 17*5 == 85 operations in the derivative computation; this holds in our case because 48 < 85

However derivatives are often far simpler than this upper bound. We see that after simplification the operation count of the derivative is 18, only one more than the original. This is common in practice.

Theano Simplification

Like SymPy, Theano transforms graphs to mathematically equivalent but computationally more efficient representations. It provides standard compiler optimizations like constant folding, and common sub-expressions as well as array specific optimizations like the element-wise operation fusion.

Because users regularly handle mathematical terms Theano also provides a set of optimizations to simplify some common scalar expressions. For example Theano will convert expressions like x*y/x to y. In this sense it overlaps with SymPy’s simplify functions. This post is largely a demonstration that SymPy’s scalar simplifications are far more powerful than Theano’s and that their use can result in significant improvements. This shouldn’t be surprising. Sympians are devoted to scalar simplification to a degree that far exceeds the Theano community’s devotion to this topic.

Experiment

We’ll compute the derivative of our radial wavefunction and then simplify the result. We’ll do this using both SymPy’s derivative and simplify routines and using Theano’s derivative and simplify routines. We’ll then compare the two results by counting the number of required operations.

Here is some setup code that you can safely ignore:

def fgraph_of(*exprs):
""" Transform SymPy expressions into Theano Computation """
outs = map(theano_code, exprs)
ins = theano.gof.graph.inputs(outs)
ins, outs = theano.gof.graph.clone(ins, outs)
return theano.gof.FunctionGraph(ins, outs)

def theano_simplify(fgraph):
""" Simplify a Theano Computation """
mode = theano.compile.get_default_mode().excluding("fusion")
fgraph = fgraph.clone()
mode.optimizer.optimize(fgraph)
return fgraph

def theano_count_ops(fgraph):
""" Count the number of Scalar operations in a Theano Computation """
return len(filter(lambda n: isinstance(n.op, theano.tensor.Elemwise),
fgraph.apply_nodes))


In SymPy we create both an unevaluated derivative and a fully evaluated and sympy-simplified version. We translate each to Theano, simplify within Theano, and then count the number of operations both before and after simplification. In this way we can see the value added by both SymPy’s and Theano’s optimizations.

exprs = [Derivative(expr, x),    # derivative computed in Theano
simplify(expr.diff(x))] # derivative computed in SymPy, also sympy-simplified

for expr in exprs:
fgraph = fgraph_of(expr)
simp_fgraph = theano_simplify(fgraph)
print latex(expr)
print "Operations:                             ", theano_count_ops(fgraph)
print "Operations after Theano Simplification: ", theano_count_ops(simp_fgraph)


Theano Only

$$\frac{\partial}{\partial x}\left(\frac{1}{210} \sqrt{70} x^{2} \left(- \frac{4}{3} x^{3} + 16 x^{2} - 56 x + 56\right) e^{- x}\right)$$

Operations:                              40
Operations after Theano Simplification:  21

SymPy + Theano

$$\frac{2}{315} \sqrt{70} x \left(x^{4} - 17 x^{3} + 90 x^{2} - 168 x + 84\right) e^{- x}$$

Operations:                              13
Operations after Theano Simplification:  10

Analysis

On its own Theano produces a derivative expression that is about as complex as the unsimplified SymPy version. Theano simplification then does a surprisingly good job, roughly halving the amount of work needed (40 -> 21) to compute the result. If you dig deeper however you find that this isn’t because it was able to algebraically simplify the computation (it wasn’t) but rather because the computation contained several common sub-expressions. The Theano version looks a lot like the unsimplified SymPy version. Note the common sub-expressions like 56*x below.

$$\frac{1}{210} \sqrt{70} x^{2} \left(- 4 x^{2} + 32 x - 56\right) e^{- x} - \frac{1}{210} \sqrt{70} x^{2} \left(- \frac{4}{3} x^{3} + 16 x^{2} - 56 x + 56\right) e^{- x} + \frac{1}{105} \sqrt{70} x \left(- \frac{4}{3} x^{3} + 16 x^{2} - 56 x + 56\right) e^{- x}$$

The pure-SymPy simplified result is again substantially more efficient (13 operations). Interestingly Theano is still able to improve on this, again not because of additional algebraic simplification but rather due to constant folding. The two projects simplify in orthogonal ways.

Simultaneous Computation

When we compute both the expression and its derivative simultaneously we find substantial benefits from using the two projects together.

orig_expr = R_nl(n, l, x, Z)
for expr in exprs:
fgraph = fgraph_of(expr, orig_expr)
simp_fgraph = theano_simplify(fgraph)
print latex((expr, orig_expr))
print "Operations:                             ", len(fgraph.apply_nodes)
print "Operations after Theano Simplification: ", len(simp_fgraph.apply_nodes)


$$\begin{pmatrix}\frac{\partial}{\partial x}\left(\frac{1}{210} \sqrt{70} x^{2} \left(- \frac{4}{3} x^{3} + 16 x^{2} - 56 x + 56\right) e^{- x}\right), & \frac{1}{210} \sqrt{70} x^{2} \left(- \frac{4}{3} x^{3} + 16 x^{2} - 56 x + 56\right) e^{- x}\end{pmatrix}$$

Operations:                              57
Operations after Theano Simplification:  24

$$\begin{pmatrix}\frac{2}{315} \sqrt{70} x \left(x^{4} - 17 x^{3} + 90 x^{2} - 168 x + 84\right) e^{- x}, & \frac{1}{210} \sqrt{70} x^{2} \left(- \frac{4}{3} x^{3} + 16 x^{2} - 56 x + 56\right) e^{- x}\end{pmatrix}$$

Operations:                              27
Operations after Theano Simplification:  17

The combination of SymPy’s scalar simplification and Theano’s common sub-expression optimization yields a significantly simpler computation than either project could do independently.

To summarize

Projectoperation count
SymPy27
Theano24
SymPy+Theano17

March 19, 2013

MRocklin

SymPy and Theano -- Code Generation

No one is good at everything, that’s why we have society.

No project is good at everything, that’s why we have interfaces.

This is the first of three posts that join SymPy, a library for symbolic mathematics, and Theano, a library for mathematical compilation to numeric code. Each library does a few things really well. Each library also over-reaches bit and does a few things not-as-well. Fortunately the two libraries have clear and simple data structures and so can be used together effectively.

In this post I’ll focus on how SymPy can use Theano to generate efficient code.

Physics

SymPy knows Physics. For example, here is the radial wavefunction corresponding to n = 3 and l = 1 for Carbon (Z = 6)

from sympy.physics.hydrogen import R_nl
from sympy.abc import x
expr = R_nl(3, 1, x, 6)
print latex(expr)

$$\frac{8}{3} x \left(- 4 x + 4\right) e^{- 2 x}$$

SymPy is great at this. It can manipulate high level mathematical expressions very naturally. When it comes to numeric computation it is less effective.

Numerics

Fortunately there are methods to offload the work to numerical projects like numpy or to generate and compile straight Fortran code. Here we use two existing methods to create two identical vectorized functions to compute the above expression.

from sympy.utilities.autowrap import ufuncify
from sympy.utilities.lambdify import lambdify
fn_numpy   = lambdify(x, expr, 'numpy')
fn_fortran = ufuncify([x], expr)

fn_numpy replaces each of the SymPy operations with the equivalent function from the popular NumPy package. fn_fortran generates and compiles low-level Fortran code and uses f2py to bind it to a Python function. They each use numpy arrays as common data structures, supporting broad interoperability with the rest of the Scientific Python ecosystem. They both work well and produce identical results.

>>> from numpy import linspace
>>> xx = linspace(0, 1, 5)
>>> fn_numpy(xx)
[ 0.          1.21306132  0.98101184  0.44626032  0.        ]
>>> fn_fortran(xx)
[ 0.          1.21306132  0.98101184  0.44626032  0.        ]

We use these functions and matplotlib to plot the original equation

from pylab import plot, show, legend
xx = linspace(0, 5, 50000)
plot(xx, fn_numpy(xx))

Performance

When we profile these functions we find that the Fortran solution runs a bit faster. This is because it is able to fuse all of the scalar operations into one loop while the numpy solution walks over memory several times, performing each operation individually. Jensen wrote a more thorough blogpost about this when he worked on code generation. He shows substantial performance increases as the complexity of the mathematical expression increases.

>>> timeit fn_numpy(xx)
1000 loops, best of 3: 1.4 ms per loop
>>> timeit fn_fortran(xx)
1000 loops, best of 3: 884 us per loop

This weekend I built up a translation from SymPy expressions to Theano computations. This builds off of old work done with Frederic Bastien at SciPy2012.

>>> from sympy.printing.theanocode import theano_function
>>> fn_theano  = theano_function([x], [expr], dims={x: 1}, dtypes={x: 'float64'})
>>> timeit fn_theano(xx)
1000 loops, best of 3: 1.04 ms per loop

Theano generates C code that performs the same loop fusion done in Fortran but it incurs a bit more startup time. It performs somewhere between the numpy and Fortran solutions.

However, the SymPy to Theano translation interface only takes up about a page of code while the lambdify and autowrap modules are substantially more complex. Additionally, Theano is actively developed and is sure to improve and track changes in hardware well into the future. lambdify and autowrap have been relatively untouched over the past year. For example Theano is able to seemlessly compile these computations to the GPU.

Leveraging Theano

In the above example we used Theano to copy the behavior of SymPy’s existing numpy and Fortran numeric solutions. Theano is capable of substantially more than this. To show a simple example we’ll compute both our original output and the derivative simultaneously.

outputs = expr, simplify(expr.diff(x))
print latex(outputs)

$$\begin{pmatrix}\frac{8}{3} x \left(- 4 x + 4\right) e^{- 2 x}, & \frac{32}{3} \left(2 x^{2} - 4 x + 1\right) e^{- 2 x}\end{pmatrix}$$

We redefine our functions to produce both outputs, instead of just expr alone

fn_numpy  = lambdify([x], outputs, 'numpy')
fn_theano = theano_function([x], outputs, dims={x: 1}, dtypes={x: 'float64'})

fns_fortran = [ufuncify([x], output) for output in outputs]
fn_fortran  = lambda xx: [fn_fortran(xx) for fn_fortran in fns_fortran]

The expression and its derivative look like this:

for y in fn_theano(xx):
plot(xx, y)
legend(['$R_{31}$', "$R'_{31}$"])

Because Theano handles common subexpressions well it is able to perform the extra computation with only a very slight increase in runtime, easily eclipsing either of the other two options.

>>> timeit fn_numpy(xx)
100 loops, best of 3: 2.85 ms per loop
>>> timeit fn_fortran(xx)
1000 loops, best of 3: 1.8 ms per loop
>>> timeit fn_theano(xx)
1000 loops, best of 3: 1.16 ms per loop

When we extend this experiment and vary the number of simultaneous derivatives we observe the following runtimes

In the case of highly structured computation Theano is able to scale very favorably.

Conclusion

The Theano project is devoted to code generation at a level that exceeds the devotion of SymPy to this same topic. This is natural and prevalent. When we combine the good parts of both projects we often achieve a better result than with an in-house solution

In-house solutions to foreign problems lack persistence. As programmers within an ecosystem we should make projects that do one thing well and provide clean interfaces and simple data structures to encourage inter-project communication.

March 03, 2013

Aaron Meurer

When does x^log(y) = y^log(x)?

In this blog post, when I write $\log(x)$, I mean the natural logarithm, or log base $e$, i.e., $\ln(x)$.

A discussion on a pull request got me thinking about this question: what are the solutions to the complex equation $x^{\log{(y)}} = y^{\log(x)}$?  At the outset, they look like different expressions.  But clearly there some solutions. For example, if $x = y$, then obviously the two expressions will be the same.  We probably should exclude $x = y = 0$, though note that even if $0^{\log(0)}$ is well-defined (probably if it is it is either 0 or complex $\infty$), it will be the same well-defined value. But for the remainder of this blog post, I’ll assume that $x$ and $y$ are nonzero.

Now, observe that if we apply $\log$ to both sides of the equation, we get $\log{\left(x^{\log(y)}\right )} = \log {\left (y^{\log(x)}\right )}$.  Now, supposing that we can apply the famous logarithm exponent rule, we would get $\log(x)\log(y) = \log(y)\log(x)$, which means that if additionally $\log$ is one-to-one, we would have that the original expressions must be equal.

The second question, that of injectivity, is easier to answer than the first, so I’ll address it first.  Note that the complex exponential is not one-to-one, because for example $e^0 = e^{2\pi i} = 1$.  But we still define the complex logarithm as the “inverse” of the complex exponential.  What this really means is that the complex logarithm is strictly speaking not a function, because it is not well-defined. Recall that the definition of one-to-one means that $f(x) = f(y)$ implies $x = y$, and that the definition of well-defined is that $x = y$ implies $f(x) = f(y)$.  It is clear to see here that $f$ being one-to-one is the same as $f^{-1}$ being well-defined and visa-versa ($f^{-1}$ here is the same loose definition of an inverse as saying that the complex logarithm is the inverse of the complex exponential).

So note that the complex logarithm is not well-defined exactly because the complex exponential is not one-to-one.  We of course fix this problem by making it well-defined, i.e., it normally is multivalued, but we pick a single value consistently (i.e., we pick a branch), so that it is well-defined.  For the remainder of this blog post, I will assume the standard choice of branch cut for the complex logarithm, i.e., the branch cut is along the negative axis, and we choose the branch where, for $x > 0$, $\log(x)$ is real and $\log(-x) = \log(x) + i\pi$.

My point here is that we automatically know that the complex logarithm is one-to-one because we know that the complex exponential is well-defined.

So our question boils down to, when does the identity $\log{\left (z^a\right)} = a \log(z)$ hold?  In SymPy, this identity is only applied by expand_log() or logcombine() when $a$ is real and $z$ is positive, so let us assume that we know that it holds under those conditions. Note that it also holds for some other values too.  For example, by our definition $\log{\left (e^{i\pi}\right)} = \log(-1) = \log(1) + i\pi = i\pi = i\pi\log(e)$.  For our example, this means that $x = e$, $y = -1$ is a non-trivial solution (non-trivial meaning $x \neq y$).   Actually, the way that the complex logarithm being the “inverse” of the complex exponential works is that $e^{\log(x)} = x$ for all $x$ (on the other hand $\log{\left(e^x\right)} \neq x$ in general), so that if $x = e$, then $x^{\log(y)} = e^{\log(y)} = y$ and $y^{\log(x)} = y^{\log(e)} = y^1 = y$.  In other words, $x = e$ is always a solution, for any $y\, (\neq 0)$ (and similarly $y = e$ for all $x$).  In terms of our question of when $\log{\left(z^a\right)} = a\log(z)$, this just says that this always true for $a = \log(e) = 1$, regardless of $z$, which is obvious.  We can also notice that this identity always holds for $a = 0$, regardless of $z$. In terms of our original equation, this means that $x = e^0 = 1$ is a solution for all $y$ (and as before, $y = 1$ for all $x$).

Note that $z > 0$ and $a$ real corresponds to $x, y > 0$ and $\log(x), \log(y)$ real, respectively, (which are the same condition).  So we have so far that the following are solutions to $x^{\log(y)} = y^{\log(x)}$:

• $x, y > 0$
• $x = y$
• $x = e$, $y$ arbitrary
• $y = e$, $x$ arbitrary
• $x = 1$, $y$ arbitrary
• $y = 1$, $x$ arbitrary

Now let’s look at some cases where $\log{\left (z^a\right)} \neq a\log(z)$.  If $z < 0$ and $a$ is a nonzero even integer, then $z^a > 0$ so $\log{\left (z^a \right)}) = \log{\left (\left (-z\right )^a \right )} = a\log(-z)$, whereas $a\log(z) = a(\log(-z) + i\pi)$, which are different by our assumption that $a \neq 0$.  If $a$ is an odd integer not equal to 1, then $z^a < 0$, so $\log{\left (z^a \right)} = \log{\left (-z^a \right )} + i\pi$ = $latex \log{\left (\left(- z\right)^{a} \right )} + i\pi$ WordPress is refusing to render this. It should be log((-z)^a) + iπ = $a\log(-z) + i\pi$, whereas $a\log(z) = a(\log(-z) + i\pi)$ again, which is not the same because $a \neq 1$. This means that if we let $x < 0$ and $y = e^a$, where $a \neq 0, 1$, we get a non-solution (and the same if we swap $x$ and $y$).

This is as far as I got tonight. WordPress is arbitrarily not rendering that LaTeX for no good reason. That and the very ugly LaTeX images is pissing me off (why wordpress.com hasn't switched to MathJaX yet is beyond me). The next time I get some free time, I am going to seriously consider switching my blog to something hosted on GitHub, probably using the IPython notebook. I welcome any hints people can give me on that, especially concerning migrating pages from this blog.

Here is some work on finding the rest of the solutions: the general definition of $\log(x)$ is $\log(|x|) + i\arg(x)$, where $\arg(x)$ is chosen in $(-\pi, \pi]$. Therefore, if $\log{\left(z^a\right )} = a\log(z)$, we must have $\arg(z^a) = a\arg(z)$. I believe a description of all such complex $z$ and $a$ will give all solutions $x = z$, $y = e^a$ (and $y = z$, $x = e^a$) to $x^{\log(y)} = y^{\log(x)}$. I need to verify that, though, and I also need to think about how to describe such $z$ and $a$. I will (hopefully) continue this post later, either by editing this one or writing a new one (depending on how much more I come up with).

Any comments to this post are welcome. I know you can't preview comments, but if you want to use math, just write it as $latex math$ (like $latex \log(x)$ for $\log(x)$). If you mess something up, I’ll edit your comment and fix it.

February 26, 2013

Fabian Pedregosa

Loss Functions for Ordinal regression

Note: this post contains a fair amount of LaTeX, if you don't visualize the math correctly come to its original location

In machine learning it is common to formulate the classification task as a minimization problem over a given loss function. Given data input data $(x_1, ..., x_n)$ and associated labels $(y_1, ..., y_n), y_i \in \lbrace-1, 1\rbrace$, the problem becomes to find a function $f(x)$ that minimizes

$$L(x, y) = \sum_i ^n \text{loss}(f(x_i), y_i)$$

where loss is any loss function. These are usually functions that become close to zero when $f(x_i)$ agrees in sign with $y_i$ and have a non-negative value when $f(x_i)$ have opposite signs. Common choices of loss functions are:

• Zero-one loss, $I(f(x_i) = y_i)$, where $I$ is the indicator function.
• Hinge loss, $\text{max}(0, 1 - f(x_i) y_i)$
• Logistic loss, $\log(1 + \exp{f(x_i) y_i})$

1

In the paper Loss functions for preference levels: Regression with discrete ordered labels, the above setting that is commonly used in the classification and regression setting is extended for the ordinal regression problem. In ordinal regression, classes can take one of several discrete, but ordered, labels. Think for example on movie ratings that go from zero to ten stars. Here there's an inherent order in the sense that (unlike the multiclass classification setting) not all errors are equally bad. For instance, it is worse to mistake a 1-star with a 10-star than a 4-star movie with a 5-star one.

To extend the binary loss to the case of ordinal regression, the author introduces K-1 thresholds $-\infty = \theta_0 \lt \theta_1 \lt ... \lt \theta_{K-1}=\infty$. Each of the K segments corresponds to one of K labels and a predicted value between $\theta_{d}$ and $\theta_{d+1}$ corresponds to the prediction of class $d$ (supposing that classes to from zero to $d$). This generalizes the binary case by considering zero the unique threshold.

Suppose K=7 and a that the correct outcome of $y_i$ is 2, that is, $f(x_i)$ must lie between $\theta_1$ and $\theta_2$. In that case, it must be verified that $f(x_i) > \theta_0$ and $f(x_i) > \theta_1$ as well as $f(x_i) < \theta_3 < \theta_4 < \theta_5$. Treating this as K-1=6 independent classification problems produces the following family of loss functions (for a hinge loss):

The idea behind Rennie's paper is to sum all these loss function to produce a single optimization problem that is then the sum of all threshold violations. Here, not only the function increases when thresholds are violated, but the slope of the loss function increases each time a threshold is crossed.

1. Code for generating all figures can be found here

February 25, 2013

MRocklin

Maximum a Posteriori Estimation

Disclaimer: I know relatively little about this application. Corrections welcome.

In this post we see how SymPy can simplify common numeric calculations, particularly in Bayesian inference problems.

Imagine you are a scientist studying some counting process (like radioactive decay or the number of page requests on a web server). You describe this process with a Poisson random variable and try to learn the rate parameter of this distribution by observing some random samples.

If you have no preconceptions about the rate then this problem is easy. You just divide total counts by total time and you’re done.

A more complex problem arises when external theory provides prior information about your rate parameter (for example physics might impose rules on the rate of radioactive decay). Lets model this problem in SymPy. For the sake of concreteness lets arbitrarily assume that $\lambda$, the rate parameter, follows a Beta distribution with parameters a and b.

a, b = symbols('a,b', positive=True)
lam = Symbol('lambda', positive=True)
rate = Beta(lam, a, b)
count = Poisson('X', rate)


In the lab we observe many samples $x_i$ taken from count. From these we wish to find the most likely value of rate. The probability of any single value of rate given our data can be rewritten with Bayes’ rule.

$$p(\lambda \vert x_i) \propto \prod_i p(x_i \vert \lambda) \cdot p(\lambda)$$

In this case the distributions are given by

pdf = density(count, rate);  print latex(pdf(x))  # density of count, given rate
pdf = density(rate);         print latex(pdf(lam))


$$p(x_i \vert \lambda) = \frac{\lambda^{x}}{e^{\lambda} x!} \;\;\;\; p(\lambda) = \frac{\lambda^{a - 1} \left(- \lambda + 1\right)^{b - 1} \Gamma\left(a + b\right)}{\Gamma\left(a\right) \Gamma\left(b\right)}$$

To find the maximizer of $p(\lambda \vert x_i)$ we set the derivative equal to zero. We simplify the computation by taking the log. Because log is monotonic this does not change the solution.

$$0 = \frac{d}{d\lambda} \log\left( \prod_i p(x_i \vert \lambda) \cdot p(\lambda)\right) = \frac{d}{d\lambda} \sum_i \log(p(x_i \vert \lambda) \cdot p(\lambda))$$

We can accomplish this in SymPy with the following code

# Model n observations with a function data indexed by integer i
i, n = symbols('i,n', integer=True)
data = Function('data')

# Compute log likelihood
loglikelihood = log(Product(density(count, rate)(data(i)) * density(rate)(lam), (i, 1, n)))
Eq(simplify(loglikeihood.diff(lam)), 0)


$$\sum_{i=1}^{n} \frac{a \left(\lambda - 1\right) + b \lambda - \lambda \left(\lambda - 1\right) - 2 \lambda + \left(\lambda - 1\right) \operatorname{data}{\left[i \right]} + 1}{\lambda \left(\lambda - 1\right)} = 0$$

Discussion

SymPy reduces this Bayesian inference problem to finding roots of the above equation. I suspect that many prevalent numeric problems could be similarly accelerated through a symbolic preprocessing step.

Looking at the equation above it’s clear that this problem can be simplified further. However I like the existing solution because it does not depend on the user possessing any mathematical expertise beyond the ability to describe their mathematical model (the derivatives, log, etc… are generally applicable to this problem). In what other automated ways can SymPy further process computations like this? What are other ways that aren’t in SymPy but could be developed in the future?

I suspect that the problem given here is analytically solvable. To the extent possible SymPy should try to solve these problems. However for the vast number of problems without analytic solutions I suspect there is still a great deal we can do, either by reducing the problem as above or through the mathematically informed selection of numeric algorithms.

Various root finding algorithms are appropriate in different cases. Wikipedia suggests Householder’s Method, a generalization on Newton’s method for scalar systems with known derivatives. Perhaps in cases where SymPy is unable to solve the problem analytically it could select the correct numeric algorithm. Is this a reasonable use case for SymPy?

February 05, 2013

MRocklin

Assuming assumptions

SymPy has two assumptions systems called (unimaginatively) “old assumptions” and “new assumptions.” They differ in how they manage mathematical attributes.

Old Assumptions

In old assumptions attributes are bound to variables

>>> x = Symbol('x', positive=True)
>>> y = Symbol('y', positive=True)


These are then composed into expressions.

>>> expr = 2*x + y


And we query these expressions directly

>>> expr.is_positive
True


The expression and the attributes are woven into the same object.

New Assumptions

In new assumptions variables and attributes are maintained separately.

>>> x = Symbol('x')
>>> y = Symbol('y')

>>> facts = Q.positive(x) & Q.positive(y)


The construction of mathematical expressions remains unchanged

>>> expr = 2*x + y


But querying now requires two inputs, both the expression and the facts.

>>> ask(Q.positive(expr), facts)
True


The separation of facts from expressions enables rich logical inference but it requires the management of two separate variables, expr and facts, rather than just one, expr. It is difficult to consistently pass the extra variable through all relevant function calls.

Global assumptions

One solution to the management problem is to keep all facts in a globally accessible collection. This removes the need to pass an extra argument between function calls.

This little known feature is already accessible in SymPy

>>> # Setup
>>> global_assumptions.add(Q.positive(x))
>>> global_assumptions.add(Q.positive(y))

>>> # Compute in this context
>>> ask(Q.positive(2*x + y))
True


Unfortunately global variables often cause confusion. We will invariably add an experimental fact to the global collection and then forget to clean up, polluting future computations. In this case we need to always remember to clean up after we’re done.

>>> # Cleanup
>>> global_assumptions.remove(Q.positive(x))
>>> global_assumptions.remove(Q.positive(y))


This cleanup step is both crucial and forgettable. We can not trust ourselves to remember it.

Introducing assuming

Context managers provide the convenience of global variables with side-effect free security. This is accomplished through consistent cleanup.

SymPy now includes, assuming, a context manager for mathematical assumptions. Here is an example

>>> facts = Q.positive(x), Q.positive(y)

>>> with assuming(*facts):
...     ask(Q.positive(2*x + y))
True


All ask calls within this block have global-like access to the knowledge Q.positive(x) and Q.positive(y). These calls may be at top level as in the example above or buried deeply within function calls. This arrangement is convenient because we do not need to pass down facts through all function calls. This knowledge is pervasive like a global variable but contained within the with assuming clause.

January 28, 2013

Aaron Meurer

Tip for debugging SymPy with PuDB

Usually, when I debug SymPy code with PuDB, I create a script that calls the code, then I put a

import pudb; pudb.set_trace()


in the SymPy library code where I want to start debugging. But this is annoying, first because I have to create the script, and second, because I have to modify the library code, and there’s always the risk of accidentally commiting that. Also, if I want to start debugging somewhere else, I have to edit the files and change it.

Well, I just figured out a better way. First, if you haven’t already, add an alias like this in your bash config file (~/.profile or ~/.bashrc):alias pudb='python -m pudb.run. As of this pull request, this is no longer necessary. A pudb script is installed automatically with PuDB.

This will let you run pudb script.py to debug script.py. Next, start PuDB. It doesn’t matter with what. You can just run touch test.py, and then pudb test.py. It occured to me that you can just set the breakpoint when starting isympy with PuDB.

Now, press m, and navigate to where in the library code you want to start debugging. It also helps to use / to search the current file and L to jump to a specific line. When you get to the line where you want to start debugging, press b to set a breakpoint. You can do this in multiple places if you want.

Now, you just have to start isympy from within PuDB. Just run pudb bin/isympy, and immediately press c to jump to the interactive prompt. Now, run whatever code you want to debug. When it gets to the breakpoint, PuDB will open, and you can start debugging. If you type c to continue, it will go back to isympy. But the next time you run something that hits the breakpoint, it will open PuDB again.

This trick works because breakpoints are saved to file (at ~/.config/pudb/saved-breakpoints). In fact, if you want, you can just modify that file in the first step. You can edit your saved breakpoints in the bottom right pane of PuDB.

When you are done and you type Ctrl-D PuDB will pop-up again, asking if you want to quit. That’s because it was running the whole time, underneath isympy. Just press q. Note that you should avoid pressing q while debugging, or else PuDB will quit, and you will be left with just normal isympy (it won’t break at your breakpoints any more). Actually, if you do this, but doing Ctrl-D still opens the PuDB prompt, you can just press “Restart”, and it should start working again. Note that “Restart” will not actually reset isympy: all your saved variables will still be the same, and any changes to the library code will not be reloaded. To do that, you have to completely exit and start over again.

Of course, there is nothing SymPy specific about this trick. As long as you have a script that acts as an entry point to an interactive console for your application, you can use it. If you just use IPython, you can use something like pudb /bin/ipython (replace /bin/ipython with the output of which ipython).

January 25, 2013

MRocklin

Commutative Unification

LogPy now supports commutative and associative pattern matching on expression trees. This is a standard requirement for computer algebra systems like SymPy but not a traditional feature of logic programming systems.

Pattern-matching in LogPy is expressed by the eq goal. This goal relies on unification to match trees of tuples. Unification is a computational cornerstone of LogPy. Traditionally eq performs exact structural pattern matching. For example

(1, x, (5, y, 7))  matches  (1, (2, 3, 4), (5, 6, 7))

with the following substitution

{x: (2, 3, 4), y: 6}

Expression Trees

We traditionally represent both mathematical expressions and computer programs with expression trees. For example $$y * (1 + x)$$ can be visualized as follows

We represent this expression in LogPy with tuples. The head/first element of each tuple is an operation like add or mul. All subsequent elements (the tail) are the arguments/children of that expression.

y * (x + 1) -> (mul, y, (add, x, 1))

Matching Expression Trees

This form is exactly what we use for unification. We could match this pattern against the following expression, treating x and y as wildcard logic variables

(mul, y, (add, x, 1))  matches  (mul, (pow, 2, 10), (add, 3, 1))

with the following substitution

{x: 3, y: (pow, 2, 10)}

But what about the following?

(add, x, 1)  matches?  (add, 1, 3)

This doesn’t unify. While the first (add, add) and second (x, 1) elements can match (if {x: 1}) the third elements (1, 3) will not.

As mathematicians however we know that because add is commutative these expressions should match if we are allowed to rearrange the terms in the tail and match 1 to 1 and x to 3. LogPy doesn’t know this by default. LogPy is not a math library.

Building Commutative Equality

Given the goal seteq for set unification and a goal conso for head-tail pattern matching we build eq_commutative for commutative matching.

Example of seteq

run(0, x, seteq((1, 2, x), (3, 1, 2)))  # seteq matches within sets
(3,)

Example of conso

run(0, head,  conso(head, tail, (1, 2, 3, 4)))
(1,)
run(0, tail,  conso(head, tail, (1, 2, 3, 4)))
((2, 3, 4),)

Given these two it is easy to build eq_commutative

def eq_commutative(u, v):
operation, utail, vtail = var(), var(), var()
return lall(conso(operation, utail, u),
conso(operation, vtail, v),
commutative(operation),
seteq(utail, vtail))


That is we require all of the following (lall is logical all).

• u must be of the form (operation, utail....)
• v must be of the form (operation, vtail....). Note that the same variable operation must be the same in both expressions.
• The operation must be commutative (operations register themselves beforehand, see example below)
• utail and vtail must unify under set equality.

I am glossing over some details here, like “what about associative matching” and “how does seteq work?” but this should give a high-level view of how logic programs are made. Lets see an example of associative/commutative matching

Example

This is the standard example for commutative matching found in the repository

from logpy import run, var, fact
from logpy.assoccomm import eq_assoccomm as eq
from logpy.assoccomm import commutative, associative

# Define some dummy Operationss
add = 'add'
mul = 'mul'

# Register that these ops are commutative using the facts system
fact(commutative, mul)
fact(commutative, add)
fact(associative, mul)
fact(associative, add)

# Define some wild variables
x, y = var('x'), var('y')

# Two expressions to match
pattern = (mul, (add, 1, x), y)                # (1 + x) * y
expr    = (mul, 2, (add, 3, 1))                # 2 * (3 + 1)
print run(0, (x,y), eq(pattern, expr))
# ((3, 2),) #  meaning one result with x matches to 3 and y matches to 2


Conclusion

With this LogPy contains all of the functionality of SymPy’s old unify module but in a cleaner and much more extensible form.

January 17, 2013

MRocklin

LogPy - Facts and Relations

In my last post I introduced LogPy, a library for logic and relational programming in Python. In this post I show how LogPy can be used as a quick and dirty in-memory database.

Data

As an example we’ll look at the 50 states in the US. We know two things about each state.

1. Is it coastal? For example California (CA) is coastal because it is next to the Pacific Ocean, Arizona (AZ) is not.
2. To which other states is it adjacent? For example California (CA) is adjacent to Oregon (OR), Arizona (AZ) and Nevada (NV).

We express data in LogPy using relations and facts

>>> from logpy import Relation, fact, facts
>>> coastal = Relation()
>>> fact(coastal, 'CA')


here we have asserted the fact that 'CA' is coastal. Lets quickly do this for all of the coastal states

>>> coastal_states = 'WA,OR,CA,TX,LA,MS,AL,GA,FL,SC,NC,VA,MD,DE,NJ,NY,CT,RI,MA,ME,NH,AK,HI'
>>> for state in coastal_states.split(','):
...     fact(coastal, state)


Adjacency is only slightly more complex to express. The following code asserts that California (CA) is adjacent to Arizona (AZ) and that California (CA) is adjacent to Oregon (OR).

>>> adjacent = Relation()
>>> fact(adjacent, 'CA', 'AZ')
>>> fact(adjacent, 'CA', 'OR')


Now we need a list of all adjacent pairs of states. Fortunately someone else has already compiled such a list. His data looks like this

AK
AL,MS,TN,GA,FL
AR,MO,TN,MS,LA,TX,OK
AZ,CA,NV,UT,CO,NM
CA,OR,NV,AZ
CO,WY,NE,KS,OK,NM,AZ,UT
...

Each line says that the first element is adjacent to the following ones. So for example Alaska (AK) is adjacent to no states and California (CA) is adjacent to Oregon (OR), Nevada (NV) and Arizona (AZ). We can parse this file and assert the relevant facts with fairly standard Python code

f = open('examples/data/adjacent-states.txt')  # lines like 'CA,OR,NV,AZ'
adjlist = [line.strip().split(',') for line in f]
f.close()

for L in adjlist:                   # ['CA', 'OR', 'NV', 'AZ']
head, tail = L[0], L[1:]        # 'CA', ['OR', 'NV', 'AZ']
for state in tail:
fact(adjacent, head, state) # e.g. 'CA' is adjacent to 'OR',
#      'CA' is adjacent to 'NV', etc...


Queries

Once have asserted the relevant facts we can run queries with the logical expressions of LogPy. Recall from the last post that we can use relations to express logical goals and use run to search for cases that satisfy those goals. Here are two simple queries

>>> from logpy import var, run
>>> x = var()
>>> print run(0, x, adjacent('CA', 'NY')) # is California adjacent to New York?
()

>>> print run(0, x, adjacent('CA', x))    # all states next to California
('OR', 'NV', 'AZ')


We can construct more complex queries with multiple goals. In SQL the following queries would require a JOIN

>>> y = var()  # create second variable for complex queries

>>> print run(0, x, adjacent('TX', x),    # all coastal states next to Texas
...                 coastal(x))
('LA',)

>>> print run(5, x, coastal(y),           # five states that border a coastal state
...                 adjacent(x, y))
('VT', 'AL', 'WV', 'DE', 'MA')

>>> print run(0, x, adjacent('TN', x),    # all states adjacent to Tennessee
...                 adjacent('FL', x))    #        and adjacent to Florida
('GA', 'AL')


Facts and relations are currently indexed by default, yielding relatively fast query times.

Conclusion

LogPy provides a declarative interface to query complex data. Data is stored as facts/tuples and queries are expressed as logical goals. This system is expressive and can match SQL in many respects. The use of Logic programming languages for database queries has roots in Datalog a subset of Prolog designed for databases.

January 14, 2013

MRocklin

Introducing LogPy

LogPy is a library for logic and relational programming in Python. This post contains some introductory examples.

Informative Examples

LogPy enables the expression of relations and the search for values which satisfy them. The following code is the “Hello, world!” of logic programming. It asks for 1 number, x, such that x == 5

>>> from logpy import run, eq, membero, var, conde
>>> x = var()
>>> run(1, x, eq(5, x))
(5,)


Multiple variables and multiple goals can be used simultaneously. The following code asks for a number x such that x == z and z == 3

>>> z = var()
>>> run(1, x, eq(x, z),
eq(z, 3))
(3,)


LogPy uses unification, an advanced form of pattern matching, to match within expression trees. The following code asks for a number, x, such that (1, 2) == (1, x) holds.

>>> run(1, x, eq((1, 2), (1, x)))
(2,)


The above examples use eq, a goal to state that two expressions are equal. Other goals exist. membero(item, coll), a goal, states that item is a member of coll, a collection.

The following example uses membero twice to ask for 2 values of x, such that x is a member of (1, 2, 3) and that x is a member of (2, 3, 4).

>>> run(2, x, membero(x, (1, 2, 3)),  # x is a member of (1, 2, 3)
membero(x, (2, 3, 4)))  # x is a member of (2, 3, 4)
(2, 3)


We can write other fancier goals too. Here is a list of all prime numbers within 1..10. primo depends on the traditional prime and isprime functions found in sympy.

>>> from logpy.math import primo
>>> run(0, x, (membero, x, (1,2,3,4,5,6,7,8,9,10)),
(primo, x))
(3, 2, 7, 5)


Want just a few primes? Here are five numbers that satisfy the primo goal

>>> run(5, x, primo(x))
(2, 3, 5, 7, 11)


Relations

We often want to state and then query data. Logic programming represents data a set of facts and represents queries with logical goals. In the following examples we assert some facts about the Simpsons family, construct queries through logical goals and then run the queries to obtain results.

The following code defines a parent relation and uses it to state who fathered whom.

>>> from logpy import Relation, facts
>>> parent = Relation()
>>> facts(parent, ('Homer', 'Bart'),
...               ('Homer', 'Lisa'),
...               ('Abe',  'Homer'))


We ask some questions using the parent relation as a goal constructor. Who is Bart’s father?

>>> run(1, x, parent(x, 'Bart'))  # one x such that x is a parent of Bart
('Homer',)

>>> run(2, x, parent('Homer', x)) # two xs such that Homer is a parent of x
('Lisa', 'Bart')


We can use intermediate variables for more complex queries. Who is Bart’s grandfather?

>>> y = var()
>>> run(1, x, parent(x, y),
parent(y, 'Bart'))
('Abe',)


We can express the grandfather relationship separately. In this example we use conde, a goal constructor for logical and and or.

>>> def grandparent(x, z):
...     y = var()
...     return conde((parent(x, y), parent(y, z)))

>>> run(1, x, grandparent(x, 'Bart'))
('Abe,')


grandparent demonstrates that we can construct complex relations programmatically. How would you define sibling? How about uncle or aunt? How about descendant?

If you’d like to play with LogPy you can install it with pip or easy_install using

pip install logic

or clone it directly from github

git clone git@github.com:logpy/logpy.git

Source is available at http://github.com/logpy/logpy/, design input and contributions are much appreciated.

Logic Programming in General

Logic and relational programming are making a comeback. They were popular in the 80s, died during the AI dark ages, and have recently begun a resurgence in the functional programming community. Logic programs write music, search databases, write numeric algorithms, and build testing frameworks. It is expressive for a wide class of problems.

The design of LogPy is based off of miniKanren, a simple and powerful implementation in Scheme popularized through the core.logic Clojure library.

January 09, 2013

Brian Granger

Personal Reflections on Features and Scope

In a previous blog post, I talked about the problem of ever expanding feature sets in open source software projects.  I discussed the hidden  costs and liabilities of adding new features and proposed practical ways of tackling these problems.  However, I think that story is incomplete.  Of course open source projects will want to add features, even some with massive costs and liabilities, but how should that happen?  How should projects decide which features to add?  What criteria should be used?  To answer these questions, I want to describe a few of my own experiences in dealing with features and scope in IPython and PyZMQ.

First, IPython.  Anyone who is familiar with IPython will probably read my previous blog post and say, “wait, hasn’t IPython experienced a massive amount of feature creep over the last few years.  It started out as a simple terminal based interactive shell, but has grown to include parallel computing, a web application and a GUI application.  The IPython developers, including yourself, have shown little, if any restraint in adding features.”

Back in 2004, IPython was the 3 year old child of Fernando Perez.  At that point, it was already a popular, enhanced interactive Python shell.  I had started to work on parallel computing libraries for Python, when Fernando visited me at Santa Clara University (we had been classmates in graduate school starting in 1996).  We stayed up until 3 am talking about Python’s rapidly evolving ecosystem of scientific computing libraries, interactive computing environments and the future of IPython and my parallel computing libraries.  IPython was a fantastic tool, but both of us wanted more.  We wanted a web based IPython Notebook.  We wanted integrated parallel computing tools.  We began to see that all of these things required the same architecture: a stateful computational engine, the IPython Kernel, that could run user code and send back results over a network.  Once that existed, everything else could be layered on top, even the terminal based IPython.  The vision that emerged that night was specific and, while ambitious, was ultimately finite.  As Fernando describes, in this blog post, it took us ~6 years and multiple attempts to build this architecture.  Now, we have a terminal, QtConsole, web-based notebook and parallel computing library all built on top of a common IPython Kernel and message specification.  While these things may look like feature creep from the outside, they have been part of a deliberate and calculated plan.

What I have learned from this experience is that it is absolutely critical for the core developers of an open source project to consciously and deliberately set the scope and vision for the project.  This vision can even be ambitious.  Once that scope is set, choosing which features to add becomes easy: to first order, you add features within that scope.  That doesn’t mean you don’t count the cost of those features or that you add all possible features within the scope.  The scope provides an upper bound on the feature set.  So why does IPython have parallel computing tools?  Because we set the scope early on to include that.

Second, PyZMQ.  As we developed IPython’s architecture, we realized that we needed better networking/messaging libraries.  This led to the creation of PyZMQ in 2010, which is a set of Python bindings for ZeroMQ.  Notice the scope of PyZMQ: Python bindings for ZeroMQ.  Fast forward to Spring of 2012.  In the course of working on the IPython Notebook, I had written some additional code for building PyZMQ based web applications: a module for simple, PyZMQ-based RPC and a module for running Tornado event handlers in a scalable and distributed manner using PyZMQ.  I submitted two pull requests to include these modules in PyZMQ.  One of them was quickly merged and the other languished in code review for almost a year.  Min began to sense that these modules, while useful, were not a good match for inclusion in PyZMQ.  Their API was very unstable and they hadn’t been well tested.  They would require a much faster release cycle compared to the rest of PyZMQ, which was very stable and well tested.  This Christmas break, I finally woke up to the real problem.  These modules were outside the scope of PyZMQ.  They involved an implicit and undiscussed increase in scope from “Python bindings for ZeroMQ” to “A general place for ZeroMQ based tools in Python.”  Once I realized this, Min and I pulled them out of PyZMQ and created separate projects for them.  Again, the important thing is to set the scope of a project.

Features and Scope in Open Source Software

Follow up: I have created another blog post to clarify some of these issues here.

The IPython project, through UC Berkeley and Cal Poly San Luis Obispo, just received a $1.15 million dollar grant from the Alfred P. Sloan Foundation to develop the IPython Notebook over a two year period. More details about this grant can be found on the IPython website. This is really exciting for us because, so far, we have mostly developed IPython in our spare time. But I think there is also a potential danger here. The danger is that we will add lots of new features. What, you say, lots of features will endanger IPython? What else are you going to do with a million dollars if you are not going to add lots of new features? The answer is simple: we are going to add as few features as possible and knock each of them out of the park. The future of the project depends on this. This is a topic that I have been thinking about a lot lately: how do open source projects decide which features to implement. Most active open source projects I am involved in see a continual stream of new features. Just hop onto the GitHub pages for SymPy or IPython and watch the activity. Anyone with the right skill set can fork a project on GitHub and submit a pull request within a few hours. Amazingly, this is happening all the time; apparently people love to code. While each new feature is an asset for the project, it also brings a cost, or liability, with it. If a project ignores those costs, it can have long term, detrimental effects on the project. What are these liabilities and costs associated with new features? • Each new feature adds complexity to the code base. Complexity makes a code base less hackable, maintainable, extensible. • Each new feature increases the “bug surface” of the project. When a feature also adds complexity, those bugs become harder to find and fix. • Each new feature requires documentation to be written and maintained. • Each new feature requires support over email or IRC. • Endless feature expansion, or feature creep, requires developers to specialize. They can’t follow the entire project, so they have to focus on a subset that can fit into their brain and schedule. • Each new feature has to be tested on a wide variety on platforms (Linux, Mac, Windows) and environments (PyPy, Python 2, Python 3). • Each new feature adds complexity to the user experience. Sometimes it’s the documentation, other times the UI or configuration options. • When you spend on one feature, another feature or bug fix didn’t get worked on. If you didn’t prioritize things beforehand, you just spent time on something less important to your users. Either that or you did shoddy work while trying to do it all. • Features multiply like bunnies. How many times have you heard, “wow, that new feature is really cool, could you make it do X as well?” • Features are easy to add, difficult to remove. Once you add a feature, you are stuck with the costs and liabilities. For a typical open source project, what percentage of features get used regularly by most users? 15%? 40%?. Let’s be really optimistic and say that number is 60%. That means that developers are spending 40% of their time and effort on features that don’t really get used. Ouch. Then why do open source projects keep adding features without counting the cost? I think there are a number of factors that lead to this, but one in particular comes to my mind. It is hard to say no. When an end user submits a feature request over email or on GitHub issues, it is hard to tell them, “great idea, but we are not doing that.” It is even more difficult to say no after someone submits a pull request that implements a new feature. It is difficult to build a vibrant project community if you are always saying no in these contexts. Clearly, we need a better way of limiting feature expansion in open source projects. What can we do to better evaluate the hidden costs of adding new features so we can make informed, strategic decisions about which features to add? Here are some ideas that have emerged out of my recent reading and conversations. • Create a wiki page for your project, where you list all of the features you are not going to implement. Publicize this list, discuss it and make it an important part of the development workflow. Another way of phrasing this is to decide on a finite scope for the project. When you are going through this exercise, come up with an initial scope and then force yourself to reduce it. • Make features fight hard to be accepted and implemented. Communicate to the community and developers that the default answer to feature requests is no (it’s not personal!) and don’t even consider implementation until the much of the community is crying “we absolutely must have this.” Even then, you don’t have to say yes. • Create a workflow that separates feature requests from other tickets/issues. When people submit new feature requests, encourage discussion, but don’t automatically promote the feature to the project’s todo list. Then you can promote them, as needed, to the project’s todo list in an informed and prioritized manner. • When new feature requests appear, discuss the specific costs and liabilities associated with the feature. Build this thinking into your development DNA. • Communicate to the community and its developers why it is important to fight against feature expansion. Focus on the benefits of waging this war: smaller, simpler code base, fewer bugs, more time to focus on important features, easier to support, etc. • Remove features that have too great a cost or that few users actually use. Maybe even create a special exception you can raise (FeatureReductionWarning?) to help people transition away from them. • Refactor the codebase to reduce complexity. While this doesn’t directly reduce the number of features, it can mitigate the cost of existing and future features. Extra bonus points if you can implement a new feature while dramatically reducing the complexity of the code base. • Improve testing. Again, this is mitigation. As you discuss and evaluate features, here are some questions you can ask yourself and the community: • What fraction of your user base will use the feature? How often will they use it? If it won’t be used by most of your users, just say no. • Can the feature be added as a third party plugin or library? This is especially useful if the new feature would increase the overall scope of the project, but make a great standalone project. • How difficult will it be to test, debug, document, and maintain the feature? What fraction of your development team is capable or interested in doing this work? If the maintanence is huge and only one person is willing to do it, it is time to rethink it. • Can you implement the functionality in a more limited, but much simpler manner? Developers absolutely love to implement features in the most general way possible. It requires developer discipline and focus to resist this temptation. • One way that developers over engineer things is by making every conceivable thing configurable. Can you simplify the feature by removing configurability and just choosing great defaults? One thoughtful discussion of these issues is in the book Getting Real, by some of the folks at 37signals. They propose something quite radical for handling feature requests in commercial web applications. Here is what they say: “How do you manage them? You don’t. Just read them and then throw them away.” Their experience is that the important features will keep coming up. When this happens, you know they are important and you don’t have to write them down to keep track of them. This is definitely my experience in developing the IPython Notebook. The most important features, the ones that I am going to spend time on this year, are probably not written down anywhere, but everyone in the community is discussing them and couldn’t possibly forget them. So why on earth do we currently have 177 open new feature issues (we call them “enhancements”) on IPython’s GitHub site? In an open source project, I don’t think it makes sense to literally throw away feature requests; sometimes the ensuing discussion is valuable and worth preserving. But what about allowing the discussion to occur, for example in the context of a GitHub issue, but then closing the issue. If someone wants to re-open the feature request at a later time to voice their support, they should be encouraged to do that. But again, once discussion stops, the issue should be re-closed. As this process repeats itself, the community is essentially voting for the features they want. This would also dramatically reduce the number of open issues, which helps developers to better manage the active work on the project. I don’t think this is the only way to manage feature requests intelligently in an open source project. I would love to hear other ideas. How are you managing these things in your own projects? I realize that I am far from the first person to write about these things. What other resources do you know of that address these problems? January 03, 2013 Fabian Pedregosa Memory plots with memory_profiler Besides performing a line-by-line analysis of memory consumption, memory_profiler exposes some functions that allow to retrieve the memory consumption of a function in real-time, allowing e.g. to visualize the memory consumption of a given function over time. The function to be used is memory_usage. The first argument specifies what code is to be monitored. This can represent either an external process or a Python function. In the case of an external process the first argument is an integer representing its process identifier (PID). In the case of a Python function, we need pass the function and its arguments to memory_usage. We do this by passing the tuple (f, args, kw) that specifies the function, its position arguments as a tuple and its keyword arguments as a dictionary, respectively. This will be then executed by memory_usage as f(*args, **kw). Let's see this with an example. Take as function NumPy's pseudo-inverse function. Thus f = numpy.linalg.pinv and f takes one positional argument (the matrix to be inverted) so args = (a,) where a is the matrix to be inverted. Note that args must be a tuple consisting of the different arguments, thus the parenthesis around a. The third item is a dictionary kw specifying the keyword arguments. Here kw is optional and is omitted. >>> from memory_profiler import memory_usage >>> import numpy as np # create a random matrix >>> a = np.random.randn(500, 500) >>> mem_usage = memory_usage((np.linalg.pinv, (a,)), interval=.01) >>> print(mem_usage) [57.02734375, 55.0234375, 57.078125, ...]  This has given me a list specifying at different time intervals (t0, t0 + .01, t0 + .02, ...) at which the measurements where taken. Now I can use that to for example plot the memory consumption as a function of time: >>> import pylab as pl >>> pl.plot(np.arange(len(mem_usage)) * .01, mem_usage, label='linalg.pinv') >>> pl.xlabel('Time (in seconds)') >>> pl.ylabel('Memory consumption (in MB)') >>> pl.show()  This will give the memory usage of a single function across time, which might be interesting for example to detect temporaries that would be created during the execution. Another use case for memory_usage would be to see how memory behaves as input data gets bigger. In this case we are interested in memory as a function of the input data. One obvious way we can do this is by calling the same function each time with a different input and take as memory consumption the maximum consumption over time. This way we will have a memory usage for each input. >>> for i in range(1, 5): ... A = np.random.randn(100 * i, 100 * i) ... mem_usage = memory_usage((np.linalg.pinv, (A,))) ... print max(mem_usage) 29.22 30.10 40.66 53.96  It is now possible to plot these results as a function of the dimensions. import numpy as np import pylab as pl from memory_profiler import memory_usage dims = np.linspace(100, 1000, 10) pinv_mem = np.zeros(dims.size) for i_dim, k in enumerate(dims): x = np.random.randn(k, k) tmp = memory_usage((np.linalg.pinv, (x,)), interval=.01) pinv_mem[i_dim] = np.max(tmp) pl.plot(dims, pinv_mem, label='np.linalg.pinv') pl.ylabel('Memory (in MB)') pl.xlabel('Dimension of the square matrix') pl.legend(loc='upper left') pl.axis('tight') pl.show()  January 01, 2013 Aaron Meurer Emacs: One year later As readers of this blog may remember, back in 2011, I decided to move to a command-line based editor. For roughly two weeks in December, 2011, I exclusively used Vim, and for the same amount of time in January, 2012, I used exclusively Emacs. I had used a little of each editor in the past, but this was my first time using them to do true editing work. My experiences are chronicled in my blog posts (parts 1, 2, 3, and 7 months later follow up). To summarize, I decided to use Emacs, as I found it to be much more intuitive, and much more user-friendly. Today, January 1, marks the one-year point of my using Emacs as my sole text editor, with some exceptions (notably, I’m currently writing this blog post in the browser). So I’d like to make some observations: • Either one of these editors (Vim or Emacs) is going to really suck unless you are willing to make a serious investment in customizing them and installing nice addons. For the second point, Emacs has an advantage, because the philosophy of Vim is to be barebones whereas the philosophy of Emacs is to be featureful, so that in particular many things that were once addons of Emacs are now included in the standard installation. For customization, on the one hand, Emacs is easier, because it has a nice interface (M-x customize), but on the other hand, Vim’s scripting language is much easier to hack on than Emacs lisp (I still can’t code in Lisp to save my life; it’s a very challenging programming language). But my point here is that neither has really great defaults. For example, in Emacs, M-space is bound to just-one-space, which is great for programming. What it does is remove all spaces around the cursor, except for one. But to be really useful, it also should include newlines. It doesn’t do this by default. Rather, you have to call it with a negative argument. So to be really useful, you have to add (defun just-one-space-with-newline () "Call just-one-space with a negative argument" (interactive) (just-one-space -1)) (global-set-key (kbd "M-SPC") 'just-one-space-with-newline)  to your .emacs file. • Emacs has great features, but I always have to look them up. Or rather, I have to look up the keyboard shortcuts for them. I only have the keyboard shortcuts memorized for the things I do every day. I even ended up forgetting really important ones, like M-w (Emacs version of copy). And if a feature involves several keystrokes to access, forget about it (for example, rectangular selection, or any features of special modes). If I use a new mode, e.g., for some file type that I rarely edit (like HTML), I might as well not have any of the features, other than the syntax highlighting, because I either don’t know what they are, or even if I know that they should exist (like automatic tag completion for html), I have no idea how to access them. There’s really something to be said about GUI editors, which give these things to users in a way that they don’t have to memorize anything. Perhaps I should try to use the menu more. Or maybe authors of addons should aim to make features require as little cognitive user interaction as possible (such as the excellent auto-complete-mode I mentioned in part 3). I mention this because it is one of the things I complained about with Vim, that the keybindings were too hard to memorize. Of course, the difference with Vim is that one has to memorize keybindings to do even the most basic of editing tasks, whereas with Emacs one can always fall back to more natural things like Shift-Arrow Key to select text or Delete to delete the character under the cursor (and yes, I know you can rebind this stuff in Vim; I refer you to the previous bullet point). • I mentioned at the end of part 3 that Vim might still be useful to learn, as vi is available literally anywhere that you have POSIX. I honestly don’t think I would be able to use vi or vim if I had to, customization or no, unless I had my keyboard cheat sheet and a decent amount of time. If I’m stuck on a barebones system and I can’t do anything about it, I’ll use nano/pico before I use vi. It’s not that I hate vi. I just can’t do anything with it. It is the same to me now as it was before I used it in-depth. I have forgotten all the keyboard shortcuts, except for ESC and i. • I don’t use emacsclient any more. Ever since I got my new retina MacBook Pro, I don’t need it any more, because with the solid state drive starting Emacs from scratch is instantaneous. I’m glad to get rid of it, because it had some seriously annoying glitches. • Add alias e=emacs to your Bash config file (.profile or .bashrc). It makes life much easier. “emacs” is not an easy word to type, at least on QWERTY keyboards. • I still feel like I am not nearly as efficient in Emacs as I could be. On the one hand, I know there are built-in features (like rectangular selection) that I do not take advantage of enough. I have been a bit lazy with customization: there are a handful of things that I do often that require several keystrokes, but I still haven’t created custom keyboard shortcuts for (off the top of my head: copying and pasting to/from the Mac OS X clipboard and rigidly indenting/dedenting a block of text (C-u 4 C-x TAB, actually C-c u 4 C-x TAB, since I did the sensible thing and rebound C-u to clear to the previous newline, and bound universal-argument to C-c u) come to mind). I feel as if I were to watch someone who has used Emacs for a long time that I would learn a lot of tricks. • I really should learn Emacs lisp. There are a lot of little customizations that I would like to make, but they are really niche, and can only be done programmatically. But who has the time to learn a completely new programming language (plus a whole library, as just knowing Lisp is useless if you don’t know the proper Emacs funtions and variables and coding styles)? • I’ve still not found a good visual browser for jumping to function definitions in a file (mostly Python function definitions, but also other kinds of headers for other kinds of files). The best I’ve found is imenu. If you know of anything, please let me know. One thing I really liked about Vim was the tag list extension, which did this perfectly (thanks to commenter Scott for pointing it out to me). I’ve been told that Cedet has something like this, but every time I try to install it, I run into some issues that just seem like way too much work (I don’t remember what they are, it won’t compile or something, or maybe it just wants to do just way too much and I can’t figure out how to disable everything except for the parts I want). • If you ever code in C, add the following to your Makefile check-syntax:$(CC) -o nul $(FLAGS) -S$(CHK_SOURCES)


(and if you don’t use a Makefile, start using one now). This is assuming you have CC and FLAGS defined at the top (generally to something like cc and -Wall, respectively). Also, add the following to your .emacs

;; ===== Turn on flymake-mode ====

(add-hook 'c-mode-common-hook 'turn-on-flymake)
(defun turn-on-flymake ()
"Force flymake-mode on. For use in hooks."
(interactive)
(flymake-mode 1))

(add-hook 'c-mode-common-hook 'flymake-keyboard-shortcuts)
(defun flymake-keyboard-shortcuts ()
"Add keyboard shortcuts for flymake goto next/prev error."
(interactive)
(local-set-key "\M-n" 'flymake-goto-next-error)
(local-set-key "\M-p" 'flymake-goto-prev-error))


The last part adds the useful keyboard shortcuts M-n and M-p to move between errors. Now, errors in your C code will show up automatically as you type. If you use the command line version of emacs like I do, and not the GUI version, you’ll also need to install the flymake-cursor module, which makes the errors show up in the mode line, since otherwise it tries to use mouse popups. You can change the colors using M-x customize-face (search for “flymake”).

• I never got flymake to work with LaTeX. Does anyone know how to do it? It seems it is hardcoded to use MikTeX, the Windows version of LaTeX. I found some stuff, but none of it worked.

Actually, what I really would like is not syntax checking (I rarely make syntax mistakes in LaTeX any more), but rather something that automatically builds the PDF constantly as I type. That way, I can just look over at the PDF as I am writing (I use an external monitor for this. I highly recommend it if you use LaTeX, especially one of those monitors that swivels to portrait mode).

• If you use Mac OS X, you can use the very excellent KeyRemap4MacBook program to make regular Mac OS X programs act more like Emacs. Mac OS X already has many Emacs shortcuts built in (like C-a, C-e, etc.), but that only works in Cocoa apps, and it doesn’t include any meta key shortcuts. This lets you use additional shortcuts literally everywhere (don’t worry, it automatically doesn’t use them in the Terminal), including an emulator for C-space and some C-x commands (like C-x C-s to Command-s). It doesn’t work on context sensitive shortcuts, unfortunately, unless the operating system already supports it with another keyboard shortcut (e.g., it can map M-f to Option-right arrow). For example, it can’t enable moving between paragraphs with C-S-{ and C-S-}. If anyone knows how to do that, let me know.
• For about a month this summer, I had to use a Linux laptop, because my Mac broke and my new Mac took a month to arrive (the downside to ordering a new computer immediately after it is announced by Apple). At this point, my saving of all my customizations to GitHub really helped a lot. I created a new branch for the Linux computer (because several things in my customizations were Mac specific), and just symlinked the files I wanted. A hint I can give to people using Linux is to use Konsole. The Gnome terminal sucks. One thing I never figured out is how to make Konsole (or any other Terminal for that matter) to send Control-Shift shortcuts to Emacs (see http://superuser.com/q/439961/39697). I don’t use Linux any more at the moment, but if anyone knows what was going on there, add an answer to that question.
• In part 3 mentioned that predictive mode was cool, but not very useful. What it does is basically add tab completion for every word in the English language. Actually, I’ve found using auto-complete-mode even when editing text (or LaTeX) to be very useful. Unlike predictive mode, it only guesses words that you’ve already typed (it turns out that you tend to type the same words over and over, and doubly so if those words are LaTeX math commands). Also, predictive mode has a set order of words, which supposedly helps to use it with muscle memory, whereas auto-complete-mode tries to learn what words you are more likely to use based on some basic statistical machine-learning. Also, auto-complete-mode has a much better visual UI and smarter defaults than predictive mode. The result is that it’s actually quite useful and makes typing plain text, as well as LaTeX (actually, pretty much anything, as long as you tend to use the same words repeatedly) much faster. I recommend enabling auto-complete-mode almost everywhere using hooks, like
(add-hook 'latex-mode-hook 'auto-complete-mode)
(add-hook 'LaTeX-mode-hook 'auto-complete-mode)
(add-hook 'prog-mode-hook 'auto-complete-mode)
;; etc.

• At the end of the day, I’m pretty happy with Emacs. I’ve managed to fix most of the things that make it annoying, and it is orders of magnitude more powerful than any GUI editor or IDE I’ve ever seen, especially at just basic text editing, which is the most important thing (I can always use another program for other things, like debugging or whatever). The editor uses the basic shortcuts that I am used to, and is quite efficient to write in. Extensions like auto-complete-mode make using it much faster, though I could use some more extensions to make it even better (namely, a better isearch and a better imenu). Regarding Vim vs. Emacs, I’d like to quote something I said back in my first blog post about Vim over a year ago:

Vim is great for text editing, but not so hot for text writing (unless you always write text perfectly, so that you never need to leave insert mode until you are done typing). Just the simple act of deleting a mistyped word (yes, word, that happens a lot when you are decently fast touch typist) takes several keystrokes, when it should in my opinion only take one (two if you count the meta-key).

Needless to say, I find Emacs to be great for both text editing and text writing.

• December 30, 2012

Aaron Meurer

2012 in review

The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.

Here’s an excerpt:

4,329 films were submitted to the 2012 Cannes Film Festival. This blog had 20,000 views in 2012. If each view were a film, this blog would power 5 Film Festivals

Click here to see the complete report.

December 11, 2012

MRocklin

Statistical Simplification

Lawrence Leemis, a statistician at Williams and Mary, recently published a wonderful interactive visualization on the reduction relationships of statistical distributions. (found via John Cook’s blog)

This excites me because it touches on one of my favorite topics

How do we reusably encode expert knowledge into computational systems?

The Big Challenge

Correct use of mathematical information can accelerate important computations by several orders of magnitude. Unfortunately the people who know the mathematics are not always the ones doing the computation. This results in substantial waste.

How do we integrate expert mathematical knowledge everywhere? One solution is to collaborate more. While collaboration is generally good it doesn’t scale well. As problems become more complex it is more difficult to find all of the necessary experts, especially for smaller relatively unimportant projects. Also, collaboration rarely results in reusable infrastructure. Statistical chemistry projects are rarely applicable to statistical biology problems despite their shared interest in statistics.

One Solution - Multi-Stage Compilation

We could write each expert’s knowledge into a single project and then connect many such projects into a multi-stage compiler. At each each stage we simplify the expression with the knowledge relevant at that stage. We must create a transformation between each pair of connecting stages. Ideally the conceptual distance between connected stages is small and so these transformations are easy.

This isn’t clearly the right solution. It is difficult to chain many small projects together and end up with efficient code. You need to find the right sequence of intermediate layers that are able to communicate mathematical information down to the lowest-level of computational code.

Relevance to SymPy.stats

SymPy.stats endeavors to be a transformation in such a toolchain. It converts stochastic expressions into integral expressions.

The surrounding infrastructure looks like this

When SymPy expressions are imbued with random variables they form stochastic expressions. Sympy.stats transforms these into integral expressions which are then again converted through a variety of methods, either numeric (like Monte Carlo) or again symbolic.

Each stage within this pipeline presents us with the opportunity to simplify the expression with knowledge relevant to that stage. For example at the input and output SymPy Expr layers we make algebraic and trigonometric simplifications like the following

X + X -> 2*X
sin(x)**2 + cos(x)**2 -> 1 

At the integration stage we might separate multivariate integrals if possible or use integration by parts.

Notice that there is no such simplification self-loop at the Stochastic Expr node. This is where the information in Leemis’ chart would fit.

A Failing of sympy.stats

Currently sympy.stats does not simplify stochastic expressions with expert knowledge; it converts directly from stochastic expressions to integral expressions without first applying known simplifications like what is encoded in Leemis’ chart. This causes some fairly embarassing failures

In [1]: from sympy.stats import *

In [2]: X = Normal('X', 0, 1)  # A standard normal random variable

In [3]: density(X**2)
Out[3]:
<< failure: unevaluated integral >>


Any statistician could tell you that the expression X**2 has a Chi Squared distribution which has a simple and well understood density. This relation is commonly known and commonly occurs in practice.

Because sympy.stats doesn’t know this it blindly takes the expression density(X**2) and converts it directly into an integral. The resulting integral is difficult and stumps the integration routines.* In this case knowing basic statistics would have turned an impossible problem into a trivial one.

Future work

We should encode relations on distributions into SymPy. The knowledge in Leemis’s chart could be written down as a knowledgebase of known transformations. Transformations like the following could solve our immediate problem.

Normal(0, 1) -> StandardNormal()
StandardNormal()**2 -> ChiSquared(1)
StandardNormal()**2 + ChiSquared(n) -> ChiSquared(n+1)


Each stage in the compilation pipeline presents us with an opportunity to apply expert knowledge. The Stochastic Expr stage is such an opportunity of which we are not currently taking advantage.

Leemis’s chart is written declaratively, highlighting which logical transformations are possible under which conditions. The new modules on unification and strategies should provide all of the necessary infrastructure to translate Leemis’ chart to functioning code. Writing a minimal simpliication scheme for the above problem might be as simple as

#    rewriterule(from-pattern, to-pattern, wilds)
p1 = rewriterule(Normal(name, 0, 1), StandardNormal(name), wilds=(name,))
p2 = rewriterule(StandardNormal(name)**2, ChiSquared(name, 1), wilds=(name,))
p3 = rewriterule(StandardNormal(name)**2 + ChiSquared(name, n),
ChiSquared(name, n+1), wilds=(name, n))

statsimp = exhaust(bottom_up(multiplex(p1, p2, p3)))


If anyone is interested in this I’d be happy to help out. This is the sort of project that really excites me but that I can’t currently justify time-wise.

December 07, 2012

Fabian Pedregosa

Singular Value Decomposition in SciPy

SciPy contains two methods to compute the singular value decomposition (SVD) of a matrix: scipy.linalg.svd and scipy.sparse.linalg.svds. In this post I'll compare both methods for the task of computing the full SVD of a large dense matrix.

The first method, scipy.linalg.svd, is perhaps the best known and uses the linear algebra library LAPACK to handle the computations. This implements the Golub-Kahan-Reisch algorithm 1, which is accurate and highly efficient with a cost of O(n^3) floating-point operations 2.

The second method is scipy.sparse.linalg.svds and despite it's name it works fine also for dense arrays. This implementation is based on the ARPACK library and consists of an iterative procedure that finds the SVD decomposition by reducing the problem to an eigendecomposition on the array X -> dot(X.T, X). This method is usually very effective when the input matrix X is sparse or only the largest singular values are required. There are other SVD solvers that I did not consider, such as sparsesvd or pysparse.jdsym, but my points for the sparse solve probably hold for those packages too since they both implement iterative algorithms based on the same principles.

When the input matrix is dense and all the singular values are required, the first method is usually more efficient. To support this statement I've created a little benchmark: timings for both methods as a function of the size of the matrices. Notice that we are in a case that is clearly favorable to the linalg.svd: after all sparse.linalg.svds was not created with this setting in mind, it was created for sparse matrices or dense matrices with some special structure. We will see however that even in this setting it has interesting advantages.

I'll create random square matrices with different sizes and plot the timings for both methods. For the benchmarks I used SciPy v0.12 linked against Intel Math Kernel Library v11. Both methods are single-threaded (I had to set OMP_NUM_THREADS=1 so that MKL does not try to parallelize the computations). [code]

Lower timings are better, so this gives scipy.linalg.svd as clear winner. However, this is just part of the story. What this graph doesn't show is that this method is winning at the price of allocating a huge amount of memory for temporary computations. If we now plot the memory consumption for both methods under the same settings, the story is completely different. [code]

The memory requirements of scipy.linalg.svd scale with the number of dimensions, while for the sparse version the amount of allocated memory is constant. Notice that we are measuring the amount of total memory used, it is thus natural to see a slight increase in memory consumption since the input matrix is bigger on each iteration.

For example, in my applications, I need to compute the SVD of a matrix for whom the needed workspace does not fit in memory. In cases like this, the sparse algorithm (sparse.linalg.svds) can come in handy: the timing is just a factor worse (but I can easily parallelize jobs) and the memory requirements for this method is peanuts compared to the dense version.

1. Calculating the singular values and pseudo-inverse of a matrix, Golub, Gene H., Kahan, William, 1965, JSTOR

2. A Survey of Singular Value Decomposition Methods and Performance Comparison of Some Available Serial Codes, Plassman, Gerald E. 2005 PDF

December 03, 2012

MRocklin

Characteristic Functions

In a recent post, Characteristic Functions and scipy.stats, Josef Perktold created functions for the characteristic functions of the Normal (easy) and t (hard) distributions. The t-distribution is challenging because the solution involves special functions and has numerically challenging behavior around 0 for high degrees of freedom. Some quotes

The characteristic function for the normal distribution is easy, but looking at the characteristic function of the t-distribution, I wish someone had translated it into code already

Since I haven’t seen it yet, I sat down and tried it myself. I managed to code the characteristic function of the t-distribution, but it returns NaNs when it is evaluated close to zero for large df. I didn’t find a Bessel “k” function that works in this case

He then includes his code and discusses a particular application of the characteristic function which I won’t discuss here.

Symbolic Solution

Josef wished that this code had been written already. Characteristic functions aren’t built into SymPy but we may be able to derive them automatically.

Wikipedia says that the characteristic function $$\phi(t)$$ of a random variable X is defined as follows

$$\phi_X(t) = E(e^{itX})$$

It equal to the expectation of exp(i*t*X). Lets do this in SymPy

>>> from sympy.stats import *
>>> mu = Symbol('mu', bounded=True)
>>> sigma = Symbol('sigma', positive=True, bounded=True)
>>> t = Symbol('t', positive=True)

>>> X = Normal('X', mu, sigma)  # Normal random variable
>>> simplify(E(exp(I*t*X)))     # Expectation of exp(I*t*X)
2  2
σ ⋅t
ⅈ⋅μ⋅t - ─────
2
ℯ             

I was actually pretty surprised that this worked as smoothly as it did. SymPy stats wasn’t designed for this.

Here are some gists for the Cauchy and Student-T distributions. Cauchy simplifies down pretty well but the Student-T characteristic function has a few special functions included.

Behavior near zero

Josef says that the behavior of the characteristic function of the t distribution near zero for high degrees of freedom (the nu parameter) is challenging to compute numerically. Can SymPy handle this symbolically?

>>> nu = Symbol('nu', positive=True, integer=True)
>>> t = Symbol('t', positive=True)

>>> X = StudentT('X', nu)
>>> simplify(E(exp(I*t*X)))


The bold scripted I’s are besseli functions. We restrict this expression to a specific number of degrees of freedom, setting nu = 50

>>> simplify(E(exp(I*t*X))).subs(nu, 50)  # replace nu with 50
⎛         ___  ⎞            ⎛        ___  ⎞
∞⋅besseli⎝-25, 5⋅╲╱ 2 ⋅t⎠ - ∞⋅besseli⎝25, 5⋅╲╱ 2 ⋅t⎠

Whoops, simple substitution at that number of degrees of freedom fails, giving us infinities. What if we build the expression with 50 from the beginning? This gives the integration routines more information when they compute the expectation.

>>> X = StudentT('X', 50)
>>> simplify(E(exp(I*t*X)))
⎛              │     2  -ⅈ⋅π⎞           ⎛              │     2  ⅈ⋅π⎞
╭─╮3, 1 ⎜   1/2        │ 25⋅t ⋅ℯ    ⎟   ╭─╮3, 1 ⎜   1/2        │ 25⋅t ⋅ℯ   ⎟
│╶┐     ⎜              │ ───────────⎟ + │╶┐     ⎜              │ ──────────⎟
╰─╯1, 3 ⎝25, 0, 1/2    │      2     ⎠   ╰─╯1, 3 ⎝25, 0, 1/2    │     2     ⎠
────────────────────────────────────────────────────────────────────────────
1240896803466478878720000⋅π                         

The solution is in terms of Meijer-G functions. Can we evaluate this close to t = 0?

>>> simplify(E(exp(I*t*X))).subs(t, 1e-6).evalf()
0.999999999999479 + 1.56564942264937e-29⋅ⅈ

>>> simplify(E(exp(I*t*X))).subs(t, 1e-30).evalf()
1.0 - 1.2950748484704e-53⋅ⅈ

This is where scipy’s special functions failed in Josef’s post, yielding infinity instead of 1.

And finally we plot the behavior around 0.

>>> plot(re(simplify(E(exp(I*t*X)))), (t, 1e-7, 1e-1))

Closing Notes

The sympy.stats module was not designed with characteristic functions in mind. I was pleasantly surprised when I entered code almost identical to the mathematical definition

X = Normal('X', mu, sigma)
E(exp(I*t*X))

and received the correct answer. I am always happy when projects work on problems for which they were not originally designed. The fact that this works is largely due to SymPy’s excellent handling of integrals of special functions, due largely to Tom Bachmann.

If you do complex work on statistical functions I recommend you take a look at sympy.stats. It’s able to perform some interesting high level transformations. Thanks to recent work by Raoul Bourquin many of the common distributions found in scipy.stats are now available (with even more in a pull request).

November 29, 2012

Brian Granger

PyData NY

This past month, I attended PyData NY in midtown Manhattan. This is the second PyData conference and the folks at Continuum Analytics have done a spectacular job in promoting and running this conference. In less than a year PyData has grown to over 200 attendees.  The timing (right after Strata) and location (NY and Bay Area) of the PyData conferences are definitely a strong part of the success.

The conference had a wide range of talks from a fantastic set of speakers. It is very clear that the community is growing in leaps and bounds and that there are a lot of bright people writing great code.  It is always fun to think back on where the community and various projects were in the past.  Just think, it wasn’t that long ago, that Pandas didn’t exist!  Where will we be in another 3 years?

The representation of IPython at the conference was amazing.  Not only did I present two talks on the IPython Notebook and IPython.parallel, but many of the other presenters used the IPython Notebook in their talks.  Ultimately, in developing the IPython Notebook we are scratching our own itch.  We are working scientists who need better tools for our research.  But it is seriously gratifying to see that how useful other people find it.  That makes developing it one of the most enjoyable things I have ever done.

Continuum has put all of the talks on Vimeo here, but I wanted to point out a few related to IPython:

First, I gave a talk about the IPython Notebook.  With 45 minutes, I was able to give a talk that was almost tutorial in nature.  In preparing this talk, I went through the IPython example Notebooks and organized/updated all of them.  This should provide a much nicer set of materials for people wanting to learn about the Notebook.  We are currently in the process of reviewing the pull request for these updates.

Second, I gave a talk about IPython.parallel.  For this talk, I mostly followed a set of materials that Min Ragan-Kelley prepared for previous SciPy/PyCon tutorials.  Min has done a fantastic job with IPython.parallel and this set of materials really goes into great depth on all of the details.  This material is hosted here.

Third, Michael Selik gave a nice talk entitled IPython as a teaching tool, which explores using the IPython Notebook in a teaching context.  It is becoming clear that the Notebook is particularly useful in this context and it was very helpful to have someone reflect on how that was going for them.

Finally, Massimo di Stefano is doing some amazing things with the Notebook and GIS.  He is really leveraging the ability of the Notebook to embed arbitrary HTML.  He even has a “gearth” function that embeds Google Earth in an output cell!  Here is one of his Notebooks on nbviewer.ipython.org.

Unfortunately, many people left early because of hurricane Sandy, so the sprints were sparsely attended.  I was glad to catch my flight back to the west coast only hours before JFK closed.  Can’t wait for PyData in Silicon Valley after PyCon.

November 24, 2012

MRocklin

Computing the Kalman Filter

The Kalman Filter is an algorithm to update probability distributions with new observations made under noisy conditions. It is used in everything from smartphone GPS navigation systems to large scale climate simulations. It is implemented in hardware ranging from embedded devices to super-computers. It is important and widely used.

In this post I will

1. Show how to compute the Kalman Filter with BLAS/LAPACK
2. Relate this to old work on sympy.stats
3. Analyze the computation graph for some flaws and features

Results

The Kalman filter can be defined as a pair of matrix expressions

# kalman.py
from sympy.matrices.expressions import MatrixSymbol, Identity
from sympy import Symbol, Q

n, k    = Symbol('n'), Symbol('k')
mu      = MatrixSymbol('mu', n, 1)
Sigma   = MatrixSymbol('Sigma', n, n)
H       = MatrixSymbol('H', k, n)
R       = MatrixSymbol('R', k, k)
data    = MatrixSymbol('data', k, 1)
I       = Identity(n)

new_mu      = mu + Sigma*H.T * (R + H*Sigma*H.T).I * (H*mu - data)
new_Sigma   = (I - Sigma*H.T * (R + H*Sigma*H.T).I * H) * Sigma

assumptions = (Q.positive_definite(Sigma) & Q.symmetric(Sigma) &
Q.positive_definite(R) & Q.symmetric(R))


We convert these matrix expressions into a computation

# kalman_computation.py
from kalman import new_mu, new_Sigma, assumptions
from sympy.computations.matrices.compile import make_rule, patterns
from sympy.computations.core import Identity

ident = Identity(new_mu, new_Sigma)
rule = make_rule(patterns, assumptions)
mathcomp = next(rule(ident))
mathcomp.show()


This is a useful result.

History with stats

Two years ago I wrote two blogposts about the Kalman filter, one on the univariate case and one on the multivariate case. At the time I was working on sympy.stats, a module that enabled uncertainty modeling through the introduction of a random variables into the SymPy language.

SymPy stats was designed with atomicity in mind. It tried to be as small and thin a layer as possible.

• It turned queries on continuous random expressions into integral expressions.
• It turned queries on discrete random expressions into iterators and sums.
• It also turned queries on multivariate normal random expressions into matrix expressions.

The goal was to turn a specialized problem (uncertainty quantification) into a general one (integrals, sums, matrix expressions) under the assumption that tools are much richer to solve general problems than specific ones.

The first blogpost on the univariate kalman filter produced integral expressions that were then solved by SymPy’s integration routines. The second blogpost on the multivariate Kalman filter generated the following matrix expressions

$$\mu’ = \mu + \Sigma H^T \left( R + H \Sigma H^T \right )^{-1} \left(H\mu - \textrm{data} \right)$$ $$\Sigma’ = \left( \mathbb{I} - \Sigma H^T \left(R + H \Sigma H^T \right)^{-1} H \right) \Sigma$$

That blogpost finished with this result, claiming that the job of sympy.stats was finished and that any of the popular numerical linear algebra packages could pick up from that point.

These two equations are the Kalman filter and were our starting point for today. Matrix Expressions are an intermediate representation layer between sympy.stats and sympy.computations.matrices.

By composing these projects we compile high level statistical expressions of sympy.stats to intermediate level matrix expressions to the DAGs of sympy.computations and even down to low level Fortran code. If we add a traditional Fortran compiler we can build working binaries directly from sympy.stats.

Features and Flaws in the Kalman graph

Lets investigate the Kalman computation. It contains a few notable features.

First, unlike previous examples it has two outputs, new_mu and new_Sigma. These two have large common subexpressions like H*Sigma*H' + R. You can see that these were computed once and shared.

Second, lets appreciate that H*Sigma*H + R is identified as symmetric positive definite allowing the more efficient POSV routine. I’ve brought this up before in artificial examples. It’s nice to see that this occurs in practice.

Third, there is an unfortunately common motif. This instance was taken from the upper right of the image but the GE/SYMM -> AXPY motif occurs four times in this graph.

GEMM/SYMM are matrix multiply operations used to break down expressions like alpha*A*B. AXPY is a matrix addition used to break down expressions like alpha*A + B. They are both used properly here.

This motif is unforunate because GEMM is also capable of breaking down a larger expression like alpha*A*B + beta*C, doing a fused matrix multiply and add all in one pass. The AXPY would be unnecessary if the larger GE/SYMM patterns had matched correctly.

Canonicalization

The alpha*A*B + beta*C patterns did not match here for a silly reason, there wasn’t anything to match for the scalars alpha and beta. Instead the patterns were like A*B - C. One solution to this problem is to make more patterns with all possibilities. Alternatively we could change how we canonicalize MatMul objects so that they always have a scalar coefficient, even if it defaults to 1.

We don’t want to change MatMul to behave like this throughout all of SymPy though; the extra 1 is usually unwanted. This is a case where there are multiple correct definitions of canonical. Fortunately MatMul is written with this eventuality in mind.

November 23, 2012

MRocklin

Building Computations

In my last post I described a base type that represented a computation as a directed acyclic graph. In my post on preliminary results I showed how we could write Fortran code for a simple matrix expression. In this post I want to show how unificaiton, rewrite rules, and manipulations on computations can compile computations from fairly complex matrix expressions.

Inputs

Lets begin with a complex expression and a set of assumptions

    expr = (Y.I*Z*Y + b*Z*Y + c*W*W).I*Z*W
assumptions = Q.symmetric(Y) & Q.positive_definite(Y) & Q.symmetric(X)


We also specify a list of conditional rewrite patterns. A pattern has the following form

    Source:     alpha*A*B
Target:     SYMM(alpha, A, B, S.Zero, B)
Wilds:      alpha, A, B
Condition:  Q.symmetric(A) | Q.symmetric(B)


This means that we convert the expression alpha*A*B into the computation SYMM(alpha, A, B, S.Zero, B) (a SYmmetric Matrix Multiply) for any (alpha, A, B) when either A is symmetric or B is symmetric.

Thanks to unification rewrite patterns are easy to write. Someone who is familiar with BLAS/LAPACK but unfamiliar with compilers would be able to make these easily.

Expressions to Computations

Each pattern is turned into a function/rule that transforms an expression into a computation. We start with an identity computation

    expr = (Y.I*Z*Y + b*Z*Y + c*W*W).I*Z*W
assumptions = Q.symmetric(Y) & Q.positive_definite(Y) & Q.symmetric(X)
identcomp = Identity(expr)
identcomp.show()


Computations are able to print themselves in the DOT Language enabling simple visualization. Here we see a computation that produces the expression we want but its input is the same. We’d prefer one that had more atomic inputs like a, b, c, W, X, Y, Z

Our patterns know how to break down big expressions into smaller ones by adding the right computation (e.g alpha*A*B -> alpha, A, B via SYMM.) We convert each of our patterns into a rule. This rule looks at the inputs and, if it finds a matching expression adds on a new computation to break down that expression. We use branching strategies to orchestrate how all of these rules are applied. This is accomplished in the last line of the make_matrix_rule function

    def make_matrix_rule(patterns, assumptions):
rules = [expr_to_comp_rule(src, target, wilds, cond, assumptions)
for src, target, wilds, cond in patterns]
return exhaust(multiplex(*map(input_crunch, rules)))


This function combines logic (patterns/assumptions) with control (exhaust/multiplex/input_crunch) to create a complete algorithm. We apply this algorithm to our identity computation and pull off a compiled result

    rule = make_matrix_rule(patterns, assumptions)
comp = next(rule(identcomp))
comp.show()


We still have same output but now the input is broken down into smaller pieces by a set of computations. These computations are arranged in a graph based on their dependencies. We had to use a GESV, a POSV, two GEMMs a SYMM and two AXPYs to break down this computation. Our inputs are now a,b,c,W,X,Y,Z as desired.

rule(identcomp) iterates over all possible computations to compute this expression. If you are not satisfied with the computation above you may ask for another.

Inplace Computations

The BLAS/LAPACK routines are inplace; they write their results to the memory locations of some of their inputs. The above matheamtical graph doesn’t have the necessary information to think about this computational concern. We have a separate system to compile and optimize inplace computations.

    from sympy.computations.inplace import inplace_compile
icomp = inplace_compile(comp)
icomp.show()


Each variable is now of the form

Mathematical Expression @ memory location

We have introduced Copy operations into the graph where necessary to prevent dangerous overwrites. If you track the memory locations you can see which BLAS/LAPACK operations overwrite which variables. For example Z is never overwritten and so is never copied. On the other hand W is used in two overwrite operations and so it is copied to two new variables, W_2 and W_3. Copies are not added if obviously unnecessary.

Future Work

There are a couple of small items and one large one.

1. An expert in BLAS/LAPACK will note that there are some issues with my graphs; they are not yet ideal. I don’t handle IPIV permutation operations well (I need to add some new patterns), I am overwrie the INFO out parameter, and there are a few cases where a copy could be avoided by operation reordering.

2. I need to refactor my old Fortran generation code to work with the new inplace system.

3. The largest challenge is to build strategies for intelligent application of rewrite rules. Expressions are now large enough and the list of patterns is now long enough so that checking all possiblities is definitely infeasible. I need to think hard about traversals. Fortunately this problem is purely algorithmic and has no connection to BLAS, inplace computations, etc…. I should be able to think about it in isolation.

Closing Note

Except for the mathematical definition of BLAS none of this code is specific to generating matrix computations. The majority of this technology isn’t even specific to building computations. The computaitonal core of most of the technologies isn’t even dependent on SymPy. My final sympy.computations.matrices directory is small.

Throughout this project I’ve tried to keep all of the technology as general as possible in hopes that others will make use of it. Only a small fraction of my work has been specific to my application. I hope that others find this work interesting. I hope that this technology enables a variety of other unrelated projects.

November 21, 2012

MRocklin

Computations

How can we symbolically represent computations? In SymPy we normally represent expressions as trees. Each node in the graph is an expression; it depends on some operation like Add or Mul and a sequence of arguments/children.

While trees are a convenient data structure they are also very restrictive. Consider the following operation

This MinMax operation takes two inputs variables, x, and y, and produces two outputs Min(x, y), Max(x, y). Computationally you might prefer this over two separate trees because both outputs can be produced at once with a single comparison. This also supports natural grouping of common sub-expressions. If x and y were large trees we would not want two have copies of each in separate Min and Max trees.

Because the MinMax operation has two outputs we can no longer represent its graph with a single tree, we need a more general data structure. This graph can be described as a bipartite directed acyclic graph (BiDAG). It is bipartite because there are two types of nodes, variables (circles) and operations [boxes]. It is directed and acyclic by the dependence of data (e.g. if Min(x, y) depends on x then x can not depend on Min(x, y)).

A DAG is the next most restrictive graph subtype. In some sense this is the smallest generalization we can make.

Computation Type

Enter the Computation base type. This is an interface that must provide tuples of inputs and outputs instead of the standard args we use for trees.

We also add a CompositeComputation type which collects many computations together. Consider the collection of the following computations.

Note that A produces x which is used by MinMax. This computation has inputs (w, y) and outputs (Min(x, y), Max(x, y)). The data dependencies infer an ordering; the A computation must occur before the MinMax computation.

Internal Representation

My current implementation of CompositeComputation is represented internally as an immutable set of computations. Inter-computation interactions are inferred as needed by their variables. We provide methods to form an alternative dict-based data structure with fast access and traversal should performance become necessary.

All variables are assumed immutable and unique. The intention is that variables should be entirely defined by their mathematical meaning. The expectation is that the variables are SymPy expressions.

This approach has a focus on immutability and mathematical attributes rather than performance and computational attributes. For example it is impossible to represent a Copy operation within this framework because mathematical meanings of the input and output variable would be identical. Similarily inplace operations are not checkable in this framework.

Inplace

And yet copies and inplace operations are important parts of real computation. We make an explicit separation between mathematics-based optimizations and infrastructure-based optimizations (like inplace). We perform this transition by replacing each variable with a pair that contains a purely mathematical expression (left) and a purely computational variable (right).

In the example above we see the MinMax computation where the x and y expressions are stored in variables "x" and "y" and the outputs are stored in dummy variables "_1" and "_2". For performance reasons a computation may write the outputs back into the memory for the inputs as follows (note that the two outputs are stored in the same variables as the inputs.)

Inplace computations provide higher performance at the cost of memory safety. We must avoid situations like the following where the x variable may be overwritten (for example by B) before it is read (by C).

Motivation

I am working to translate matrix expressions (tree) into a computation (DAG) of BLAS/LAPACK operations. I do this through setting up and matching mathematical patterns like the following

alpha*X*Y + beta*Z -> GEMM(alpha, X, Y, beta, Z)

However the available operations (like GEMM) are inplace by default. These two goals of mathematical pattern matching and inplace computations are challenging to solve simultaneously for non-trivial expressions. My solution has been to consider the mathematical pattern matching problem first and then switch to ‘inplace mode’ and resolve the inplace issues separately.

Question

Should this be a part of SymPy?

November 17, 2012

Guru Devanla

My GSoC Experience

Recently, I gave a talk at UIC Friendly Friday session!

The slides give an overview about Google Summer of Code, some relevant links and what is in it for students.

Take a peek.

November 10, 2012

MRocklin

Preliminary BLAS Results

In the last few posts I’ve built up some independent technology.

1. BLAS and code generation - a logical description
2. Unification - advanced pattern matching
3. Strategies - programmatic control
4. Branching Strategies - control with multiple possibilities

In this post I’ll pull them all together for my first substantial results generating Fortran code to call BLAS/LAPACK. Lets go through a working example

We set up a problem that we’d like to solve. We want to compute $$(4 X X^{T} + 2 Z)^{-1} X$$ where $$X$$ is invertible and and $$Z$$ is symmetric positive definite.

>>> # Set up a mathematical problem to solve
>>> X = MatrixSymbol('X', n, n)
>>> Z = MatrixSymbol('Z', n, n)
>>> target = (4*X*X.T + 2*Z).I*X
>>> assumptions = Q.invertible(X) & Q.positive_definite(Z) & Q.symmetric(Z)


We have described a set of BLAS/LAPACK operations to perform certain transformations when the right conditions are met. Each BLAS operation is a single rewrite rule.

>>> from sympy.matrices.expressions.gen import rr_from_blas
>>> from sympy.matrices.expressions.blas import   GEMM, SYMM, TRMM, TRSV
>>> from sympy.matrices.expressions.lapack import POSV, GESV
>>> routines = (TRSV, POSV, GESV, TRMM, SYMM, GEMM)
>>> rules = [rr_from_blas(r, assumptions) for r in routines]


Each of these rules can convert one kind of expression into a computation given certain conditions. For example

SYMM:  alpha*A*B + beta*C -> SYMM(alpha, A, B, beta, C) if A or B is symmetric

We need to combine them to turn the large target expression into a set of atomic inputs. Some of the BLAS routines overlap so there are potentially many possibilities.

>>> from sympy.matrices.expressions.gen import top_down
>>> from sympy.rules.branch import multiplex

>>> rule = top_down(multiplex(*rules))

>>> computations = list(rule(target))
>>> len(computations)
2


We generate Fortran code from the first computation

>>> print computations[0].print_Fortran(str, assumptions)

subroutine f(X, Z, INFO, n)

real*8, intent(inout) :: X(n, n)
real*8, intent(inout) :: Z(n, n)
integer, intent(out) :: INFO
integer, intent(in) :: n

call dgemm('N', 'T', n, n, n, 4, X, n, X, n, 2, Z, n)
call dposv(U, n, n, Z, n, X, n, INFO)

RETURN
END


This solution first uses GEMM to multiply $$4X X^{T} + 2 Z$$. It then uses POSV to perform the solve $$(4X X^{T} + 2Z)^{-1} X$$. The POSV routine solves systems of the form $$A^{-1}B$$ where $$A$$ is symmetric positive definite. Internally we used a logical programming framework to infer that $$4X X^{T} + 2Z$$ is symmetric positive definite given the original mathematical assumptions.

>>> assumptions = Q.invertible(X) & Q.positive_definite(Z) & Q.symmetric(Z)
>>> expr = 4*X*X.T + 2*Z
>>> ask(Q.symmetric(expr) & Q.positive_definite(expr), assumptions)
True


This computation is in-place. GEMM stores its result in the argument Z. POSV uses Z and stores the output in X. Note that both X and Z have been declared with inout intents in the Fortran code.

This Fortran code is independent of Python or SymPy and can be used in any project. However, if we prefer the Python environment we can bring it back into the Python session with F2PY.

>>> f = computations[0].build(str, assumptions)
>>> f?
f - Function signature:
info = f(x,z,[n])
Required arguments:
x : in/output rank-2 array('d') with bounds (n,n)
z : in/output rank-2 array('d') with bounds (n,n)
Optional arguments:
n := shape(x,0) input int
Return objects:
info : int

This function accepts numpy arrays and so integrates well into the Python scientific computing stack.

Multiple Matches

There were two computations. What was the other?

>>> len(computations)
2
>>> print computations[1].print_Fortran(str, assumptions)

subroutine f(X, Z, INFO, IPIV, n)

real*8, intent(inout) :: X(n, n)
real*8, intent(inout) :: Z(n, n)
integer, intent(out) :: INFO
integer, intent(out) :: IPIV(n)
integer, intent(in) :: n

call dgemm('N', 'T', n, n, n, 4, X, n, X, n, 2, Z, n)
call dgesv(n, n, Z, n, IPIV, X, n, INFO)

RETURN
END


This solution uses the GESV routine for general matrices in place of the specialized POSV for symmetric positive definite matrices. Which is best? In this case POSV is likely faster because it is able to use faster algorithms due to the symmetric positive definite assumption. After looking at both possibilities we choose it.

For large matrix expressions the number of possible computations may stop us from inspecting all possible solutions. How can we ensure that the best solution is in the first few?

Code Separation

The definition of BLAS/LAPACK is separated from the pattern matching code and the branching control code. This allows me (or other people) to develop one without thinking about the other. It also allows for a declarative definition of BLAS and LAPACK routines. If anyone is interested I could use more routines than just the six used in this example.

This project requires the technology from the previous four posts. While all of that technology (strategies, unification, code generation) is necessary to this project none of it is specific to this project. All of the pieces are general, composable, and applicable to other ends. I hope that others are able to find some use for them.

Caveats

This code is still experimental. It is not yet merged into the SymPy master branch. The interface may change. Results are promising but there are stil big pieces missing before its ready for public use.

References

1. F2PY
2. D. Fabregat-Traver, P. Bientinesi, A Domain-Specific Comiler for Linear Algebra Operations, arXiv preprint arXiv:1205.5975 (2012).
3. Example code from this post
4. My development branch of SymPy

November 09, 2012

MRocklin

Branching Strategies

In my last post on strategies I introduced a set of higher order functions to represent common control patterns (like top_down). We combined these with transformation rules (like flatten) to create complex functions for tree manipulation (like flatten_tree)

rule     :: expr -> expr
strategy :: parameters, rule -> rule

In my post on unification we showed how to easily create rules from patterns. At the end of this post I described that because patterns might match in multiple ways one rule might produce many different results. To avoid combinatorial blowup in the number of possible matches we solved this by yielding matches lazily.

Transformation rules produced by unify don’t return values, they yield possible solutions lazily. How do we reconcile this with our previous notion of rules and strategies? We make a new set of strategies for branching rules.

branching-rule      :: expr -> {expr}
branching-strategy  :: parameters, branching-rule -> branching-rule

In sympy.rules.branch we have implemented lazy analogs for the strategies found in sympy.rules. This allows us to apply strategies to transformations like the sincos_to_one rule created in the unification post.

Toy Problem

Lets see branching strategies with a toy problem. Consider the following “function”

$$f(x) = \cases{ x+1 & \text{if } 5 < x < 10 \\ x-1 & \text{if } 0 < x < 5 \\ x+1 \text{ or } x-1 & \text{if } x = 5 \\ x & \text{otherwise} \\ }$$

And it’s equivalent in Python

def f(x):
if 0 < x < 5:   yield x - 1
if 5 < x < 10:  yield x + 1
if x == 5:      yield x + 1; yield x - 1


Notice that in the case where x = 5 there are two possible outcomes. Each of these is preserved by the application of branching strategies. We use the branching version of the exhaust strategy to make a new exhaustive version of this function

>>> from sympy.rules.branch import exhaust
>>> newf = exhaust(f)
>>> set(newf(6))
{10}
>>> set(newf(3))
{0}
>>> set(newf(5))
{0, 10}


Practical Problem

We have all the machinery necessary. Lets make a sin(x)**2 + cos(x)**2 -> 1 tree-wise simplification function.

>>> from sympy.rules.branch.traverse import top_down
>>> from sympy.unify.rewrite import rewriterule
>>> from sympy.abc import a, b, c, x, y

>>> sincos_to_one = rewriterule(sin(x)**2 + cos(x)**2, S.One, wilds=[x])
>>> sincos_tree = top_down(sincos_to_one)

>>> list(sincos_tree(2 + c**(sin(a+b)**2 + cos(a+b)**2)))  # see footnote
[c**1 + 2]


Lets make a rule to simplify expressions like c**1

>>> pow_simp = rewriterule(Pow(x, 1, evaluate=False), x, wilds=[x]) # footnote 2

>>> from sympy.rules.branch.strat_pure import multiplex, exhaust
>>> simplify = exhaust(top_down(multiplex(sincos_to_one, pow_simp)))

>>> list(simplify(2 + c**(sin(a+b)**2 + cos(a+b)**2)))
[c + 2]


We see how we can easiy build up powerful simplification functions through the separate description of logic

sin(x)**2 + cos(x)**2 -> 1
x ** 1 -> x

and control

simplify = exhaust(top_down(multiplex( ... )))

Footnote 1: At the time of this writing this line should actually be

map(rebuild, sincos_tree( ... )

The rebuild function is necessary because rules don’t play well with Exprs. Expr’s need to be constructed normally in order to function properly. In particular all expressions built by rules lack the is_commutative flag which is attached onto the object at construction time. I neglected to mention this above to simplify the discussion.

Footnote 2: This also requires a slight modification due to the Expr/rules mismatch. In particular the pattern Pow(x, 1, evaluate=False) unfortunately matches to just x because x == x**1 in SymPy.

November 07, 2012

MRocklin

Strategies

In my last post I showed how unification and rewrite rules allow us to express what we want without specifying how to compute it. As an example we were able to turn the mathematical identity sin(x)**2 + cos(x)**2 -> 1 into a function with relatively simple code

# Transformation : sin(x)**2 + cos(x)**2 -> 1
>>> sincos_to_one = rewriterule(sin(x)**2 + cos(x)**2, 1, wilds=[x])

>>> sincos_to_one(sin(a+b)**2 + cos(a+b)**2).next()
1


However we found that this function did not work deep within an expression tree

>>> list(sincos_to_one(2 + c**(sin(a+b)**2 + cos(a+b)**2))) # no matches
[]


sincos_to_one does not know how to traverse a tree. It is pure logic and has no knowledge of control. We define traverals separately using strategies.

Short version: we give you a higher order function, top_down which turns a expression-wise function into a tree-wise function. We provide a set of similar functions which can be composed to various effects.

A Toy Example

How do we express control programmatically?

Traditional control flow is represented with constructs like if, for, while, def, return, try, etc…. These terms direct the flow of what computation occurs when. Traditionally we mix control and logic. Consider the following toy problem that reduces a number until it reaches a multiple of ten

def reduce_to_ten(x):
""" Reduce a number to the next lowest multiple of ten

>>> reduce_ten(26)
20
"""
old = None
while(old != x):
if (x % 10 != 0):
x -= 1
return x


While the logic in this function is somewhat trivial

if (x % 10 != 0):
x -= 1


the control pattern is quite common in serious code

while(old != expr):
old = expr
expr = f(expr)
return expr


It is the “Exhaustively apply this function until there is no effect” control pattern. It occurs often in general programming and very often in the SymPy sourcecode. We separate this control pattern into a higher order function named exhaust

def exhaust(rule):
""" Apply a rule repeatedly until it has no effect """
def exhaustive_rl(expr):
old = None
while(expr != old):
expr, old = rule(expr), expr
return expr
return exhaustive_rl


We show how to use this function to achieve the previous result.

def dec_10(x):                          # Close to pure logic
if (x % 10 != 0):   return x - 1
else:               return x

reduce_to_ten = exhaust(dec_10)


By factoring out the control strategy we achieve several benefits

1. Code reuse of the while(old != new) control pattern
2. Exposure of logic - we can use the dec_10 function in other contexts more easily. This version is more extensible.
3. Programmability of control - the control pattern is now first class. We can manipulate and compose it as we would manipulate or compose a variable or function.

Example - Debug

When debugging code we often want to see the before and after effects of running a function. We often do something like the following

new = f(old)
if new != old:
print "Before: ", old
print "After:  ", new


This common structure is encapsulated in the debug strategy

def debug(rule):
""" Print out before and after expressions each time rule is used """
def debug_rl(expr):
result = rule(expr)
if result != expr:
print "Rule: ", rule.func_name
print "In:   ", expr
print "Out:  ", result
return result
return debug_rl


Because control is separated we can inject this easily into our function

>>> reduce_to_ten = exhaust(debug(dec_10))

>>> reduce_to_ten(23)
Rule:  dec_10
In:    23
Out:   22
Rule:  dec_10
In:    22
Out:   21
Rule:  dec_10
In:    21
Out:   20
20


Traversals

Finally we show off the use of a tree traversal strategy which applies a function at each node in an expression tree. Here we use the Basic type to denote a tree of generic nodes.

def top_down(rule):
""" Apply a rule down a tree running it on the top nodes first """
def top_down_rl(expr):
newexpr = rule(expr)
if is_leaf(newexpr):
return newexpr
return new(type(newexpr), *map(top_down_rl, newexpr.args))
return top_down_rl

>>> reduce_to_ten_tree = top_down(exhaust(tryit(dec_10)))

>>> expr = Basic(23, 35, Basic(10, 13), Basic(Basic(5)))
>>> reduce_to_ten_tree(expr)
Basic(20, 30, Basic(10, 10), Basic(Basic(0)))


Use in Practice

We have rewritten the canonicalization code in the Matrix Expression module to use these strategies. There are a number of small functions to represent atomic logical transformations of expressions. We call these rules. Rules are functions from expressions to expressions

rule :: expr -> expr

And there are a number of strategies like exhaust and top_down which transform rules and parameters into larger rules

strategy :: parameters, rule -> rule

For example there are general rules like flatten that simplify nested expressions like

Add(1, 2, Add(3, 4)) -> Add(1, 2, 3, 4)

def flatten(expr):
""" Flatten T(a, b, T(c, d), T2(e)) to T(a, b, c, d, T2(e)) """
cls = expr.__class__
args = []
for arg in expr.args:
if arg.__class__ == cls:
args.extend(arg.args)
else:
args.append(arg)
return new(expr.__class__, *args)


We compose these general rules (e.g. ‘flatten’, ‘unpack’, ‘sort’, ‘glom’) with strategies to create large canonicalization functions

rules = (rm_identity(lambda x: x == 0 or isinstance(x, ZeroMatrix)),
unpack,
flatten,
glom(matrix_of, factor_of, combine),
sort(str))

canonicalize = exhaust(top_down(typed({MatAdd: do_one(*rules)})))


Going Farther

We use strategies to build large rules out of small rules. Can we build large strategies out of small strategies? The canonicalize function above follows a common pattern “Apply a set of rules down a tree, repeat until they have no effect.” This is built into the canon strategy.

def canon(*rules):
""" Strategy for canonicalization """
return exhaust(top_down(do_one(*rules)))


Previous Work

This implementation of strategies was inspired by the work in the language StrategoXT. Stratego is a language for control that takes these ideas much farther and implements them more cleanly. It is a language where control structure are the primitives that can be built up, composed, and compiled down. It is a language in which ideas like “breadth first search” and “dynamic programming” are natural expressions.

References

1. Ralf Lämmel , Eelco Visser , Joost Visser, The Essence of Strategic Programming, 2002
2. Eelco Visser, Program Transformation with Stratego/XT

Guru Devanla

Emacs script to add function headers in python mode

I have an usual python-mode setup with python-mode.el and all packages that help set up a good IDE environment for Python in Emacs. But, I did not find any scripts/functions that would help me generate a doc string for methods. The script I have shared below should do the trick. This is the initial script and I hope to add to this script as I use it. The script tries to follows the markup syntax that is supported by reStructuredText

Please feel free comment on the git repo if you find any issues or have any improvements

 ;; add the following lines to .emacs file. ;; When a function header needs to be generated, place the cursor ;; on the same line as function definition and call the function ;; generate-header. Alternatively, a key-binding can be added ;; using (global-set-key) or adding this function to the python-mode-hook

 (defun get-function-definition(sentence) (if (string-match "def.*(.*):" sentence) (match-string 0 sentence)) ) (defun get-parameters(sentence) (setq y (get-function-definition sentence)) (if y (if (string-match "(.*)" y) (match-string 0 y))) ) 

(require 'thingatpt) (defun generate-header() (interactive) (setq p (get-parameters (thing-at-point 'sentence))) (forward-line 1) (insert "\t\"\"\"\n\n\n") (setq params (split-string p "[?\,?$$?$$?\ ]")) (while params (if (/= (length (chomp (car params))) 0) (progn (insert "\t:param ") (insert (chomp (car params))) (insert ": \n"))) (setq params (cdr params))) (insert "\n\t\"\"\"\n\n") ) 

Git

November 01, 2012

MRocklin

Unification in SymPy

Unification is a way to ask questions by matching expressions against patterns. It is a powerful form of pattern matching found in logical programming languages like Prolog, Maude, and Datalog. It is the computational backbone behind the logical programming paradigm and is now a part of SymPy (in a pull request).

Consider the following example. Imagine that you want to find the name of the MatrixSymbol within the Transpose in the following expression (i.e. we’re looking for the string 'X')

    >>> X = MatrixSymbol('X', 3, 3)
>>> Y = MatrixSymbol('Y', 3, 3)
>>> expr = Transpose(X) + Y


Traditionally we could solve this toy problem with a simple function

    def name_of_symbol_in_transpose_in_add(matadd):
for arg in matadd.args:
if isinstance(arg, Transpose) and isinstance(arg.arg, MatrixSymbol):
return arg.arg.name


We solve this task with unification by setting up a pattern and then unifying that pattern against a target expression

    >>> A = MatrixSymbol('name', n, m)
>>> B = MatrixSymbol('B', m, n)
# Look for an expression tree like A.T + B
# Treat the leaves 'name', n, m, B as Wilds
>>> pattern = patternify(Transpose(A) + B, 'name', n, m, B)

>>> unify(pattern, expr).next()
{'name': 'X', m: 3, n: 3, B: Y}


We get back a matching for each of the wildcards (name, n, m, B) and see that 'name' was matched to the string 'X'. Is this better or worse than the straight Python solution? Given the relative number of users between Python and Prolog it’s a safe bet that the style of Python programs have some significant advantages over the logical programming paradigm. Why would we program in this strange way?

Unification allows a clean separation between what we’re looking for and how we find it. In the Python solution the mathematical definition of what we want is spread among a few lines and is buried inside of control flow.

    for arg in matadd.args:
if isinstance(arg, Transpose) and isinstance(arg.arg, MatrixSymbol):
return arg.arg.name


In the unification solution the line

    pattern = patternify(Transpose(A) + B, 'name', n, m, B)


expresses exactly what we’re looking for and gives no information on how it should be found. The how is wrapped up in the call to unify

    unify(pattern, expr).next()


This separation of the what and how is what excites me about declarative programming. I think that this separation is useful when mathematical and algorithmic programmers need to work together to solve a large problem. This is often the case in scientific computing. Mathematical programmers think about what should be done while algorithmic programmers think about how it can be efficiently computed. Declarative techniques like unification enables these two groups to work independently.

Multiple Matches

Lets see how unify works on a slightly more interesting expression

    >>> expr = Transpose(X) + Transpose(Y)
>>> unify(pattern, expr)
<generator object unify at 0x548cb90>


In this situation because both matrices X and Y are inside transposes our pattern to match “the name of a symbol in a transpose” could equally well return the strings 'X' or 'Y'. The unification algorithm will give us both of these options

    >>> for match in unify(pattern, expr):
...    print match
{'name': 'Y', m: 3, n: 3, B: X'}
{'name': 'X', m: 3, n: 3, B: Y'}


Because expr is commutative we can match {A: Transpose(X), B: Transpose(Y)} or {A: Transpose(Y), B: Transpose(X)} with equal validity. Instead of choosing one unify, returns an iterable of all possible matches.

Combinatorial Blowup

In how many ways can we match the following pattern

w + x + y + z

to the following expression?

a + b + c + d + e + f

This is a variant on the standard “N balls in K bins” problem often given in a discrete math course. The answer is “quite a few.” How can we avoid this combinatorial blowup?

unify produces matches lazily. It returns a Python generator which yields results only as you ask for them. You can ask for just one match (a common case) very quickly.

The bigger answer is that if you aren’t satisfied with this and want a better/stronger/faster way to find your desired match you could always rewrite unify. The unify function is all about the how and is disconnected from the what. Algorithmic programmers can tweak unify without disrupting the mathematical code.

Rewrites

Unification is commonly used in term rewriting systems. Here is an example

    >>> pattern_source = patternify(sin(x)**2 + cos(x)**2, x)
>>> pattern_target = 1
>>> sincos_to_one = rewriterule(pattern_source, pattern_target)

>>> sincos_to_one(sin(a+b)**2 + cos(a+b)**2).next()
1


We were able to turn a mathematical identity sin(x)**2 + cos(x)**2 => 1 into a function very simply using unification. However unification only does exact pattern matching so we can only find the sin(x)**2 + cos(x)**2 pattern if that pattern is at the top node in the tree. As a result we’re not able to apply this simplification within a larger expression tree

    >>> list(sincos_to_one(2 + c**(sin(a+b)**2 + cos(a+b)**2))) # no matches
[]


I will leave the solution of this problem to a future post. Instead, I want to describe why I’m working on all of this.

Matrix Computations

My last post was about translating Matrix Expressions into high-performance Fortran Code. I ended this post with the following problem:

So how can we transform a matrix expression like

    (alpha*A*B).I * x


Into a graph of BLAS calls like one of the following?

    DGEMM(alpha, A, B, 0, B) -> DTRSV(alpha*A*B, x)
DTRMM(alpha, A, B)       -> DTRSV(alpha*A*B, x)


This problem can be partially solved by unification and rewrite rules. Each BLAS operation is described by a class

class MM(BLAS):
""" Matrix Multiply """
_inputs   = (alpha, A, B, beta, C)
_outputs  = (alpha*A*B + beta*C,)


The _outputs and _inputs fields mathematically define when MM is appropriate. This is all we need to make a transformation

    pattern_source = patternify(MM._outputs[0], *MM._inputs)
pattern_target = MM(*MM._inputs)
rewriterule(pattern_source, pattern_target)


Unification allows us to describe BLAS mathematically without thinking about how each individual operation will be detected in an expression. The control flow and the math are completely separated allowing us to think hard about each problem in isolation.

References

I learned a great deal from the following sources

October 29, 2012

Gilbert Gede

Google Summer of Code, Mentor Summit 2012

I probably should have written a blog post during the summer, displaying Angadh’s work . . . oh well.

SymPy sent me as one of 3 mentors to the GSoC Mentor Summit this year, at the Googleplex. I was very excited to go, meet some of the other SymPy developers (Matthew and Stefan), and meet others in the open source community. My overall experience was a little mixed though.

Matthew covered some feelings/conclusions I also share in his write-up on the event, but I have some others.

Meeting and interacting with people in the open source community was probably the highlight of the trip for me. It was interesting to talk to a few people who had taken projects that were originally academic/research code and translated them into more successful (open source) products. I’m not sure if it will be relevant to SymPy’s future, but it might be to mine. A common thread seemed to be: jumpstart a project with grants, keep improving it for long enough with that money, and then put it in a position where it can join an umbrella organization or can be used by professionals who will pay for support. I certainly don’t speak for any other SymPy developers, but I’ve never got the feeling that this was the intended trajectory of the project (at least the selling support part).

There was a talk on how to structure student/mentor/organization interactions for future GSoC projects. Some of the organizations have a much more defined structure than SymPy though, and benefit from things like daily group meetings. SymPy seems to have a more distributed organizational structure though – there are a lot of different modules, that all have some independence from each other. Despite this, SymPy’s code base is of very high quality, with credit going to the review-process/reviewers and the high standards that are enforced.

The talk on forming non-profit organizations was also interesting. One major takeaway was that the IRS would rather you put your project under an existing umbrella, rather than grant you 501(c) status, due to the potential for abuse. Also, a lot of work is involved in managing money properly once a certain amount of cash flowing through. Although there are hurdles, getting to form a board of directors sounded interesting. Scheduling board meetings could also be fun (with the money being spent responsibly, of course…).

On the less positive side of things: the unconference format. I think that it could have worked a lot better. Again, Matthew touched on this in his post, but all of the sessions (that were not just presentations) were very unfocused, with some more productive than others. The lack of moderation was a serious impediment to keeping the discussions on track. There were definitely a few times were people would hijack a session to try and talk about or show off their project. While I don’t have a problem with people showing off their work, there was limited time for each session.

The time limitations on the sessions was another issue. Each one was ~45 minutes, and almost always there was another session waiting for you to leave the room so they could start. I think in a few of the talks I went to, people would have been happy to stay in the room and continue to discuss the subject. Perhaps the expectations, of the session initiators (and myself), on what we would accomplish were too high. Perhaps the 45 minutes should have been spent networking with other people thinking about that topic, and then spawning a more detailed discussion in the future? Again, I think moderation would have helped this.

There were also some talks, with interesting sounding titles, that were just unproductive. There was too much recounting of what one organization did, with little generalization to what others could use. There was also little consideration of what decisions were made which led to important decisions; e.g. the fact that a project was split into parts A and B which were developed separately was recounted, but not what went into making that decision. Perhaps another example of expectations being too high…

I’m certainly not going to write off the unconference format – I think it could have led to some really cool things. But I don’t think I will attend another one that has sessions which are so short and under-moderated.

MRocklin

Matrix Computations in SymPy

I want to translate matrix expressions like this

    (alpha*A*B).I * x


Into Fortran code that call BLAS and LAPACK code like this

    subroutine f(alpha, A, B, x, n)

real*8,  intent(in)     :: A(n, n)
real*8,  intent(inout)  :: B(n, n)
real*8,  intent(in)     :: alpha
integer, intent(in)     :: n
real*8,  intent(inout)  :: x(n, 1)

call dgemm('N', 'N', n, n, n, alpha, A, n, B, n, 0, B, n)
call dtrsv('L', 'N', 'N', n, B, n, x, 1)

RETURN
END


And then call it in Python like this

    nA, nB, nx = .... # Get numpy arrays
f(nalpha, nA, nB.T, nx.T))


What is BLAS?

BLAS stands for Basic Linear Algebra Subroutines. It is a library of Fortran functions for dense linear algebra first published in 1979.

The most famous BLAS routine is DGEMM a routine for Double precision GEnerally structured Matrix Multiplication. DGEMM is very well implemented. DGEMM traditionally handles blocking for fewer cache misses, autotuning for each individual architecture, and even assembly level code optimization. You should never code up your own matrix multiply, you should always use DGEMM. Unfortunately, you may not know Fortran, and, even if you did, you might find the function header to be daunting.

SUBROUTINE DGEMM(TRANSA,TRANSB,M,N,K,ALPHA,A,LDA,B,LDB,BETA,C,LDC)

Even if you’re capable of working at this low-level most scientific users are not. DGEMM is fast but inaccessible. To solve this problem we usually build layers on top of BLAS. For example numpy.dot calls DGEMM if the BLAS library is available on your system.

Why not just use NumPy?

If you’re reading this then you’re probably comfortable with NumPy and you’re very happy that it gives you access to highly optimized low-level code like DGEMM. What else could we desire? NumPy has two flaws

1. Each operation occurs at the Python level. This causes sub-optimal operation ordering and lots of unnecessary copies. For example the following code is executed as follows

D = A*B*C # store A*B  -> _1
D = _1*C  # store _1*C -> _2
D = _2    # store _2   ->  D



It might have been cleaner to multiply A*B*C as (A*B)*C or A*(B*C) depending on the shapes of the matrices. Additionally the temporary matrices _1, and _2 did not need to be created. If we’re allowed to reason about the computation before execution then we can make some substantial optimizaitons.

2. BLAS contains many special functions for special cases. For example you can use DSYMM when one of your matrices is SYmetric or DTRMM when one of your matrices is TRiangular. These allow for faster execution time if we are able to reason about our matrices.

Previous Work

In the cases above we argue that we can make substantial gains if we are allowed to reason about the computation before it is executed. This is the job of a compiler. Computation usually happens as follows:

1. Write down code
2. Reason about and transform code
3. Execute code

Step (2) is often removed in scripting languages for programmer simplicity. There has been a lot of activity recently in putting it back in for array computations. The following projects compile array expressions prior to execution

1. NumExpr
2. Theano
3. Numba
4. … I’m undoubtedly forgetting many excellent projects. Here is a more complete list

Where does SymPy fit in?

The projects above are all numerical in nature. They are generally good at solving problems of the first kind (operation ordering, inplace operations, …) but none of them think very clearly about the mathematical properties of the matrices. This is where SymPy can be useful. Using the assumptions logical programming framework SymPy is able to reason about the properties of matrix expressions. Consider the following situation

We know that A is symmetric and positive definite. We know that B is orthogonal.

Question: is BAB' symmetric and positive definite?

Lets see how we can pose this question in SymPy.

    >>> A = MatrixSymbol('A', n, n)
>>> B = MatrixSymbol('B', n, n)
>>> context = Q.symmetric(A) & Q.positive_definite(A) & Q.orthogonal(B)
>>> ask(Q.symmetric(B*A*B.T) & Q.positive_definite(B*A*B.T), context)
True


Positive-Definiteness is a very important property of matrix expressions. It strongly influences our choice of numerical algorithm. For example the fast Cholesky algorithm for LU decomposition may only be used if a matrix is symmetric and positive definite. Expert numerical analysts know this but most scientific programmers do not. NumPy does not know this but SymPy does.

Describing BLAS

We describe a new matrix operation in SymPy with code like the following:

    S = MatrixSymbol('S', n, n)
class LU(BLAS):
""" LU Decomposition """
_inputs   = (S,)
_outputs  = (Lof(S), Uof(S))
view_map  = {0: 0, 1: 0} # Both outputs are stored in first input
condition = True         # Always valid

class Cholesky(LU):
""" Cholesky LU Decomposition """
condition = Q.symmetric(S) & Q.positive_definite(S)


This description allows us to consisely describe the expert knowledge used by numerical analysts. It allows us to describe the mathematical properties of linear algebraic operations.

Matrix Computation Graphs

We usually write code in a linear top-down text file. This representation does not allow the full generality of a program. Instead we need to use a graph.

A computation can be described as a directed acyclic graph (DAG) where each node in the graph is an atomic computation (a function call like DGEMM or Cholesky) and each directed edge represents a data dependency between function calls (an edge from DGEMM to Cholesky implies that the Cholesky requires an output of the DGEMM call in order to run). This graph may not contain cycles - they would imply that some set of jobs all depend on each other; they could never start.

Graphs must be eventually linearized and turned into code. Before that happens we can think about optimal ordering and, if we feel adventurous, parallel scheduling onto different machines.

SymPy contains a very simple Computation graph object. Here we localize all of the logic about inplace operations, ordering, and (eventually) parallel scheduling.

Translating Matrix Expressions into Matrix Computations

So how can we transform a matrix expression like

    (alpha*A*B).I * x


And a set of predicates like

    Q.lower_triangular(A) & Q.lower_triangular(B) & Q.invertible(A*B)


Into a graph of BLAS calls like one of the following?

    DGEMM(alpha, A, B, 0, B) -> DTRSV(alpha*A*B, x)
DTRMM(alpha, A, B)       -> DTRSV(alpha*A*B, x)


And, once we have this set of valid computations how do we choose the right one? This is the question that this project faces right now. These are both challenging problems.

October 24, 2012

Stefan Krastanov

Sage vs SymPy – integration

During the recent GSoC summit I had the chance to participate in many fascinating discussions. One such occasion was while meeting the Sage representative.

A detail he mentioned, was that during his tests SymPy frequently failed to solve integrals that Sage (using Maxima) was able to solve. An explanation, in which I like to believe, would be that he was testing an old version of SymPy lacking the new integration routines implemented during the last few GSoC projects. Hence my decision the compare the most recent versions of both projects.

The tested versions are SymPy 0.7.2 and Sage 5.3.

Depending on screen size and wordpress theme the table might be badly formatted so here is a link to the wiki html page and a pdf version.

It should be noted that Sage is more rigorous about the assumptions on its symbols and so it fails to integrate something like $x^n$ if $n$ is not explicitly different than -1. I personally think that this is a feature and not a bug. Due to this difference however the script used to test Sage differs from the one used for SymPy.

Another minor methodological difference in the tests is the fact that the timeout pattern that I used failed to work for the Sage interpreter. Hence, SymPy integration timeouts at about 120 seconds while Sage integration is manually interrupted when it takes too much time.

Final methodological difference is that I purge the SymPy cache between each integral as otherwise the RAM usage becomes too great.

The results show that SymPy is slightly better in using special functions to solve integrals, but there are also a few integrals that Sage solves while SymPy fails to do so. On few occasions Sage fails disgracefully, meaning  that it returns an error instead of unevaluated integral. When both packages fail to evaluate the integral SymPy is much slower to say so (timeout for SymPy compared to 1 or 2 seconds for Sage to return an unevaluated integral). Finally, on some occasions the results by Sage seem better simplified.

Integrals solved better by SymPy (if you consider special functions “better”):

• $\frac{1}{a x^n + 1}$ with the use of a special function while Sage returns unevaluated integrals
• $\frac{a x^n}{b x^m + 1}$ with the use of a special function while Sage returns unevaluated integrals
• $\frac{a x^n + 1}{b x^m + 1}$ with the use of a special function while Sage returns unevaluated integrals
• $\frac{a x^5 + x^3 + 1}{b x^5 + x^3 + 1}$ with the use of a special function while Sage returns unevaluated integrals

Integrals solved better by Sage:

• $\frac{a x^2}{b x^2 + 1}$ solved by both but Sage’s result is simpler (it uses arctan instead of log)
• $\frac{1}{\sqrt{x^2 + 1}}$ SymPy fails this simple integral
• $\log\left(\frac{a x^3}{b x^3 + 1}\right)$ solved by both but Sage’s result is much simpler
• $\log\left(\frac{a x^2 + 1}{b x^2 + 1}\right)$ SymPy fails
• $\log\left(\frac{a x^3 + 1}{b x^3 + 1}\right)$ SymPy fails
• $\frac{1}{\sin x + 1}$ SymPy fails
• $\frac{a \sin^2 x + 1}{b \sin^2 x + 1}$ SymPy fails

The code for the SymPy tests:

import signal
from time import clock
from sympy import *
from sympy.core.cache import clear_cache

class TimeoutException(Exception):
pass

def timeout_handler(signum, frame):
raise TimeoutException()

a, b, x = symbols('a b x')
n, m = symbols('n m', integer=True)

integrants = [
x,
a*x**n,
a*x**n + 1,
a*x**b + 1,

1/x,
1/(x + 1),
1/(x**2 + 1),
1/(x**3 + 1),
1/(a*x**n),
1/(a*x**n + 1),
1/(a*x**b + 1),

a*x**2/(b*x**2 + 1),
a*x**3/(b*x**3 + 1),
a*x**n/(b*x**m + 1),
(a*x**2 + 1)/(b*x**2 + 1),
(a*x**3 + 1)/(b*x**3 + 1),
(a*x**n + 1)/(b*x**m + 1),
(a*x**5 + x**3 + 1)/(b*x**5 + x**3 + 1),

sqrt(1/x),
sqrt(1/(x + 1)),
sqrt(1/(x**2 + 1)),
sqrt(1/(x**3 + 1)),
sqrt(1/(a*x**n)),
sqrt(1/(a*x**n + 1)),
sqrt(1/(a*x**b + 1)),
sqrt(a*x**2/(b*x**2 + 1)),
sqrt(a*x**3/(b*x**3 + 1)),
sqrt(a*x**n/(b*x**m + 1)),
sqrt((a*x**2 + 1)/(b*x**2 + 1)),
sqrt((a*x**3 + 1)/(b*x**3 + 1)),
sqrt((a*x**n + 1)/(b*x**m + 1)),
sqrt((a*x**5 + x**3 + 1)/(b*x**5 + x**3 + 1)),

log(x),
log(1/x),
log(1/(x + 1)),
log(1/(x**2 + 1)),
log(1/(x**3 + 1)),
log(1/(a*x**n)),
log(1/(a*x**n + 1)),
log(1/(a*x**b + 1)),
log(a*x**2/(b*x**2 + 1)),
log(a*x**3/(b*x**3 + 1)),
log(a*x**n/(b*x**m + 1)),
log((a*x**2 + 1)/(b*x**2 + 1)),
log((a*x**3 + 1)/(b*x**3 + 1)),
log((a*x**n + 1)/(b*x**m + 1)),
log((a*x**5 + x**3 + 1)/(b*x**5 + x**3 + 1)),

sin(x),
sin(x)**n*cos(x)**m,
sin(a*x)**n*cos(b*x)**m,
1/sin(x),
1/(sin(x) + 1),
1/(sin(x)**2 + 1),
1/(sin(x)**3 + 1),
1/(a*sin(x)**n),
1/(a*sin(x)**n + 1),
1/(a*sin(x)**b + 1),
a*sin(x)**2/(b*sin(x)**2 + 1),
a*sin(x)**3/(b*sin(x)**3 + 1),
a*sin(x)**n/(b*sin(x)**m + 1),
(a*sin(x)**2 + 1)/(b*sin(x)**2 + 1),
(a*sin(x)**3 + 1)/(b*sin(x)**3 + 1),
(a*sin(x)**n + 1)/(b*sin(x)**m + 1),
(a*sin(x)**5 + sin(x)**3 + 1)/(b*sin(x)**5 + sin(x)**3 + 1),
]

integrated = []
durations = []

f_integrants = open('dump_integrants', 'w')
f_integrated = open('dump_integrated', 'w')
f_durations = open('dump_duration', 'w')

for index, integrant in enumerate(integrants):
clear_cache()
print '====================================='
print index, ' of ', len(integrants)
print '###', integrant
start = clock()
try:
old_handler = signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(120)
integrated.append(integrate(integrant, x))
signal.alarm(0)
except TimeoutException:
integrated.append(TimeoutException)
finally:
signal.signal(signal.SIGALRM, old_handler)
durations.append(clock() - start)
print '###', integrated[-1]
print 'in %f seconds'%durations[-1]

f_integrants.write(str(integrant))
f_integrated.write(str(integrated[-1]))
f_durations.write(str(durations[-1]))
f_integrants.write('\n')
f_integrated.write('\n')
f_durations.write('\n')


And for Sage:

import signal
from time import clock
from sage.all import *
from sage.symbolic.integration.integral import indefinite_integral

class TimeoutException(Exception):
pass

def timeout_handler(signum, frame):
raise TimeoutException()

a, b, x = var('a b x')
n, m = var('n m')
assume(n, 'integer')
assume(m, 'integer')

assume(n != 1)
assume(n != -1)
assume(n != 2)
assume(n>0)
assume(b != 1)
assume(b != -1)
assume(b>0)
assume(a>0)

integrants = [
x,
a*x**n,
a*x**n + 1,
a*x**b + 1,

1/x,
1/(x + 1),
1/(x**2 + 1),
1/(x**3 + 1),
1/(a*x**n),
1/(a*x**n + 1),
1/(a*x**b + 1),

a*x**2/(b*x**2 + 1),
a*x**3/(b*x**3 + 1),
a*x**n/(b*x**m + 1),
(a*x**2 + 1)/(b*x**2 + 1),
(a*x**3 + 1)/(b*x**3 + 1),
(a*x**n + 1)/(b*x**m + 1),
(a*x**5 + x**3 + 1)/(b*x**5 + x**3 + 1),

sqrt(1/x),
sqrt(1/(x + 1)),
sqrt(1/(x**2 + 1)),
sqrt(1/(x**3 + 1)),
sqrt(1/(a*x**n)),
sqrt(1/(a*x**n + 1)),
sqrt(1/(a*x**b + 1)),
sqrt(a*x**2/(b*x**2 + 1)),
sqrt(a*x**3/(b*x**3 + 1)),
sqrt(a*x**n/(b*x**m + 1)),
sqrt((a*x**2 + 1)/(b*x**2 + 1)),
sqrt((a*x**3 + 1)/(b*x**3 + 1)),
sqrt((a*x**n + 1)/(b*x**m + 1)),
sqrt((a*x**5 + x**3 + 1)/(b*x**5 + x**3 + 1)),

log(x),
log(1/x),
log(1/(x + 1)),
log(1/(x**2 + 1)),
log(1/(x**3 + 1)),
log(1/(a*x**n)),
log(1/(a*x**n + 1)),
log(1/(a*x**b + 1)),
log(a*x**2/(b*x**2 + 1)),
log(a*x**3/(b*x**3 + 1)),
log(a*x**n/(b*x**m + 1)),
log((a*x**2 + 1)/(b*x**2 + 1)),
log((a*x**3 + 1)/(b*x**3 + 1)),
log((a*x**n + 1)/(b*x**m + 1)),
log((a*x**5 + x**3 + 1)/(b*x**5 + x**3 + 1)),

sin(x),
sin(x)**n*cos(x)**m,
sin(a*x)**n*cos(b*x)**m,
1/sin(x),
1/(sin(x) + 1),
1/(sin(x)**2 + 1),
1/(sin(x)**3 + 1),
1/(a*sin(x)**n),
1/(a*sin(x)**n + 1),
1/(a*sin(x)**b + 1),
a*sin(x)**2/(b*sin(x)**2 + 1),
a*sin(x)**3/(b*sin(x)**3 + 1),
a*sin(x)**n/(b*sin(x)**m + 1),
(a*sin(x)**2 + 1)/(b*sin(x)**2 + 1),
(a*sin(x)**3 + 1)/(b*sin(x)**3 + 1),
(a*sin(x)**n + 1)/(b*sin(x)**m + 1),
(a*sin(x)**5 + sin(x)**3 + 1)/(b*sin(x)**5 + sin(x)**3 + 1),
]

integrated = []
durations = []

f_integrants = open('dump_integrants', 'w')
f_integrated = open('dump_integrated', 'w')
f_durations = open('dump_duration', 'w')

for index, integrant in enumerate(integrants):
print '====================================='
print index, ' of ', len(integrants)
print '###', integrant
start = clock()
try:
old_handler = signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(120)
integrated.append(indefinite_integral(integrant, x))
signal.alarm(0)
except Exception, e:
integrated.append(e)
finally:
signal.signal(signal.SIGALRM, old_handler)
durations.append(clock() - start)
print '###', integrated[-1]
print 'in %f seconds'%durations[-1]

f_integrants.write(str(integrant))
f_integrated.write(str(integrated[-1]))
f_durations.write(str(durations[-1]))
f_integrants.write('\n')
f_integrated.write('\n')
f_durations.write('\n')


Below is the complete table (available as a wiki html page and a pdf).

 Legend: Disgraceful failure Timeout or manual interupt Return an unevalued integral Asking for assumptions Solution with fancy special functions Integrant Sympy integrate Sage indefinite_integral Comments Result cpu time Result cpu time x x**2/2 0 1/2*x^2 0 a*x**n a*x**(n + 1)/(n + 1) 0 a*x^(n + 1)/(n + 1) 0 Sage asks whether the denominator is zero before solving. a*x**n + 1 a*x**(n + 1)/(n + 1) + x 0 a*x^(n + 1)/(n + 1) + x 0 Sage asks whether the denominator is zero before solving. a*x**b + 1 a*x**(b + 1)/(b + 1) + x 0 a*x^(b + 1)/(b + 1) + x 0 Sage asks whether the denominator is zero before solving. 1/x log(x) 0 log(x) 0 1/(x + 1) log(x + 1) 0 log(x + 1) 0 1/(x**2 + 1) atan(x) 0 arctan(x) 0 1/(x**3 + 1) log(x + 1)/3 – log(x**2 – x + 1)/6 + sqrt(3)*atan(2*sqrt(3)*x/3 – sqrt(3)/3)/3 0 1/3*sqrt(3)*arctan(1/3*(2*x – 1)*sqrt(3)) + 1/3*log(x + 1) – 1/6*log(x^2 – x + 1) 0 x**(-n)/a x**(-n + 1)/(a*(-n + 1)) 0 -x^(-n + 1)/((n – 1)*a) 0 Sage asks whether the denominator is zero before solving. 1/(a*x**n + 1) x*gamma(1/n)*lerchphi(a*x**n*exp_polar(I*pi), 1, 1/n)/(n**2*gamma(1 + 1/n)) 2 integrate(1/(x^n*a + 1), x) 1 Sympy solves with special functions an integral that Sage cannot solve. 1/(a*x**b + 1) x*gamma(1/b)*lerchphi(a*x**b*exp_polar(I*pi), 1, 1/b)/(b**2*gamma(1 + 1/b)) 1 integrate(1/(x^b*a + 1), x) 1 Sympy solves with special functions an integral that Sage cannot solve. a*x**2/(b*x**2 + 1) a*(sqrt(-1/b**3)*log(-b*sqrt(-1/b**3) + x)/2 – sqrt(-1/b**3)*log(b*sqrt(-1/b**3) + x)/2 + x/b) 0 (x/b – arctan(sqrt(b)*x)/b^(3/2))*a 0 a*x**3/(b*x**3 + 1) a*(RootSum(_t**3 + 1/(27*b**4), Lambda(_t, _t*log(-3*_t*b + x))) + x/b) 0 -1/6*(2*sqrt(3)*arctan(1/3*(2*b^(2/3)*x – b^(1/3))*sqrt(3)/b^(1/3))/b^(4/3) – 6*x/b – log(b^(2/3)*x^2 – b^(1/3)*x + 1)/b^(4/3) + 2*log((b^(1/3)*x + 1)/b^(1/3))/b^(4/3))*a 0 Interesting examples deserving more study as Sympy uses the sum of the roots of a high order polynomial while Sage uses elementary special functions. a*x**n/(b*x**m + 1) a*(n*x*x**n*gamma(n/m + 1/m)*lerchphi(b*x**m*exp_polar(I*pi), 1, n/m + 1/m)/(m**2*gamma(1 + n/m + 1/m)) + x*x**n*gamma(n/m + 1/m)*lerchphi(b*x**m*exp_polar(I*pi), 1, n/m + 1/m)/(m**2*gamma(1 + n/m + 1/m))) 3 (m*integrate(x^n/((m – n – 1)*b^2*x^(2*m) + 2*(m – n – 1)*x^m*b + m – n – 1), x) – x^(n + 1)/((m – n – 1)*x^m*b + m – n – 1))*a 0 Sympy solves with special functions an integral that Sage cannot solve. (a*x**2 + 1)/(b*x**2 + 1) a*x/b + sqrt((-a**2 + 2*a*b – b**2)/b**3)*log(-b*sqrt((-a**2 + 2*a*b – b**2)/b**3)/(a – b) + x)/2 – sqrt((-a**2 + 2*a*b – b**2)/b**3)*log(b*sqrt((-a**2 + 2*a*b – b**2)/b**3)/(a – b) + x)/2 0 a*x/b – (a – b)*arctan(sqrt(b)*x)/b^(3/2) 0 Sage symplifies better (log-to-trig formulas). (a*x**3 + 1)/(b*x**3 + 1) a*x/b + RootSum(_t**3 + (a**3 – 3*a**2*b + 3*a*b**2 – b**3)/(27*b**4), Lambda(_t, _t*log(-3*_t*b/(a – b) + x))) 0 a*x/b – 1/3*(a*b – b^2)*sqrt(3)*arctan(1/3*(2*b^(2/3)*x – b^(1/3))*sqrt(3)/b^(1/3))/b^(7/3) + 1/6*(a*b^(2/3) – b^(5/3))*log(b^(2/3)*x^2 – b^(1/3)*x + 1)/b^2 – 1/3*(a*b^(2/3) – b^(5/3))*log((b^(1/3)*x + 1)/b^(1/3))/b^2 0 Interesting examples deserving more study as Sympy uses the sum of the roots of a high order polynomial while Sage uses elementary special functions. (a*x**n + 1)/(b*x**m + 1) a*(n*x*x**n*gamma(n/m + 1/m)*lerchphi(b*x**m*exp_polar(I*pi), 1, n/m + 1/m)/(m**2*gamma(1 + n/m + 1/m)) + x*x**n*gamma(n/m + 1/m)*lerchphi(b*x**m*exp_polar(I*pi), 1, n/m + 1/m)/(m**2*gamma(1 + n/m + 1/m))) + x*gamma(1/m)*lerchphi(b*x**m*exp_polar(I*pi), 1, 1/m)/(m**2*gamma(1 + 1/m)) 5 a*m*integrate(x^n/((m – n – 1)*b^2*x^(2*m) + 2*(m – n – 1)*x^m*b + m – n – 1), x) – a*x^(n + 1)/((m – n – 1)*x^m*b + m – n – 1) + integrate(1/(x^m*b + 1), x) 1 Sympy solves with special functions an integral that Sage cannot solve. (a*x**5 + x**3 + 1)/(b*x**5 + x**3 + 1) a*x/b + RootSum(_t**5 + _t**3*(500*a**2*b**3 + 27*a**2 – 1000*a*b**4 – 54*a*b + 500*b**5 + 27*b**2)/(3125*b**6 + 108*b**3) + _t**2*(27*a**3 – 81*a**2*b + 81*a*b**2 – 27*b**3)/(3125*b**6 + 108*b**3) + _t*(9*a**4 – 36*a**3*b + 54*a**2*b**2 – 36*a*b**3 + 9*b**4)/(3125*b**6 + 108*b**3) + (a**5 – 5*a**4*b + 10*a**3*b**2 – 10*a**2*b**3 + 5*a*b**4 – b**5)/(3125*b**6 + 108*b**3), Lambda(_t, _t*log(x + (3662109375*_t**4*b**12 + 3986718750*_t**4*b**9 + 242757000*_t**4*b**6 + 3779136*_t**4*b**3 – 1054687500*_t**3*a*b**9 – 72900000*_t**3*a*b**6 – 1259712*_t**3*a*b**3 + 1054687500*_t**3*b**10 + 72900000*_t**3*b**7 + 1259712*_t**3*b**4 + 410156250*_t**2*a**2*b**9 + 655340625*_t**2*a**2*b**6 + 51267654*_t**2*a**2*b**3 + 944784*_t**2*a**2 – 820312500*_t**2*a*b**10 – 1310681250*_t**2*a*b**7 – 102535308*_t**2*a*b**4 – 1889568*_t**2*a*b + 410156250*_t**2*b**11 + 655340625*_t**2*b**8 + 51267654*_t**2*b**5 + 944784*_t**2*b**2 – 48828125*_t*a**3*b**9 – 186046875*_t*a**3*b**6 + 16774290*_t*a**3*b**3 + 629856*_t*a**3 + 146484375*_t*a**2*b**10 + 558140625*_t*a**2*b**7 – 50322870*_t*a**2*b**4 – 1889568*_t*a**2*b – 146484375*_t*a*b**11 – 558140625*_t*a*b**8 + 50322870*_t*a*b**5 + 1889568*_t*a*b**2 + 48828125*_t*b**12 + 186046875*_t*b**9 – 16774290*_t*b**6 – 629856*_t*b**3 – 2812500*a**4*b**6 + 3596400*a**4*b**3 + 104976*a**4 + 11250000*a**3*b**7 – 14385600*a**3*b**4 – 419904*a**3*b – 16875000*a**2*b**8 + 21578400*a**2*b**5 + 629856*a**2*b**2 + 11250000*a*b**9 – 14385600*a*b**6 – 419904*a*b**3 – 2812500*b**10 + 3596400*b**7 + 104976*b**4)/(9765625*a**4*b**8 + 26493750*a**4*b**5 + 746496*a**4*b**2 – 39062500*a**3*b**9 – 105975000*a**3*b**6 – 2985984*a**3*b**3 + 58593750*a**2*b**10 + 158962500*a**2*b**7 + 4478976*a**2*b**4 – 39062500*a*b**11 – 105975000*a*b**8 – 2985984*a*b**5 + 9765625*b**12 + 26493750*b**9 + 746496*b**6)))) 106 -(a – b)*integrate((x^3 + 1)/(b*x^5 + x^3 + 1), x)/b + a*x/b 0 Sympy solves with special functions an integral that Sage cannot solve. sqrt(1/x) 2*x*sqrt(1/x) 0 2*x*sqrt(1/x) 0 sqrt(1/(x + 1)) 2*x*sqrt(1/(x + 1)) + 2*sqrt(1/(x + 1)) 0 2/sqrt(1/(x + 1)) 0 sqrt(1/(x**2 + 1)) Integral(sqrt(1/(x**2 + 1)), x) 0 arcsinh(x) 0 Sympy cannot solve this simple integral while Sage can. sqrt(1/(x**3 + 1)) Integral(sqrt(1/(x**3 + 1)), x) 3 integrate(sqrt(1/(x^3 + 1)), x) 0 sqrt(x**(-n)/a) -2*x*sqrt(1/a)*sqrt(x**(-n))/(n – 2) 0 -2*x*sqrt(x^(-n)/a)/(n-2) 0 Sage asks whether the denominator is zero before solving. sqrt(1/(a*x**n + 1)) Integral(sqrt(1/(a*x**n + 1)), x) 29 integrate(sqrt(1/(x^n*a + 1)), x) 0 When both Sage and Sympy fail, Sage is quicker. sqrt(1/(a*x**b + 1)) Integral(sqrt(1/(a*x**b + 1)), x) 35 integrate(sqrt(1/(x^b*a + 1)), x) 1 When both Sage and Sympy fail, Sage is quicker. sqrt(a*x**2/(b*x**2 + 1)) sqrt(a)*x*sqrt(x**2)*sqrt(1/(b*x**2 + 1)) + sqrt(a)*sqrt(x**2)*sqrt(1/(b*x**2 + 1))/(b*x) 2 (sqrt(a)*b*x^2 + sqrt(a))/(sqrt(b*x^2 + 1)*b) 0 sqrt(a*x**3/(b*x**3 + 1)) Integral(sqrt(a*x**3/(b*x**3 + 1)), x) 7 integrate(sqrt(a*x^3/(b*x^3 + 1)), x) 0 sqrt(a*x**n/(b*x**m + 1)) Timeout 115 integrate(sqrt(x^n*a/(x^m*b + 1)), x) 1 When both Sage and Sympy fail, Sage is quicker. sqrt((a*x**2 + 1)/(b*x**2 + 1)) Timeout 110 integrate(sqrt((a*x^2 + 1)/(b*x^2 + 1)), x) 0 When both Sage and Sympy fail, Sage is quicker. sqrt((a*x**3 + 1)/(b*x**3 + 1)) Timeout 109 integrate(sqrt((a*x^3 + 1)/(b*x^3 + 1)), x) 0 When both Sage and Sympy fail, Sage is quicker. sqrt((a*x**n + 1)/(b*x**m + 1)) Timeout 114 integrate(sqrt((x^n*a + 1)/(x^m*b + 1)), x) 1 When both Sage and Sympy fail, Sage is quicker. sqrt((a*x**5 + x**3 + 1)/(b*x**5 + x**3 + 1)) Timeout 104 integrate(sqrt((a*x^5 + x^3 + 1)/(b*x^5 + x^3 + 1)), x) 0 When both Sage and Sympy fail, Sage is quicker. log(x) x*log(x) – x 0 x*log(x) – x 0 log(1/x) -x*log(x) + x 0 -x*log(x) + x 0 log(1/(x + 1)) -x*log(x + 1) + x – log(x + 1) 0 -(x + 1)*log(x + 1) + x + 1 0 log(1/(x**2 + 1)) -x*log(x**2 + 1) + 2*x – 2*I*log(x + I) + I*log(x**2 + 1) 2 -x*log(x^2 + 1) + 2*x – 2*arctan(x) 0 log(1/(x**3 + 1)) -x*log(x**3 + 1) + 3*x – 3*log(x + 1)/2 + sqrt(3)*I*log(x + 1)/2 + log(x**3 + 1)/2 – sqrt(3)*I*log(x**3 + 1)/2 + sqrt(3)*I*log(x – 1/2 – sqrt(3)*I/2) 6 -x*log(x^3 + 1) – sqrt(3)*arctan(1/3*(2*x – 1)*sqrt(3)) + 3*x – log(x + 1) + 1/2*log(x^2 – x + 1) 0 log(x**(-n)/a) -n*x*log(x) + n*x – x*log(a) 0 n*x + x*log(x^(-n)/a) 0 log(1/(a*x**n + 1)) Integral(log(1/(a*x**n + 1)), x) 68 n*x – n*integrate(1/(a*e^(n*log(x)) + 1), x) – x*log(x^n*a + 1) 1 When both Sage and Sympy fail, Sage is quicker. log(1/(a*x**b + 1)) Timeout 91 b*x – b*integrate(1/(a*e^(b*log(x)) + 1), x) – x*log(a*e^(b*log(x)) + 1) 2 When both Sage and Sympy fail, Sage is quicker. log(a*x**2/(b*x**2 + 1)) x*log(a) + 2*x*log(x) – x*log(b*x**2 + 1) + 2*I*log(x – I*sqrt(1/b))/(b*sqrt(1/b)) – I*log(b*x**2 + 1)/(b*sqrt(1/b)) 10 x*log(a*x^2/(b*x^2 + 1)) – 2*arctan(sqrt(b)*x)/sqrt(b) 0 log(a*x**3/(b*x**3 + 1)) -216*b**4*x**6*(1/b)**(7/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 216*(-1)**(2/3)*b**4*x**6*(1/b)**(7/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 72*(-1)**(1/6)*sqrt(3)*b**4*x**6*(1/b)**(7/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 72*sqrt(3)*I*b**4*x**6*(1/b)**(7/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 588*b**2*x**5*(1/b)**(2/3)*log(a)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 1764*b**2*x**5*(1/b)**(2/3)*log(x)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 588*b**2*x**5*(1/b)**(2/3)*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 630*b**2*x**5*(1/b)**(2/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 420*(-1)**(5/6)*sqrt(3)*b**2*x**5*(1/b)**(2/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 210*sqrt(3)*I*b**2*x**5*(1/b)**(2/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 981*b**2*x**3*(1/b)**(4/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 981*(-1)**(2/3)*b**2*x**3*(1/b)**(4/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 327*(-1)**(1/6)*sqrt(3)*b**2*x**3*(1/b)**(4/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 327*sqrt(3)*I*b**2*x**3*(1/b)**(4/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 135*b**2*(1/b)**(7/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 135*(-1)**(2/3)*b**2*(1/b)**(7/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 45*(-1)**(1/6)*sqrt(3)*b**2*(1/b)**(7/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 45*sqrt(3)*I*b**2*(1/b)**(7/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 147*(-1)**(2/3)*b*x**4*log(a)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 49*(-1)**(1/6)*sqrt(3)*b*x**4*log(a)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 49*(-1)**(5/6)*sqrt(3)*b*x**4*log(a)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 147*(-1)**(1/3)*b*x**4*log(a)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 882*(-1)**(2/3)*b*x**4*log(a)*log(x)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 294*(-1)**(1/6)*sqrt(3)*b*x**4*log(a)*log(x)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 294*(-1)**(5/6)*sqrt(3)*b*x**4*log(a)*log(x)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 882*(-1)**(1/3)*b*x**4*log(a)*log(x)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 294*(-1)**(1/3)*b*x**4*log(a)*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 98*(-1)**(5/6)*sqrt(3)*b*x**4*log(a)*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 98*(-1)**(1/6)*sqrt(3)*b*x**4*log(a)*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 294*(-1)**(2/3)*b*x**4*log(a)*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 588*(-1)**(2/3)*b*x**4*log(a)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 1323*(-1)**(2/3)*b*x**4*log(x)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 441*(-1)**(1/6)*sqrt(3)*b*x**4*log(x)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 441*(-1)**(5/6)*sqrt(3)*b*x**4*log(x)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 1323*(-1)**(1/3)*b*x**4*log(x)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 882*(-1)**(1/3)*b*x**4*log(x)*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 294*(-1)**(5/6)*sqrt(3)*b*x**4*log(x)*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 294*(-1)**(1/6)*sqrt(3)*b*x**4*log(x)*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 882*(-1)**(2/3)*b*x**4*log(x)*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 882*(-1)**(1/3)*b*x**4*log(x)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 294*(-1)**(5/6)*sqrt(3)*b*x**4*log(x)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 294*(-1)**(1/6)*sqrt(3)*b*x**4*log(x)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 882*(-1)**(2/3)*b*x**4*log(x)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 294*(-1)**(5/6)*sqrt(3)*b*x**4*log(x – (-1)**(1/3)*(1/b)**(1/3))/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 882*(-1)**(1/3)*b*x**4*log(x – (-1)**(1/3)*(1/b)**(1/3))/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 147*(-1)**(2/3)*b*x**4*log(b*x**3 + 1)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 49*(-1)**(1/6)*sqrt(3)*b*x**4*log(b*x**3 + 1)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 49*(-1)**(5/6)*sqrt(3)*b*x**4*log(b*x**3 + 1)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 147*(-1)**(1/3)*b*x**4*log(b*x**3 + 1)**2/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 588*(-1)**(2/3)*b*x**4*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 294*(-1)**(1/6)*sqrt(3)*b*x**4*log(x – (-1)**(1/3)*sqrt(3)*I*(1/b)**(1/3)/2 + (-1)**(1/3)*(1/b)**(1/3)/2)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 882*(-1)**(2/3)*b*x**4*log(x – (-1)**(1/3)*sqrt(3)*I*(1/b)**(1/3)/2 + (-1)**(1/3)*(1/b)**(1/3)/2)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 294*(-1)**(2/3)*b*x**4/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 98*(-1)**(1/6)*sqrt(3)*b*x**4/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 98*(-1)**(5/6)*sqrt(3)*b*x**4/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 294*(-1)**(1/3)*b*x**4/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 588*b*x**2*(1/b)**(2/3)*log(a)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 1764*b*x**2*(1/b)**(2/3)*log(x)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 588*b*x**2*(1/b)**(2/3)*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 945*b*x**2*(1/b)**(2/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 630*(-1)**(5/6)*sqrt(3)*b*x**2*(1/b)**(2/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 315*sqrt(3)*I*b*x**2*(1/b)**(2/3)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 98*(-1)**(1/6)*sqrt(3)*x*log(a)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 98*(-1)**(5/6)*sqrt(3)*x*log(a)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 294*(-1)**(2/3)*x*log(a)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 294*(-1)**(1/3)*x*log(a)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 294*(-1)**(5/6)*sqrt(3)*x*log(x – (-1)**(1/3)*(1/b)**(1/3))/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 882*(-1)**(1/3)*x*log(x – (-1)**(1/3)*(1/b)**(1/3))/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 294*(-1)**(1/3)*x*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 294*(-1)**(2/3)*x*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 98*(-1)**(5/6)*sqrt(3)*x*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 98*(-1)**(1/6)*sqrt(3)*x*log(b*x**3 + 1)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) – 294*(-1)**(1/6)*sqrt(3)*x*log(x – (-1)**(1/3)*sqrt(3)*I*(1/b)**(1/3)/2 + (-1)**(1/3)*(1/b)**(1/3)/2)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) + 882*(-1)**(2/3)*x*log(x – (-1)**(1/3)*sqrt(3)*I*(1/b)**(1/3)/2 + (-1)**(1/3)*(1/b)**(1/3)/2)/(588*b**2*x**4*(1/b)**(2/3) + 588*b*x*(1/b)**(2/3)) 26 x*log(a*x^3/(b*x^3 + 1)) – 1/2*(2*sqrt(3)*a*arctan(1/3*(2*b^(2/3)*x – b^(1/3))*sqrt(3)/b^(1/3))/b^(1/3) – a*log(b^(2/3)*x^2 – b^(1/3)*x + 1)/b^(1/3) + 2*a*log((b^(1/3)*x + 1)/b^(1/3))/b^(1/3))/a 0 Sage simplifies better. log(a*x**n/(b*x**m + 1)) Timeout 96 (m – n + log(a))*x – m*integrate(1/(b*e^(m*log(x)) + 1), x) – x*log(x^m*b + 1) + x*log(x^n) 2 When both Sage and Sympy fail, Sage is quicker. log((a*x**2 + 1)/(b*x**2 + 1)) Integral(log((a*x**2 + 1)/(b*x**2 + 1)), x) 72 x*log((a*x^2 + 1)/(b*x^2 + 1)) + 2*arctan(sqrt(a)*x)/sqrt(a) – 2*arctan(sqrt(b)*x)/sqrt(b) 0 Sage asks whether a and b are positive and then returns an answer. Sympy fails irrespective of the assumptions. log((a*x**3 + 1)/(b*x**3 + 1)) Timeout 89 x*log((a*x^3 + 1)/(b*x^3 + 1)) + sqrt(3)*arctan(1/3*(2*a^(2/3)*x – a^(1/3))*sqrt(3)/a^(1/3))/a^(1/3) – sqrt(3)*arctan(1/3*(2*b^(2/3)*x – b^(1/3))*sqrt(3)/b^(1/3))/b^(1/3) – 1/2*log(a^(2/3)*x^2 – a^(1/3)*x + 1)/a^(1/3) + log((a^(1/3)*x + 1)/a^(1/3))/a^(1/3) + 1/2*log(b^(2/3)*x^2 – b^(1/3)*x + 1)/b^(1/3) – log((b^(1/3)*x + 1)/b^(1/3))/b^(1/3) 0 Sage asks whether a and b are positive and then returns an answer. Sympy fails irrespective of the assumptions. log((a*x**n + 1)/(b*x**m + 1)) Timeout 89 (m – n)*x – m*integrate(1/(b*e^(m*log(x)) + 1), x) + n*integrate(1/(x^n*a + 1), x) – x*log(x^m*b + 1) + x*log(x^n*a + 1) 4 When both Sage and Sympy fail, Sage is quicker. log((a*x**5 + x**3 + 1)/(b*x**5 + x**3 + 1)) Integral(log((a*x**5 + x**3 + 1)/(b*x**5 + x**3 + 1)), x) 42 -x*log(b*x^5 + x^3 + 1) + x*log(a*x^5 + x^3 + 1) – integrate((2*x^3 + 5)/(b*x^5 + x^3 + 1), x) + integrate((2*x^3 + 5)/(a*x^5 + x^3 + 1), x) 1 When both Sage and Sympy fail, Sage is quicker. sin(x) -cos(x) 0 -cos(x) 0 sin(x)**n*cos(x)**m Timeout 102 No result 110 Disgraceful failure by Sage. sin(a*x)**n*cos(b*x)**m Timeout 81 No result 112 Disgraceful failure by Sage. 1/sin(x) log(cos(x) – 1)/2 – log(cos(x) + 1)/2 0 1/2*log(cos(x) – 1) – 1/2*log(cos(x) + 1) 0 1/(sin(x) + 1) -2/(tan(x/2) + 1) 1 -2/(sin(x)/(cos(x) + 1) + 1) 0 1/(sin(x)**2 + 1) Timeout 96 1/2*sqrt(2)*arctan(sqrt(2)*tan(x)) 0 Sage simply beats Sympy. 1/(sin(x)**3 + 1) Timeout 87 Maxima: quotient’ by zero’ 78 Disgraceful failure by Sage. sin(x)**(-n)/a Integral(sin(x)**(-n)/a, x) 36 No result 227 Disgraceful failure by Sage. 1/(a*sin(x)**n + 1) Timeout 98 Maxima: expt: undefined: 0 to a negative exponent. 1 Disgraceful failure by Sage. 1/(a*sin(x)**b + 1) Timeout 83 No result 140 Disgraceful failure by Sage. a*sin(x)**2/(b*sin(x)**2 + 1) Timeout 93 (x/b – arctan(sqrt(b + 1)*tan(x))/(sqrt(b + 1)*b))*a 0 Sage simply beats Sympy. a*sin(x)**3/(b*sin(x)**3 + 1) Timeout 82 No result 568 Disgraceful failure by Sage. a*sin(x)**n/(b*sin(x)**m + 1) Integral(a*sin(x)**n/(b*sin(x)**m + 1), x) 24 Manual Interupt 1527 Both Sage and Sympy fail, however Sympy is quicker. (a*sin(x)**2 + 1)/(b*sin(x)**2 + 1) Timeout 98 a*x/b – (a – b)*arctan(sqrt(b + 1)*tan(x))/(sqrt(b + 1)*b) 0 Sage simply beats Sympy. (a*sin(x)**3 + 1)/(b*sin(x)**3 + 1) Timeout 96 Manual Interupt 203 (a*sin(x)**n + 1)/(b*sin(x)**m + 1) Timeout 83 Maxima: expt: undefined: 0 to a negative exponent. 1 Disgraceful failure by Sage. (a*sin(x)**5 + sin(x)**3 + 1)/(b*sin(x)**5 + sin(x)**3 + 1) Timeout 89 Manual Interupt 142

October 22, 2012

Fabian Pedregosa

Learning to rank with scikit-learn: the pairwise transform

This tutorial introduces the concept of pairwise preference used in most ranking problems. I'll use scikit-learn and for learning and matplotlib for visualization.

In the ranking setting, training data consists of lists of items with some order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item, so that for any two samples a and b, either a < b, b > a or b and a are not comparable.

For example, in the case of a search engine, our dataset consists of results that belong to different queries and we would like to only compare the relevance for results coming from the same query.

This order relation is usually domain-specific. For instance, in information retrieval the set of comparable samples is referred to as a "query id". The goal behind this is to compare only documents that belong to the same query (Joachims 2002). In medical imaging on the other hand, the order of the labels usually depend on the subject so the comparable samples is given by the different subjects in the study (Pedregosa et al 2012).

import itertools
import numpy as np
from scipy import stats
import pylab as pl
from sklearn import svm, linear_model, cross_validation


To start with, we'll create a dataset in which the target values consists of three graded measurements Y = {0, 1, 2} and the input data is a collection of 30 samples, each one with two features.

The set of comparable elements (queries in information retrieval) will consist of two equally sized blocks, $X = X_1 \cup X_2$, where each block is generated using a normal distribution with different mean and covariance. In the pictures, we represent $X_1$ with round markers and $X_2$ with triangular markers.

np.random.seed(0)
theta = np.deg2rad(60)
w = np.array([np.sin(theta), np.cos(theta)])
K = 20
X = np.random.randn(K, 2)
y = [0] * K
for i in range(1, 3):
X = np.concatenate((X, np.random.randn(K, 2) + i * 4 * w))
y = np.concatenate((y, [i] * K))

# slightly displace data corresponding to our second partition
X[::2] -= np.array([3, 7])
blocks = np.array([0, 1] * (X.shape[0] / 2))

# split into train and test set
cv = cross_validation.StratifiedShuffleSplit(y, test_size=.5)
train, test = iter(cv).next()
X_train, y_train, b_train = X[train], y[train], blocks[train]
X_test, y_test, b_test = X[test], y[test], blocks[test]

# plot the result
idx = (b_train == 0)
pl.scatter(X_train[idx, 0], X_train[idx, 1], c=y_train[idx],
marker='^', cmap=pl.cm.Blues, s=100)
pl.scatter(X_train[~idx, 0], X_train[~idx, 1], c=y_train[~idx],
marker='o', cmap=pl.cm.Blues, s=100)
pl.arrow(0, 0, 8 * w[0], 8 * w[1], fc='gray', ec='gray',
head_width=0.5, head_length=0.5)
pl.text(0, 1, '$w$', fontsize=20)
pl.arrow(-3, -8, 8 * w[0], 8 * w[1], fc='gray', ec='gray',
head_width=0.5, head_length=0.5)
pl.text(-2.6, -7, '$w$', fontsize=20)
pl.axis('equal')
pl.show()


In the plot we clearly see that for both blocks there's a common vector w such that the projection onto w gives a list with the correct ordering.

However, because linear considers that output labels live in a metric space it will consider that all pairs are comparable. Thus if we fit this model to the problem above it will fit both blocks at the same time, yielding a result that is clearly not optimal. In the following plot we estimate $\hat{w}$ using an l2-regularized linear model.

ridge = linear_model.Ridge(1.)
ridge.fit(X_train, y_train)
coef = ridge.coef_ / linalg.norm(ridge.coef_)
pl.scatter(X_train[idx, 0], X_train[idx, 1], c=y_train[idx],
marker='^', cmap=pl.cm.Blues, s=100)
pl.scatter(X_train[~idx, 0], X_train[~idx, 1], c=y_train[~idx],
marker='o', cmap=pl.cm.Blues, s=100)
pl.arrow(0, 0, 7 * coef[0], 7 * coef[1], fc='gray', ec='gray',
head_width=0.5, head_length=0.5)
pl.text(2, 0, '$\hat{w}$', fontsize=20)
pl.axis('equal')
pl.title('Estimation by Ridge regression')
pl.show()


To assess the quality of our model we need to define a ranking score. Since we are interesting in a model that orders the data, it is natural to look for a metric that compares the ordering of our model to the given ordering. For this, we use Kendall's tau correlation coefficient, which is defined as (P - Q)/(P + Q), being P the number of concordant pairs and Q is the number of discordant pairs. This measure is used extensively in the ranking literature (e.g Optimizing Search Engines using Clickthrough Data).

We thus evaluate this metric on the test set for each block separately.

for i in range(2):
tau, _ = stats.kendalltau(
ridge.predict(X_test[b_test == i]), y_test[b_test == i])
print('Kendall correlation coefficient for block %s: %.5f' % (i, tau))

Kendall correlation coefficient for block 0: 0.71122
Kendall correlation coefficient for block 1: 0.84387


The pairwise transform

As proved in (Herbrich 1999), if we consider linear ranking functions, the ranking problem can be transformed into a two-class classification problem. For this, we form the difference of all comparable elements such that our data is transformed into $(x'_k, y'_k) = (x_i - x_j, sign(y_i - y_j))$ for all comparable pairs.

This way we transformed our ranking problem into a two-class classification problem. The following plot shows this transformed dataset, and color reflects the difference in labels, and our task is to separate positive samples from negative ones. The hyperplane {x^T w = 0} separates these two classes.

# form all pairwise combinations
comb = itertools.combinations(range(X_train.shape[0]), 2)
k = 0
Xp, yp, diff = [], [], []
for (i, j) in comb:
if y_train[i] == y_train[j] \
or blocks[train][i] != blocks[train][j]:
# skip if same target or different group
continue
Xp.append(X_train[i] - X_train[j])
diff.append(y_train[i] - y_train[j])
yp.append(np.sign(diff[-1]))
# output balanced classes
if yp[-1] != (-1) ** k:
yp[-1] *= -1
Xp[-1] *= -1
diff[-1] *= -1
k += 1
Xp, yp, diff = map(np.asanyarray, (Xp, yp, diff))
pl.scatter(Xp[:, 0], Xp[:, 1], c=diff, s=60, marker='o', cmap=pl.cm.Blues)
x_space = np.linspace(-10, 10)
pl.plot(x_space * w[1], - x_space * w[0], color='gray')
pl.text(3, -4, '$\{x^T w = 0\}$', fontsize=17)
pl.axis('equal')
pl.show()


As we see in the previous plot, this classification is separable. This will not always be the case, however, in our training set there are no order inversions, thus the respective classification problem is separable.

We will now finally train an Support Vector Machine model on the transformed data. This model is known as RankSVM. We will then plot the training data together with the estimated coefficient $\hat{w}$ by RankSVM.

clf = svm.SVC(kernel='linear', C=.1)
clf.fit(Xp, yp)
coef = clf.coef_.ravel() / linalg.norm(clf.coef_)
pl.scatter(X_train[idx, 0], X_train[idx, 1], c=y_train[idx],
marker='^', cmap=pl.cm.Blues, s=100)
pl.scatter(X_train[~idx, 0], X_train[~idx, 1], c=y_train[~idx],
marker='o', cmap=pl.cm.Blues, s=100)
pl.arrow(0, 0, 7 * coef[0], 7 * coef[1], fc='gray', ec='gray',
head_width=0.5, head_length=0.5)
pl.arrow(-3, -8, 7 * coef[0], 7 * coef[1], fc='gray', ec='gray',
head_width=0.5, head_length=0.5)
pl.text(1, .7, '$\hat{w}$', fontsize=20)
pl.text(-2.6, -7, '$\hat{w}$', fontsize=20)
pl.axis('equal')
pl.show()


Finally we will check that as expected, the ranking score (Kendall tau) increases with the RankSVM model respect to linear regression.

for i in range(2):
tau, _ = stats.kendalltau(
np.dot(X_test[b_test == i], coef), y_test[b_test == i])
print('Kendall correlation coefficient for block %s: %.5f' % (i, tau))

Kendall correlation coefficient for block 0: 0.83627
Kendall correlation coefficient for block 1: 0.84387


This is indeed higher than the values (0.71122, 0.84387) obtained in the case of linear regression.

Original ipython notebook for this blog post can be found here

Matthew Rocklin

GSoC Mentor Summit

I’m in the airport waiting for my flight after finishing the Google Summer of Code Mentor Summit. This event took place this weekend. Two or three mentors from many of the GSoC projects came out to the Google campus to participate in an un-conference about GSoC. Google and SymPy were kind enough to send me and two others (Stefan Krastanov and Gilbert Gede) so I thought I’d slightly repay the favor by reporting my experiences. I’ll list some generated ideas and thoughts below. They range from the application process, to SymPy and scientific Python in general to some meta-thoughts about the conference itself.

Application process

How should we improve our application process to attract good students, detect good students, and match good students to appropriate mentors?

We should ask indirect questions to query for curiosity and passion.

Negative examples:

• Why do you want to work on SymPy?
• Why do you like Math?
• How long have you been programming?

Positive examples:

• Copy the favorite snippet of SymPy code you’ve seen here and tell us why you like it.
• What aspects of SymPy do you like the most?
• What editor do you use and why?

Experience was that direct questions tend to have low information content (everyone says the same thing). Indirect questions will be ignored by the lazy but engage the engaged. You often want to test for curiosity and passion more than actual experience in the domain.

We should match mathematically strong students with strong programmer mentors and strong programmer students with strong mathematical mentors. We often do the opposite due to shared interests but this might result in ideal contributions

Funding

Other people have funding. Should we? What would we do with it? How would we get it? It might not be as hard as we think. Who uses us? Can we get a grant? Are there companies who might be willing to fund directed work on SymPy?

Interactions with SymPians

This is my first time physically interacting with SymPy contributors other than my old mentor. It was a really positive experience. As a community we’re pretty distributed, both geographically and in applications/modules. Getting together and talking about SymPy was oddly fascinating. We should do it more. It made us think about SymPy at a bigger scale.

Some thoughts

• Do we want to organize a SymPy meetup (perhaps collocated with some other conference like SciPy)? What would this accomplish?
• What is our big plan for SymPy? Do we have one or are we all just a bunch of hobbyists who work on our own projects? Are we actively pursuing a long term vision? I think that we could be more cohesive and generate more forward momentum. I think that this can be created by occasional collocation.
• This could also be accomplished by some sort of digital meetup that’s more intense than the e-mail/IRC list. An easy test version of this could be a monthly video conference.

Broader community

I’m accustomed to academic conferences. I recently had a different experience at the SciPy conference which mixed academic research with code. I really liked this mix of theory and application and had a great time at SciPy. GSoC amplified this change, replacing a lot of academics with attendees that were purely interested in code. This was personally very strange for me, I felt like an outsider.

The scientific/numeric python community doesn’t care as intensely about many of the issues that are religion to a substantial fraction of the open source world. My disinterest in these topics and my interest in more esoteric/academic topics also made me feel foreign. There were still people like me though and they were very fun to find, just a bit rarer.

This is the first conference I’ve been to where I was one of the better dressed attendees :)

Local Community

Other projects of our size exist under an umbrella organization like the Apache foundation. I see our local community as the numpy/scipy/matplotlib stack. How can we more tightly integrate ourselves with this community? NumFocus was started up recently. Should we engage/use NumFocus more? How can we make use of and how can we support our local community?

Meta-Mentor Summit

This section includes my thoughts about the summit itself. It’s distinctly structured. I’ll share my opinions about this structure.

The informal meeting spaces were excellent. Far better than the average academic conference. I felt very comfortable introducing myself and my project to everyone. It was a very social and outgoing crowd.

Some of the sessions were really productive and helpful. The unconference structure had a few strong successes.

There were a lot of sessions that could have been better organized.

• Frequently we didn’t have a goal in mind; this can be ok but I felt that in many cases a clear goal would have kept conversation on topic.
• People very often wanted to share their experiences from events in their organization. This is good, we need to share experiences, but often people wouldn’t filter out org-specific details. We need to be mindful about holding the floor. We have really diverse groups and I’m pretty sure that the KDE guys don’t want to hear the details of symbolic algebra algorithms.
• Sessions are sometimes dominated by one person
• In general I think that we should use neutral meeting facilitators within the larager sessions. I think that they could be much more productive with some light amount of control.

Specific Interactions with other Orgs

It was really cool to associate physical humans to all of the software projects I’ve benefitted from over the years. It’s awesome to realize that it’s all built by people, and not by some abstract force. I had a number of positive experiences with orgs like Sage and SciLab that are strongly related to SymPy as well as orgs that are completely unrelated like OpenIntents, Scala, and Tor.

Conclusions

I had a good time and came away with thoughts of the future. We have something pretty cool here and I think that we should think more aggressively about where we want to take it.

Major Changes

Python 3 support

SymPy now supports Python 3. The officially supported versions are 3.2 and 3.3, but 3.1 should also work in a pinch. The Python 3-compatible tarballs will be provided separately, but it is also possible to download Python 2 code and convert it manually, via the bin/use2to3 utility. See the README for more

PyPy support

All SymPy tests pass in recent nightlies of PyPy, and so it should have full support as of the next version after 1.9.

Combinatorics

A new module called Combinatorics was added which is the result of a successful GSoC project. It attempts to replicate the functionality of Combinatorica and currently has full featured support for Permutations, Subsets, Gray codes and Prufer codes.
In another GSoC project, facilities from computational group theory were added to the combinatorics module, mainly following the book "Handbook of computational group theory". Currently only permutation groups are supported. The main functionalities are: basic properties (orbits, stabilizers, random elements...), the Schreier-Sims algorithm (three implementations, in increasing speed: with Jerrum's filter, incremental, and randomized (Monte Carlo)), backtrack searching for subgroups with certain properties.

Definite Integration

A new module called meijerint was added, which is also the result of a successful GSoC project. It implements a heuristic algorithm for (mainly) definite integration, similar to the one used in Mathematica. The code is automatically called by the standard integrate() function. This new algorithm allows computation of important integral transforms in many interesting cases, so helper functions for Laplace, Fourier and Mellin transforms were added as well.

Random Variables

A new module called stats was added. This introduces a RandomSymbol type which can be used to model uncertainty in expressions.

Matrix Expressions

A new matrix submodule named expressions was added. This introduces a MatrixSymbol type which can be used to describe a matrix without explicitly stating its entries. A new family of expression types were also added: Transpose, Inverse, Trace, and BlockMatrix. ImmutableMatrix was added so that explicitly defined matrices could interact with other SymPy expressions.

Sets

A number of new sets were added including atomic sets like FiniteSet, Reals, Naturals, Integers, UniversalSet as well as compound sets like ProductSet and TransformationSet. Using these building blocks it is possible to build up a great variety of interesting sets.

Classical Mechanics

A physics submodule named machanics was added which assists in formation of equations of motion for constrained multi-body systems. It is the result of 3 GSoC projects. Some nontrivial systems can be solved, and examples are provided.

Quantum Mechanics

Density operator module has been added. The operator can be initialized with generic Kets or Qubits. The Density operator can also work with TensorProducts as arguments. Global methods are also added that compute entropy and fidelity of states. Trace and partial-trace operations can also be performed on these density operators.
To enable partial trace operations a Tr module has been added to the core library. While the functionality should remain same, this module is likely to be relocated to an alternate folder in the future. One can currently also use sympy.core.Tr to work on general trace operations, but this module is what is needed to work on trace and partial-trace operations on any sympy.physics.quantum objects.
The Density operators, Tr and Partial trace functionality was implemented as part of student participation in GSoC 2012
Expanded angular momentum to include coupled-basis states and product-basis states. Operators can also be treated as acting on the coupled basis (default behavior) or on one component of the tensor product states. The methods for coupling and uncoupling these states can work on an arbitrary number of states. Representing, rewriting and applying states and operators between bases has been improved.

Commutative Algebra

A new module agca was started which seeks to support computations in commutative algebra (and eventually algebraic geometry) in the style of Macaulay2 and Singular. Currently there is support for computing Groebner bases of modules over a (generalized) polynomial ring over a field. Based on this, there are algorithms for various standard problems in commutative algebra, e.g., computing intersections of submodules, equality tests in quotient rings, etc....

Plotting Module

A new plotting module has been added which uses Matplotlib as its back-end. The plotting module has functions to plot the following:
• 2D line plots
• 2D parametric plots.
• 2D implicit and region plots.
• 3D surface plots.
• 3D parametric surface plots.
• 3D parametric line plots.

Differential Geometry

Thanks to a GSoC project the beginning of a new module covering the theory of differential geometry was started. It can be imported withsympy.diffgeom. It is based on "Functional Differential Geometry" by Sussman and Wisdom. Currently implemented are scalar, vector and form fields over manifolds as well as covariant and other derivatives.

Backwards compatibility breaks

-The KroneckerDelta class was moved from sympy/physics/quantum/kronecker.py to sympy/functions/special/tensor_functions.py.
• Merged the KroneckerDelta class in sympy/physics/secondquant.py with the class above.
• The Dij class in sympy/functions/special/tensor_functions.py was replaced with KroneckerDelta.
• The errors raised for invalid float calls on SymPy objects were changed in order to emulate more closely the errors raised by the standard library. The __float__ and __complex__ methods of Expr are concerned with that change.
• The solve() function returns empty lists instead of None objects if no solutions were found. Idiomatic code of the formsol = solve(...); if sol:... will not be affected by this change.
• Piecewise no longer accepts a Set or Interval as a condition. One should explicitly specify a variable using Set().contains(x) to obtain a valid conditional.
• The statistics module has been deprecated in favor of the new stats module.
• sympy/galgebra/GA.py:
• set_main() is no longer needed
• make_symbols() is deprecated (use sympy.symbols() instead)
• the symbols used in this package are no longer broadcast to the main program
• The classes for Infinity, NegativeInfinity, and NaN no longer subclass from Rational. Creating a Rational with 0 in the denominator will still return one of these classes, however.

Other Changes

• A new module gaussopt was added supporting the most basic constructions from Gaussian optics (ray tracing matrices, geometric rays and Gaussian beams).
• New classes were added to represent the following special functions: classical and generalized exponential integrals (Ei, expint), trigonometric (Si, Ci) and hyperbolic integrals (Shi, Chi), the polylogarithm (polylog) and the Lerch transcendent (lerchphi). In addition to providing all the standard sympy functionality (differentiation, numerical evaluation, rewriting ...), they are supported by both the new meijerint module and the existing hypergeometric function simplification module.
• An ImmutableMatrix class was created. It has the same interface and functionality of the old Matrix but is immutable and inherits from Basic.
• A new function in geometry.util named centroid was added which will calculate the centroid of a collection of geometric entities. And the polygon module now allows triangles to be instantiated from combinations of side lengths and angles (using keywords sss, asa, sas) and defines utility functions to convert between degrees and radians.
• In ntheory.modular there is a function (solve_congruence) to solve congruences such as "What number is 2 mod 3, 3 mod 5 and 2 mod 7?"
• A utility function named find_unit has been added to physcis.units that allows one to find units that match a given pattern or contain a given unit.
• There have been some additions and modifications to Expr's methods:
• Although the problem of proving that two expressions are equal is in general a difficult one (since whatever algorithm is used, there will always be an expression that will slip through the algorithm) the new method of Expr named equals will do its best to answer whether A equals B: A.equals(B) might given True, False or None.
• coeff now supports a third argument n (which comes 2nd now, instead of right). This n is used to indicate the exponent on x which one seeks: (x**2 + 3*x + 4).coeff(x, 1) -> 3. This makes it possible to extract the constant term from a polynomial:(x**2 + 3*x + 4).coeff(x, 0) -> 4.
• The method round has been added to round a SymPy expression to a given a number of decimal places (to the left or right of the decimal point).
• divmod is now supported for all SymPy numbers.
• In the simplify module, the algorithms for denesting of radicals (sqrtdenest) and simplifying gamma functions (in combsimp) has been significantly improved.
• The mathematica-similar TableForm function has been added to the printing.tableform module so one can easily generate tables with headings.
• In addition to the more noticeable changes listed above, there have been numerous smaller additions, improvements and bug fixes in the commits in this release. See the git log for a full list of all changes. The command git log sympy-0.7.1..sympy-0.7.2 will show all commits made between this release and the last. You can also see the issues closed since the last release here.
• The expand API has been updated. expand() now officially supports arbitrary _eval_expand_hint() methods on custom objects._eval_expand_hint() methods are now only responsible for expanding the top-level expression. All deep=True related logic happens inexpand() itself. See the docstring of expand() for more information and an example.
• Two options were added to isympy to aid in interactive usage. isympy -a automatically creates symbols, so that typing something likea will give Symbol('a'), even if you never typed a = Symbol('a') or var('a')isympy -i automatically wraps integer literals with Integer, so that 1/2 will give Rational(1, 2) instead of 0.5isympy -I is the same as isympy -a -iisympy -I makes isympy act much more like a traditional interactive computer algebra system. These both require IPython.
• The official documentation at http://docs.sympy.org now includes an extension that automatically hooks the documentation examples in toSymPy Live.

Authors

The following people contributed at least one patch to this release (names are given in alphabetical order by last name). A total of 103 people contributed to this release. People with a * by their names contributed a patch for the first time for this release; 77 people contributed for the first time for this release.
Thanks to everyone who contributed to this release!
• Sanket Agarwal*
• Swapnil Agarwal*
• Bilal Akhtar*
• Nathan Alison*
• Steve Anton*
• Takafumi Arakaki*
• Chancellor Arkantos*
• Manoj Babu K.*
• Tom Bachmann
• Oscar Benjamin
• Raoul Bourquin*
• Christian Bühler*
• Jorge E. Cardona*
• Ondřej Čertík
• Puneeth Chaganti*
• Roberto Colistete, Jr.*
• Renato Coutinho
• Joan Creus*
• Addison Cugini
• Guru Devanla*
• Joseph Dougherty*
• Comer Duncan*
• Joachim Durchholz*
• Tarun Gaba*
• Luis Garcia*
• Gilbert Gede
• Arpit Goyal*
• Brian E. Granger
• Alexey U. Gudchenko
• Alexandr Gudulin*
• Matt Habel*
• Tristan Hume*
• Kevin Hunter*
• Gert-Ludwig Ingold*
• Sachin Irukula*
• Sergiu Ivanov*
• Siddhant Jain*
• Saurabh Jha*
• Fredrik Johansson
• David Ju*
• Kendhia*
• Andreas Kloeckner*
• Carsten Knoll*
• Piotr Korgul*
• Marcin Kostrzewa*
• Stefan Krastanov
• Priit Laes
• Tim Lahey*
• Ronan Lamy
• Nikolay Lazarov*
• Tomo Lazovich
• Tobias Lenz*
• David Li*
• Bharath M R*
• Sam Magura
• Aleksandar Makelov*
• Saptarshi Mandal
• Imran Ahmed Manzoor*
• Shruti Mangipudi*
• Davy Mao*
• Miha Marolt*
• marshall2389*
• Michael Mayorov*
• Aaron Meurer
• Raphael Michel*
• Jason Moore*
• Ljubiša Moćić*
• Angadh Nanjangud*
• Natalia Nawara*
• Jens H. Nielsen*
• Sai Nikhil*
• Ashwini Oruganti*
• Prateek Papriwal*
• Mateusz Paprocki
• Vladimir Perić
• Mario Pernici
• Luke Peterson
• Alexandr Popov*
• Nicolas Pourcelot
• Martin Povišer*
• Matt Rajca*
• Julien Rioux*
• Matthew Rocklin
• Nikhil Sarda
• Siddhanathan Shanmugam*
• Stepan Simsa*
• Sam Sleight*
• Chris Smith
• Geoffry Song*
• Andrew Straw
• Alexey Subach*
• Grzegorz Świrski*
• Prafullkumar P. Tale
• Matthias Toews*
• tsmars15*
• Nichita Utiu*
• Srinivas Vasudevan*
• Sean Vig
• vishal*
• George Waksman*
• Luca Weihs
• Raymond Wong
• Jeremias Yehdegho
• Jim Zhang*
• Tiffany Zhu*
• jerryma1121*
• Rom le Clair*

October 05, 2012

Brian Granger

Blogging with the IPython Notebook

The IPython dev team gets a lot of questions about how IPython Notebooks can be used on various blogging platforms. There have been a number of different attempts to use nbconvert to export a notebook to HTML and then embedding that HTML in the blogging platform. With the introduction of nbviewer it is trivial to embed a Notebook in any web page using iframes. Here is a simple example:

<iframe src="http://nbviewer.ipython.org/3835181/" width="800" height="1500"></iframe>

Which produces the embedded Notebook shown below. The only subtle part is that you have to set the width and height attributes manually to avoid having scroll bars on the iframe.

<iframe height="1500" src="http://nbviewer.ipython.org/3835181/" width="800"></iframe>

September 19, 2012

Aaron Meurer

Infinitely nested lists in Python

Readers of this blog know that I sometimes like to write about some strange, unexpected, and unusual things in Python that I stumble across. This post is another one of those.

First, look at this

>>> a = []
>>> a.append(a)
>>> a
[[...]]


What am I doing here? I’m creating a list, a, and I’m adding it to itself. What you end up with is an infinitely nested list. The first interesting thing about this is that Python is smart enough to not explode when printing this list. The following should convince you that a does indeed contain itself.

>>> a[0] is a
True
>>> a[0] == a
True


Now, if you have programmed in C, or a similar language that uses pointers, this should not come as a surprise to you. Lists in Python, like most things, do not actually contain the items inside them. Rather, they contain references (in C terminology, “pointers”) to the items inside them. From this perspective, there is no issue at all with a containing a pointer to itself.

The first thing I wondered when I saw this was just how clever the printer was at noticing that the list was infinitely nested. What if we make the cycle a little more complex?

>>> a = []
>>> b = []
>>> a.append(b)
>>> b.append(a)
>>> a
[[[...]]]
>>> b
[[[...]]]
>>> a[0] is b
True
>>> b[0] is a
True


So it still works. I had thought that maybe repr just catches RuntimeError and falls back to printing ... when the list is nested too deeply, but it turns out that is not true:

>>> a = []
>>> for i in range(10000):
...     a = [a]
...
>>> a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: maximum recursion depth exceeded while getting the repr of a list


And by the way, in case you were wondering, it is possible to catch a RuntimeError (using the same a as the previous code block)

>>> try:
...     print(a)
... except RuntimeError:
...     print("no way")
...
no way


(and you also may notice that this is Python 3. Things behave the same way in Python 2)

Back to infinitely nested lists, we saw that printing works, but there are some things that don’t work.

>>> a[0] == b
True
>>> a[0] == a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: maximum recursion depth exceeded in comparison


a[0] is b holds (i.e., they are exactly the same object in memory), so == is able to short-circuit on them. But to test a[0] == a it has to recursively compare the elements of a and a[0]. Since it is infinitely nested, this leads to a recursion error. Now an interesting question: why does this happen? Is it because == on lists uses a depth first search? If it were somehow possible to compare these two objects, would they be equal?

One is reminded of Russel’s paradox, and the reason why in axiomatic set theory, sets are not allowed to contain themselves.

Thinking of this brought me to my final question. Is it possible to make a Python set that contains itself? The answer is obviously no, because set objects can only contain hashable objects, and set is not hashable. But frozenset, set‘s counterpart, is hashable. So can you create a frozenset that contains itself? The same for tuple. The method I used for a above won’t work, because a must be mutable to append it to itself.

September 09, 2012

Guru Devanla

Ubuntu to Mac – My experience with the new setup

Recently I had an opportunity to get my hands dirty on a Mac based development environment. I have done most of my development on Ubuntu and getting to work on Mac was a mixed experience. As anyone would have guessed the overall UI experience has been great.

Here are some of the tools I was introduced to during my first week:

1. SQLPro : This is a nice GUI based MySQL client. I was never impressed with the command line MySql client. I hated it all the more once I started using Postgress and the accompanying client. If you have used Golden for Oracle it is on par with that tool.

2. iTerm2: This is the tmux equivalent and tuned towards the Mac UI experience. The key bindings are easy to learn and worth spending time getting used to the keyboard shortcuts.

3. Homebrew: This is the Ubuntu equivalent of apt-get. Other options include MacPorts and Fink. I am yet to familiarize myself with the differences between the options.

4. Alfred: This is a quick launch app which helps you launch applications using Cmd-Space shortcut. This is equivalent to Alt-F2 on Ubuntu.

5. PyCharm : This is not Mac specific, but this tool has been handy in helping me step through code and understand the implementation as I debug.

6. Emacs on Mac : The biggest challenge I had was trying to set up Emacs to reflect my set up on Ubuntu. ‘Monaco’ is the usual font used in most of Mac applications. But, the GUI version of Emacs just does not render Monaco-11 or Monaco-12 in anti-aliased mode.Turning on/off anti-aliasing did not help either. Doing anything in a font bigger than that was very painful. Finally, I settled into to using the non gui-version of Emacs-24. I just had to create an alias called emacs24 and make it point to the Emacs app with the non-Gui option. Here is the instruction you will need to follow, if you had the same experience.
a. Install the Emacs-24.x version.
b. Add this line to your bash_profile.

alias emacs=”/Applications/Emacs.app/Contents/MacOS/Emacs -nw”

c. And, happily run Emacs with the beautiful fonts rendered by iTerm2.

This option also helps you open Emacs within one of the sessions in iTerm2. That way you can remain working on the terminals without the need to switch windows.

There are some new Python tools I have started working in lately, and that will be the topic of my next post.

Happy Coding!

August 31, 2012

Aaron Meurer

isympy -I: A saner interactive environment

As promised, here is another post describing a new feature in the upcoming SymPy 0.7.2.

Automatic Symbol Definition

While not as ground breaking as the feature I described in my last post, this feature is still quite useful. As you may know, SymPy is inherently a Python library, meaning that it lives by the rules of Python. If you want to use any name, whether it be a Symbol or a function (like cos), you need to define it (in the case of Symbols), or import it (in the case of functions that come with SymPy). We provide the script isympy with SymPy to assist with this. This script automatically runs IPython (if it’s installed), imports all names from sympy (from sympy import *), and defines common symbol names (like x, y, and z).

But if you want to use a Symbol that is not one of the ones predefined by isympy, you will get something like

In [1]: r*x
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
in ()
----> 1 r*x

NameError: name 'r' is not defined


The best solution for this has been either to type var('r'), which will create the Symbol r and inject it into the namespace, or to wrap your text in a string and pass it to sympify(), like sympify("r*x"). Neither of these are very friendly in interactive mode.

In SymPy 0.7.2, isympy has a new command line option, isympy -a, which will enable a mechanism that will automatically define all undefined names as Symbols for you:

In [1]: r*x
Out[1]: r⋅x


There are some caveats to be aware of when using this feature:

• Names must be undefined for isympy -a to work. If you type something like S*x, you’ll get:
In [3]: S*x
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-6656a97ea7b0> in <module>()
----> 1 S*x

TypeError: unsupported operand type(s) for *: 'SingletonRegistry' and 'Symbol'


That’s because S is already defined (it’s the SingletonRegistry, and also a shortcut to sympify()). To use a name that’s already defined, either create it manually with var() or delete it using del.

• This only works on the top level namespace. If you define a function with an undefined name, it will not automatically define that symbol when run.
• This works by catching NameError, defining the name, and then re-running the expression. If you have a multiline statement, any lines before the undefined name will be run before the NameError will be caught. This usually won’t happen, but it’s a potential side-effect to be aware of. We plan to rewrite it using either ast or tokenize to avoid this issue.
• Obviously, this is intended for interactive use only. If you copy code and put it in a script, or in some other place where someone might be expected to run it, but not necessarily from isympy -a, you should include symbol definitions.

Automatic int to Integer Conversion

A second thing that is annoying with Python and SymPy is that something like 1/2 will be interpreted completely by Python, without any SymPy. This means that something like 1/2 + x will give either 0 + x or 0.5 + x, depending on whether or not __future__.division has been imported. isympy has always ran from __future__ import division, so that you’ll get the latter, but we usually would prefer to get Rational(1, 2). Previously, the best way to do this was again to either run it through sympify() as a string, or to sympify at least one of the numbers (here the S() shortcut to sympify() is useful, because you can type just S(1)/2).

With SymPy 0.7.2, you can run isympy -i, and it will automatically wrap all integers literals with Integer(). The result is that 1/2 produces Rational(1, 2):

In [1]: 1/2 + x
Out[1]: x + 1/2


Again, there are a couple of caveats:

• If you want to get Python style division, you just need to wrap both arguments in int():
In [2]: int(1)/int(2)
Out[2]: 0.5


Of course, if you just want a floating point number, you can just use N() or .evalf()

• This works by parsing the text and wrapping all integer literals with Integer(). This means that if you have a variable set to a Python int, it will still act like a Python int:
In [6]: a = int(1)

In [7]: b = int(2)

In [8]: a/b
Out[8]: 0.5


Note that to even do that example, I had to manually make a and b Python ints by wrapping them in int(). If I had just done a = 1, it would have been parsed as a = Integer(1), and I would have gotten a SymPy Integer. But this can be an issue if you use the result of some function that returns an int (again, note that most functions in SymPy that return integers return Integer, not int).

• The same as before: this will only work interactively. If you want to reuse your code outside of isympy -i, you should take care of any int/int by rewriting it as S(int)/int.

Since these are both useful features, we’ve added a way that you can get them both at once: by doing isympy -I (the “I” stands for “Interactive”). If we add similar features in the future, we will also add them to the -I shortcut (for example, we may add an option to allow ^ to automatically be replaced with **).

August 21, 2012

Aaron Meurer

SymPy Live Sphinx Extension

I didn’t blog about SymPy all summer, so I thought I would write a post about my favorite feature of the upcoming SymPy 0.7.2 release.  In fact, this feature has got me more excited than any other feature from any version of SymPy.  Yeah, it’s that good.

The feature is the SymPy Live Sphinx extension.  To start, if you don’t know about it, check out SymPy Live.  This is a console that runs on the App Engine.  We’ve actually had this for quite some time, but this winter, it got a huge upgrade thanks to the contribution of some GCI students.  Basically, SymPy Live lets you try out SymPy in your browser completely for free, because it runs all the code on the App Engine.  Actually, the console is a full Python console, so you can actually run any valid Python command on it.  This past winter, GCI students upgraded the look of the site, added a mobile version (visit live.sympy.org on your phone), and added other neat features like search history and autocompletion.

Now, Sphinx is the documentation system that we use to generate SymPy’s html documentation. Last year, when I was at the SciPy Conference, Mateusz had an idea at the sprints to create an extension linking SymPy Live and Sphinx, so that the examples in Sphinx could be easily run in SymPy Live.  He didn’t finish the extension, but I’m happy to report that thanks to David Li, who was also one of the aforementioned GCI students, the extension is now complete, and is running live on our development docs.  When SymPy 0.7.2 is released (soon I promise), it will be part of the oficial documentation.

The best way to see how awesome this is is to visit the website and check it out.  You will need a modern browser (the latest version of Firefox, Safari, or Chrome will work, IE might work too).  Go to a page in the development docs with documentation examples, for example, http://docs.sympy.org/dev/tutorial.html#algebra, and click on one of the examples (or click on one of the green “Run code block in SymPy Live” buttons). You should see a console pop up from the bottom-right of the screen, and run your code.  For example:

Example of the SymPy Live Sphinx extension at http://docs.sympy.org/dev/tutorial.html#algebra. Click for larger image.

You can access or hide the console at any time by clicking on the green box at the bottom-right of the page.  If you click on “Settings”, you will see that you can change all the same settings as the regular SymPy Live console, such as the printer type, and the keys for execution and autocompletion.  Additionally, there is a new setting, “Evaluation Mode”, which changes how the Sphinx examples are evaluated.  The default is “Evaluate”.  In this mode, if you click on an example, it is executed immediately.  The other option is “Copy”.  In this mode, if you click an example, it is copied to the console, but not executed right away. This way, you can edit the code to try something different.  Remember, this is a full fledged Python console running SymPy, so you can try literally anything

So play with this and let us know what you think.  We would love to hear ways that we can improve the experience even further.  In particular, I think we should think about ways to make the “Copy” mode more user-friendly.  Suggestions welcome!  Also, please report any bugs.

And one word of warning:  even though these are the development docs, SymPy Live is still running SymPy 0.7.1.  So some examples may not work until 0.7.2 is released, at which point we will update SymPy Live.

I believe that this extension represents the future of interactive documentation. I hope you enjoy.

August 20, 2012

Aleksandar Makelov

Google Summer of Code 2012: Week 13

Hi all, here’s a brief summary of my 13th (and last) week of GSoC.

• I continued my work on centralizers, improving normal closure, derived in lower central series, etc. My most recent pull request containing these additions just got merged and can be found here. This week I spent a lot of time on writing better tests and developing some new test practices. The group-theoretical algorithms in the combinatorics module are getting more and more complicated, so better, cleverer and more thorough tests are needed. I came up with the following model for verification:
- since the results of the tests are very hard to compute by hand, some helper functions are needed that find the wanted object in a brute-force manner using only definitions. For example, we often look for a subgroup with certain properties. The most naive and robust approach to this is to:
- list all group elements, go over the list and check each element for the given property.
- Then, make a list of all the “good” elements and compare it (as a set) with the list of all elements of the group the function being tested returns.
Hence, a new file was created, sympy/combinatorics/testutil.py, that will host such functions. (Needless to say, they are exponential in complexity, and for example going over all the elements of SymmetricGroup(n) becomes infeasible for n larger than 10.)
- The presence of functions being used to test other functions gets us in a bit of a Quis custodiet ipsos custodes? situation, but this is not fatal: the functions in testutil.py are extremely straightforward compared to the functions in perm_groups.py that they test, and it’s really obvious what they’re doing, so it’ll take less tests to verify them.
- In the tests for the new functions from perm_groups.py, I introduced some comments to indicate what (and why) I’m testing. Another practice that seems to be good is to verify the algorithms for small groups (degrees 1, 2, 3) since there are a lot of corner cases there that seem to break them.
• I started work on improving the disjoint cycle notation, namely excluding singleton cycles from the cyclic form; however, there are other changes to handling permutations that are waiting to be merged in the combinatorics module here, so I guess I’ll first discuss my changes with Christopher. Currently, I see the following two possibilities for handling the singleton cycles:
- add a _size attribute to the Permutation class, and then, when faced with something like Permutation([[2, 3], [4, 5, 6], [8]]), find the maximum index appearing in the permutation (here it’s 8) and assign the size of the permutation to that + 1. Then it remains to adjust some of the other methods in the class (after I adjusted mul so that it treats permutations of different sizes as if they leave all points outside their domain fixed, all the tests passed) so that they make sense with that new approach to cyclic forms.
- more ambitious: make a new class, ExtendedArrayForm or something, with a field _array_form that holds the usual array form of a permutation. Then we overload the __getitem__ method so that if the index is outside the bounds of self._array_form we return the index unchanged. Of course, we’ll have to overload other things, like the __len__ and __str__ to make it behave like a list. Then instead of using a list to initialize the array form of a permutation, we use the corresponding ExtendedArrayForm. This will make all permutations behave as if they are acting on a practically infinite domain, and if we do it that way, we won’t have to make any changes to the methods in Permutation – everything is going to work as expected, no casework like if len(a) > len(b),... will be needed. So this sounds like a rather elegant approach. On the other hand, I’m not entirely sure if it is possible to make it completely like a list, and also it doesn’t seem like a very performance-efficient decision since ExtendedArrayForm instances will be created all the time. (see the discussion here).
• Still nothing on a database of groups. I looked around the web for a while but didn’t find any resources… the search continues. Perhaps I should ask someone more knowledgeable.

That’s it for now, and that’s the end of my series of blog posts for the GSoC, but I don’t really feel that something has ended since it seems that my contributions to the combinatorics module will continue (albeit not that regularly : ) ). After all, it’s a lot of fun, and there are a lot more things to be implemented/fixed there! So, a big “Thank you” to everyone who helped me get through (and to) GSoC, it’s been a pleasure and I learned a lot. Goodbye!

August 19, 2012

GSoC last week

This happens to be the last week of GSoC. The major things that I accomplished this week are

• Got pylab to work interactively.
• Made more changes to the documentation of plotting module.

I have a pull request for the restructured plotting module at here. There has been lots of discussions on how the new plot API should look like in the pull request. The API as of now has 5 functions:

• plot_line which plots 2D line plots, which I think I will change to plot.
• plot_parametric which plots 2D parametric plots.
• plot3D which plots 3D plots.
• plot3D_parametric which plots 3D parametric line plots. I think I will have to change it into plot_parametric3D.
• plot3D_surface which plots 3D parametric surfaces.

The names are slightly confusing, but the alternative to these names are big. If you have any good names for 3D plots, please leave it in the comments.

I will have another post describing the things I learnt over this GSoC period.

GSoC last week

This happens to be the last week of GSoC. The major things that I accomplished this week are

• Got pylab to work interactively.
• Made more changes to the documentation of plotting module.

I have a pull request for the restructured plotting module at here. There has been lots of discussions on how the new plot API should look like in the pull request. The API as of now has 5 functions:

• plot_line which plots 2D line plots, which I think I will change to plot.
• plot_parametric which plots 2D parametric plots.
• plot3D which plots 3D plots.
• plot3D_parametric which plots 3D parametric line plots. I think I will have to change it into plot_parametric3D.
• plot3D_surface which plots 3D parametric surfaces.

The names are slightly confusing, but the alternative to these names are big. If you have any good names for 3D plots, please leave it in the comments.

I will have another post describing the things I learnt over this GSoC period.

August 13, 2012

Aleksandar Makelov

Google Summer of Code 2012: Week 12

Hi all, here’s a brief summary of the 12th week of my GSoC:

• Centralizers got some more attention since there were several bugs in the implementation from last week; this also exposed a bug in .subgroup_search() as it is on sympy/master right now. Fortunately, I located it and fixed it earlier today, so the fix for .subgroup_search() will be contained in my next pull request. In fact, it is just three more lines that should be added. Namely,
# line 29: set the next element from the current branch and update
# accorndingly
c[l] += 1
element = ~(computed_words[l - 1])


should be replaced with

# line 29: set the next element from the current branch and update
# accorndingly
c[l] += 1
if l == 0:
element = identity
else:
element = ~(computed_words[l - 1])


since we might be at the bottom level with $l=0$. In this case, python doesn’t yell at you for looking up computed_words[-1] since negative indices wrap around the list in python. Yet another silly mistake that’s incredibly hard to track down! I hope that it will work properly from now on, and I’ll have to include some more tests to it.

• The description of the algorithm for finding the center in polynomial time given in [1] didn’t really make sense to me, so instead a straightforward one,
def center(self):
return self.centralizer(self)


was used. This can be updated later when I (or someone else) figures out the polynomial-time algorithm.

• A new, faster algorithm for finding normal closures: this one uses the incremental version of Schreier-Sims, and some randomization. It’s described in [1].
• Some applications of normal closure: the derived series, lower cenral series, the commutator of two subgroups of a group, nilpotency testing. Now we have things like this:
In [68]: from sympy.combinatorics.named_groups import *
In [69]: S = SymmetricGroup(4)
In [70]: ds = S.derived_series()
In [71]: len(ds)
Out[71]: 4
In [72]: ds[1] == AlternatingGroup(4)
Out[72]: True
In [73]: ds[2] == DihedralGroup(2)
Out[73]: True
In [74]: ds[3] == PermutationGroup([Permutation([0, 1, 2, 3])])
Out[74]: True


demonstrating the well-known normal series of groups $e < K_4 < A_4 < S_4$ that solves the symmetric group on 4 letters. Note that the normal closure algorithm was already there thanks to the work of Mario, I just improved it a bit and added some applications.

• Moved DirectProduct() to a new file, group_constructs.py, that is planned to hold functions that treat several groups equally (for one other example, the commutator of two groups in the full symmetric group) rather than treating them in some sort of subgroup-supergroup relationship (such as .centralizer()).

I wrote docstrings for the new stuff, and my current work can be found on my week10 branch. There will be some comprehensive test following the new additions (and I’ll need GAP to verify the results of some of them, probably). It seems that Todd-Coxeter won’t happen during GSoC since there’s just one more week; instead, I plan to focus on improving disjoint cycle notation and group databases.

[1] Derek F. Holt, Bettina Eick, Bettina, Eamonn A. O’Brien, “Handbook of computational group theory”, Discrete Mathematics and its Applications (Boca Raton). Chapman & Hall/CRC, Boca Raton, FL, 2005. ISBN 1-58488-372-3

Guru Devanla

Week 12 : Starting work on Shor’s algorithm

This has been a great 12 week run filled with learning and valuable experience! I believe I have accomplished most of what I had set out to do.  The only other thing pending as far as proposed tasks goes is to relocate the Tr module. This would be based on decisions that needs to be taken by more experienced folks here! Once, the decision is made, I don’t think this would take too much of my time. (I am estimating this based on the current approaches that have been proposed).  Therefore, I am committed to getting this done even if this goes beyond GSoC pencils down deadline which is this week!

So, that was all about the wrap up! So, what I am doing now?

~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

Looking forward, I have taken up another interesting task I am excited to work on.  Past week I started working on implementing/completing Shor’s algorithm. The 2 major activities that consumed my time this week were

a) Understand Shor’s algorithm in more detail to help me understand what is available in the current implementation.

b) Review the current implementation and plan on next steps.

After reviewing the current implementation, I have decided to approach this task with the following smaller tasks:

a) Review and complete the implementation of QFT gate ( include tests, documentation), so that it can work independent of Shor’s algorithm.

b) Review and complete the implementation of CMOD gate. Right now, the implementation is not complete, and needs more tests too. This is the set of gates that would do the  |a mod N> * |state> over 2^j times for j = { 0, 1, ….2^(n-1)}

c) Put this all together and re-factor the current implementation (especially break down current period_find() ) to help more detail testing.

d) Also, I plan to provide a argument to shor() method, to work as a flag to turn on detail logging as the circuit progresses.

All the code I am currenly reviewing and planning to work on are available on the master branch even though it is not complete.

Happy coding!

August 12, 2012

Sergiu Ivanov

The Revolution (The Preview)

This week I continued the work on diagram embeddings and, unfortunately, I have discovered that Diagram did not actually work properly. I have written a status report E-mail to Tom, in which I briefly outine the progess. This E-mail (with some omissions) will serve as this week’s blog post, because writing a proper blog post would take me at least three hours, and I would rather code right now, given the limited timeframe.

Unfortunately, I’ve got some, well, ambiguous news.

Remember I told you about hash-randomisation failures in computing
diagram embeddings? Well, it turned out that diagram embeddings was
quite OK, and the problem went as far back as the Diagram class.
Essentially, I have done a really bad job implementing it at the
beginning of the summer: I wanted it to directly store all possible
morphism compositions. However, in that implementation, I didn’t
really store all compositions, but just a part of them; which part I
stored depended on the order in which the morphisms were supplied
(severe facepalm )

I tried thinking of some good fixes, but, as you can easily imagine,
the whole idea of storing all composites has suffered an epic
disintegration in the face of diagrams with cycles. I am really
_really_ astonished at how this has managed to slip by me for such a
long time!

I have spent the first one third of Friday on trying to save the
existing design somehow, by carefully processing all possible edge
cases, but this has very quickly started to be abominable, and, of
course, it didn’t work. So, I have spent the greater part of Friday
on thinking out a new Diagram. I have spent all of yesterday, (all of
Saturday, that is), on implementing this new concept. Basically, the
new Diagram only stores the relevant generator morphisms, but it is
still able to correctly determine and enumerate the morphisms that
belong to it. It is also capable of determining whether it is finite
or not (i.e., whether there are cycles in the underlying directed
multigraph). When asked for all morphisms, the new Diagram yields a
generator which acts correctly both in the finite and infinite cases.
In the finite case it produces all the morphisms of the Diagram. This
is clearly impossible in the infinite case, but Diagram is
sufficiently clever in this case to produce the morphisms in a
BFS-like manner. Intuitively, it will first yield all length one
morphisms, then all morphisms of length two, etc.

I have made some changes to the interface of the Diagram to better
reflect the new internals. Nevertheless, the behaviour in the finite
case is the same as that of the old Diagram (modulo some property
names and minor changes, of course).

One bit of good news that deserves standing out in a separate
paragraph is that I only had to change _one_ line of code in
diagram_drawing.py to get it to work with the new Diagram. (Well, I
did drop three other lines, because they were redundant), so this
radical swerve with the Diagram has left the larger part of my GSoC
work unaffected.

Now, I have started cherry-picking the diagram embeddings code, and I
have arrived at a conclusion that Diagram has to be further extended.
(“Extending” means adding something new, not rewriting it again.)
Namely, it is insufficient to know whether the whole Diagram is finite
or not; I really need to know whether a certain hom-set is finite or
not. It’s not that hard to implement, and I’ve got a cool book on
graphs; however, it’s going to require some extra time.

Here comes the most important part of my message: I’m working at the
fullest possible cruising speed (not sprinting yet; that I’m saving
for the last 100m). I won’t obviously have everything done tomorrow,
on Monday; however, I strongly believe that I only need another couple
days to finish the bulk of inferencing. Provided that on Monday we
have what is referred to as _soft_ pencils-down date, I hope that I’m
still OK with the GSoC timeframe. Further, I think I have already
mentioned a couple times that I’m going to have another couple free
weeks after GSoC, during which I will be easily able to finalise
whatever will be unfinished. Do note, however, that I definitely
expect to have inferencing done _within_ the GSoC timeframe.

Conclusion: despite the rather radical direction things have taken in
the last two days, I’m _still_ more or less fine with the timing.

At the moment, you will not be able to see the code I’m working on on
GitHub. The reason is that I’m juggling branches rather ninja-ily
right now, so I don’t really have the most relevant one to push
online, and they are all relatively short-lived. I do expect to get
back to working sequentially today, and once I’ve got there, I’ll push
to ct3-commutativity to reflect the updates.

I’m documenting everything I do in as minute detail as possible. I
think the Diagram class and the embeddings functionality has more
comments than actual code. I expect this to make reviewing and later
maintenance considerably more agreeable. Further, my commits are
almost all rather short, with acceptably long commit messages. There
is one commit that breaks the rule, however: the commit which adds the
new Diagram. It is one relatively large chunk of code, which replaces
the old Diagram with the new one and shows that the old tests still
pass modulo minor changes. I have nevertheless reformatted the
history a bit to make this commit easier to review and, of course, the
code itself is just literally stuffed with comments. All other
commits are much more like my usual ones.

Whenever I’m done with the core parts of inferencing, I will write a proper blogpost.

Angadh Nanjangud

GSoC blog post

Penultimate week of the GSoC period and it has been busy busy busy.

In my last post I had spoken about opening a PR for LagrangesMethod and about cleaning up the PR on energy functions. (links to both are in last week’s post and I won’t repeat them here.) Much of this week was spent cleaning up both those PRs and quite extensive testing on LagrangesMethod. The testing has been mostly successful. I shall explaing why ‘mostly’ in the a bit. The PR for the renergy functions has been merged and I’m just waiting for approval from ‘the boss’ so that LagrangesMethod can be merged too.

I would like to direct the reader to my proposal. In it I had said that I would write this class only for unconstrained systems. The idea was to modify this to be a ‘complete’ class post the GSoC period. But as we got into the week to begin working on this class, Gilbert and I decided that we would make it a full fledged Lagrange class; that could handle any kind of constraint on mechanical systems. Constraints on mechanical systems are basically  of 2 types – configuration constraints (or holonomic constraints) and velocity constraints (or non-holonomic constraints). Depending on the methods used (Newton-Euler or Lagrange or Kane’s method and so on) these constraint equations are accounted for differently In the case of Lagrange’s method, there are additional terms due to these constraints that result in the introduction of the Lagrange multipliers. So, basically, repeating myself for the sake of clairty, one can now obtain the equations of motion in sympy.physics.mechanics using LagrangesMethod for any kind of system. I would even like to go out on a limb (quite literally under my current circumstances ) and claim that it could be made use of for more generic applications involving ‘The Method of Lagrange Multipliers’ provided the user has the Lagrangian and constraint equations. (The documentation will however be limited to the domain of mechanical systems but shouldn’t be too hard to translate into something more generic for a user). The class also handles nonconservative forces thus making it a complete class.

In terms of testing each of these functionalities, I feel the tests are pretty thorough. I have tested the nonconservative force handling of the class on a simple spring-mass-damper system. I have tested the handling of the constraint equations using the famous ‘disc on an inclined plane’ test problem. Tests on more complex 3d systems have been performed like the rolling disc (more on this later). And then tests on a multibody system have been verified for a double pendulum. All of these work correctly; results have been compared to solutions by ahnd.

So with all of this down, why did I say it was “mostly successful”? Well, as it turns out, the tests work perfectly well when limited to problems involving planar motion. The results match up to those obtained by hand. But the results from the class get extremely nasty when dealing with more complex systems; I have implemented the rolling disc in two separate cases. In one test, I use the minimal set of generalized coordinates and the correct eoms are generated. But in another case I tried to use the non-minimal set of GCs and the equations generated are near impossible to comprehend (or I haven’t found the best way to deal with them yet). A big contribution of this messiness is due to the way in which Lagranges approach requires the definition of generalized speeds. In his approach, it is erquired for the generlized speeds to be ‘simple’ i.e. the gen. speeds are just derivatives of the gen coords. This is different in Kane’s approach where the generalized speeds can but needn’t necessarily be ‘simple’. From my experience, Kane’s generalized speeds are defined in a manner which make physical sense. This definitely validates why most dynamicists today (or so I have heard) prefer to choose Kane’s method on complex multi-body systems. The only way I can think of circumventing this situation in teh ‘LagrangesMethod’ class right now is using the minimal set of GCs for well known systems like the rolling disc and hope for the best.

Having all the additional functionality in this class and also playing with the rolling disc in particular has definitely led to a lot of insight but also taken a good chunk of time away from a period I wanted to dedicate to the ‘code output section’ which I have been unable to get started on. It looks like I will be unable to meet that one goal by the ‘hard’ pencils down date as I complete and fine tune the documentation (pending final approval of the Lagrange PR, of course). But I do feel that the time spent on Lagrange has been for the good. The code, I personally feel, is easy to read and appears to be easy to use. With people’s comments I was able to weed out all the unnecessary stuff. It is also ‘complete’ like I previously highlighted. I will continue to work on ‘code output’ post the GSoC period though as it’s usefulness is undeniable and also because of a development of a general sense of interest in coding (surprise surprise!).

Anyhow, apart from this, the other stuff I got done this week- I wrote up minor functions to compute a Lagrangian, changed how the potential energy function behaves. That’s it for this week. See you next week, one last time.

August 08, 2012

Stefan Krastanov

Graph of the Relations between Objects in the diffgeom Module

This graph, besides showing naively in a rather simplistic way the structure of the theory of differential geometry (and most of what I have implemented in the diffgeom module), brings attention to the one non-trivial part of the module on which I have spent most of my time lately. Namely, implementing covariant derivatives.

All directional derivatives are defined as a limiting procedure on a transport operator. Besides the Lie derivatives which use a certain transport operator that is easy to express in a coordinate free way, all other derivatives, called covariant derivatives have to be expressed using something called Christoffel symbols. And these are the ugly coordinate-dependent sources of pain, as the module structure becomes very cumbersome when such dependence must be accounted for. Thankfully, I think I have found a nice way to implement them in a new CovariantDerivativeOperator class on its own, that will contain all the logic in the same way in which the Base*Field classes do it. This will also require rewrite of the LieDerivative into a LieDerivativeOperator class.

August 06, 2012

Aleksandar Makelov

Google Summer of Code 2012: Week 11

Hi all, here’s a brief summary of the 11th week of my GSoC.

• Yay! Subgroup searching now works with the use of .stabilizer(), as I discussed in my previous blog post. Surprisingly, the running time is similar to that of the flawed version using .baseswap() (whenever the one using .baseswap() works), you can play around with the two versions on my week6 (has a bug, using .baseswap()) and week9 (seems to work, using .stabilizer()) branches.
• Consequently, I made a new pull request containing the incremental version of Schreier-Sims, the remove_gens utility for getting rid of redundant generators in a strong generating set, and the new (working) subgroup_search algorithm. You’re most welcome to help with the review!
• I worked on several applications of subgroup_search() and the incremental Schreier-Sims algorithm. Namely, the pointwise stabilizer of a set of points (via the incremental Schreier-Sims algorithm):
In [4]: from sympy.combinatorics.named_groups import *
In [5]: A = AlternatingGroup(9)
In [6]: G = A.pointwise_stabilizer([2, 3, 5])
In [7]: G == A.stabilizer(2).stabilizer(3).stabilizer(5)
Out[7]: True


(this is much faster than the naive implementation using .stabilizer() repeatedly), and the centralizer of a group $H$ inside a group $G$:

In [11]: from sympy.combinatorics.named_groups import *
In [12]: S = SymmetricGroup(6)
In [13]: A = AlternatingGroup(6)
In [14]: C = CyclicGroup(6)
In [15]: S_els = list(S.generate())
In [16]: G = S.centralizer(A)
In [17]: G.order()
Out[17]: 1
In [18]: temp = [[el*gen for gen in A.generators] == [gen*el for gen in A.generators] for el in S_els]
In [19]: temp.count(False)
Out[19]: 719
In [20]: temp.count(True)
Out[20]: 1
In [21]: G = S.centralizer(C)
In [22]: G == C
Out[22]: True
In [23]: temp = [[el*gen for gen in C.generators] == [gen*el for gen in C.generators] for el in S_els]
In [24]: temp.count(True)
Out[24]: 6


(it takes some effort to see that these calculations indeed prove that .centralizer() returned the needed centralizer). The centralizer algorithm uses a pruning criterion described in [1], and even though it’s exponential in complexity, it’s fast for practical purposes. Both of the above functions are available (albeit not documented yet) on my week10 branch.

• The next steps are an algorithm for the centre in polynomial time, and an algorithm to find the intersection of two subgroups! And after that, I hope to be able to implement the Todd-Coxeter algorithm…

That’s it for now!

[1] Derek F. Holt, Bettina Eick, Bettina, Eamonn A. O’Brien, “Handbook of computational group theory”, Discrete Mathematics and its Applications (Boca Raton). Chapman & Hall/CRC, Boca Raton, FL, 2005. ISBN 1-58488-372-3

August 05, 2012

Sergiu Ivanov

The Embedding

This week I have started some actual code of the derivation of commutativity of diagrams and implications. The first half of the week has gone to splitting Diagram into Diagram and Implication, as outlined in the previous post. Nothing really unexpected happened during that part, so there isn’t much to say about it, save for the thing that the code has become clearer and better organised. Furthermore, I have gained a better understanding of some corner cases, as well as implemented more robust handling for those corner cases.

The second half of the week was considerably more exciting and thought intensive: it was related to finding diagram embeddings. As it should be clear from the last post, this functionality lies at the foundation of deciding the commutativity of diagrams and implications. In what follows, I will refer to the diagram which we need to embed as to the pattern, and to the diagram into which we need to embed as to the model. This seems to be an almost universally accepted terminology and comes from the fact that finding subgraph isomorphisms often lies at the base of various pattern matching implementations.

I have started by selecting and analysing the excellent paper by J. R. Ullman, [Ullm1976], which describes a very clear way of enumerating all possible graph embeddings. This solution, however, was not exactly what I needed. First of all, the algorithm described in details in [Ullm1976] is actually meant for undirected graphs, whereas one can clearly see arrows in diagrams. Furthermore (a thought that has occurred to me quite late), diagrams, are actually multigraphs, in the sense that there can be more than one morphism between two objects. Yet further, a diagram embedding must preserve morphism properties, in the sense that the embedding must map a morphism in the pattern to a morphism in the model, which has exactly the same properties as the morphism in the pattern.

I attempted to find whether someone has addressed the directed multigraph embedding problem before; however, I haven’t managed to find any references on the Internet, so I started thinking on adapting Ullman’s solution to my case. The first thing I figured out was that I could reduce the directed multigraph embedding problem to a directed graph embedding problem. Indeed, take a diagram and flatten down all multiple morphisms going in the same direction between the same to objects to one directed edge between these two objects. Then construct directed graph embeddings and, for each such embeddings, for each directed edge $e$ of the flattened pattern, construct injective, property-preserving, mappings from the set of morphisms of the pattern, which were flattened to $e$, into the set of morphisms associated with the edge in the flattened model, to which $e$ is mapped by the subgraph isomorphism. (These mappings are actually property-preserving embeddings in their own right, but I won’t call them so, since I’m good and I understand that the blog post has just become a bit unclear, so to say )

Let’s see an example. Consider the diagram comprising two different morphisms: $f:A\rightarrow B$ and $g:A\rightarrow B$, where $f$ has the property golden; this diagram is going to be out pattern. Now, consider the model: a diagram comprising three morphisms $\alpha:C\rightarrow D$, $\beta:C\rightarrow D$, and $\gamma:C\rightarrow D$, in which $\beta$ has the property golden. Quite obviously, all of our property-preserving embeddings should map $f$ to $\beta$, while $\latex g$ can be mapped to either $\alpha$ or $\beta$. Also note that the flattened pattern in this case is the graph consisting of a single edge $(A,B)$, while the flattened model is another one-edge graph, $(C,D)$. More complex diagrams are treated in a similar fashion: flatten the pattern and the model to directed graphs, find directed graph embeddings, and then find the property-preserving morphism mappings.

There was another slight surprise underway, however. Ullman does describe some of the modifications which will make the original algorithm capable of constructing directed graph embeddings, however, he has apparently forgot to describe one of them. I will give some definitions before going into more detail. Ullman uses $A$ to refer to the adjacency matrix of the pattern, $B$ to refer to the adjacency matrix of the model, and $M$ to refer to the matrix representing a mapping of the vertices of the pattern into the vertices of the model; $M_{ij} = 1$ means that vertex $i$ in the pattern is mapped to vertex $j$ in the model.

Now, for given $A$, $B$, and $M$, compute $C = M (M B)^T$. Condition (1) in [Ullm1976] states that, if $a_{ij} = 1\Rightarrow c_{ij} = 1$, for any vertices $i$ and $j$ of the patern, then $M$ represents an embedding. (As usual, $a_{ij}$ are elements of $A$ and $c_{ij}$ are elements of $C$). When I tried to actually use this criterion for a directed graph, I found that, apparently, $C^T$ should be used, instead of $C$. The formal explanation follows. By abuse of terminology, I will use “pattern” and “model” to refer to the flattened pattern and flattened model as well.

Let $D = M B$. $d_{ij} = 1$ means that $\exists! k . (m_{ik} = 1 \mbox{ and } b_{kj} = 1)$, where $k$ is a vertex of the model. In other words, this means that the vertex $i$ of the pattern is mapped to a unique vertex $k$ of the model, such that in the model there exists the (directed) edge $(k, j)$. Obviously, if $d^T_{ij}$ is an element of $D^T$, the role of the indices is reversed, that is: the vertex $j$ of the pattern is mapped to a unique vertex $k$ of the model, such that in the model there exists the (directed) edge $(k, i)$.

Now, $c_{ij} = 1$ means that $\exists! t . (m_{it} = 1 \mbox{ and } d^T_{tj} = 1)$. Deciphering the meanings of the values of the elements of these matrices, this means that the vertex $i$ of the pattern is mapped to a vertex $t$ of the model, vertex $j$ of the pattern is mapped to a vertex $k$ of the model, such that in the model there is the edge $(k, t)$. Now, suppose there is an edge $(i, j)$ in the pattern. $c_{ij} = 1$ means $M$ maps $i$ to $t$ and $j$ to $k$, such that the model contains the edge $(k, t)$. That is, the condition (1) as stated in [Ullm1976] and applied to directed graphs checks that $M$ actually reverses the direction of edges! Therefore, one must actually use $C^T$ to check for embeddings.

Now, since the original algorithm in [Ullm1976] was designed for undirected graphs, this extra transposition did not matter, and I think this is the reason why Ullman does not mention it.

I have implemented all the things I have described so far, so diagram embeddings kinda work I have played with python generators a bit, so the code only produces embeddings on the as-needed basis. I did that because I thought of the situation when any diagram embedding will suffice, but also because using generators has resulted in what I believe to be more elegant code. The code abounds in comments, so I think it shouldn’t be a problem to comprehend for someone different from myself. I don’t have a formal proof for this statement, however, so, I guess, Tom is going to be the test subject for this supposition ^_^

There are still a couple things to do, though. First of all Ullman shows a nice optimisation to his algorithm; it looks pretty simple, so I’ll add it. I will then write a couple more tests, including some crash tests involving complete graphs. I will also have to rename the function which does all this magic from subdiagram_embeddings to diagram_embeddings, for obvious (I hope ) reasons.

References

[Ullm1976] J. R. Ullman, An Algorithm for Subgraph Isomorphism, J. Association of Computing Machinery, March, 1976, 16, 31–42.

GSOC week 11

I got my adaptive sampling branch merged last week. Now the plots are sampled adaptively and is more accurate. I also added a lot of tests to the implicit plotting branch and the coverage now is greater than 90%.

One of the major things decided in the previous week was to restructure the plot function. Presently plot is a single function, which depending on its input, renders an 2d or an 3d plot. Though it plots the right kind of plot, the plot function is quite complex and it was decided to split the plot function into smaller functions that plots a particular type of plot. I tried an approach where all 2D plots are plotted by a plot2d function, the 3D plots by plot3D and the existing plot_implicit plots to plot regions and implicit equations. Aaron mentioned that the API is still very complex as I was using tuples and lists to differentiate between a parametric plot and a 2d line plot and he was right. It is a bit complex and it was decided to have a functions for each kind of plot.

I think i can have the new plot functions as an PR by next week and I would like to try getting a Mayavi backend ready by the end of my GSoC period.

I forgot to mention why I deviated from my what I said I would do in my GSoC application. I tried getting a svgfig backend ready for almost one and a half week, and it was quite difficult. svgfig is not being updated and I had a hard time getting the axis ticks labelling to work most of the time. I wrote to the project maintainer many times and he helped me with a lot of things, but the library was not polished enough to use it with Sympy Live. So plotting for SymPy Live should be attempted with a javascript plotting library rather than a python backend. If we get matplotlib support on GAE, then it would be awesome.

GSOC week 11

I got my adaptive sampling branch merged last week. Now the plots are sampled adaptively and is more accurate. I also added a lot of tests to the implicit plotting branch and the coverage now is greater than 90%.

One of the major things decided in the previous week was to restructure the plot function. Presently plot is a single function, which depending on its input, renders an 2d or an 3d plot. Though it plots the right kind of plot, the plot function is quite complex and it was decided to split the plot function into smaller functions that plots a particular type of plot. I tried an approach where all 2D plots are plotted by a plot2d function, the 3D plots by plot3D and the existing plot_implicit plots to plot regions and implicit equations. Aaron mentioned that the API is still very complex as I was using tuples and lists to differentiate between a parametric plot and a 2d line plot and he was right. It is a bit complex and it was decided to have a functions for each kind of plot.

I think i can have the new plot functions as an PR by next week and I would like to try getting a Mayavi backend ready by the end of my GSoC period.

I forgot to mention why I deviated from my what I said I would do in my GSoC application. I tried getting a svgfig backend ready for almost one and a half week, and it was quite difficult. svgfig is not being updated and I had a hard time getting the axis ticks labelling to work most of the time. I wrote to the project maintainer many times and he helped me with a lot of things, but the library was not polished enough to use it with Sympy Live. So plotting for SymPy Live should be attempted with a javascript plotting library rather than a python backend. If we get matplotlib support on GAE, then it would be awesome.

GSOC week 11

I got my adaptive sampling branch merged last week. Now the plots are sampled adaptively and is more accurate. I also added a lot of tests to the implicit plotting branch and the coverage now is greater than 90%.

One of the major things decided in the previous week was to restructure the plot function. Presently plot is a single function, which depending on its input, renders an 2d or an 3d plot. Though it plots the right kind of plot, the plot function is quite complex and it was decided to split the plot function into smaller functions that plots a particular type of plot. I tried an approach where all 2D plots are plotted by a plot2d function, the 3D plots by plot3D and the existing plot_implicit plots to plot regions and implicit equations. Aaron mentioned that the API is still very complex as I was using tuples and lists to differentiate between a parametric plot and a 2d line plot and he was right. It is a bit complex and it was decided to have a functions for each kind of plot.

I think i can have the new plot functions as an PR by next week and I would like to try getting a Mayavi backend ready by the end of my GSoC period.

I forgot to mention why I deviated from my what I said I would do in my GSoC application. I tried getting a svgfig backend ready for almost one and a half week, and it was quite difficult. svgfig is not being updated and I had a hard time getting the axis ticks labelling to work most of the time. I wrote to the project maintainer many times and he helped me with a lot of things, but the library was not polished enough to use it with Sympy Live. So plotting for SymPy Live should be attempted with a javascript plotting library rather than a python backend. If we get matplotlib support on GAE, then it would be awesome.

August 04, 2012

Angadh Nanjangud

GSoC Blog Post

So, in last week’s post I had said that the Lagrange class had neared completion. By mid-week I had it functional, so I opened a discussion on the mailing list asking for suggestions to improve the class. Several people in the group suggested that it’d be better to supply all the parameters on initialization. At first I was loath to make this change. This was for the following few reasons-

1. My primary concern was that I felt it was taking away from the clarity in generating the equations of motion. I have spoken about that in some detail in the conversation on the group page about how I feel dynamicists would probably prefer typing an extra couple lines as long as the procedure to obtain the equations of motion feel similar to the work done by hand. I’m a a fan of the (possibly false) sense of security that comes from bookkeeping even whilst on a computer. Of course, I had to accede that it made no sense to have so many methods when everything could be passed at initialization. The argument that users would essentially have to memorize a sequence or procedure was a good one.
2. The other thing was that I felt was that this was making the structure of the ‘Lagrange’ class tremendously different from the ‘Kane’ class. But after discussions with Luke, Jason and Gilbert it became pretty apparent that even they felt that it might not have been the best way to implement that class. At one point I too felt like it was little weird that I was creating so many little methods which could’ve been merged into the initialization of the class itself. But I was just concentrating on staying true to the current structure of things in the mechanics package.
3. And the final reason for my apprehension was that I would have to revamp the whole method after having spent quite some time on it.

But it was very clear after a point that things had to change. So, I spent a good chunk of time making the required changes.

Also, we decided on, what I feel is, a more appropriate name for the class changing it from ‘Lagrange’ to ‘LagrangesMethod’. Most of the equation derivation techniques in dynamics have the ‘Method’ attached to the founder’s name. It made even more sense for this class because Lagrange’s contributions are numerous so just calling a class ‘Lagrange’ could lead to some ambiguity.

Rewriting the class also helped me hone my Python skills some more. I had come across the keyword arguments construct several times in my preparation for the summer of code but I was a little reluctant to use it. It was probably because it was something that felt so alien to me as I have never seen something like that in my fledgling programming career. But with great explanations on the groups message (linked above), things were made clearer as to how it should be done.

So having rewritten the class, I added the docstrings for it. I’m not too pleased with that part right now, but I’m confident it will get better with more input on the PR discussion.

So having opened that PR, I thought I would get back to working on the documentation as I had planned. But I ended up going off on a tangent with the discussion that was sparked on PR 1407 which is the stuff that I have added on the energy functions. I spent a good chunk of time going through that and almost completely changing the way the ‘kinetic_energy’ function works.

On that same PR, there was a discussion about how a more readable error should be generated if a user calls the ‘potential_energy’ property for either a Particle or RigidBody without having first set the potential energy using the ‘set_potential_energy’ method. What at the time seemed an innocuous thing to repair became a little interesting challenge for me. Without going into too many more details, I was pleased to have found a relatively simple fix with the ‘callable’ function in Python with the help of the online forum ‘Stack Overflow’.

Having handled most of the recommendations on PR1407, I decided to skip on the documentation for the time being and returned to “complete” what would be the most important part relatied to the “LagrangesMethod” class- the test! While writing the class, I had written a little dummytest to check for the little tihngs but I hadn’t subjected a real dynamical system to the ultimate test (pun intended). I decided to test the well known ‘disc rolling down an inclined plane’ problem. Not to generate any suspense, but I would like to point out that in my proposal I had said that I would only concentrate on unconstrained systems. But Gilbert and I spent a little more time to make the ‘LagrangesMethod’ class more useful and complete. The class should now be able to handle any system i.e. constrained or unconstrained. A lot of the credit goes to Gilbert for helping me through the numerous confusions I had with the implementation of the constrained systems. But back to the test. I picked that system because it has a configuration constraint and we handle configuration constraints a little unconventionally in this class. I was a little anxious about how the results for this would turn out but ti worked like a charm. With the one test that I have written, which I think is a pretty good system to test, it appears that the ‘LagrangesMethod’ class works like a charm.

Anyhow, it’s now time to get some shuteye and more importantly rest the leg as I have been a little cavalier with it in the last couple of days. Until next week.

August 03, 2012

Guru Devanla

Week 11: Fidelity of quantum states

As discussed in my previous weeks report, this week I started addressing some of the issues in the pending PR’s and all the pending PRs were merged!

In addition to getting the pending PRs accepted, I have stated working on 2 other tasks. One of them, that was completed is the feature to compute Fidelity (PR 1459) of quantum states.  Some screen-shots follow (taken from notebook examples available with code base). Information regarding fidelity can be found at these 2 wikis: 1, 2

For the next week, my plan is to finish up work on the implementation of Shor’s algorithm available here.

Here are some examples of using the fidelity function: