June 27, 2017

Hello, this post contains the fourth report of my GSoC progress. Though I had planned on writing a new blog post every Monday, seems like I’m a bit late on that account.

Report

SymEngine

Last week I had mentioned that I’d be finishing up my work on Range set, however, after a talk with Isuru, it was decided to schedule it for the later part of the timeline. Instead I finished up on simplification of Add objects in Sign class through the PR #1297.

Also, Isuru suggested implementing parser support for Relationals, which was required for PyDy. The work is currently in progress, through PR #1298, after we hit an issue with dual-character operators (<= and >=), but we’re working on it.

I sent in another PR #1302 removing some redundant includes.

From now my work with SymEngine will probably be a little slower than usual, as I’ve started off wrapping classes in SymEngine.py, which is supposed to continue for some time from now.

SymEngine.py

I pushed in #162 wrapping off a huge portion of the functions module in SymEngine.

I’ll be working on wrapping Logic classes and functions in the coming week, as well as finishing off my work on the parser support in SymEngine.

See you again!

Vale

June 26, 2017

Continued work on tutorial material

During the past week I got binder to work with our tutorial material. Using environment.yml did require that we had a conda package available. So I set up our instance of Drone IO to push to the sympy conda channel. The new base dockerimage used in the beta version of binder (v2) does not contain gcc by default. This prompted me to add gcc to our environment.yml, unfortunately this breaks the build process:

Attempting to roll back.
LinkError: post-link script failed for package defaults::gcc-4.8.5-7
running your command again with `-v` will provide additional information
location of failed script: /opt/conda/bin/.gcc-post-link.sh
==> script messages <==
<None>

I've reached out on their gitter channel, we'll see if anyone knows what's up. If we cannot work around this we will have two options:

  1. Accept that the binder version cannot compile any generated code
  2. Try to use binder with a Dockerimage based on the one used for CI tests (with the difference that it will include a prebuilt conda environment from environment.yml)

I hope that the second approach should work unless we hit a image size limitation (and perhaps we need a special base image, I haven't looked into those details yet).

Jason has been producing some really high quality tutorial notebooks and left lots of constructive feedback on my initial work. Based on his feedback I've started reworking the code-generation examples not to use classes as extensively. I've also added some introductory notebooks: numerical integration of ODEs in general & intro to SymPy's cryptically named lambdify (the latter notebook is still to be merged into master).

Work on SymPy for version 1.1

After last weeks mentor meeting we decided that we would try to revert a change to sympy.cse (a function which performs common subexpression elimination). I did spend some time profiling the new implementation. However, I was not able to find any obvious bottle-necks, and given that it is not the main scope of my project I did not pursue this any further at the moment.

I also just started looking at introducing a Python code printer. There is a PythonPrinter in SymPy right now, although it assumes that SymPy is available. The plan right now is to rename to old printer to SymPyCodePrinter and have the new printer primarily print expressions, which is what the CodePrinter class does best, even though it can be "coerced" into emitting statements.

Plans for the upcoming week

As the conference approaches the work intensifies both with respect to finishing up the tutorial material and fixing blocking issues for a SymPy 1.1 release. Working on the tutorial material helps finding things to improve code-generation wise (just today I opened a minor issue just to realize that my "work-in-progress" pull-request from an earlier week (which I intend to get back working on next week too) actually fixes said issue.

June 25, 2017

Risch’s structure theorem’s

For the next couple of weeks we will be moving to understand and implement the Risch’s structure theorem and applications that we will make use of in gsoc project. One of the application is that of “Simplification of real elementary functions”, a paper by Manuel Bronstein (in 1989)[1]. This paper by Bronstein uses Risch’s real structure theorem, the paper determines explicitly all algebraic relations among a set of real elementary functions.

Simplification of Real Elementary Functions

Risch in 1979 gave a theorem and an algorithm that found explicitly all algebraic relations among a set of only ’s and ’s. And to apply this algorithm required to convert trigonometric functions to ’s and ’s.

Consider the integrand . For this first we make use of form of , i.e . Substituting it in the original expression we get . Now using the exponential form of we get . And substituting in place of we get the final expression. , which is a complex algebraic function.

All of this could be avoided since the original function is a real algebraic function. The part that could be a problem to see it is , which can be seen to satisfy the algebraic equation using the formula for . Bronstein discusses the algorithm that wouldn’t require the use of complex ’s and ’s.

A function is called real elementary over a differential field , if for a differential extension () of . If either is algebraic over or it is , , , of an element of (consider this recursively). For example: is real elementary over , so is and .

Point to mention here is, we have to explicitly include , , since we don’t want the function to be a complex function by re-writing as , form( and can be written in form of complex and respectively), and we can write other trigonometric functions have real elementary relations with and . Alone or alone can’t do the job.

This way we can form the definition of a real-elementary extension of a differential field .

Functions currently using approach of structure theorems in SymPy

Moving on, let us now look at the three functions in SymPy that use the structure theorem “approach”:

  1. is_log_deriv_k_t_radical(fa, fd, DE): Checks if is the logarithmic derivative of a k(t)-radical. Mathematically where , . In naive terms if . Here k(t)-radical means n-th roots of element of . Used in process of calculating DifferentialExtension of an object where ‘case’ is ‘exp’.

  2. is_log_deriv_k_t_radical_in_field(fa, fd, DE): Checks if is the logarithmic derivative of a k(t)-radical. Mathematically where , . It may seem like it is just the same as above with f given as input instead of having to calculate , but the “in_field” part in function name is important.

  3. is_deriv_k: Checks if is the derivative of a k(t), i.e. where .

What have I done this week?

Moving onto what I have been doing for the last few days (at a very slow pace) went through a debugger for understanding the working of DifferentialExtension(exp(x**2/2) + exp(x**2), x) in which integer_powers function is currently used to determine the relation and , instead of and , since we can’t handle algebraic extensions currently (that will hopefully come later in my gsoc project). Similar example for is there in the book , though the difference is it uses the is_deriv_k (for case == 'primitive', we have is_log_deriv_k_t_radical for case == ‘exp’) to reach the conclusion that , and .

I still have to understand the structure theorem, for what they are? and how exactly they are used. According to Aaron, Kalevi, I should start reading the source of is_deriv_k, is_log_deriv_k_t_radical and parametric_log_deriv functions in prde.py file.

We worked on #12743 Liouvillian case for Parametric Risch Diff Eq. this week, it handles liouvillian cancellation cases. It enables us to handle integrands like .

>>> risch_integrate(log(x/exp(x) + 1), x)
(x*log(x*exp(-x) + 1) + NonElementaryIntegral((x**2 - x)/(x + exp(x)), x))

earlier it used to raise error prde_cancel() not implemented. After testing it a bit I realised that part of returned answer could be further integrated instead of being returned as a NonElementaryIntegral. Consider this

In [4]: risch_integrate(x + exp(x**2), x)
Out[4]: Integral(x + exp(x**2), x)

as can be easily seen it can be further integrated. So Aaron opened the issue #12779 Risch algorithm could split out more nonelementary cases.

Though I am not sure why Kalevi has still not merged the liouvillian PR (only a comment needs to be fixed n=2 -> n=5 in a comment), though that PR is not blocking me from doing further work.

Starting tomorrow (26th) we have the first evaluation of GSoC. Anyways I don’t like the idea of having 3 evaluations.

TODO for the next week:

Figuring out and implementing the structure theorems.

References

  • {1}. Simplification of real elementary functions, http://dl.acm.org/citation.cfm?id=74566

At the beginning of the week, all of my previous PRs got merged. This is good. Also, the homomorphism class is now mostly written with the exception of the kernel computation. The kernel is the only tricky bit. I don’t think there is any general method of computing it, but there is a way for finite groups. Suppose we have a homomorphism phi from a finite group G. If g is an element of G and phi.invert(h) returns an element of the preimage of h under phi, then g*phi.invert(phi(g))**-1 is an element of the kernel (note that this is not necessarily the identity as the preimage of phi(g) can contain other elements beside g). With this in mind, we could start by defining K to be the trivial subgroup of G. If K.order()*phi.image().order() == G.order(), then we are done. If not, generate some random elements until g*phi.invert(phi(g))**-1 is not the identity, and redefine K to be the subgroup generated by g*phi.invert(phi(g))**-1. Continue adding generators like this until K.order()*phi.image().order() == G.order() or finite G this will always terminate (or rather almost always considering that the elements are generated randomly - though it would only not terminate if at least one of the elements of G is never generated which is practically impossible).

This is how I am going to implement it. I haven’t already because I’ve been thinking about some of the details of the implementation. One thing that could be problematic is multiple calls to reidemeister_presentation() to compute the presentation of K at each step. As I’ve discovered when I was implementing the search for finite index subgroups last week, this can be very inefficient if the index of K is large (which it could well be for a small number of generators at the start). After giving it some thought, I realised that actually we could completely avoid finding the presentation because K’s coset table would be enough to calculate the order and check if an element belongs to it. Assuming G is finite, K.order() can be calculated as the order of G divided by the length of the coset table so the knowledge of the generators is enough. And for membership testing, all that’s necessary is to check if a given element stabilises K with the respect to the action on the cosets described by the coset table. And that should be a huge improvement over the obvious calls to the subgroup() method.

Actually, FpGroups doesn’t currently have a __contains__() method so one can’t check if a word w is in a group G using w in G. This is easy to correct. What I was wondering was if we wanted to be able to check if words over G belong to a subgroup of G created with the subgroup() method - that wouldn’t be possible directly because subgroup() returns a group on a different set of generators but it wouldn’t be too unreasonable to have SymPy do this:

>>> F, a, b = free_group('a, b')
>>> G = FpGroup(F, [a**3, b**2, (a*b)**2])
>>> K = G.subgroup([a])
>>> a**2 in K
True

I asked Kalevi about it earlier today and they said that it would be preferable to treat K in this case as a different group that happens to be linked to G through an injective homomorphism (essentially the canonical inclusion map). If we call this homomorphism phi, then the user can check if an element of G belongs to the subgroup represented by K like so: a**2 in phi.image(). Here phi.image() wouldn’t be an instance of FpGroup but rather of a new class FpSubgroup that I wrote today - it is a way to represent a subgroup of a group while keeping the same generators as in the original group. It’s only attributes are generators, parent (the original group) and C (the coset table of the original group by the subgroup) and it has a __contains__, an order() and a to_FpGroup() methods (the names are self-explanatory). For finite parent groups, the order is calculated as I described above for the kernel and for the infinite ones redeimeister_presentation() has to be called. The injective homomorphism from an instance of FpGroup returned by subgroup() would need to be worked out during the running of reidemeister_presentation() when the schreier generators are defined - this is still to be done.

Another thing I implemented this week was an improved method for generating random elements of finitely presented groups. However, when I tried it out to see how random the elements looked, I got huge words even for small groups so I didn’t send a PR with it yet. Once I implement rewriting systems, these words could be reduced to much shorter ones. Speaking of rewriting systems, I think it could be good if the rewriting was applied automatically much like x*x**-1 is removed for FreeGroupElements. Though I suppose this could be too inefficient some times - this would need testing. This is what I’ll be working on this week.

June 24, 2017

Note : If you’re viewing this on Planet SymPy and Latex looks weird, go to the WordPress site instead.

Prof. Sukumar came up with the following optimization idea :
Consider equation 10 in Chin et. al . The integral of the inner product of the gradient of f(x) and {x}_{0} need not be always re-calculated. This is because it might have been calculated before. Hence, the given polynomial can be broken up into a list of monomials with increasing degree. Over a certain facet, the integrals of the list of monomials can be calculated and stored for later use. Before calculation of a certain monomial we need to see if it’s gradient has  already calculated and hence available for re-use.
Then again, there’s another way to implement this idea of re-use. Given the degree of the input polynomial we know all the types of monomials which can possibly exist. For example, for max_degree = 2 the monomials are : [1, x, y, x^2, xy, y^2].
All the monomial integrals over all facets can be calculated and stored in a list of lists. This would be very useful for one specific use-case, that is when there are many different polynomials with max degree less than or equal to a certain global maximum to be integrated over a given polytope.

I’m still working on implementing the first one. That’s cause the worst case time(when no or very few dictionary terms are re-used) turns out to much more expensive than the straightforward method. Maybe computing the standard deviation among monomial degrees would be a good way to understand which technique to use. Here is an example to show what I mean. If we have a monomial list : [x^3, y^3, 3x^2y, 3xy^2] , then the dynamic programming technique becomes useless here since gradient of any term will have degree 2 and not belong in the list. Now, the corresponding list of degrees is : [3, 3, 3, 3] , and standard deviation is zero. Hence the normal technique is applicable here.

In reality, I’ll probably have to use another measure to make the judgement of dynamic vs. normal. Currently, I’m trying to fix a bug in the implementation of the first technique. After that, I’ll try out the second one.


I ran the test suit for algebraic rules 1.2. Here is the report:

Total number of tests: 1480
Passed: 130
Failed: 1350
Time taken to run the tests: 4967.428s

Very few test passed since many rules use ExpandIntegrand[ ] function to evaluate its integral. There are 37 definitions of ExpandIntegrand[ ](for different expressions). I am trying to implement few definitions which can are used in algebraic rules. The problem is that ExpandIntegrand[ ] is defined after 50% of all the rules in the list. We have only been able to complete 10% of utility functions in SymPy. Hence, it is tricky to implement ExpandIntegrand[ ] when it depends on functions which we haven’t implemented.

A good thing is that MatchPy was able to match almost all the patterns. Manuel said that he is going to work on incorporating optional arguments in MatchPy next week. I will rework on Rubi parser to incorporate his changes.

I will complete my work on ExpandIntegrand[ ] and also continue my work in implementing remaining utility functions with Abdullah.

June 23, 2017

Hello all!

After adding Algebraic\Linear Product rules and tests Arihant and I are working on implementations of Utility Functions, there is a huge set Utility Functions that needs to be implemented before we proceed ahead with next set of rules. Once we are done with the entire set of utility functions implementing rules and adding tests would be an easy task. Understanding the Mathematica codes and analyzing its dependent functions and converting it into Python syntax is a major task.

We started with implementation of only those functions which were necessary to support Algebraic rules\Linear Products but because they were dependent on the previous functions we had to start off from the very beginning of the Pdf. So far we have implemented more that 100 utility functions. Our priority is to implement all Utility functions capable to support Algebraic Rules as soon as possible.


June 20, 2017

In the last week I focused mainly on finishing tasks related Lame coefficients. During this time two PR were merged. In my previous post I described how we can calculate gradient, curl and divergence in different type of coordinate system, so now I described only new thing which was already add to mainline. We decided to remove dependency between Del class and CoordSysCartesian. From mathematical point of view it makes sense, because nabla operator is just an entity which acts on vector or scalar and his behavior is independent from coordinate system.

This week, I tried to get the earlier PR’s #1291 and #1293 merged in. But, Unfortunately there were several mistakes in the earlier implementation of simplifications in ConditionSet. Thanks to isuruf, I was able to correct them and finally it got merged in. #1293 PR on ImageSet is also complete but not yet merged.

Along side, I started working on implementing lower order polynomial solvers. This work is done here in #1296.

Short description of this PR:

  • Basic module for sovlers is up and running.
  • adds solvers for polynomials with degree <= 4.
  • integrates Flint’s wrappers for factorisation into solvers.
  • Fixes a bug in Forward Iterator. credits : Srajangarg

This PR is still a WIP, as it doesn’t handle polynomials with symbolic coefficients.

More on Solvers coming soon !! Until then Stay tuned !!

June 19, 2017

Fast callbacks from SymPy using SymEngine

My main focus the past week has been to get Lambdify in SymEngine to work with multiple output parameters. Last year Isuru Fernando lead the development to support jit-compiled callbacks using LLVM in SymEngine. I started work on leveraging this in the Python wrappers of SymEngine but my work stalled due to time constraints.

But since it is very much related to code generation in SymPy I did put it into my time-line (later in the summer) in my GSoC application. But with the upcoming SciPy conference, and the fact that it would make a nice addition to our tutorial, I have put in work to get this done earlier than first planned.

Another thing on my to-do-list from last week was to get numba working with lambdify. But for this to work we need to wait for a new upstream release of numba (which they are hoping to release before the SciPy conference).

Status of codegen-tutorial material

I have not added any new tutorial material this week, but have been working on making all notebooks work under all targeted operating systems. However, every change to the notebooks have to be checked on all operating systems using both Python 2 and Python 3. This becomes tedious very quickly so I decided to enable continuous integration on our repository. I followed conda-forges approach: Travis CI for OS X, CircleCI for Linux and AppVeyor for Windows (and a private CI server for another Linux setup). And last night I finally got green light on all 4 of our CI services.

Plans for the upcoming week

We have had a performance regression in sympy.cse which has bit me multiple times this week. I managed to craft a small test case indicating that the algorithmic complexity of the new function is considerably worse than before (effectively making it useless for many applications). In my weekly mentor-meeting (with Aaron) we discussed possibly reverting that change. I will first try to see if I can identify easy-to-fix bottlenecks by profiling the code. But the risk is that it is too much work to be done before the upcoming new release of SymPy, and then we will simply revert for now (choosing speed over extensiveness of the sub-expression elimination).

I still need to test the notebooks using not only msvc under Windows (which is currently used in the AppVeyor tests), but also mingw. I did manage to get it working locally but there is still some effort left in order to make this work on AppVeyor. It's extra tricky since there is a bug in distutils in Python 3 which causes the detection of mingw to fail. So we need to either:

  • Patch cygwincompiler.py in distutils (which I believe we can do if we create a conda package for our tutorial material).
  • ...or use something else than pyximport (I'm hesitant to do this before the conference).
  • ...or provide a gcc executable (not a .bat file) that simply spawns gcc.bat (but that executable would need to be compiled during build of our conda package).

Based on my work on making the CI services work, we will need to provide test scripts for the participants to run. We need to provide the organizers with these scripts by June 27th so this needs to be decided upon during next week. I am leaning towards providing an environment.yml file together with a simple instruction of activating said environment, e.g.:

$ conda env create -f environment.yml
$ source activate codegen17
$ python -c "import scipy2017codegen as cg; cg.test()"

This could even be tested on our CI services.

I also intend to add a (perhaps final) tutorial notebook for chemical kinetics where we also consider diffusion. We will solve the PDE using the method of lines. The addition of a spatial dimension in this way is simple in principle, things do tend to become tricky when handling boundary conditions though. I will try to use the simplest possible treatment in order to avoid taking focus from what we are teaching (code-generation).

It is also my hope that this combined diffusion-reaction model is a good candidate for ufuncify from sympy.utilities.autowrap.

This blog is now deployed to GitHub pages automatically from Travis CI.

As I've outlined in the past, is built with the Nikola static blogging engine. I really like Nikola because it uses Python, has lots of nice extensions, and is sanely licensed.

Most importantly, it is a static site generator, meaning I write my posts in Markdown, and Nikola generates the site as static web content ("static" means no web server is required to run the site). This means that the site can be hosted for free on GitHub pages. This is how this site has been hosted since I started it. I have a GitHub repo for the site, and the content itself is deployed to the gh-pages branch of the repo. But until now, the deployment has happened only manually with the nikola github_deploy command.

A much better way is to deploy automatically using Travis CI. That way, I do not need to run any software on my computer to deploy the blog.

The steps outlined here will work for any static site generator. They assume you already have one set up and hosted on GitHub.

Step 1: Create a .travis.yml file

Create a .travis.yml file like the one below

sudo: false
language: python

python:
  - 3.6

install:
  - pip install "Nikola[extras]" doctr

script:
  - set -e
  - nikola build
  - doctr deploy . --built-docs output/
  • If you use a different static site generator, replace nikola with that site generator's command.
  • If you have Nikola configured to output to a different directory, or use a different static site generator, replace --built-docs output/ with the directory where the site is built.
  • Add any extra packages you need to build your site to the pip install command. For instance, I use the commonmark extension for Nikola, so I need to install commonmark.
  • The set -e line is important. It will prevent the blog from being deployed if the build fails.

Then go to https://travis-ci.org/profile/ and enable Travis for your blog repo.

Step 2: Run doctr

The key here is doctr, a tool I wrote with Gil Forsyth that makes deploying anything from Travis CI to GitHub Pages a breeze. It automatically handles creating and encrypting a deploy SSH key for GitHub, and the syncing of files to the gh-pages branch.

First install doctr. doctr requires Python 3.5+, so you'll need that. You can install it with conda:

conda install -c conda-forge doctr

or if you don't use conda, with pip

pip install doctr

Then run this command in your blog repo:

doctr configure

This will ask you for your GitHub username and password, and for the name of the repo you are deploying from and to (for instance, for my blog, I entered asmeurer/blog). The output will look something like this:

$ doctr configure
What is your GitHub username? asmeurer
Enter the GitHub password for asmeurer:
A two-factor authentication code is required: app
Authentication code: 911451
What repo do you want to build the docs for (org/reponame, like 'drdoctr/doctr')? asmeurer/blog
What repo do you want to deploy the docs to? [asmeurer/blog] asmeurer/blog
Generating public/private rsa key pair.
Your identification has been saved in github_deploy_key.
Your public key has been saved in github_deploy_key.pub.
The key fingerprint is:
SHA256:4cscEfJCy9DTUb3DnPNfvbBHod2bqH7LEqz4BvBEkqc doctr deploy key for asmeurer/blog
The key's randomart image is:
+---[RSA 4096]----+
|    ..+.oo..     |
|     *o*..  .    |
|      O.+  o o   |
|     E + o  B  . |
|      + S .  +o +|
|       = o o o.o+|
|        * . . =.=|
|       . o ..+ =.|
|        o..o+oo  |
+----[SHA256]-----+

The deploy key has been added for asmeurer/blog.

You can go to https://github.com/asmeurer/blog/settings/keys to revoke the deploy key.

================== You should now do the following ==================

1. Commit the file github_deploy_key.enc.

2. Add

    script:
      - set -e
      - # Command to build your docs
      - pip install doctr
      - doctr deploy <deploy_directory>

to the docs build of your .travis.yml.  The 'set -e' prevents doctr from
running when the docs build fails. Use the 'script' section so that if
doctr fails it causes the build to fail.

3. Put

    env:
      global:
        - secure: "Kf8DlqFuQz9ekJXpd3Q9sW5cs+CvaHpsXPSz0QmSZ01HlA4iOtdWVvUttDNb6VGyR6DcAkXlADRf/KzvAJvaqUVotETJ1LD2SegnPzgdz4t8zK21DhKt29PtqndeUocTBA6B3x6KnACdBx4enmZMTafTNRX82RMppwqxSMqO8mA="

in your .travis.yml.

Follow the steps at the end of the command:

  1. Commit the file github_deploy_key.enc.
  2. You already have doctr deploy in your .travis.yml from step 1 above.
  3. Add the env block to your .travis.yml. This will let Travis CI decrypt the SSH key used to deploy to gh-pages.

That's it

Doctr will now deploy your blog automatically. You may want to look at the Travis build to make sure everything works. Note that doctr only deploys from master by default (see below). You may also want to look at the other command line flags for doctr deploy, which let you do things such as to deploy to gh-pages for a different repo than the one your blog is hosted on.

I recommend these steps over the ones in the Nikola manual because doctr handles the SSH key generation for you, making things more secure. I also found that the nikola github_deploy command was doing too much, and doctr handles syncing the built pages already anyway. Using doctr is much simpler.

Extra stuff

Reverting a build

If a build goes wrong and you need to revert it, you'll need to use git to revert the commit on your gh-pages branch. Unfortunately, GitHub doesn't seem to have a way to revert commits in their web interface, so it has to be done from the command line.

Revoking the deploy key

To revoke the deploy key generated by doctr, go your repo in GitHub, click on "settings" and then "deploy keys". Do this if you decide to stop using this, or if you feel the key may have been compromised. If you do this, the deployment will stop until you run step 2 again to create a new key.

Building the blog from branches

You can also build your blog from branches, e.g., if you want to test things out without deploying to the final repo.

We will use the steps outlined here.

Replace the line

  - doctr deploy . --built-docs output/

in your .travis.yml with something like

  - if [[ "${TRAVIS_BRANCH}" == "master" ]]; then
      doctr deploy . --built-docs output/;
    else
      doctr deploy "branch-$TRAVIS_BRANCH" --built-docs output/ --no-require-master;
    fi

This will deploy your blog as normal from master, but from a branch it will deploy to the branch-<branchname> subdir. For instance, my blog is at http://www.asmeurer.com/blog/, and if I had a branch called test, it would deploy it to http://www.asmeurer.com/blog/branch-test/.

Note that it will not delete old branches for you from gh-pages. You'll need to do that manually once they are merged.

This only works for branches in the same repo. It is not possible to deploy from a branch from pull request from a fork, for security purposes.

Enable build cancellation in Travis

If you go the Travis page for your blog and choose "settings" from the hamburger menu, you can enable auto cancellation for branch builds. This will make it so that if you push many changes in succession, only the most recent one will get built. This makes the changes get built faster, and lets you revert mistakes or typos without them ever actually being deployed.

Hello, this post contains the third report of my GSoC progress. This week was mostly spent on learning the internal structures of SymEngine functionalities and methods on a deeper level.

Report

SymEngine

This week I worked on implementing Conjugate class and the related methods in SymEngine, through PR #1295.

I also worked on implementing the “fancy-set” Range, the code for which would be complete enough to be pushed sometime in the coming GSoC week.

Also, since it would probably be the last week that I’d be working on SymEngine, I spent some time going through the codebase and checking for discontinuities between SymEngine and SymPy’s implementations.

SymEngine.py

I pushed in #155 fixing a trivial change from _sympify to sympify in relevant cases throughout the SymEngine.py codebase. The PR is reviewed and merged.

I reached out to Isuru once again regarding further work to be undertaken for PyDy, and he suggested wrapping up Relationals from SymEngine. The work, which is pushed through #159, is in itself close to completion, with only specific parsing capabilities left to be implemented (for eg. x < y should return a LessThan(x, y) object).

Wrapping Relationals also marks the initiation of SymEngine.py’s side of Phase II, which predominantly focuses on bug-fixing and wrapping.

See you again!

Addio

June 18, 2017

Last week and some of this one I was working on changing the order() method of FpGroups. Currently SymPy attempts to perform coset enumeration on the trivial subgroup and, if it terminates, the order of the group is the length of the coset table. A somewhat better way, at least theoretically, is to try and find a subgroup of finite index and compute the order of this subgroup separately. The function I’ve implemented only looks for a finite index subgroup generated by a subset of the group’s generators with a pseudo-random element thrown in (this can sometimes give smaller index subgroups and make the computation faster). The PR is here.

The idea is to split the list of generators (with a random element) into two halves and try coset enumeration on one of the halves. To make sure this doesn’t go on for too long, it is necessary to limit the number of cosets that the coset enumeration algorithm is allowed to define. (Currently, the only way to set the maximum number of cosets is by changing the class variable CosetTable.coset_table_max_limit which is very large (4096000) by default - in the PR, I added a keyword argument to all functions relevant to coset enumeration so that the maximum can be set when calling the function.) If the coset enumeration fails (because the maximum number of cosets was exceeded), try the other half. If this doesn’t succeed, double the maximum number of cosets and try again. Once (if) a suitable subgroup is found, the order of the group is just the index times the order of the subgroup. The latter is computed in the same way by having order() call itself recursively.

The implementation wasn’t hard in itself but I did notice that finding the subgroup’s presentation was taking far too long in certain cases (specifically when the subgroup’s index wasn’t small enough) and spent a while trying to think of a way around it. I think that for cyclic subgroups, there is a way to calculate the order during coset enumeration without having to find the presentation explicitly but I couldn’t quite work out how to do that. Perhaps, I will eventually find a way and implement it. For now, I left it as it is. For large groups, coset enumeration will take a long time anyway and at least the new way will be faster in some cases and may also be able to tell if a group is infinite (while coset enumeration on the trivial subgroup wouldn’t terminate at all).

Now I am going to actually start working on homomorphisms which will be the main and largest part of my project. I’ll begin by writing the GroupHomomorphism class in this coming week. This won’t actually become a PR for a while but it is easier to do it first because I still have exams in the next several days. After that I’ll implement the Knuth-Bendix algorithm for rewriting systems (I might make a start on this later this week as I’ll have more time once the exams are over). Then I’ll send a PR with rewriting systems and once that’s merged, the GroupHomomorphism class one, because it would depend on rewriting systems.

June 17, 2017

Ondrej and Prof.Sukumar were quite busy this week, but eventually they reviewed both the notebook and code. Their review was quite insightful as it brought to surface one major optimization problem which I’ll discuss below. I also encountered a major issue with the already defined Polygon class.

Firstly, there were some minor issues :

  1. I had used python floating point numbers instead of SymPy’s exact representation in numerous places(both in the algorithm and the test file). So, that had to be changed first.
  2. The decompose() method discarded all constant terms in a polynomial. Now, the constant value is assigned a value of zero as key.
    Example:
    Before:

    decompose(x**2 + x + 2) = {1: x, 2: x**2}
    

    After:

    decompose(x**2 + x + 2) = {0: 2, 1: x, 2: x**2}
    
  3. Instead of computing component-wise and passing that value to integration_reduction(), the inner product is computed directly and then passed on. This leads to only one recursive call instead of two for 2D case and future three for 3D case.

Prof. Sukumar also suggested that I should add the option of hyperplane representation. This was simple to do as well. All I did was compute intersection of the hyperplanes(lines as of now) to get the vertices. In case of vertex representation the hyperplane parameters would have to be computed.

Major Issues:

  1. Another suggestion was to add the tests for 2D polytopes mentioned in the paper(Page 10). The two tests which failed were polytopes with intersecting sides. In fact, this was the error which I got :
sympy.geometry.exceptions.GeometryError: Polygon has intersecting sides.

It seems that the existing polygon class in SymPy does not account for polygons with intersecting sides. At first I thought maybe polygons are not supposed to have intersecting sides by geometrical definition but that was not true.  I’ll have to discuss how to circumvent this problem with my mentor.

2. Prof. Sukumar correctly questioned the use of best_origin(). As explained in an earlier post, best_origin() finds that point on the facet which will lead to an inner product with lesser degree. But, obviously there is an associated cost with computing that intersection point. So, I wrote some really basic code to test out the current best_origin() versus simply choosing the first vertex of the facet.


from __future__ import print_function, division
from time import time

import matplotlib.pyplot as plt

from sympy import sqrt

from sympy.core import S

from sympy.integrals.intpoly import (is_vertex, intersection, norm,
decompose, best_origin,
hyperplane_parameters,
integration_reduction, polytope_integrate,
polytope_integrate_simple)
from sympy.geometry.line import Segment2D
from sympy.geometry.polygon import Polygon
from sympy.geometry.point import Point
from sympy.abc import x, y

MAX_DEGREE = 10

def generate_polynomial(degree, max_diff):
    poly = 0
    if max_diff % 2 == 0:
    degree += 1
    for i in range((degree - max_diff)//2, (degree + max_diff)//2):
        if max_diff % 2 == 0:
            poly += x**i*y**(degree - i - 1) + y**i*x**(degree - i - 1)
        else:
            poly += x**i*y**(degree - i) + y**i*x**(degree - i)
    return poly

times = {}
times_simple = {}
for max_diff in range(1, 11):
    times[max_diff] = 0
for max_diff in range(1, 11):
    times_simple[max_diff] = 0

def test_timings(degree):
    hexagon = Polygon(Point(0, 0), Point(-sqrt(3) / 2, S(1) / 2),
                      Point(-sqrt(3) / 2, 3 / 2), Point(0, 2),
                      Point(sqrt(3) / 2, 3 / 2), Point(sqrt(3) / 2, S(1) / 2))
    square = Polygon(Point(-1,-1), Point(-1,1), Point(1,1), Point(1,-1))

    for max_diff in range(1, degree):
        poly = generate_polynomial(degree, max_diff)

    t1 = time()
    polytope_integrate(square, poly)
    times[max_diff] += time() - t1

    t2 = time()
    polytope_integrate_simple(square, poly)
    times_simple[max_diff] += time() - t2

    return times

for i in range(1, MAX_DEGREE + 2):
    test_timings(i)

plt.plot(list(times.keys()), list(times.values()), 'b-', label="Best origin")
plt.plot(list(times_simple.keys()), list(times_simple.values()), 'r-', label="First point")
plt.show()

The following figures are that of computation time vs. maximum difference in exponents of x and y. Blue line is when best_origin() is used. Red line is when simply the first vertex of the facet(line-segment) is selected.

Hexagon

figure_hexagon

hexagon

Square

figure_square.png

Square

When the polygon has a lot of facets which intersect axes thereby making it an obvious choice to select that intersection point as best origin, then the current best origin technique works better as in the case with the square where all four sides intersected the axes.
However, in case of the hexagon the best_origin technique would result in a better point than the first vertex only for one facet. The added computation time makes it more expensive than just selecting the first vertex. Of course, as the difference between exponents increase then the time taken by best_origin is overshadowed by another processes in the algorithm. I’ll need to look at the method again and see if there are preliminary checks which can be performed making computing intersections a last resort.


June 16, 2017

Our plan was to implement all Algebraic rules and complete the Rubi test suit for algebraic integration. However, there was a setback in our work because of a bug I found in ManyToOneReplacer of MatchPy. This bug prevents matching of expressions having many nested commutative expressions. Hence, we were not able to match all expressions in the test suit. Manuel Krebber have helped us a lot in adding features and giving suggestions to make MatchPy work for Rubi. He have fixed the bug today. I will resume my testing of algebraic rules asap.

Utility functions

Previously, our strategy was to implement only those functions which were used by algebraic rules. However, we found that those functions have dependencies on many other functions. So, we decided to implement the functions from the beginning itself, to avoid problems in the long run.

Mathematica has functionality of implementing function arguments as patterns:

SqrtNumberQ[m_^n_] :=
  IntegerQ[n] && SqrtNumberQ[m] || IntegerQ[n-1/2] && RationalQ[m]


SqrtNumberQ[u_*v_] :=
  SqrtNumberQ[u] && SqrtNumberQ[v]


SqrtNumberQ[u_] :=
  RationalQ[u] || u===I

In the above code SqrtNumberQ is defined multiple times for different function arguments. To implement these functions in Python, Francesco suggested that we test the type of argument using conditionals:

def SqrtNumberQ(expr):
    # SqrtNumberQ[u] returns True if u^2 is a rational number; else it returns False.
    if expr.is_Pow:
        m = expr.base
        n = expr.exp
        return IntegerQ(n) & SqrtNumberQ(m) | IntegerQ(n-1/2) & RationalQ(m)
    elif expr.is_Mul:
        return all(SqrtNumberQ(i) for i in expr.args)
    else:
        return RationalQ(expr) or expr == I

There was some problem while implementing Catch since SymPy doesn’t currently has Throw object:

MapAnd[f_,lst_] :=
  Catch[Scan[Function[If[f[#],Null,Throw[False]]],lst];True]

I used the following code to implement ManAnd in Python:

def MapAnd(f, l, x=None):
    # MapAnd[f,l] applies f to the elements of list l until False is returned; else returns True
    if x:
        for i in l:
            if f(i, x) == False:
                return False
        return True
    else:
        for i in l:
            if f(i) == False:
                return False
        return True

TODO

I and Abdullah have implemented ~40 utility functions so far. There are around ~40 more functions which needs to be implemented in order to support all algebraic rules. We should be able to complete those functions by tomorrow.

I will also resume adding tests to Rubi test suit for algebraic functions.

June 14, 2017

After the Kalevi’s comment

The param-rde branch is getting inconveniently big for hunting down and reviewing small changes. I think we should create another branch that would contain my original PR plus the added limited_integrate code (and optionally something else). Then that should be merged. Thereafter it would be easier to review new additions.

I decided to remove a few commits from top of param-rde branch and made the branch param-rde mergeable.

Yesterday Kalevi merged the pull request #11761, Parametric Risch differentia equation, there were quite a few problems(unrelated to my work) with travis but finally it passes the tests.

So till now these are pull requests that have been completed/started for my project:

  • Merged:

    param-rde #11761: This is the pull request that Kalevi made back in september 2016. I further made commits for limited_integration function, implementation of parametric risch differential equation, though not many tests were added in this, which should definitely be done. I haven’t been able to find tests that lead to non-cancellation cases (Kalevi mentions that we should be able to find them), so for the time being we decided to start the implementation of cancellation routines, particularly liouvillian cases (others being non-linear and hypertangent cases), there isn’t a good reason to implement the hypertangent cases right now.

  • Unmerged:

    param-rde_polymatrix this pull request is intended to use PolyMatrix instead of Matrix (it is MutableDenseMatrix), here is Kalevi’s comment regarding it: “It would also be possible to use from … import PolyMatrix as Matrix. That would hint that there might be a single matrix in the future.”. The reason for making is Matrix doesn’t play well with Poly(or related) elements.

  • Merged:

    Change printing of DifferentialExtension object, it wasn’t necessary to make this pull request, but it does help me in debugging the problems a little easier.

I was hoping that I would write a bit of mathematics in my blog posts, but unfortunately things I have dealt with till now required me to focus on programming API, introducing PolyMatrix so it deals well with the elements of Poly’s, but I am thinking this week I am going to deal with more of mathematics.

TODO for this week

I hope the next blog post is going to be a good mathematical one :)

June 13, 2017

My second week I spent on introducing Lame coefficients into CoordSysCartesian. Unfortunately, our work is restricted by SymPy structure so, we don’t have too much freedom in our implementation. Hopefully, with my mentor, Francesco, we found some solution how to achieve our goals without destroying vector module. This week shows that I have lack in my knowledge about object-oriented programming and SymPy. Having access to Lame coefficient I was able to modified Del operator (in mathematics nabla operators) to handle spherical and cylindrical coordinate system.

Last week’s PR #1281 on Complement, set_intersection and set_complement got merged in. This week, I implemented ConditionSet and ImageSet. This work is done in #1291 for ConditionSet and #1293 for ImageSet.

ConditionSet :
It is useful to represent unsolved equations or a partially solved equation.

class ConditionSet : public Set
{
private:
    vec_sym syms_;
    RCP<const Boolean> condition_;
...
}

Earlier, I used another data member for storing the base set. Thanks to isuruf, I was able to merge it within condition_.
For implementing contains method for ConditionSet, I added Subs Visitor for Contains and And.

ImageSet :
It is a Set representation of a mathematical function defined on symbols over some set. For example, x**2 for all x in [0,2] is represented as imageset({x},x**2,[0,2]).

When is ImageSet useful ?
Say, I need to solve a trignometric equation sin(x) = 1. Solution is 2*n*pi+pi/2, n belongs to [0,oo). for such solutions imageset is a useful representation to have.

I will try to get these two PRs merged in ASAP.
Next week, I will be working on implementing solvers for lower degree(<=4) polynomials.

See you next time. Bye for now !

June 12, 2017

Hello, this post contains the second report of my GSoC progress.

Report

SymEngine

This week I mostly worked on implementing specific classes in SymEngine, namely Sign, Float and Ceiling, through PRs #1287 and #1290. The work is currently under review, but again, mostly complete.

Though I had originally planned to implement more classes in my proposal, after a thorough look, I realised that a number of the mentioned classes could easily be implemented in SymEngine.py side only. As such, there was no hard requirement for them to be implemented in SymEngine. Also, a number of them had been pre-implemented, but rather as virtual methods, and not standalone classes. There are still a couple of classes that I’d be working on in the coming week, which would effectively finish up a huge part of the planned Phase I of my proposal.

SymEngine.py

Isuru and I conversed a bit on whether I’d be interested in working on providing SymEngine support for PyDy (a multi-body dynamics toolkit), as a part of GSoC. I agreed happily, and Isuru opened a couple of issues in SymEngine.py for me to work on, as I was completely new to PyDy. I started off wrapping up Infinity, NegInfinity, ComplexInfinity and NaN classes through PR #151. I also worked on finishing Isuru’s code, wrapping ccode and CodePrinter class with an improvised doprint function through PR #152. I also opened #153, working on acquisition and release of Global Interpreter Lock or GIL in pywrapper.cpp file.

See you again!

Au Revoir

June 11, 2017

I have spent the second week of Google Summer of Code on essentially two things:

  1. Continued work on type awareness in the code printers (CCodePrinter and FCodePrinter). The work is ongoing in gh-12693.
  2. Writing tutorial code on code-generation in the form of jupyter notebooks for the upcoming SciPy 2017 conference.

Precision aware code printers

After my weekly mentor meeting, we decided to take another approach to how we are going to represent Variable instances in the .codegen.ast module. Previously I had proposed to use quite a number of arguments (stored in .args since it inherits Basic). Aaron suggested we might want to represent that underlying information differently. After some discussion we came to the conclusion that we could introduce a Attribute class (inhereting from Symbol) to describe things such as value const-ness, and pointer const-ness (those two are available as value_const and pointer_const). Attributes will be stored in a FiniteSet (essentially SymPy version of set) and the instances we provide as "pre-made" in the .codegen.ast module will be supported by the printers by default. Here is some example code showing what the current proposed API looks like (for C99):

>>> u = symbols('u', real=True)
>>> ptr = Pointer.deduced(u, {pointer_const, restrict})
>>> ccode(Declaration(ptr))
'double * const restrict u;'

and for Fortran:

>>> vx = Variable(Symbol('x'), {value_const}, float32)
>>> fcode(Declaration(vx, 42))
'real(4), parameter :: x = 42'

The C code printer can now also print code using different math functions depending on the targeted precision (functions guaranteed to be present in the C99 stanard):

>>> ccode(x**3.7, standard='c99', precision=float32)
'powf(x, 3.7F)'
>>> ccode(exp(x/2), standard='c99', precision=float80)
'expl((1.0L/2.0L)*x)'

Tutorial material for code generation

Aaron, Jason and I have been discussing what examples to use for the tutorial on code generation with SymPy. Right now we are aiming to use quite a few examples from chemistry actually, and more specifically chemical kinetics. This is the precise application which got me started using SymPy for code-generation, so it lies close to my heart (I do extensive modeling of chemical kinetics in my PhD studies).

Working on the tutorial material has already been really helpful for getting insight in development needs for the exisiting classes and functions used for code-generation. I was hoping to use autowrap from the .utilities module. Unfortunately I found that it was not flexible enough to be useful for integration of systems of ODEs (where we need to evaluate a vector-valued function taking a vector as input). I did attempt to subclass the CodeWrapper class to allow me to do this. But personally I found those classes to be quite hard to extend (much unlike the printers which I've often found to be intuitive).

My current plan for the chemical kinetics case is to first solve it using sympy.lambdify. That allows for quick prototyping, and unless one has very high demands with respect to performance, it is usually good enough.

The next step is to generate a native callback (it was here I was hoping to use autowrap with the Cython backend). The current approach is to write the expressions as Cython code using a template. Cython conveniently follows Python syntax, and hence the string printer can be used for the code generation. Doing this speeds up the integration considerably. At this point the bottleneck is going back and forth through the Python layer.

So in order to speed up the integration further, we need to bypass Python during integration (and let the solver call the user provided callbacks without going through the interpreter). I did this by providing a C-code template which relies on CVode from the Sundials suite of non-linear solvers. It is a well established solver and is already available for Linux, Mac & Windows under conda from the conda-forge channel. I then provide a thin Cython wrapper, calling into the C-function, which:

  1. sets up the CVode solver
  2. runs the integration
  3. records some statistics

Using native code does come at a cost. One of the strengths of Python is that it is cross-platform. It (usually) does not matter if your Python application runs on Linux, OSX or Windows (or any other supported operating system). However, since we are doing code-generation, we are relying on compilers provided by the respective platform. Since we want to support both Python 2 & Python 3 on said three platforms, there are quite a few combinations to cover. That meant quite a few surprises (I now know for example that MS Visual C++ 2008 does not support C99), but thanks to the kind help of Isuru Fernando I think I will manage to have all platform/version combinations working during next week.

Also planned for next week:

  • Use numba together with lambdify
  • Use Lambdify from SymEngine (preferably with the LLVM backend).
  • Make the notebooks more tutorial like (right now they are more of a show-case).

and of course: continued work on the code printers. That's all for now feel free to get in touch with any feedback or questions.

Note : If you’re viewing this on Planet SymPy and Latex looks weird, go to the WordPress site instead.

The 2D use case works perfectly although there are limitations. The current API and method-wise limitations are discussed here.

That lone failing test was caused due to typecasting a Python set object to a list. Sets in Python do not support indexing, so when they are coerced to list, indexing of the resulting object is arbitrary. Therefore, I had to change to an axes list of [x, y].

Some other coding improvements were suggested by Gaurav Dhingra (another GSoC student). One can view them as comments to the pull request.

Clockwise Sort
I wanted to overcome some of the method-wise limitations. So, the first improvement I did was to implement a clockwise sorting algorithm for the points of the input polygon.
Come to think of it, for the 3D use case the points belonging to a particular facet will have to be sorted anti-clockwise for the algorithm to work. Therefore, it would be better to include an extra argument “orient” with default value 1 (clockwise sort) and -1 when anti-clockwise sort.

First I tried to think of it myself, and naturally the first thing that came to my mind was sorting the points depending on their \arctan { \theta } value where \theta is the angle that the line from that point to an arbitrary reference point (center) makes with the x-axis (more correctly, the independent variable axis). But obviously this is not efficient, because calculating the \arctan { \theta } value is expensive.

Then I came across this answer on StackOverflow. It was a much better technique because it did not use \arctan { \theta } , no division operations and no distance computing using square root (as pointed out in the comments). Let us understand the reasoning.

Firstly, I’m sure we all know that the distance of a point ({ x }_{ 1 },{ y }_{ 1 }) from the line (ax+by+c=0) is (\frac { a{ x }_{ 1 }+b{ y }_{ 1 }+c }{ \sqrt { { a }^{ 2 }+{ b }^{ 2 } } }). Now, for clockwise sorting we need to decide the left or right orientation of a point given the other point and a constant reference(in this case, the center). Therefore, it is required to define a custom compare function.

Given three points : a, b, center, consider the line made by points b and center. The point a will be on the left if numerator of of above formula is negative and right if positive. The cases where we need not consider this formula is when we are sure of :

Case 1 > The center lies in between a and b(Firstly check this by comparing x-coordinates).

Case 2 > If all the points are on a line parallel to y-axis :
Sub-Case 1 > If any point is above center, that point is “lesser” than other one.
Sub-Case 2 > If both below center, the point closer to center is “lesser” than
other one.

Case 3 > This is when Case 1, 2 and the standard check fails.
This can only happen if a, b and center are all collinear points and a, b both lie on the same side of the line with respect to center. Then, the one farthest  from center is “lesser” than the other one.

Ondrej and Prof.Sukumar have looked at the notebook but will discuss in detail further and then inform me about the final 2D API.

Till then, I’ll work on the existing limitations.

 


June 09, 2017

Francesco was skeptical if MatchPy could support all Rubi rules. Hence we (I and Abdullah) are trying to implement first set of rules (Algebraic) into MatchPy as soon as possible. Major things involved in accomplishing this are:

  • [Completed] Complete writing parser for Rubi rules form DownValues[] generated from Mathematica.
  • [Partially Completed] Complete basic framework for MatchPy to support Rubi rules.
  • [Completed] Parse Rubi tests.
  • [Incomplete] Add utility functions for Rubi into SymPy syntax.

Generation of new patterns from Optional Arguments

Mathematica supports optional arguments for Wild symbols. Some common functions (such as Mul, Add, and Pow) of Mathematica have built‐in default values for their arguments.

MatchPy does not support optional arguments to its Wildcards. So, Manuel Krebber suggested to add two more rules for each optional argument that exist in the pattern. For example:

Int[x_^m_.,x_Symbol] :=
  x^(m+1)/(m+1) /;
FreeQ[m,x] && NonzeroQ[m+1]

In the above rule, the default value of m_ is 1. So I implemented these rules in MatchPy:

Int[x_^m_.,x_Symbol] :=
  x^(m+1)/(m+1) /;
FreeQ[m,x] && NonzeroQ[m+1]

(* substituting m = 1*)
Int[x_,x_Symbol] :=
  x^(2)/(2) /;
FreeQ[2,x] && NonzeroQ[2]

I have used powerset to generate all combinations of default values to generate patterns. Code for parser can be found here.

Utility functions

There are many utility functions for Rubi written in Mathematica. We are currently focusing on implementing the functions which are being used in Algebraic rules. As soon we we complete our implementation (hopefully by this weekend), we can start running the test suit for Rubi.

I will work on implementing utility functions along with Abdullah in coming days. I will keep testing the module as we implement utility functions and add more rules into our matcher.

June 08, 2017

Hello all!

The first week comes to an end and we (Arihant and me) have partially implemented Algebraic functions\ Linear products.

THE PROGRESS

Most of my task was to translate Algebraic Integration tests from Maple Syntax to Python, write Utility Functions and writing tests for Utility Functions. I have already implemented 1 Algebraic functions\1 Linear products\1.2 (a+b x)**m (c+d x)**nhere. And test sets for 1 Algebraic functions\1 Linear products\1.3 (a+b x)**m (c+d x)**n (e+f x)**p and 1 Algebraic functions\1 Linear products\1.4 (a+b x)^m (c+d x)^n (e+f x)^p (g+h x)^q are almost ready here along with most of the Utility Functions we require till now, after this we will be successfully covering the Algebraic\ Linear products portion.

Next what’s delaying our progress is Utility Functions, I  have been taking help from this pdf on Integration Utility Functions and looking for their definitions in Mathematica website but the major problem is the definitions provided are not very clear or no definition at all. Meanwhile Arihant was implementing RUBI rules, default values for variables and working on constraints .


June 05, 2017

Ahoy there! This post contains my first GSoC progress report.

Report

SymEngine

My previous PR(#1276) on Relationals is reviewed and merged in. I also worked on introducing additional support for them. The PRs, #1279 and #1280 are also reviewed and merged, leaving only LLVM support as a work in progress.

I also noticed, that one of the pending requests for 0.3.0 milestone for SymEngine.py was the implementation of vector-specific methods such as dot() and cross() in SymEngine. The work is done here and is mostly complete.

Apart from this, I started the planned implementation of SymPy classes by implementing the Dummy class in SymEngine here.

Most of the above mentioned pending work should be ready to merge within a couple of days.

SymPy

Continuing my work on sympy/physics, I pushed in #12703 covering the stand-alone files in physics module and #12700 which is a minor addition to the work done in physics/mechanics.

SymEngine.py

Isuru pointed out some inconsistencies in the existing code for ImmutableMatrix class, which needed to be fixed for 0.3.0 milestone. The code was fixed through the PR #148.

We’ll probably have a SymEngine.py release next week, after which I plan to port over pre-existing functionalities in SymEngine.py to SymPy’s left-over modules.

That’s all for now.

Adiós

As discussed in the previous blog, this and the next week’s task for me is to improve the Sets module. I started off by implementing the class for Complemsent and the functions set_complement and set_intersection. This work is done in #1281.

set_intersection :

RCP<const Set> set_intersection(const set_set &in);

set_intersection tries to simplify the set of sets input by applying various rules.

  • trivial rules like if one of the input is emptyset()
  • handles finitesets by checking all other sets in the input for all elements in finitesets.
  • If any of the sets is union, then it tries to return a Union of Intersections
  • If any of the sets is Complement, then this returns a simplified Complement.
  • pair-wise rules checks every pair of sets and tries to merge them into one.

set_complement :

RCP<const Set> set_complement(const RCP<const Set> &universe, const RCP<const Set> &container);

For this function, i had to implement virtual functions set_complement for all the existing child classes of Set. and this function simply calls the container’s set_complement() with the given universe.

Details of Complement class :

  • similar to other classes in sets module, the class prototype for Complement is
    class Complement : public Set
    
  • It stores two Sets. Universe and container, and Complement(a, b) represents a-b where a is universe and b is the container.

Apart from this, I started maintaining a wiki page for GSoC’s progress report as suggested by Amit Kumar in the first weekly meeting. I will post minutes of all meetings in this wiki page.

Next week, I will be working on ImageSet and ConditionSet.

That’s it for now. See you next time. Until then, Goodbye !

Last week I was working on implementing the Smith Normal Form for matrices over principal ideal domains. I’m still making corrections as the PR is being reviewed. I used the standard algorithm: use invertable (in the ring) row and column operations to make the matrix diagonal and make sure the diagonal entries have the property that an entry divides all of the entries that come after it (this is described in more detail on wikipedia for example). I ran into trouble when trying to determine the domain of the matrix entries if the user hadn’t explicitly specified one. Matrices in SymPy don’t have a .domain attribute or anything similar, and can contain objects of different types. So, if I did attempt to find some suitable principal ideal domain over which to consider all of them, the only way would be to try a few of the ones that are currently implemented until something fits, and that would have to be extended every time a new domain is implemented and generally sounds tedious to do. I’ve asked on the Group Theory channel if there was a better way and that started a discussion about changing the Matrix class to have a .domain or a .ring attribute and have the entries checked at construction. In fact, this has been brought up by other people before as well. Unfortunately, adding this attribute would require going over the matrix methods implemented so far and making sure they don’t assume anything that might not hold for general rings (especially non-commutative ones: like the determinant in its traditional form wouldn’t even make sense; however, it turns out there are several generalisations of determinants to non-commutative rings like quasideterminants and Dieudonné determinant). And this would probably take quite a while and is not directly related to my project. So in the end we decided to have the function work only for matrices for which a .ring attribute has been added manually by the user. For example,

>>> from sympy.polys.solvers import RawMatrix as Matrix
>>> from sympy.polys.domains import ZZ
>>> m = Matrix([0,1,3],[2,4,1])
>>> setattr(m, "ring", ZZ)

Hopefully, at some point in the future matrices will have this attribute by default.

The Smith Normal Form was necessary to analyse the structure of the abelianisation of a group: abelian groups are modules over integers which is a PID and so the Smith Normal form is applicable if the relators of the abelianisation are written in the form of a matrix. If 0 is one of the abelian invariants (the diagonal entries of the Smith Normal form), then the abelianisation is infinite and so must be the whole group. I’ve added this test to the .order() method for FpGroups and now it is able to terminate with the answer oo (infinity) for certain groups for which it wouldn’t terminate previously. I hope to extend this further with another way to evaluate the order: trying to find a finite index cyclic subgroup (this can be achieved by generating a random word in the group and considering the coset table corresponging to it), and obtain the order of the group by multiplying the index with the order of the cyclic subgroup. The latter could be infinite, in which case the whole group is. Of course, this might not always terminate, but it will terminate in more cases than coset enumeration applied directly to the whole group. This is also what’s done in GAP.

I have begun working on it but this week is the last before my exams, and I feel that I should spend more time revising. For this reason, I probably wouldn’t be able to send a PR with this new test by the end of the week. However, it would most likely be ready by the end of the next one, and considering that the only other thing I planned to do until the first evaluation period was to write (the main parts of) the GroupHomomorphism class assuming that the things it depends on (e.g. rewriting systems) are already implemented, I believe I am going to stay on schedule.

June 03, 2017

Google are generously funding work on selected open source projects each year through the Google Summer of Code project. The project allows under- and post-graduate students around the world to apply to mentoring organizations for a scholarship to work on a project during the summer. This spring I made the leap, I wrote a proposal which got accepted, and I am now working full time for the duration of this summer on one of these projects. In this blog post I'll give some background and tell you about the first project week.

Background

Since a few years I've been contributing code to the open-source project SymPy. SymPy is a so-called "computer algebra system", which lets you manipulate mathematical expressions symbolically. I've used this software package extensively in my own doctoral studies and it has been really useful.

My research involves formulating mathematical models to: rationalize experimental observations, fit parameters or aid in design of experiments. Traditionally one sits down and derive equations, often using pen & paper, then one writes computer code which implements said model, and finally one writes a paper with the same formulas as LaTeX code (or something similar). Note how this procedure involves writing the same equations essentially three times, during derivation, coding and finally the article.

By using SymPy I can, from a single source:

  1. Do the derivations (fewer hard-to-find mistakes)
  2. Generate the numerical code (a blazing fast computer program)
  3. Output LaTeX formatted equations (pretty formulas for the report)

A very attractive side-effect of this is that one truly get reproducible research (reproducibility is one of the pillars in science). Every step of the process is self-documented, and because SymPy is free software: anyone can redo them. I can't stress enough how big this truly is. It is also the main motivation why I haven't used proprietary software in place of SymPy, even though that software may be considerably more feature complete than SymPy, any code I wrote for it would be inaccessible to people without a license (possibly even including myself if I leave academia).

For this work-flow to work in practice the capabilities of the computer algebra system need to be quite extensive, and it is here my current project with SymPy comes in. I have had several ideas on how to improve capability number two listed above: generating the numerical code, and now I get the chance to realize some of them and work with the community to improve SymPy.

First week

The majority of the first week has been spent on introducing type-awareness into the code-printers. SymPy has printer-classes which specialize printing of e.g. strings, C code, Fortran code etc. Up to now there has been no way to indicate what precision the generated code should be for. The default floating point type in python is for example "double precision" (i.e. 64-bit binary IEEE 754 floating point). This is also the default precision targeted by the code printers.

However, there are occasions where one want to use another precision. For example, consumer class graphics cards which are ubiquitous often have excellent single precision performance, but are intentionally capped with respect to double precision arithmetic (due to marketing reasons). At other times, one want just a bit of extra precision and extended precision (80-bit floating point, usually the data type of C's long double) is just what's needed to compute some values with the required precision. In C, the corresponding math functions are standardized since C99.

I have started the work to enable the code printers to print this in a pull-request to the SymPy source repository. I have also started experimenting with a class representing arrays. Arrays

The first weekly meeting with Aaron Meurer went well and we also briefly discussed how to reach out to the SymPy community for wishes on what code-generation functions to provide, I've set up a wiki-page for it under the SymPy projects wiki:

https://github.com/sympy/sympy/wiki/codegen-gsoc17

I'll be sending out an email to the mailing list for SymPy asking for feedback.

We also discussed the upcoming SciPy 2017 conference where Aaron Meurer and Jason Moore will be giving a tutorial on code-generation with SymPy. They've asked me to join forces with them and I've happily accepted that offer and am looking forward to working on the tutorial material and teaching fellow developers and researchers in the scientific python community about how to leverage SymPy for code generation.

Next blog post will most likely be a bit more technical, but I thought it was important to give some background on what motivates this effort and what the goal is.

As per the timeline, I spent the week writing a prototype for the 2D use case. Here is the current status of the implementation.

At the time of writing this blog post, the prototype mostly works but fails for one of the test cases. I haven’t used a debugger on it yet but will get around to fixing it today. The main agenda for the next week should be to improve this prototype and make it more robust and scalable. Making it robust would mean that the implementation need not require a naming convention for the axes and can handle more boundary cases.
Scalability would refer to implementing in a way which works reasonably fast for large input data. It would also mean designing the functions in a use case independent way so that they could be re-used for the 3-Polytopes. The simplest example would be the norm function. It works for both 2D and 3D points.

Currently, some parts of the implementation are hacky and depend upon denoting the axes by specific symbols namely x and y. That would be the main focus for the beginning of Week 2. Let’s take a look at the methods implemented and their respective shortcomings. The methods currently implemented are :

1. polytope_integrate : This is the main function which calls integration_reduction which further calls almost everything else. Mathematically, this function is the first application of the Generalized Stokes Theorem. Integration over the 2-Polytope surface is related to integration over it’s facets (lines in this use case). Nothing much to change here.

2. integration_reduction : This one is the second application of the Stokes Theorem. Integration over facets (lines) is related to evaluation of the function at the vertices. The implementation can be made more robust by avoiding accessing the values of the best_origin by the index and instead accessing it by key. One workaround would be to assign the Symbol ‘x‘ to the independent variable and ‘y‘ to the dependent one. In the code, all dictionary accesses will be done with these symbols. This offers scalability as well. An extra Symbol ‘z‘ will suffice to denote the third axis variable for the 3D use case.

3. hyperplane_parameters :  This function returns the values of the hyperplane parameters of which the facets are a part of. I can’t see improvements to be made with respect to the 2D use case, but that may change with the new API.

4. best_origin : This function returns a point on the line for which the vector inner product between the divergence of the homogeneous polynomial and that point yields a polynomial of least degree. This function is very much dependent on the naming scheme of the axes but this issue can be circumvented by assigning symbols as explained above.

5. decompose : This function works perfectly for all test data. However, I’ll see if it can be made faster. Will be important for scaling up-to polynomials containing large number of terms.

6. norm : This is a really simple function. The only reason to change it’s code would be adding support for different representations of points.

7. intersection, is_vertex : Both of these methods are simple to write and don’t require any further changes for the 2D case (at least as far as I can see).

9. plot_polytope, plot_polynomial : These are simple plotting functions to help visualize the polytope and polynomial respectively. If extra features for the plots are required then suitable changes to code can be made.

After I get all the tests for the 2D prototype to pass, I’ll write a Jupyter notebook and add it to the examples/notebooks folder of SymPy. As recommended by Ondrej, it should contain examples along with some plots and description of whatever basic API exists now.
The main focus for Week 2 should be :

1 > Get the prototype to pass all test cases (should be completed really soon).
2 > Make the notebook and discuss a better API for the 2D use case.
3 > Implement the final 2D API and write any tests for it.
3 > Discuss how to extend to the 3D use case (Implementation and API).


June 02, 2017

My first task, which corresponds to GSoC was create three classes, Curl, Divergence, Gradient. They create object which are unevaluated mathematical expression. Sometimes it’s better working on such expression, for example when we wants to check some identity. We have to check if it’s true for every possible vector. We have still some work here, because in next step we want to create abstract vector expression. There is one open PR corresponding to described task:

Community bonding period is completed and coding period started on 31st May.

Our original plan was to rewrite the pattern matcher for SymPy and generate decision tree for Rubi rules from Downvalues[] generated by Francesco.

Aaron gave us a link to this paper by Manuel Krebber. Pattern matching algorithms discussed in the paper are implemented in MatchPy library.

MatchPy

MatchPy uses discrimination net’s to do many-to-one matching(i.e. matching one subject to multiple patterns). MatchPy generates its own discrimination net as we add patterns to its ManyToOneReplacer.

Discrimination nets can be more efficient than a decision tree hence we decided to use MatchPy as the pattern matcher for Rubi. I wrote basic matching programs which implements few Rubi rules MatchPy.

I found the following issues with MatchPy:

  • MatchPy cannot be directly added to SymPy because it is written in Python3.6(whereas SymPy supports Python2 also).

  • It lacked mathematical operations on its Symbols due to which it becomes difficult to implement Rubi constrains. A workaround this issue is to sympify the expression and do calculations in SymPy.

  • MatchPy uses external libraries such as Multiset, enum and typing. SymPy does not encourage using external libraries in its code. Those modules need to be reimplemented into SymPy if we are going to directly import MatchPy code into SymPy.

Re-implementing MatchPy algorithms in SymPy can be very challenging and time consuming task as I am not very familiar with the algorithms used in MatchPy.

I used 3to2 to convert MatchPy code to Python2 syntax. Majority tests are passing in Python2 Syntax. I am currently trying to get the code working in Python2.

In coming week I will import MatchPy code to SymPy directly. If there are some setbacks in this approach, I will reimplement MatchPy algorithms in SymPy.


Older blog entries


Planet SymPy is made from the blogs of SymPy's contributors. The opinions it contains are those of the contributor. This site is powered by Rawdog and Rawdog RSS. Feed readers can read Planet SymPy with RSS, FOAF or OPML.