May 24, 2016

Since my last blog post I have opened two new PR’s for the Project. We’ll see what they do and discuss the goals of this week.

A major issue in the first PR was the slow performing algorithms, a consequence of using recursive expressions internally. Most of the algorithm implemented used matrices module to solve the linear system which doesn’t support DMP and DMF objects.

So I defined a new class subclassed from MutableDenseMatrix and changed some methods to make it work with Polynomials and Fractions and used this to use Polynomials internally. Thanks to Kalevi for this idea. It works much more robust now. I have also added methods to find composition of Holonomic Functions and converting a Hypergeometric Function to Holonomic. These things are added in this PR. I hope the PR gets merged in a couple of days.

A new PR was opened for features relating to recurrence relations in coefficients of Power Series expansion of Holonomic Functions. The first thing I did was defined a class RecurrenceOperator parallel to DifferentialOperator to store the recurrence relation.

Goals of the Week:

In this week, I have planned to define a function to find the Recurrence Relation of series coefficients and then go for numerical computation of Holonomic Functions. Let me know If anything else should be implemented first as I haven’t discussed this with mentors yet.

The chronology might be different from what I wrote in the Proposal but we are quite ahead of that.

Cheers Everyone.


May 23, 2016

Logo

I have been selected for GSoC’16 to work with Sympy on Implementing Finite Fields and Set module in SymEngine.

SymEngine is a standalone fast C++ symbolic manipulation library.

About the Proposal

We all know that Polynomial factorization is one of the fundamental tools of the computer algebra systems. And in symbolic mathematics, it is one of the basic requirement over which other algorithms can be implemented.
Currently, SymEngine has the implementation of Univariate Polynomial class, which provides us the basic functionality to add, multiply and subtract two polynomials.
Now, comes the problem of factoring the polynomials.
We have explicit solution formulas only till polynomials of degree four(the Quadtratic formula for degree 2, the Cardano formulas for third-degree equations, and the Ferrari formula for degree 4).
For sure, we need a different way out for higher degree polynomials. We see that there are algorithms for factorization in finite fields:

So, this summers I will be working on converting a polynomial in integer field to finite field, then factorizing them. After which we have to do Hensel Lifting to bring back the factored polynomial to integer field.

Furthermore, I will be working on implementing Sets module. These two together will help us to create a basic infrastructure over which we can develop a solvers module in SymEngine.

My proposal can be found here.

Community Bonding Period

I have been alloted Isuru Fernando, Thilina Rathnayake, Sumith and Ondřej Čertík as mentors.
The SymEngine community is very fast in reachability. We had a discussion on gitter channel of SymEngine, about the proceedings of our Proposals. As SymEngine has an implementaion of sparse polynomials, I will be working on changing them to Finite Fields. Like:

GaloisField::GaloisField(std::map<unsigned, int> &dict, unsigned modulo) : modulo_(modulo) { unsigned temp; for (auto iter : dict) { temp = iter.second % modulo; if (temp != 0) dict_[iter.first] = temp; } }
where dict is the dictionary of Univariate Polynomial representation and, dict_ stores its finite field representation modulo modulo_. I will be implementing this in the first week of GSoC period.

Work already Done

During the Community Bonding Period, I worked on implementing UniversalSet and FiniteSet.
UniversalSet is a singleton class like EmptySet, and while implementing this I learned a lot about Singleton classes.
FiniteSet is a class with a set of RCP<const Basic> as member variable. It can contain any object of Basic type. While implemeting this, we came on a fix over what to do when we have a interval like [1, 1], i.e. both end points equal. This led to a little change in Interval’s code, and now it returns a FiniteSet. Though this PR is not merged till now. I hope to get it merged in the next few days and along with it keep working on Finite Field implementation.

May 22, 2016

The Community bonding period comes to an end now. First of all considering the issues described in the last post:

  • Aaron created a new channel for our GSoC project discussion, sympy/GroupTheory.
  • I have also added rss-feed in my blog.
  • As for time of meeting, me and Kalevi often have discussion on the gitter channel, but since of quite a bit differene in timings between me and Aaron (I tend to sleep early at 11 PM IST). We three haven't been able to together have a meeting. Though Aaron suggested "Kalevi is the primary mentor, so if you have to meet without me that is fine". I also think that's not too big of an issue now, but his opinion has always helped, since he has best knowledge of sympy core.


In the past few weeks I wasn't too much productive, since I had a touch of fever, anyways I am okay now. We have now completed the implementation of the FreeGroup class in PR #10350. I started working on the PR back in January but I halted, since of my semester classes. FreeGroup is quite similar to the PolyRing implemented in sympy.polys.rings.py. We first started with the list of tuples as the data structure for the FreeGroupElm, where each tuple being (index, exp), but since of the mutable nature of lists, Kalevi suggested to go with tuple of tuples. Also as tuples are probably more efficient as there is no 'housekeeping' overhead. Also changed the element from (index, exp) --> (symbol, exp).


Implementing FreeGroupElm deals elegantly in such a way that it can't be independently created in a public interface. The reason being: every FreeGroupElm is in itself created only by the dtype method of FreeGroup class. The assignment is as follows:
obj.dtype = type("FreeGroupElm", (FreeGroupElm,), {"group": obj})
. Its sort of an advanced usage of type function as a metaclass.


Currently the printing code of latex and pprint for FreeGroupElm is a little hacky. I need to work on that as well.


Plan for Next few weeks

Though according to my proposal timeline, we described to go with implementation of other algebraic structures i.e Magma, SemiGroup, Monoid. But we will next move onto "Coset Enumeration". It is going to be a big task. That is harder and more important than other algebraic structures. Timline states it to be 5 week task, thats almost half the GSoC coding period. Well how do we go about that? I think of studying the mathematics in parallel with the implementation.

We have created a PR for implementation of Finitely Presented Group #11140. Not much code has been added here. Paper on Coset Enumeration using Implementation and Analysis of Todd Coxeter Algorithm (by John J. Cannon, George Havas), and other paper being the original paper by Todd and Coxeter, "A practical method for enumerating cosets of a finite abstract group" are the ones I am reading. As for the implementation of Todd Coxeter, we will be following the methods described in the book "Handbook of Computational Group Theory" by Derek F. Holt.

Also now the "official" coding period begins, good luck to everyone.

The last week of Community Bonding period was awesome. From tomorrow onwards, the coding period will begin. I am supposed to start working on my project from tomorrow, but I have done that already from the second week of the Community Bonding Period because I was supposed to take a vacation of 4 days (25 May – 29 May). Due to some issues, I have had to cancel that vacation. Now I have got some more days to work on my project. Let’s see what I have done so far…

So far

  •  PR 10863 had finally got merged.
  •  rewrite(Piecewise) :- In PR 11103, I was trying to solve the  arg = 0 part using solve functionality in sympy. But Jason suggested not to use solve as because there may arise some cases when solve will not be able to provide the desired output. So I kept  the arg = 0 part as  it is.  The story doesn’t end here. There is a confusion regarding keeping the check for whether  arg is real. Personally, I think that check should be there since both Heaviside and DiracDelta is defined only on real axis.
  • In PR 11137, I have improved the doc strings of all the methods under DiracDelta and Heaviside classes. I have added the contextual example for DiracDelta(x, k) and described the relation between fdiff() and diff() . This pull request needs a final review.
  • In PR 11139, I have added the functionality to pretty print the DiracDelta(x) as δ(x). This pull request also needs a final review.
  • Finally, almost every proposed improvement under the issue 11075 is being fulfilled.

Next Week

My plans for next weeks are:-

  • To polish PR 11103PR 11137 and PR 11139 and get them merged.
  • To start working on the implementation of Singularity Functions.

May 21, 2016

The Kronecker Substitution

I started off my work by reading through the existing mul_poly function. It uses the Kronecker Substitution technique to multiply two polynomials. An insight can be gained by looking at the slides here. Think of it this way,

“If you evaluate a polynomial at a large enough power of 10, I bet you can tell all it’s coefficients just by looking at the result!”

The mentioned slides call this the KS1 algorithm. Another algorithm it proposes is the KS2 algorithm, which evaluates the polynomial at two points (in contrast to just one) to interpolate the polynomial. A more mathematical explanation on the two techniques can be found here. I implemented the algorithm, and it wasn’t too difficult, as it was a a slight modification to the already existing multiplication technique. Later, I added a benchmark test for comparing the two techniques, KS1 & KS2. The benchmark (roughly) calculates the ratio of the time required for multiplying two polynomials using the two algorithms. Both the polynomial length (from 1 to 10,000) and the bit length of the coefficients (5, 10, 15, 20 bits) were varied. The graphs of the benchmarking are as follows.

Linear & Log scale : During this time, I was asked by Isuru to switch work towards the polynomial interface with FLINT & Piranha (and shift the polynomial manipulations to the end of summer). So, the PR hasn’t been merged in yet, and no conclusions and observations have been made between the two algorithms as of yet. Will be done later during the summer. Here’s the PR #930

Dictionary wrappers

I also started work on Dictionary wrappers for SymEngine. One was already made, for the UnivariatePolynomial class aka the class for univariate polynomials with symbolic coefficients. It is a map from int -> Expression. We needed another wrapper for the uint -> integer_class map, so that the UnivariateIntPolynomial class can be structured the same way as the former. Now that we need almost the same functionality, why not temlatize the wrapper? (suggested by Isuru) That’s what I did, and the PR #946 is almost merged in. More on wrappers next time!

Miscellaneous issues

Most of my work during this period revolved around reading the existing polynomial class, and refactor it and removed any redundancies. Some of the miscellaneous work that was done :

  • Some refactoring was done in the dict.cpp file. There were some redundancy in the functions which was removed. Templatized methods for checking equality and comparing vectors (and sets) were made. Other specific eq & compare methods became derived methods of these base classes. #933

  • Initially, the mul_poly method was constructing a vector of coefficients for the resulting multiplied polynomial (thus, implicitly storing it in a dense representation for a while). However, it was returned as a sparse represented polynomial, using a dictionary. This was changed, so that the dictionary is directly created, and the intermediate vector isn’t requireds. Also, some changes in variable names for clarity, as well as removing the redundant function dict_add_term. #928

  • A redundant function create was removed. All it was doing was calling from_vec within the same class. #941

See you next week, Goodbye!

May 20, 2016

The community bonding period is coming to a close and so I’d like to write about what I’ve done/learned during this time. I’ve had the opportunity to create my first blog, have my first meeting with my mentors, submit a couple of minor pull requests to pydy and sympy, add an example script to the pydy repository, begin learning about spatial vectors and begin work on some benchmarking code.

This is my first attempt at blogging and I had some trouble initially setting it up. For starters I did not know what an RSS feed was or for what it was used. Also I wanted the blog to be hosted on github pages where I currently keep my collection of notes and thus I decided to try to use the jekyll static site backend that github uses. I tried, however, to isolate the blog in its own subfolder with the rest of my GSoC information but this caused all kinds of problems with posts showing up multiple times and the RSS feed not updating properly. I eventually decided to stop trying to separate the posts and just centrally locate them as is demonstrated in the jekyll documentation. For the RSS feed I used a template that I found online. Now the posts appeared properly and the RSS feed updated correctly. The last thing I wanted to do for my blog before I considered it officially set up was to have some method of allowing people to comment on my posts. I found a blog post online on how to achieve this without the need for anything other than what github pages offers and so I set out to try this method. I used the code shown on the blog post without any luck. Prior to this I have had zero experience working with javascript and so I didn’t even know where to begin to try to debug why the comments were not showing up and so I sent the writer of the blog post an email asking for his assistance. And he replied! He pointed out that I was missing the part where a java script library would be loaded for use on the page and once I added the couple of lines of code, commenting on my blog posts is now possible (At least I think that’s what the problem was but again I have no experience working with javascript). With the ability to comment added, my blog is completely set up and is connected to the correct channels for the Google Summer of Code.

Early in the community bonding period I was able to have my first meeting with my mentors for my project. During this meeting it was discussed that I could change the later portion of my project from working on implementing a Newton Euler method of equations of motion generation to implementing the faster Featherstone method. Considering I had no great attachment to the Newton Euler method I agreed that the faster method would provide a greater benefit for the overall project. Since the meeting I have spent some time reading on the math involved in the Featherstone method, specifically spatial vectors and their uses in dynamics. To this end I have read A Beginners Guide to 6-D Vectors (Part 1) and started reading both A Beginners Guide to 6-D Vectors (Part 2) and Roy Featherstone’s short course on spatial vector algebra.

I have also spent some time beginning to familiarize myself with the code that I will be working with. To begin I followed Jason Moore’s great suggestion of coding through one of the examples from my dynamics course and adding it to the pydy/examples folder in the pydy repository. The example I chose to use was a simple pendulum so that I could focus on the code rather than the complexities of the problem itself. This code and diagram are currently undergoing the review process now in order to be added to the pydy repository.

Lastly I have begun work on benchmarking code which is mentioned as part of my project itself. In working on this part of the project I was able to learn how to use a SQLite database with python which I had only obtained brief exposure to in the past. This code currently works using python’s timeit library to run a file utilizing Lagrange’s method of equations of motion generation and another using Kane’s method. The code runs each file several thousand times and iterates through this process 30 times and saves the average of the 30 runs along with several other useful bits of information about the computer and version of python being used to run the tests. In addition to the benchmarking code itself I have been working on a script that will allow viewing of a graph of the tests utilizing matplotlib and tkinter. This code is close to completion and the current next major addition will be to add the ability to filter the tests based on what platform was used/what version of python was used to run the tests.

This community bonding period has been productive and I am excited to begin the Google Summer of Code program on Monday.

May 19, 2016

About a month ago I tweeted this:

EDIT: Some people have started working on making this happen. See https://python3statement.github.io/.

For those of you who don't know, Python 2.7 is slated to reach end-of-life in 2020 (originally, it was slated to end in 2015, but it was extended in 2014, due to the extraordinary difficulty of moving to a newer version). "End-of-life" means absolutely no more support from the core Python team, even for security updates.

I'm writing this post because I want to clarify why I think this should be done, and to clear up some misconceptions, the primary one being that this represents library developers being antagonistic against those who want or have to use Python 2.

I'm writing this from my perspective as a library developer. I'm the lead developer of SymPy, and I have sympathies for developers of other libraries.1 I say this because my idea may seem a bit in tension with "users" (even though I hate the "developer/user" distinction).

Python 2

There are a few reasons why I think libraries should drop (and announce that they will drop) Python 2 support by 2020 (actually earlier, say 2018 or 2019, depending on how core the library is).

First, library developers have to be the leaders here. This is apparent from the historical move to Python 3 up to this point. Consider the three (not necessarily disjoint) classes of people: CPython core developers, library developers, and users. The core developers were the first to move to Python 3, since they were the ones who wrote it. They were also the ones who provided the messaging around Python 3, which has varied over time. In my opinion, it should have been and should be more forceful.2 Then you have the library developers and the users. A chief difference here is that users are probably going to be using only one version of Python. In order for them to switch that version to Python 3, all the libraries that they use need to support it. This took some time, since library developers saw little impetus to support Python 3 when no one was using it (Catch 22), and to worsen the situation, versions of Python older than 2.6 made single codebase compatibility almost impossible.

Today, though, almost all libraries support Python 3, and we're reaching a point where those that don't have forks that do.

But it only happened after the library developers transitioned. I believe libraries need to be the leaders in moving away from Python 2 as well. It's important to do this for a few reasons:

  • Python 2.7 support ends in 2020. That means all updates, including security updates. For all intents and purposes, Python 2.7 becomes an insecure language to use at that point in time.

  • Supporting two major versions of Python is technical debt for every project that does it. While writing cross compatible code is easier than ever, it still remains true that you have to remember to add __future__ imports to the top of every file, to import all relevant builtins from your compatibility file or library, and to run all your tests in both Python 2 and 3. Supporting both versions is a major cognitive burden to library developers, as they always have to be aware of important differences in the two languages. Developers on any library that does anything with strings will need to understand how things work in both Python 2 and 3, and the often obscure workarounds required for things to work in both (pop quiz: how do you write Unicode characters to a file in a Python 2/3 compatible way?).

  • Some of Python 3's new syntax features (i.e., features that are impossible to use in Python 2) only matter for library developers. A great example of this is keyword-only arguments. From an API standpoint, almost every instance of keyword arguments should be implemented as keyword-only arguments. This avoids mistakes that come from the antipattern of passing keyword arguments without naming the keyword, and allows the argspec of the function to be expanded in the future without breaking API.3

The second reason I think library developers should agree to drop Python 2 support by 2020 is completely selfish. A response that I heard on that tweet (as well as elsewhere), was that libraries should provide carrots, not sticks. In other words, instead of forcing people off of Python 2, we should make them want to come to Python 3. There are some issues with this argument. First, Python 3 already has tons of carrots. Honestly, not being terrible at Unicode ought to be a carrot in its own right.4

If you don't deal with strings, or do but don't care about those silly foreigners with weird accents in their names, there are other major carrots as well. For SymPy, the fact that 1/2 gives 0 in Python 2 has historically been a major source of frustration for new users. Imagine writing out 1/2*x + x**(1/2)*y*z - 3*z**2 and wondering why half of what you wrote just "disappeared" (granted, this was worse before we fixed the printers). While integer/integer not giving a rational number is a major gotcha for SymPy, giving a float is infinitely better than giving what is effectively the wrong answer. Don't use strings or integers? I've got more.

Frankly, if these "carrots" haven't convinced you yet, then I'll wager you're not really the sort of person who is persuaded by carrots.

Second, some "carrots" are impossible unless they are implemented in libraries. While some features can be implemented in 2/3 compatible code and only work in Python 3 (such as @ matrix multiplication), others, such as keyword-only arguments, can only be implemented in code that does not support Python 2. Supporting them in Python 2 would be a net deficit of technical debt (one can imagine, for instance, trying to support keyword-only arguments manually using **kwargs, or by using some monstrous meta-programming).

Third, as I said, I'm selfish. Python 3 does have carrots, and I want them. As long as I have to support Python 2 in my code, I can't use keyword-only arguments, or extended argument unpacking, or async/await, or any of the dozens of features that can't be used in cross compatible code.

A counterargument might be that instead of blocking users of existing libraries, developers should create new libraries which are Python 3-only and make use of new exciting features of Python 3 there. I agree we should do that, but existing libraries are good too. I don't see why developers should throw out all of a well-developed library just so they can use some Python features that they are excited about.

Legacy Python

A lot of people have taken to calling Python 2 "legacy Python". This phrase is often used condescendingly and angers a lot of people (and indeed, this blog post is the first time I've used it myself). However, I think Python 2 really should be seen this way, as a "legacy" system. If you want to use it, for whatever your reasons, that's fine, but just as you shouldn't expect to get any of the newest features of Python, you shouldn't expect to be able to use the newest versions of your libraries. Those libraries that have a lot of development resources may choose to support older Python 2-compatible versions with bug and/or security fixes. Python 2 itself will be supported for these until 2020. Those without resources probably won't (keep in mind that you're using open source libraries without paying money for them).

I get that some people have to use Python 2, for whatever reasons. But using outdated software comes at a cost. Libraries have borne this technical debt for the most part thus far, but they shouldn't be expected to bear it forever. The debt will only increase, especially as the technical opportunity cost, if you will, of not being able to use newer and shinier versions of Python 3 grows. The burden will have to shift at some point. Those with the financial resources may choose to offload this debt to others,5 say, by backporting features or bugfixes to older library versions that support Python 2 (or by helping to move code to Python 3).

I want to end by pointing out that if you are, for whatever reason, still using Python 2, you may be worried that if libraries become Python 3-only and start using Python 3 features, won't that break your code? The answer is no. Assuming package maintainers mark the metadata on their packages correctly, tools like pip and conda will not install non-Python 2 compatible versions into Python 2.

If you haven't transitioned yet, and want to know more, a good place to start is the official docs. I also highly recommend using conda environments, as it will make it easy to separate your Python 2 code from your Python 3 code.

Footnotes


  1. With that being said, the opinions here are entirely my own, and are don't necessarily represent those of other people, nor do they represent official SymPy policy (no decisions have been made by the community about this at this time). 

  2. It often feels like core Python itself doesn't really want people to use Python 3. It's little things, like docs links that redirect to Python 2, or PEP 394, which still says that the python should always point to Python 2. 

  3. In Swift, Apple's new language for iOS and OS X, function parameter names are effectively "keyword-only" by default

  4. As an example of this, in conda, if you use Python 2 in the root environment, then installing into a path with non-ASCII characters is unsupported. This is common on Windows, because Windows by default uses the user's full name as the username, and the default conda install path is in the user directory.

    This is unsupported except in Python 3, because to fix the issue, every single place in conda where a string appears would have to be changed to use a unicode string in Python 2. The basic issue is that things like 'π' + u'i' raise UnicodeDecodeError in Python 2 (even though 'π' + 'i', u'π' + 'i', and u'π' + u'i' all work fine). You can read a more in-depth description of the problem here. Incidentally, this is also why you should never use from __future__ import unicode_literals in Python 2, in my opinion.

    Even though I no longer work on conda, as far as I know, the issue remains unfixed. Of course, this whole thing works just fine if conda is run in Python 3. 

  5. If that legitimately interests you, I hear Continuum may be able to help you. 

May 18, 2016

GSoC Coding period is about to start next week.

The past week I was focused on completing the NTheory Ruby wrappers, in order to complete my promised workload for the pre-coding time.

 

The main lessons learnt from this week’s work was handling conversions between Ruby and C types. This proved to be a quite easy task, with the Ruby C API.

The Definitive Guide to Ruby’s C API covers this in detail.

Then I had to figure out how to let SymEngine Integers to be implicitly convertible into Ruby numeric types. This proved tricky for me to get around as I wasn’t aware that it could be done in the actual Ruby code, without having to use the Ruby C API. The implicit conversion to Ruby numeric types was quite easy.

As shown in the gist above, I just had to declare the class in the lib folder with the necessary conversion method.

With this part done, several number theory functions can now be called upon SymEngine Integers. Those functions are:

  • GCD
  • LCM
  • Mod
  • Next Prime
  • Quotient
  • Fibonacci Number
  • Lucas Number
  • Binomials

Now that this part is done, the next step would be to start coding from next Monday. From my proposed plan, the first two weeks would be for wrapping Complex Numbers and Floating Point Numbers for Ruby.

See you after the first week.


May 15, 2016

The 4th week of Community Bonding Period is about to kick off. I am here to write about what I’ve done so far and my goals for next week.

The first PR for the project “Implementation of Holonomic Function” got merged today. This adds following functionality in SymPy.

  • Differential Operators with Polynomial Coefficients and operation like addition, multiplication etc.
  • Holonomic Functions. A representation of Holonomic Functions given its annihilator and Initial Conditions (optional).

A little about the API to get you an idea of this.

>>> from sympy import *
>>> from sympy.holonomic import HolonomicFunction, DiffOperatorAlgebra

>> x = symbols(‘x’)
>>> R, Dx = DiffOperatorAlgebra(ZZ.old_poly_ring(x), ‘Dx’)

>> Dx * x
(1) + (x)Dx

>>> HolonomicFunction(Dx – 1, x, 0, [1]) + HolonomicFunction(Dx**2 + 1, x, 0, [0, 1])
HolonomicFunction((-1) + (1)Dx + (-1)Dx**2 + (1)Dx**3, x), f(0) = 1, f'(0) = 2, f”(0) = 1

Operations supported for Differential Operators are addition, multiplication, subtraction and power. Holonomic Functions can be added and multiplied with or without giving the Initial Conditions. Special thanks to OndrejKalevi and Aaron for all the help, suggestions and reviews.

What Now?

Now the goal is to use Polynomials and Fractions i.e. instances of DMP and DMF classes instead of expressions for all the manipulation done internally. This is necessary for robustness. After that is done I work on to implement conversion of Hypergeometric Functions to Holonomic Functions.

This has been super exciting so far and I hope same for future.

Thank You.

 


May 14, 2016

Hi there ! This week was great. I got to learn about many new things. I have mentioned in my last post, about my goals for this week, let us see what I have done so far.

So far

  • In PR 10863, implementation of  _eval_expand_diracdelta is almost done . A final review is needed. But at the same time, I was forgetting about the fact that the   simplify  method has to be deprecated in order to make things backwards compatible. Thanks Jason for the suggestion.

I have made the simplify() method call the _eval_expand_diracdelta() method and raise a deprecation warning. I have also added the tests for this method by catching the deprecation warnings properly. The API works like this:-

In [3]: DiracDelta(x*y).simplify(x)
/home/ahappyidiot/anaconda2/bin/ipython:1: SymPyDeprecationWarning: 

simplify has been deprecated since SymPy 1.0.1. Use
expand(diracdelta=True, wrt=x) instead.

  #!/home/ahappyidiot/anaconda2/bin/python
Out[3]: DiracDelta(x)/Abs(y)

These commits are needed to be reviewed properly in order to merge PR 10863.

  •  rewrite(Piecewise) :- In PR 11103, I have implemented a new method under DiracDelta class which would successfully output a Piecewise representation of a DiracDelta Object. For this pull request also, a final review is needed. The API works as:-
In [4]: DiracDelta(x).rewrite(Piecewise)
Out[4]:
⎧  oo       for x = 0
⎨                           
⎩  0        otherwise 

In [4]: DiracDelta(x - 5).rewrite(Piecewise)
Out[4]:
⎧  oo       for x - 5 = 0
⎨                           
⎩  0        otherwise
  • I have also reviewed PR 11065, I personally think that the implementation is a great idea.

Next Week

My plans for next weeks are:-

  • Polish both PR 11103 and  PR 10863 and get these pull requests merged
  • Improve doc strings of the DircaDelta and Heaviside classes and methods.

 

I will again get back by the end of the next week. Cheers !!!

Happy Coding.


May 10, 2016

The second week of the Community Bonding Period got over. Though this post is quite late, I will try to post updates on Fridays of every week.

So far

I had my first meeting with Jason Moore, one of my mentor, on 5th May through Google Hangouts. We had a brief discussion over my proposal. I am taking a head-start for coding along with community bonding. I have started a discussion about the first phase of my proposal. Jason has created an issue tracker for Improvements to DiracDelta and Heaviside.

  • PR 10863 is almost completed only the depreciation part is left.

Almost all the properties of DiracDelta functions has been already implemented in Sympy. But I need to check whether all of them are unit tested and well documented.

Next Week

My targets for this week are:-

  • Polish PR 10863  and get it merged.
  • Implement rewriting DircaDelta  as Piecewise.
  • Improve doc strings of the DircaDelta and Heaviside classes and methods.

I will again get back by the end of this week. Cheers !!!

Happy Coding.


May 07, 2016

So, the GSoC is officially starting with the community bonding period until the last week of May. Until then, the official requirements are to get to know the communities. For my project, this puts me in an odd situation with the project being done for SymEngine community, while under the auspices of SciRuby Foundation. For the starters, I am familiar with many people from the SymEngine community, and just now I am trying to get more involved with the SciRuby people.

Also, according to my proposal I have listed a couple of tasks to be completed before the actual coding begins. So this week was mostly spent on merging my existing and long standing PR for Ruby Wrappers for Trigonometric, Hyperbolic and other functions. The major problem I had was writing the repetitive tests for all the functions included in the wrappers. But apart from that the requirements were quite straightforward.

For the next week, I am planning to wrap the Number Theory functions in Ruby. This already has a CWrapper, which makes my task a lot easier.

Apart from coding, I wanted to set the record straight for SymEngine gem in the sciruby website. It lists the SymEngine gem as broken, and I would need to correct the gem’s installation scripts. Figuring out this is another task I carry on for the coming week.

See you!


May 02, 2016

About one and half week ago, the results of Google Summer of Code were out. I am extremely glad to inform that my project for Sympy on Implementation of Singularity Functions got selected for GSoC 2016.

Google Summer of Code

Google Summer of Code is a global annual program focused on bringing more student developers into open source software development. It is a global program that offers students stipends to write code for open source projects.

Sympy

SymPy  is a Python library for symbolic mathematics. It aims to become a full-featured Computer Algebra System (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible.

About my Project

I have proposed to work on the Implementation of a full fledged Computer Algebra System (CAS) of Singularity Functions. I will create a module to represent a Singularity Function and implement different mathematical operations. This module will be further used to create an another module which would be used for solving complicated beam bending problems.

Jason Moore, Sartaj Singh and Ondřej Čertík are going to mentor me throughout the whole program. All of them are really talented and very humble people. I have learned a lot from all of them. I am extremely lucky to work under such great people.

Now Community Bonding Period is going on. This is intended to get students ready to start contributing to their organization full time from 23rd May. I am supposed to :

  • Become familiar with the community practices and processes.
  • Participate on Mailing Lists / IRC / etc.
  • Set up your development environment.
  • Small (or large) patches/bug fixes.
  • Participate in code reviews for others.
  • Work with my mentor and other org members on refining my project plan. This might include finalizing deadlines and milestones, adding more detail, figuring out potential issues, etc.

Looking forward toward a great summer.

Cheers!!!

 

 


May 01, 2016

Hello, I'm Gaurav Dhingra a 3rd year undergraduate student at IIT Roorkee, my proposal on Group Theory with SymPy has been accepted as a part of Google Summer of Code


First, a little bit about SymPy, a Computer Algebra System (CAS) written entirely in Python. SymPy 1.0 was released about 2 months ago, Sympy has been created by hundreds of contributors starting from 2006. I will be working on Group Theory over the summer, for the next 3 months, to implement Computational Group Theory (CGT) and Group Theory, which are parts of mathematics I particularly enjoy. You can view my project proposal GSoC 2016 Application Gaurav Dhingra: Group Theory. Until a few days ago I was pretty busy with my exams, but in the next few weeks I will go over working on the project. I will particularly focus on Finite and Finitely Presented Groups.


I hope that I'll be able to implement everything that I promised in it. Moving onto the ongoing community bonding. Since I am very well acquitted with the workflow of SymPy, I can get straight to few important things, which i will do in the next few days.

This includes things like:

  • Setting up a blog with RSS feed i.e this blog in which I am supposed to add an RSS feed functionality.
  • Talking to my mentors regarding the time, and place of chat on internet, we differ by almost 5hrs. Time wouldn't be an issue, since seeing from past, I haven't faced such difficulty as both me and my mentor work for almost the same time intervals. From the GSoC 2015 discussions, I remember that Ondrej tries to make sure everyone knows what time student-mentor meet, since of different time zones.
  • In the past we have had discussion on my private gitter channel Group Theory Implementation. Would it be wise to continue code discussions there?. Since no one can be added in the channel without my permission.


One thing that has been a hell of a lot annoying has been the GSoC mailing list, it's a lot distracting. I changed list settings to abridged daily updates because I was getting like 50 mails every day and that too about some really stupid and irrelevant things. But yeah like whatever.


"LESS TALK, MORE CODE" is the policy that I always tend to follow (not for blog!!). I will try my best to implement it in a strict way this summer. I have seen this policy working fine for me, mostly first I start writing question in a message box to my mentor, and then i think more about it myself and in the end I come up with a solution on my own, instead of asking.


I'm quite sure that I will write more than enough blog posts about my project during the summers. Since I enjoy writing and that too regarding things that occupy larger part of my day.


I'd like to thank all the people involved with contributions to SymPy. My special thanks to my mentor - Kalevi Suominen and my co-mentor - Aaron Meurer for all the suggestions while making my proposal, and showing faith and enthusiasm in my ability and my proposal.

April 29, 2016

I have been selected for GSoC’16! The results came out on Apr 23, and I have never been happier! I got around to writing this blog post only now, because of my end semester examinations which ended yesterday. I have been alotted Isuru and Sumith as my official mentors. I’m very excited to start working on the project, alongside them.

Right now, I’ll start my discussions on the implementation details, and overall structure of the code. Also I will begin work on the Fast Fourier algorithm for univariate polynomial multiplication.

Looking forward to a busy summer!

April 26, 2016

So, I have been accepted for the Google Summer of Code – 2016 for the project “Ruby Wrappers for SymEngine”, under the mentoring organization SciRuby.

The aim of this post is to give an introduction to the project.

The abstract of the project is as follows:

A project started by the SymPy organisation, SymEngine is a standalone fast C++ symbolic manipulation library. It solves mathematical problems the same way a human does, but way more quickly and precisely. The motivation for SymEngine is to develop the Computer Algebra System once in C++ and then use it from other languages rather than doing the same thing all over again for each language that it is required in. The project for Ruby bindings has already been setup at symengine.rb. Few things that the project involves are:
  • Extending the C interface of SymEngine library.
  • Wrapping up the C interface for Ruby using Ruby C API, including error handling.
  • Designing the Ruby interface.
  • Integrating IRuby with symengine gem for better printing and writing IRuby notebooks.
  • Integrating the gem with existing gems like gmp, mpfr and mpc.
  • Making the installation of symengine gem easier.

If you are interested, the full proposal, which includes the timeline is available online.

Also, the GitHub repository for the project is at SymEngine/SymEngine.rb.

The actual coding phase starts in about a month, and before that I plan to complete the Ruby Wrappers for the Trigonometric and Hyperbolic Functions and to write the necessary tests. Next, the NTheory CWrappers can be wrapped into Ruby. This too will be done before the GSoC period starts.

Keep checking the blog if you are interested to track the progress of this project. I will be posting weekly updates in the blog.

Auf Wiedersehen!

 


I am excited to announce that I have been accepted for the the Google Summer of Code program for the summer of 2016. I will be working with the Sympy open source project’s equation of motion generators. For the project I will mainly be focusing on creating a shared base class for the current equation of motion generators and adding an additional generator.

March 25, 2016

Hi there! Last week I went to Singapore for FOSSASIA Open Tech Summit 2016. I conducted a Worskhop on SymPy and assisted the PyDy Workshop in Python track hosted by Kushal Das. This blog post accounts to my experience as a speaker, as a attendee at FOSSASIA and as a traveler to Singapore.

About FOSSASIA

FOSSASIA is the premier Free and Open Source technology event in Asia for developers, start-ups, and contributors. Projects at FOSSASIA range from open hardware, to design, graphics and software. FOSSASIA was established in 2009. Previous events took place in Cambodia and Vietnam.

As the name suggests its one of the largest tech conferences in Asia and my expectations were pretty high from this conference and moreover It was my first international conference. I witnessed lots of amazing people in the conference and interacted with a few as well. This is how it started:

The SymPy/PyDy Workshop

Community is more important than Code @ Singapore Science Center Level 3, Pauling Lab

The SymPy and PyDy workshop was scheduled on 20th March at 1:00 - 2:00 PM (PyDy) and 2:00 - 4:00 PM (SymPy). Jason suggested to conduct the SymPy workshop first since PyDy uses SymPy and it would be easier for people to learn SymPy first and then PyDy, but since the schedule was already published, It was not possible to reschedule the workshops, so we had to continue with PyDy first. Sahil started the PyDy workshop at 1:00 PM, though we had to spend a lot of time installing Anaconda to everyone's systems by creating a local server and distributing flash drives as most of the people didn't had Anaconda or Canopy installed. This has been the problem for almost all the workshops I have conducted in the past. It seems I need to invent an efficient way to do this faster in future as we spent 30-40 odd minutes in installation.

Fortunately Sahil finished his presentation at around 2:15 PM. Then I took over for SymPy workshop, I started with the basic introduction to SymPy, the slides can be found here. Then I jumped to IPython notebook exercises to demonstrate more of SymPy. People were amazed by the capabilities of this amazing piece of software. The most beautiful feature they liked was printing and integration. The workshop went pretty well except for the glitches in the HDMI port of my laptop (probably, its the right time to get a new laptop). Here are some SymPy stickers for you, if you missed there.

Singapore was Fun ;)

Visiting Singapore has been a great experience, the culture is a mix of Malaysian, Indian and native Singaporean. The City is well connected with MRT/SMRT (Metro and Buses). It's quite easy get anywhere around the city. People here are very helpful and nice. I didn't faced any problems throughout my stay there. I spent most of my time near Science Center, China Town and Little India. There were lot of people from India and particularly from Delhi and three from my University. It was awesome time spent with geeks all around. Tagging some of them Mayank, Ishaan, Umair, Jigyasa, Yask, Garvit, Manan, sorry If I missed someone. Here is a pic of the last day of the conference.

Thank you!

Thank you FOSSASIA Organizing Team, Hong Phuc Dang for inviting me to be part of this awesome FOSS community. I would not have been able to attend the conference without the generous financial support from SymPy, Thank you Ondrej Certik, Aaron Meurer & SymPy contributors.

Good Bye!

Good bye! everyone, see you on my next blog post, meanwhile you can have a look at a Picture of me doing back flip at Sentosa ;)

March 24, 2016

Hello World! My previous blog posts were at: Global Class.

You should try not to reinvent the wheel, So I thought it would be better for me to fork jekyll-now repo and build this jekyll blog in minutes :smile: .

This is cool and the awesome most part is that It is Markdown flavoured . Its quite cool writing in markdown now. Oh! I remember how weird I felt writing in markdown back in December. Snap!

Now, I will search for a way to do spell correction in markdown. It is so much nedded, I know a little bit of googling will help me.

Happy Holi! :fire:

March 07, 2016

This is my first blog post. The blog was made to track progress of my GSoC project and get feedback from my mentors, if my proposal gets selected. I’m proposing to implement the Multivariate and Univariate polynomial class in SymEngine.

Wish me luck!

February 06, 2016

Hi there! It's been sometime now since my last blog post, It's probably the right time to write one now. Yesterday, I gave a talk on SymPy at Python Delhi User group Meetup at CSDS, New Delhi. Things never go the way you want, an hour was wasted in just setting up Anaconda on everyone's system, eventually I had to cut on the material I could demonstrate, though It was nice to see that people were very enthusiatic about SymPy, they actively solved excercises. It was fun interacting with everyone.

Here is a Pic of the Seminar room at CSDS:

I should also admit that, I have increased my appetite for attending conferences and meetups, these days. In the last 4 months I have attended 3 Meetups (PyDelhi Meetup) and 1 Conference (PyCon India 2015). I think this is one of the best things I have done in last few years & I would recommend anyone with a slight interest in Python either Beginner or Expert should attend PyDelhi Meetups. Looking forward to more such meetups and conferences.

I gave SymPy stickers to everyone who solved atleast one excercise (Since, I didn't had enough stickers).

January 26, 2016

This post is based off a Jupyter notebook I made in 2013. You can download the original here. That notebook was based off a wiki page on the SymPy wiki, which in turn was based on a message to the SymPy mailing list.

What is hashing?

Before we start, let's have a brief introduction to hashing. A hash function is a function that maps a set of objects to a set of integers. There are many kinds of hash functions, which satisfy many different properties, but the most important property that must be satisfied by any hash function is that it be a function (in the mathematical sense), that is, if two objects are equal, then their hash should also be equal.

Usually, the set of integers that the hash function maps to is much smaller than the set of objects, so that there will be multiple objects that hash to the same value. However, generally for a hash function to be useful, the set of integers should be large enough, and the hash function well distributed enough that if two objects hash to the same value, then they are very likely to be equal.

To summarize, a hash function must satisfy the property:

  • If two objects are equal, then their hashes should be equal.

Additionally, a good hash function should satisfy the property:

  • If two objects have the same hash, then they are likely to be the same object.

Since there are generally more possible objects than hash values, two objects may hash to the same value. This is called a hash collision, and anything that deals with hashes should be able to deal with them.

This won't be discussed here, but an additional property that a good hash function should satisfy to be useful is this:

  • The hash of an object should be cheap to compute.

What is it used for?

If we have a hash function that satisfies the above properties, then we can use it to create from a collection of objects something called a hash table. Suppose we have a collection of objects, and given any object, we want to be able to compute very quickly if that object belongs to our collection. We could store these objects in an ordered array, but then to determine if it is in the array, we would have to search potentially through every element of the array (in other words, an \(O(n)\)) algorithm.

With hashing, we can do better. We create what is known as a hash table. Instead of storing the objects in an ordered array, we create an array of buckets, each corresponding to some hash values. We then hash each object, and store it into the array corresponding to its hash value (if there are more hash values than buckets, we distribute them using a second hash function, which can be as simple as taking the modulus with respect to the number of buckets, % n).

This image from Wikipedia shows an example.

img

To determine if an object is in a hash table, we only have to hash the object, and look in the bucket corresponding to that hash. This is an \(O(1)\) algorithm, assuming we have a good hash function, because each bucket will generally hold very few objects, possibly even none.

Note: there are some additional things that need to be done to handle hash collisions, but the basic idea is the same, and as long as there aren't too many hash collisions, which should happen if hash values are evenly distributed and the size of the hash table is large compared to the number of objects stored in it, the average time to determine if an object is in the hash table is still \(O(1)\).

Hashing in Python

Python has a built in function that performs a hash called hash(). For many objects, the hash is not very surprising. Note, the hashes you see below may not be the same ones you see if you run the examples, because Python hashing depends on the architecture of the machine you are running on, and, in newer versions of Python, hashes are randomized for security purposes.

>>> hash(10)
10
>>> hash(()) # An empty tuple
3527539
>>> hash('a')
12416037344

In Python, not all objects are hashable. For example

>>> hash([]) # An empty list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

This is because Python has an additional restriction on hashing:

  • In order for an object to be hashable, it must be immutable.

This is important basically because we want the hash of an object to remain the same across the object's lifetime. But if we have a mutable object, then that object itself can change over its lifetime. But then according to our first bullet point above, that object's hash has to change too.

This restriction simplifies hash tables. If we allowed an object's hash to change while it is in a hash table, we would have to move it to a different bucket. Not only is this costly, but the hash table would have to notice that this happened; the object itself doesn't know that it is sitting in a hash table, at least not in the Python implementation.

In Python, there are two objects that correspond to hash tables, dict and set. A dict is a special kind of hash table called an associative array. An associative array is a hash table where each element of the hash table points to another object. The other object itself is not hashed.

Think of an associative array as a generalization of a regular array (like a list). In a list, objects are associated to nonnegative integer indices, like

>>> l = ['a', 'b', 7]
>> l[0]
'a'
>>> l[2]
7

In an associative array (i.e., a dict) we can index objects by anything, so long as the key is hashable.

>>> d = {0: 'a', 'hello': ['world']}
>>> d[0]
'a'
>>> d['hello']
['world']

Note that only the keys need to be hashable. The values can be anything, even unhashable objects like lists.

The uses for associative arrays are boundless. dict is one of the most useful data types in the Python language. Some example uses are

  • Extension of list with "missing values". For example, {0: 'a', 2: 7} would correspond to the above list l with the value 'b' corresponding to the key 1 removed.

  • Representation of a mathematical function with a finite domain.

  • A poor-man's database (the Wikipedia image above is an associative array mapping names to telephone numbers).

  • Implementing a Pythonic version of the switch-case statement.

The other type of hash table, set, more closely matches the definition I gave above for a hash table. A set is just a container of hashable objects. sets are unordered, and can only contain one of each object (this is why they are called "sets," because this matches the mathematical definition of a set).

In Python 2.7 or later, you can create a set with { and }, like {a, b, c}. Otherwise, use set([a, b, c]).

>>> s = {0, (), '2'}
>>> s
{0, '2', ()}
>>> s.add(1)
>>> s
{0, 1, '2', ()}
>>> s.add(0)
>>> s
{0, 1, '2', ()}

A final note: set and dict are themselves mutable, and hence not hashable! There is an immutable version of set called frozenset. There are no immutable dictionaries.

>>> f = frozenset([0, (), '2'])
>>> f
frozenset({0, '2', ()})
>>> hash(f)
-7776452922777075760
>>> # A frozenset, unlike a set, can be used as a dictionary key
>>> d[f] = 'a set'
>>> d
{0: 'a', frozenset({0, '2', ()}): 'a set', 'hello': ['world']}

Creating your own hashable objects

Before we move on, there is one final thing we need to know about hashing in Python, which is how to create hashes for custom objects. By default, if we create an object, it will be hashable.

>>> class Nothing(object):
...     pass
...
>>> N = Nothing()
>>> hash(N)
270498113

Implementation-wise, the hash is just the object's id, which corresponds to its position in memory. This satisfies the above conditions: it is (extremely) cheap to compute, and since by default objects in Python compare unequal to one another, objects with different hashes will be unequal.

>>> M = Nothing()
>>> M == N
False
>>> hash(M)
270498117
>>> hash(M) == hash(N)
False

To define a hash function for an object, define the __hash__ method.

>>> class HashToOne(object):
...     def __hash__(self):
...         return 1
...
>>> HTO = HashToOne()
>>> hash(HTO)
1

To set an object as not hashable, set __hash__ to None.

>>> class NotHashable(object):
...     __hash__ = None
...
>>> NH = NotHashable()
>>> hash(NH)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'NotHashable'

Finally, to override the equality operator ==, define __eq__.

>>> class AlwaysEqual(object):
...     def __eq__(self, other):
...         if isinstance(other, AlwaysEqual):
...             return True
...        return False
...
>>> AE1 = AlwaysEqual()
>>> AE2 = AlwaysEqual()
>>> AE1 == AE2
True

One of the key points that I hope you will take away from this post is that if you override __eq__, you must also override __hash__ to agree. Note that Python 3 will actually require this: in Python 3, you cannot override __eq__ and not override __hash__. But that's as far as Python goes in enforcing these rules, as we will see below. In particular, Python will never actually check that your __hash__ actually agrees with your __eq__.

Messing with hashing

Now to the fun stuff. What happens if we break some of the invariants that Python expects of hashing. Python expects two key invariants to hold

  1. The hash of an object does not change across the object's lifetime (in other words, a hashable object should be immutable).

  2. a == b implies hash(a) == hash(b) (note that the reverse might not hold in the case of a hash collision).

As we shall see, Python expects, but does not enforce either of these.

Example 1: Mutating a hash

Let's break rule 1 first. Let's create an object with a hash, and then change that object's hash over its lifetime, and see what sorts of things can happen.

>>> class Bad(object):
...     def __init__(self, hash): # The object's hash will be hash
...         self.hash = hash
...     def __hash__(self):
...         return self.hash
...
>>> b = Bad(1)
>>> hash(b)
1
>>> d = {b:42}
>>> d[b]
42
>>> b.hash = 2
>>> hash(b)
2
>>> d[b]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: <__main__.Bad object at 0x1047e7438>

Here, we implicitly changed the hash of b by mutating the attribute of b that is used to compute the hash. As a result, the object is no longer found in a dictionary, which uses the hash to find the object.

The object is still there, we just can't access it any more.

>>> d
{<__main__.Bad object at 0x1047e7438>: 42}

Note that Python doesn't prevent me from doing this. We could make it if we want (e.g., by making __setattr__ raise AttributeError), but even then we could forcibly change it by modifying the object's __dict__. We could try some more fancy things using descriptors, metaclasses, and/or __getattribute__, but even then, if we knew what was happening, we could probably find a way to change it.

This is what is meant when people say that Python is a "consenting adults" language. You are expected to not try to break things, but generally aren't prevented from doing so if you try.

Example 2: More mutation

Let's try something even more crazy. Let's make an object that hashes to a different value each time we look at the hash.

>>> class DifferentHash(object):
...     def __init__(self):
...         self.hashcounter = 0
...     def __hash__(self):
...         self.hashcounter += 1
...         return self.hashcounter
...
>>> DH = DifferentHash()
>>> hash(DH)
1
>>> hash(DH)
2
>>> hash(DH)
3

Obviously, if we use DH as a key to a dictionary, then it will not work, because we will run into the same issue we had with Bad. But what about putting DH in a set.

>>> DHset = {DH, DH, DH}
>>> DHset
{<__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>}

Woah! We put the exact same object in a set three times, and it appeared all three times. This is not what is supposed to happen with a set.

>>> {1, 1, 1}
{1}

What happens when we do stuff with DHset?

>>> DHset.remove(DH)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: <__main__.DifferentHash object at 0x1047e75f8>

That didn't work, because set.remove searches for an object by its hash, which is different by this point.

Now let's make a copy of DHset. The set.copy method will create a shallow copy (meaning that the set container itself will be different, according to is comparison, but the objects themselves will the same, according to is comparison).

>>> DHset2 = DHset.copy()
>>> DHset2 == DHset
True

Everything is fine so far. This object is only going to cause trouble if something recomputes its hash. But remember that the whole reason that we had trouble with something like Bad above is that Python doesn't recompute that hash of an object, unless it has to. So let's do something that will force it to do so: let's pop an object from one of the sets and add it back in.

>>> D = DHset.pop()
>>> DHset.add(D)
>>> DHset
{<__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>}
>>> DHset2
{<__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>}
>>> DHset == DHset2
False

There we go. By removing it from the set, we made the set forget about its hash, so it had to be recomputed when we added it again. This version of DHset now has a DH with a different hash than it had before. Thinking back to set being a hash table, in this DHset, the three DH objects are in different "buckets" than they were in before. DHset.__eq__(DHset2) notices that the bucket structure is different right away and returns False.

By the way, what hash value are we up to these days?

>>> hash(DH)
9

Example 3: When a == b does not imply hash(a) == hash(b)

Now let's look at point 2. What happens if we create an object with __eq__ that disagrees with __hash__. We actually already have made a class like this, the AlwaysEqual object above. Instances of AlwaysEqual will always compare equal to one another, but they will not have the same hash, because they will use object's default __hash__ of id. Let's take a closer look at the AE1 and AE2 objects we created above.

>>> hash(AE1)
270498221
>>> hash(AE2)
270498197
>>> hash(AE1) == hash(AE2)
False
>>> AE1 == AE2
True
>>> {AE1, AE2}
{<__main__.AlwaysEqual at 0x101f79950>,
 <__main__.AlwaysEqual at 0x101f79ad0>}

We can already see that we have broken one of the key properties of a set, which is that it does not contain the same object twice (remember that AE1 and AE2 should be considered the "same object" because AE1 == AE2 is True).

This can lead to subtle issues. For example, suppose we had a list and we wanted to remove all the duplicate items from it. An easy way to do this is to convert the list to a set and then convert it back to a list.

>>> l = ['a', 'a', 'c', 'a', 'c', 'b']
>>> list(set(l))
['a', 'c', 'b']

Now, this method is obviously not going to work for a list of AlwaysEqual objects.

>>> AE3 = AlwaysEqual()
>>> l = [AE1, AE1, AE3, AE2, AE3]
>>> list(set(l))
[<__main__.AlwaysEqual at 0x102c1d590>,
 <__main__.AlwaysEqual at 0x101f79ad0>,
 <__main__.AlwaysEqual at 0x101f79950>]

Actually, what happened here is that the equality that we defined on AlwaysEqual was essentially ignored. We got a list of unique items by id, instead of by __eq__. You can imagine that if __eq__ were something a little less trivial, where some, but not all, objects are considered equal, that this could lead to very subtle issues.

But there is an issue with the above algorithm. It isn't stable, that is, it removes the ordering that we had on the list. We could do this better by making a new list, and looping through the old one, adding elements to the new list if they aren't already there.

>>> def uniq(l):
...     newl = []
...     for i in l:
...         if i not in newl:
...             newl.append(i)
...     return newl
...
>>> uniq(['a', 'a', 'c', 'a', 'c', 'b'])
['a', 'c', 'b']
>>> uniq([AE1, AE1, AE3, AE2, AE3])
[<__main__.AlwaysEqual at 0x101f79ad0>]

This time, we used in, which uses ==, so we got only one unique element of the list of AlwaysEqual objects.

But there is an issue with this algorithm as well. Checking if something is in a list is \(O(n)\), but we have an object that allows checking in \(O(1)\) time, namely, a set. So a more efficient version might be to create a set alongside the new list for containment checking purposes.

>>> def uniq2(l):
...     newl = []
...     newlset = set()
...     for i in l:
...         if i not in newlset:
...             newl.append(i)
...             newlset.add(i)
...     return newl
...
>>> uniq2(['a', 'a', 'c', 'a', 'c', 'b'])
['a', 'c', 'b']
>>> uniq2([AE1, AE1, AE3, AE2, AE3])
[<__main__.AlwaysEqual at 0x101f79ad0>,
 <__main__.AlwaysEqual at 0x102c1d590>,
 <__main__.AlwaysEqual at 0x101f79950>]

Bah! Since we used a set, we compared by hashing, not equality, so we are left with three objects again. Notice the extremely subtle difference here. Basically, it is this:

>>> AE1 in {AE2}
False
>>> AE1 in [AE2]
True

Set containment uses hashing; list containment uses equality. If the two don't agree, then the result of your algorithm will depend on which one you use!

By the way, as you might expect, dictionary containment also uses hashing, and tuple containment uses equality:

>>> AE1 in {AE2: 42}
False
>>> AE1 in (AE2,)
True

Example 4: Caching hashing

If you ever want to add subtle bizarreness to a system, add some sort of caching, and then do it wrong.

As we noted in the beginning, one important property of a hash function is that it is quick to compute. A nice way to achieve this for heavily cached objects is to cache the value of the cache on the object, so that it only needs to be computed once. The pattern (which is modeled after SymPy's Basic) is something like this:

>>> class HashCache(object):
...     def __init__(self, arg):
...         self.arg = arg
...         self.hash_cache = None
...     def __hash__(self):
...         if self.hash_cache is None:
...             self.hash_cache = hash(self.arg)
...         return self.hash_cache
...     def __eq__(self, other):
...         if not isinstance(other, HashCache):
...             return False
...         return self.arg == other.arg
...

HashCache is nothing more than a small wrapper around a hashable argument, which caches its hash.

>>> hash('a')
12416037344
>>> a = HashCache('a')
>>> hash(a)
12416037344

For ordinary Python builtins, simply recomputing the hash will be faster than the attribute lookup used by HashCache. Note: This uses the %timeit magic from IPython. %timeit only works when run in IPython or Jupyter.

>>> %timeit hash('a')
10000000 loops, best of 3: 69.9 ns per loop
>>> %timeit hash(a)
1000000 loops, best of 3: 328 ns per loop

But for a custom object, computing the hash may be more computationally expensive. As hashing is supposed to agree with equality (as I hope you've realized by now!), if computing equality is expensive, computing a hash function that agrees with it might be expensive as well.

As a simple example of where this might be useful, consider a highly nested tuple, an object whose hash that is relatively expensive to compute.

>>> a = ()
>>> for i in range(1000):
...     a = (a,)
...
>>> A = HashCache(a)
>>> %timeit hash(a)
100000 loops, best of 3: 9.61 µs per loop
>>> %timeit hash(A)
1000000 loops, best of 3: 325 ns per loop

So far, we haven't done anything wrong. HashCache, as you may have noticed, has __eq__ defined correctly:

>>> HashCache(1) == HashCache(2)
False
>>> HashCache(1) == HashCache(1)
True

But what happens if we mutate a HashCache. This is different from examples 1 and 2 above, because we will be mutating what happens with equality testing, but not the hash (because of the cache).

In the below example, recall that small integers hash to themselves, so hash(1) == 1 and hash(2) == 2.

>>> a = HashCache(1)
>>> d = {a: 42}
>>> a.arg = 2
>>> hash(a)
1
>>> d[a]
42

Because we cached the hash of a, which was computed as soon as we created the dictionary d, it remained unchanged when modified the arg to be 2. Thus, we can still find the key of the dictionary. But since we have mutated a, the equality testing on it has changed. This means that, as with the previous example, we are going to have issues with dicts and sets keeping unique keys and entries (respectively).

>>> a = HashCache(1)
>>> b = HashCache(2)
>>> hash(a)
1
>>> hash(b)
2
>>> b.arg = 1
>>> a == b
True
>>> hash(a) == hash(b)
False
>>> {a, b}
{<__main__.HashCache at 0x102c32050>, <__main__.HashCache at 0x102c32450>}
>>> uniq([a, b])
[<__main__.HashCache at 0x102c32050>]
>>> uniq2([a, b])
[<__main__.HashCache at 0x102c32050>, <__main__.HashCache at 0x102c32450>]

Once we mutate b so that it compares equal to a, we start to have the same sort of issues that we had in example 3 with AlwaysEqual. Let's look at an instant replay.

>>> a = HashCache(1)
>>> b = HashCache(2)
>>> b.arg = 1
>>> print(a == b)
True
>>> print(hash(a) == hash(b))
True
>>> print({a, b})
set([<__main__.HashCache object at 0x102c32a10>])
>>> print(uniq([a, b]))
[<__main__.HashCache object at 0x102c32a50>]
>>> print(uniq2([a, b]))
[<__main__.HashCache object at 0x102c32a50>]

Wait a minute, this time it's different! Comparing it to above, it's pretty easy to see what was different this time. We left out the part where we showed the hash of a and b. When we did that the first time, it cached the hash of b, making it forever be 2, but when we didn't do it the second time, the hash had not been cached yet, so the first time it is computed (in the print(hash(a) == hash(b)) line), b.arg has already been changed to 1.

And herein lies the extreme subtlety: if you mutate an object with that hashes its cache like this, you will run into issues only if you had already called some function that hashed the object somewhere. Now just about anything might compute the hash of an object. Or it might not. For example, our uniq2 function computes the hash of the objects in its input list, because it stores them in a set, but uniq does not:

>>> a = HashCache(1)
>>> b = HashCache(2)
>>> uniq2([a, b])
>>> b.arg = 1
>>> print(a == b)
True
>>> print(hash(a) == hash(b))
False
>>> print({a, b})
set([<__main__.HashCache object at 0x102c32c50>, <__main__.HashCache object at 0x102c32c10>])
>>> print(uniq([a, b]))
[<__main__.HashCache object at 0x102c32c50>]
>>> print(uniq2([a, b]))
[<__main__.HashCache object at 0x102c32c50>, <__main__.HashCache object at 0x102c32c10>]
>>> a = HashCache(1)
>>> b = HashCache(2)
>>> uniq([a, b])
>>> b.arg = 1
>>> print(a == b)
True
>>> print(hash(a) == hash(b))
True
>>> print({a, b})
set([<__main__.HashCache object at 0x102c32c90>])
>>> print(uniq([a, b]))
[<__main__.HashCache object at 0x102c32bd0>]
>>> print(uniq2([a, b]))
[<__main__.HashCache object at 0x102c32bd0>]

The moral of this final example is that if you are going to cache something, that something had better be immutable.

Conclusion

The conclusion is this: don't mess with hashing. The two invariants above are important. Let's restate them here,

  1. The hash of an object must not change across the object's lifetime (in other words, a hashable object should be immutable).

  2. a == b implies hash(a) == hash(b) (note that the reverse might not hold in the case of a hash collision).

If you don't follow these rules, you will run into very subtle issues, because very basic Python operations expect these invariants.

If you want to be able to mutate an object's properties, you have two options. First, make the object unhashable (set __hash__ = None). You won't be able to use it in sets or as keys to a dictionary, but you will be free to change the object in-place however you want.

A second option is to make all mutable properties non-dependent on hashing or equality testing. This option works well if you just want to cache some internal state that doesn't inherently change the object. Both __eq__ and __hash__ should remain unchanged by changes to this state. You may also want to make sure you use proper getters and setters to prevent modification of internal state that equality testing and hashing does depend on.

If you choose this second option, however, be aware that Python considers it fair game to swap out two identical immutable (i.e., hashable) objects at any time. If a == b and a is hashable, Python (and Python libraries) are free to replace a with b anywhere. For example, Python uses an optimization on strings called interning, where common strings are stored only once in memory. A similar optimization is used in CPython for small integers. If store something on a but not b and make a's hash ignore that data, you may find that some function that should return a may actually return b. For this reason, I generally don't recommend this second option unless you know what you are doing.

Finally, to keep invariant 2, here are some tips:

  • Make sure that the parts of the object that you use to compare equality are not themselves mutable. If they are, then your object cannot itself be immutable. This means that if a == b depends on a.attr == b.attr, and a.attr is a list, then you will need to use a tuple instead (if you want a to be hashable).

  • You don't have to invent a hash function. If you find yourself doing bitshifts and XORs, you're doing it wrong. Reuse Python's builtin hashable objects. If the hash of your object should depend on the hash of a and b, define __hash__ to return hash((a, b)). If the order of a and b does not matter, use hash(frozenset([a, b])).

  • Don't cache something unless you know that the entire cached state will not be changed over the lifetime of the cache. Hashable objects are actually great for caches. If they properly satisfy invariant 1, and all the state that should be cached is part of the hash, then you will not need to worry. And the best part is that you can just use dict for your cache.

  • Unless you really need the performance or memory gains, don't make your objects mutable. This makes programs much harder to reason about. Some functional programming languages take this idea so far that they don't allow any mutable objects.

  • Don't worry about the situation where hash(a) == hash(b) but a != b. This is a hash collision. Unlike the issues we looked at here, hash collisions are expected and checked for in Python. For example, our HashToOne object from the beginning will always hash to 1, but different instances will compare unequal. We can see that the right thing is done in every case with them.

    >>> a = HashToOne()
    >>> b = HashToOne()
    >>> a == b
    False
    >>> hash(a) == hash(b)
    True
    >>> {a, b}
    {<__main__.HashToOne at 0x102c32a10>, <__main__.HashToOne at 0x102c32cd0>}
    >>> uniq([a, b])
    [<__main__.HashToOne at 0x102c32cd0>, <__main__.HashToOne at 0x102c32a10>]
    >>> uniq2([a, b])
    [<__main__.HashToOne at 0x102c32cd0>, <__main__.HashToOne at 0x102c32a10>]
    

    The only concern with hash collisions is that too many of them can remove the performance gains of dict and set.

  • Conversely, if you are writing something that uses an object's hash, remember that hash collisions are possible and unavoidable.

    A classic example of a hash collision is -1 and -2. Remember I mentioned above that small integers hash to themselves:

    >>> hash(1)
    1
    >>> hash(-3)
    -3
    

    The exception to this is -1. The CPython interpreter uses -1 as an error state, so -1 is not a valid hash value. Hence, hash(-1) can't be -1. So the Python developers picked the next closest thing.

    >>> hash(-1)
    -2
    >>> hash(-2)
    -2
    

    If you want to check if something handles hash collisions correctly, this is a simple example. I should also note that the fact that integers hash to themselves is an implementation detail of CPython that may not be true in alternate Python implementations.

  • Finally, we didn't discuss this much here, but don't assume that the hash of your object will be the same across Python sessions. In Python 3.3 and up, hash values of strings are randomized from a value that is seeded when Python starts up. This also affects any object whose hash is computed from the hash of strings. In Python 2.7, you can enable hash randomization with the -R flag to the interpreter. The following are two different Python sessions.

    >>> print(hash('a'))
    -7750608935454338104
    
    >>> print(hash('a'))
    8897161376854729812
    

December 19, 2015

Note: No Starch Press has sent me a copy of this book for review purposes.

SHORT VERSION: Doing Math with Python is well written and introduces topics in a nice, mathematical way. I would recommend it for new users of SymPy.

Doing Math with Python by Amit Saha is a new book published by No Starch Press. The book shows how to use Python to do high school-level mathematics. It makes heavy use of SymPy in many chapters, and this review will focus mainly on those parts, as that is the area I have expertise in.

The book assumes a basic understanding of programming in Python 3, as well as the mathematics used (although advanced topics are explained). No prior background in the libraries used, SymPy and matplotlib, is assumed. For this reason, this book can serve as an introduction them. Each chapter ends with some programming exercises, which range from easy exercises to more advanced ones.

The book has seven chapters. In the first chapter, "Working with numbers", basic mathematics using pure Python is introduced (no SymPy yet). It should be noted that Python 3 (not Python 2) is required for this book. One of the earliest examples in the book (3/2 == 1.5) will not work correctly without it. I applaud this choice, although I might have added a more prominent warning to wary users. (As a side note, in the appendix, it is recommended to install Python via Anaconda, which I also applaud). This chapter also introduces the fractions module, which seems odd since sympy.Rational will be implicitly used for rational numbers later in the text (to little harm, however, since SymPy automatically converts fractions.Fraction instances to sympy.Rational).

In all, this chapter is a good introduction to the basics of the mathematics of Python. There is also an introduction to variables and strings. However, as I noted above, one should really have some background with basic Python before reading this book, as concepts like flow control and function definition are assumed (note: there is an appendix that goes over this).

Chapters 2 and 3 cover plotting with matplotlib and basic statistics, respectively. I will not say much about the matplotlib chapter, since I know only basic matplotlib myself. I will note that the chapter covers matplotlib from a (high school) mathematics point of view, starting with a definition of the Cartesian plane, which seems a fitting choice for the book.

Chapter 3 shows how to do basic statistics (mean, median, standard deviation, etc.) using pure Python. This chapter is clearly meant for pedagogical purposes for basic statistics, since the basic functions mean, median, etc. are implemented from scratch (as opposed to using numpy.mean or the standard library statistics.mean). This serves as a good introduction to more Python concepts (like collections.Counter) and statistics.

Note that the functions in this chapter assume that the data is the entire population, not a sample. This is mentioned at the beginning of the chapter, but not elaborated on. For example, this leads to a different definition of variance than what might be seen elsewhere (the calculate_variance used in this chapter is statistics.pvariance, not statistics.variance).

It is good to see that a numerically stable definition of variance is used here (see PEP 450 for more discussion on this). These numerical issues show why it is important to use a real statistics library rather than a home grown one. In other words, use this chapter to learn more about statistics and Python, but if you ever need to do statistics on real data, use a statistics library like statistics or numpy. Finally, I should note that this book appears to be written against Python 3.3, whereas statistics was added to the Python standard library in Python 3.4. Perhaps it will get a mention in future editions.

Chapter 4, "Algebra and Symbolic Math with SymPy" starts the introduction to SymPy. The chapter starts similar to the official SymPy tutorial in describing what symbolics is, and guiding the reader away from common misconceptions and gotchas. The chapter does a good job of explaining common gotchas and avoiding antipatterns.

This chapter may serve as an alternative to the official tutorial. Unlike the official tutorial, which jumps into higher-level mathematics and broader use-cases, this chapter may be better suited to those wishing to use SymPy from the standpoint of high school mathematics.

My only gripes with this chapter, which, in total, are minor, relate to printing.

  1. The typesetting of the pretty printing is inconsistent and, in some cases, incorrect. Powers are printed in the book using superscript numbers, like

    However, SymPy prints powers like

     2
    x
    

    even when Unicode pretty printing is enabled. This is a minor point, but it may confuse users. Also, the output appears to use ASCII pretty printing (mixed with superscript powers), for example

        x²   x³   x⁴   x⁵
    x + -- + -- + -- + --
        2    3    4    5
    

    Most users will either get MathJax printing (if they are using the Jupyter notebook), or Unicode printing, like

         2    3    4    5
        x    x    x    x
    x + ── + ── + ── + ──
        2    3    4    5
    

    Again, this is a minor point, but at the very least the correct printing looks better than the fake printing used here.

  2. In line with the previous point, I would recommend telling the user to start with init_printing(). The function is used once to change the order of printing to rev-lex (for series printing). There is a link to the tutorial page on printing. That page goes into more depth than is necessary for the book, but I would recommend at least mentioning to always call init_printing(), as 2-D printing can make a huge difference over the default str printing, and it obviates the need to call pprint.

Chapter 5, "Playing with Sets and Probability" covers SymPy's set objects (particularly FiniteSet) to do some basic set theory and probability. I'm excited to see this in the book. The sets module in SymPy is relatively new, but quite powerful. We do not yet have an introduction to the sets module in the SymPy tutorial. This chapter serves as a good introduction to it (albeit only with finite sets, but the SymPy functions that operate on infinite sets are exactly the same as the ones that operate on finite sets). In all, I don't have much to say about this chapter other than that I was pleasantly surprised to see it included.

Chapter 6 shows how to draw geometric shapes and fractals with matplotlib. I again won't say much on this, as I am no matplotlib expert. The ability to draw leaf fractals and Sierpiński triangles with Python does look entertaining, and should keep readers enthralled.

Chapter 7, "Solving Calculus Problems" goes into more depth with SymPy. In particular, assumptions, limits, derivatives, and integrals. The chapter alternates between symbolic formulations using SymPy and numeric calculations (using evalf). The numeric calculations are done both for simple examples and more advanced things (like implementing gradient descent).

One small gripe here. The book shows that

from sympy import Symbol
x = Symbol('x')
if (x + 5) > 0:
    print('Do Something')
else:
    print('Do Something else')

raises TypeError at the evaluation of (x + 5) > 0 because its truth value cannot be determined. The solution to this issue is given as

x = Symbol('x', positive=True)
if (x + 5) > 0:
    print('Do Something')
else:
    print('Do Something else')

Setting x to be positive via Symbol('x', positive=True) is correct, but even in this case, evaluating an inequality may still raise a TypeError (for example, if (x - 5) > 0). The better way to do this is to use (x + 5).is_positive. This would require a bit more discussion, especially since SymPy uses a three-valued logic for assumptions, but I do consider "if <symbolic inequality>" to be a SymPy antipattern.

I like Saha's approach in this chapter of first showing unevaluated forms (Limit, Derivative, Integral), and then evaluating them with doit(). This puts users in the mindset of a mathematical expression being a formula which may or may not later be "calculated". The opposite approach, using the function forms, limit, diff, and integrate, which evaluate if they can and return an unevaluated object if they can't, can be confusing to new users in my experience. A common new SymPy user question is (some form of) "how do I evaluate an expression?" (the answer is doit()). Saha's approach avoids this question by showing doit() from the outset.

I also like that this chapter explains the gotcha of math.sin(Symbol('x')), although I personally would have included this earlier in the text.

(Side note: now that I look, these are both areas in which the official tutorial could be improved).

Summary

This book is a good introduction to doing math with Python, and, for the chapters that use it, a good basic introduction to SymPy. I would recommend it to anyone wishing to learn SymPy, but especially to anyone whose knowledge of mathematics may preclude them from getting the most out of the official SymPy tutorial.

I imagine this book would work well as a pedagogical tool, either for math teachers or for self-learners. The exercises in this book should push the motivated to learn more.

I have a few minor gripes, but no major issues.

You can purchase this book from the No Starch Press website, both as a print book or an ebook. The website also includes a sample chapter (chapter 1), code samples from the book, and exercise solutions.

October 21, 2015

The excitement

People travelling from all over the country(and outside!) to Bangalore for a conference on a weekend, Yay!
We were really excited about the workshop and devsprint that the SymPy team was about to deliver. More so excited we were about the fact that we will finally be meeting one another.

Day 0

DevSprint

The first day of the conference kicked off with the devsprints. That morning the whole team met up, present there were Harsh, Sudhanshu, AMiT, Sartaj, Shivam and Sumith . Abinash couldn't make it but he was there in spirit :)
We all got our awesome SymPy tees and stickers, thanks to AMiT.
Having got alloted mentoring space in the devsprint, basic introduction of SymPy was given by Sumith. Some interesting mentoring spaces were CPython by Kushal Das, Data Science by Bargava. The whole list is here
We got the participants started off with setting up the development workflow of SymPy and then they started working on the internals. We alloted bugs to many and directed them to the solution. Sadly, not many issues could alloted or closed due to the really poor internet connection at the conference hall but it was cool interacting with the enthusiasts. We also happened to meet Saurabh Jha, a contributor to SymPy who had worked on Linear Algebra and he helped us out with the devsprint.

Workshop

The workshops ran in two and a half hour slot. This was conducted by Harsh, Sudhanshu, AMiT and Sumith.
Sumith started off with introduction to SymPy. Then we spent some helping everyone setup their systems with SymPy and IPython notebooks, even though prior instructions were given, we had to do this so as to get everyone on level ground.

Harsh took first half of the content and exercises
Sudhanshu took the second half, while AMiT and Sumith were helping out the participants with their queries.

PyCon

We distributed t-shirts to all the participants at the end. Thanks to all those who participated, we had an awesome time.

PyCon

Day 0 ended with all of us wrapping off the devsprint.
After having dinner together, everybody headed back looking forward to the coming two days of the conference.

Day 1

Day 1 started off with a keynote by Dr Ajith Kumar B.P followed by multiple talks and lightning talks.
More interesting than the scheduled talks were the conversations that we had with people present in the conference. Exchanging views, discussing on a common point of interest was surely one of the best experience that I had.

Lightning talk

Shivam delivered a lightning talk titled Python can be fast. Here, he stressed on the fact that implementing correct data structures is important and Python is not always to be blamed. He gave relevant examples from his summers work at SymPy.

PyCon

By this point, we had reached considerable audience in the conference and lot of them were really interested in SymPy. We had a lot of younger participants who were enthusiastic about SymPy as it participates in GSoC, some of them also sent in patches.

Day 2

Day 2 started off with a keynote by Nicholas H.Tollervey.

Talk

Sumith delivered a talk titled SymEngine: The future fast core of computer algebra systems. The content included SymPy, SymEngine and the interface. Some light was shed on Python wrappers to C++ code. Thanks to all the audience present there.

PyCon

As the day was closing in, Harsh and Shivam had to leave to catch their flights.

Open Space

After multiple people requesting to help them get started with SymPy, we decided to conduct an open space.
Open spaces are a way for people to come together to talk about topics, ideas or whatever they want. All people had to do is just show up :) Present there were Sudhanshu, Sartaj, AMiT and Sumith. Sartaj luckily came up with a solveset bug. We had a live show of how bug-fixing is done. Filing an issue, fixing the code, writing tests and sending in a PR was all demonstrated.

PyCon

Closing thoughts

Conferences are the perfect place to discuss and share knowledge and ideas. The people present there were experts in their area of interests and conversations with them is a cool experience. Meeting the team was something that we were looking forward right from the start.

PyCon

Missing Sartaj and Abinash

PyCon

Discussing SymPy and the gossips in person is a different experience altogether. I'll make sure to attend all the conference that I possibly can from hereon.

Thanks for the reading
Be back for more

October 05, 2015

Last Friday was my last day working at Continuum Analytics. I enjoyed my time at the company, and wish success to it, but the time has come for me to move on. Starting later this year, I will start working with Anthony Scopatz at his new lab ERGS at the University of South Carolina.

During my time at Continuum (over two years if you count a summer internship), I primarily worked on the Anaconda distribution and its open source package manager, conda. I learned a lot of lessons in that time, and I'd like to share some of them here.

In no particular order:

  • Left to their own devices, people will make the minimal possible solution to packaging. They won't try to architect something. The result will be over-engineered, specific to their use-case, and lack reproducibility.

  • The best way to ensure that some software has no bugs is for it to have many users.

  • Be wary of the "software would be great if it weren't for all the users" mentality (cf. the previous point).

  • Most people don't code defensively. If you are working on a project that requires extreme stability, be cautious of contributions from those outside the development team.

  • Hostility towards Windows and Windows users doesn't help anyone.

  • https://twitter.com/asmeurer/status/593170976981913600

  • For a software updater, stability is the number one priority. If the updater breaks, how can a fix be deployed?

  • Even if you configure your program to update itself every time it runs you will still get bug reports with arbitrarily old versions.

  • Separating components into separate git repositories leads to a philosophical separation of concerns among the components.

  • Everyone who isn't an active developer on the project will ignore this separation and open issues in the wrong repo.

  • Avoid object oriented programming when procedural programming will do just fine.1

  • Open source is more about the open than the source. Develop things in the open, and you will create a community that respects you.1

  • Academics (often) don't know good software practices, nor good licensing practices.

  • Neither do some large corporations.

  • Avoid over-engineering things.

  • Far fewer people than I would have thought understand the difference between hard links and soft links.2

  • Changelogs are useful.

  • Semantic versioning is over-hyped.

  • If you make something and release it, the first version should be 1.0 (not 0.1 or 0.0.1).

  • Getting a difficult package to compile is like hacking a computer. All it takes is time.

  • It doesn't matter how open source friendly your business is, there will always be people who will be skeptical and point their fingers at the smallest proprietary components, fear monger, and overgeneralize unrelated issues into FUD. These people should generally be ignored.

  • Don't feed the trolls.1

  • People constantly misspell the name of Apple's desktop operating system.

  • People always assume you have way more automation than you really do.

  • The Python standard library is not a Zen garden. Some parts of it are completely broken, and if you need to rely on them, you'll have to rewrite them. shutil.rmtree on Windows is one example of this.

  • Linux is strictly backwards compatible. Windows is strictly forwards compatible. 3

  • On Linux, things tend to be very simple. On Windows, things tend to be very complicated.

  • I can't decide about OS X. It lies somewhere in between.

  • Nobody uses 32-bit Linux. Why do we even support that?

  • People oversimplify the problem of solving for package dependencies in their heads. No one realizes that it's meaningless to say something like "the dependencies of NumPy" (every build of every version of NumPy has its own set of dependencies, which may or may not be the same).

  • Writing a set of rules and a solver to solve against those rules is relatively easy. Writing heuristics to tell users why those rules are unsolvable when they are is hard.

  • SAT solvers solve NP-complete problems in general, but they can be very fast to solve common case problems. 1

  • Some of the smartest people I know, who otherwise make very rational and intelligent decisions, refuse to update to Python 3.

  • As an introvert, the option of working from home is great for maintaining sanity.

  • Aeron chairs are awesome.

  • If living in Austin doesn't turn you into a foodie you will at least gain a respect for them.

  • Twitter, if used correctly, is a great way to interact with your users.

  • Twitter is also a great place to learn new things. Follow John Cook and Bret Victor.

  • One of the best ways to make heavily shared content is to make it about git (at least if you're an expert).

  • A good optimization algorithm avoids getting caught in local maxima by trying different parts of the search space that initially appear to be worse. The same approach should be taken in life.

Footnotes


  1. These are things that I already knew, but were reiterated. 

  2. If you are one of those people, I have a small presentation that explains the difference here 

  3. These terms can be confusing, and I admit I got this backwards the first time I wrote this. According to Wikipedia, forwards compatible means a system can accept input intended for a later version of itself and backwards compatible means a system can accept input intended for an earlier version of itself.

    What I specifically mean here is that in terms of building packages for Linux or Windows, for Linux, you should build a package on the oldest version that you wish to support. That package will work on newer versions of Linux, but not anything older (generally due to the version of libc you are linked against).

    On the other hand, on Windows, you can can compile things on the newest version (I used Windows 8 on my main Windows build VM), and it will work on older versions of Windows like XP (as long as you ship the right runtime DLLs). This is also somewhat confusing because Windows tends to be both forwards compatible and backwards compatible. 

August 27, 2015

Hi! I am Amit Kumar (@aktech), a final year undergraduate student of Mathematics & Computing at Delhi Technological University. This post summarizes my experience working on GSoC Project on Improving Solvers in SymPy.

Introduction

I first stumbled upon SymPy last year, while looking for some Open Source Computer Algebra Systems to contribute. I didn't had any Open Source experience by then, So SymPy was an Ideal Choice for getting into the beautiful world of Open Source. I wasn't even Proficient in Python so at first it was little difficult for me, but Thanks! to the beauty of the language itself, which makes anyone comfortable with it in no time. Soon, I decided to participate into Google Summer of Code under SymPy. Though at this point of time, I didn't had decided about the project, I would like to work in Summers.

First Contribution

I started learning the codebase & made my first contribution by Fixing an EasyToFix bug in solvers.py through the PR #8647, Thanks to @smichr for helping me making my first ever open source contribution. After my first PR, I started looking for more things to work and improve upon and I started commiting quite often. During this period I learnt the basics of Git, which is one of the most important tools for contributing to Open Source.

Project Ideas

When I got a bit comfortable with the basics of SymPy & contributing to open source in general, I decided to chose an area (module) to concentrate on. The modules I was interested in, were Solvers and Integrals, I was literally amazed by the capability of a CAS to integrate and solve equations. I decided to work on one of these in the summers. There was already some work done on the Integrals module in 2013, which was yet to be Merged. I wasn't well versed about the Manuel Bronsteins works on Methods of Integration in a Computer Algebra System, so I was little skeptical about working on Integrals. The Solvers module attracted me due it's awesome capabilities, I found it one of the most useful features of any Computer Algebra Systems, So I finally decided to work on Solvers Module.

Coding

I was finally accepted to work on Solvers this summer. I had my exams during the community bonding period, So I started almost in the first week of Coding Period. I made a detailed timeline of my work in summers, but through my experience I can say that's seldom useful. Since, you never know what may come out in between you and your schedule. As an instance PR #9540, was a stumbling block in lot of my work, which was necessary to fix for proceeding ahead.

Phase I (Before Mid Terms)

When coding period commenced, I started implementing the linsolve, the linear system solver which is tolerant to different input forms & can solve almost all forms of linear systems. At the start I got lot of reviews from Jason and Harsh, regarding improvement of the function. One of the most important thing I learnt which they focused on was Test Driven Development, they suggested me to write extensive tests before implementing the logic, which helps in reducing the problems in visualizing the final implementaion of the function and avoids API changes.

After linsolve I implemented ComplexPlane, which is basically Complex Sets. It is useful for representing infinite solutions in argand plane. While implementing this I learnt that chosing the right API is one of the most important factors while designing aa important functionality. To know more about it, see my blog post here. During this period I also worked on fixing Intersection's of FiniteSet with symbolic elements, which was a stumbling block.

Phase II (After Mid Terms)

After successfully passing the Mid Terms, I started working more on robustness of solveset, Thanks to @hargup for pointing out the motivation for this work. The idea is to tell the user about the domain of solution returned. Simplest motivation was the solution of the equation |x| - n, for more info see my blog post here. I also worked on various trivial and non trivial bugs which were more or less blocking my work.

Then I started replacing solve with solveset in the codebase, the idea was to make a smooth transition between solve and solveset, while doing this Jason pointed out that I should not remove solve tests, which can make solve vunerable to break, So I reverted removing of solve tests. Later we decided to add domain argument to solveset, which would help the user in easily dictating to solveset about what solutions they are interested in, thanks to @shivamvats for doing this in a PR. After the decision of adding domain argument, Harsh figured out that, as of now solveset is vunerable to API changes, so it's not the right time to replace solve with solveset, so we decided to halt this work, as a result I closed my several PR's unmerged.

I also worked on Implementing Differential Calculus Method such as is_increasing etc, which is also Merged now. Meanwhile I have been working on documenting solveset, because a lot of people don't know what we are doing & why we are doing, so It's very important to answer all those subtle questions which may come up in there mind, So we decided to create a FAQ style documentation of solveset see PR #9500. This is almost done, some polishing is needed. It would be Merged soon.

During this period apart from my work, there are some other works as well which is worth mentioning, one of them is ConditionSet by Harsh which serves the purpose of unevaluated solve object and even much more than that for our future endeavours with solveset. Others being codomain & not_empty by Gaurav @gxyd which are also important additions to SymPy.

Advice

TODO: Probably, this will need a comprehensive post, I would write soon.

Future Plans

Recently Harsh came up with an idea of tree based solver. Since now ConditionSet has been introduced, the solving of equations can be seen as set transformation, We can do the following things to solve equations (abstract View):

  • Apply Various Set Transformations on the given Set.
  • Define a Metric of the usability or define a notion of better solution over others.
  • Different Transformation would be the nodes of the tree.
  • Suitable searching techniques could be applied to get the best solution.

For more info see mailing list thread here.

As a part of this I worked on implementing a general decomposition function decompogen in PR #9831, It's almost done, will be merged soon.

I plan for a long term association with SymPy, I take the full responsibilty of my code. I will try to contribute as much as I can particularly in sets and solvers module.

Conclusion

On a concluding note, I must say that getting the opportunity to work on SymPy this summer has been one of the best things that could happen to me. Thanks to Harsh for helping me all my endeavour, also for being one of the best mentors I could get. I would like to thank Sean as well who from his busy schedule took up the time to attend meetings, hangouts and for doing code reviews. Also thanks to Chris Smith who is the most gentle and helpful person I have ever seen, he is one of the reasons I started contributing to SymPy. Thanks to Aaron, Ondrej, and last but not the least my fellow GSoCer's at SymPy leosartaj, debugger22, sumith1896, shivamvats, abinashmeher999. Special Thanks to whole SymPy team and Community for a wonderful collaboration experience. Kudos!

August 23, 2015

This week we announced the release of SymEngine on Sage list. For that, I made some changes into the build system for versioning and to use SymEngine from other C/C++ projects.
First, SymEngineConfig.cmake would output a set of flags, imported dependencies, etc. SymEngineConfigVersion.cmake would check that the version is compatible and if the 32/64-bitness is correct of the SymEngine project and the other CMake project. When SymEngine is only built, then these files would be at the root level and when installed they would be at /lib/cmake/symengine. An excerpt from the wiki page, I wrote at, https://github.com/sympy/symengine/wiki/Using-SymEngine-from-a-Cpp-project
Using SymEngine in another CMake project
To use SymEngine from another CMake project include the following in yourCMakeLists.txt file
find_package(SymEngine 0.1.0 CONFIG)
You can give the path to the SymEngine installation directory if it was installed to a non standard location by,
find_package(SymEngine 0.1.0 CONFIG PATHS /path/to/install/dir/lib/cmake/symengine)
Alternatively, you can give the path to the build directory.
find_package(SymEngine 0.1.0 CONFIG PATHS /path/to/build/dir)
An example project would be,
cmake_minimum_required(VERSION 2.8)
find_package(symengine 0.1.0 CONFIG)
set(CMAKE_CXX_FLAGS_RELEASE ${CMAKE_CXX_FLAGS_RELEASE} "-std=c++0x")

include_directories(${SYMENGINE_INCLUDE_DIRS})
add_executable(example main.cpp)
target_link_libraries(example ${SYMENGINE_LIBRARIES})
More options are here
Using SymEngine in Non CMake projects
You can get the include flags and link flags needed for SymEngine using the command line CMake.
compile_flags=`cmake --find-package -DNAME=SymEngine -DCOMPILER_ID=GNU -DLANGUAGE=CXX -DMODE=COMPILE`
link_flags=`cmake --find-package -DNAME=SymEngine -DCOMPILER_ID=GNU -DLANGUAGE=CXX -DMODE=LINK`

g++ $compile_flags main.cpp $link_flags
Python wrappers
There was a suggestion to make the Python wrappers separate, so that in a distribution like Gentoo, the package sources can be distributed separately.
So, I worked on the Python wrappers to get them to be built independently or with the main repo. Now, the python wrappers directory along with the setup.py file from the root folder can be packaged and they would work without a problem.

August 21, 2015

From not knowing anything considerable in programming and open source to reaching this level, has been a wonderful ride. Google Summer of Code has been full of ups and downs but none the less exhilarating.

Didn't even know at the time of my first patch that I would be so closely associated to SymEngine and the team members just a few months down the line.

After a couple of bug fixes, my first major contribution came in as the UnivariatePolynomial class. The biggest challenge here was implementing multiplication using Kronecker's trick. This was my first experience of implementing an algorithm from a paper. The UnivariatePolynomial class shaped up really well, there are minor improvements that can be made and some optimizations that could be done. But standalone, it is a fully functional class.

Once this was done, my next aim was to optimize multiplication to reach Piranha's speed. This was a very enriching period and the discussions with the team members and Francesco was a great learning experience. En route, I also got a chance to explore Piranha under the hood and trouble Francesco for reasoning why certain things were the way they. End of this, we were able to hit Piranha's speed. I remember I was the happiest I had been in days.

Once we hit the lower level speed, we decided to hard-depend on Piranha for Polynomial. This meant adding Piranha as SymEngine dependence. Here I had to learnt how to write and wrote CMake files as well as setting up Piranha testing in Travis meant writing shell and CI scripts. We faced a problem here, resolution to which meant implementing Catch as a testing framework for SymEngine. Catch is an awesome library and community is very pleasant. Implementing this was a fun work too. Also the high level value class Expression was implemented in SymEngine, mostly taken from Francesco's work.

I then started writing the Polynomial class, most of the work is done here(597). But the design is not very well thought of. I say this because once ready this can only support integer(ZZ) domain. But we will also need rational(QQ) and expression(EX). The code will be of much use but we have been discussing a much cleaner implementation with Ring class. Most of the progress and the new design decisions are being documented here.

Second half has been really rough, with the university running. Ondrej has been really patient with me, I thank him for that. The bond that I made with him through mails, technical and non technical, has really grown strong. He has allowed me to continue the work the Polynomial and implement more details and algorithms in future. I am looking forward to that as long term association is an amazing thing and I am proud to be responsible for the Polynomial module in SymEngine.

I am indebted to my mentor Ondrej Certik and all the SymEngine and SymPy developers who were ever ready to help and answer my silliest of questions. It’s an amazing community and they are really very helpful and always appreciated even the smallest of my contributions. The best part of SymEngine is you know contributors one to one and it is like a huge family of learners. I am looking forward to meeting the team (atleast SymPy India in near future).

Google Summer of Code has been one exhilarating journey. I don't know if I was a good programmer then or a good programmer now but I can say that I am a better programmer now.

This is just the beginning of the ride, GSoC a stepping stone.

There will be blog posts coming here, so stay tuned. Till then,
Bye

August 20, 2015

This is the 12th week. Hard deadline is this Friday. GSoC is coming to an end leaving behind a wonderful experience. Well here's how my past few weeks went.

Highlights:

Work on Formal Power Series:

  • #9776 added the fps method in Expr class. Instead of fps(sin(x)), user can now simply do sin(x).fps().
  • #9782 implements some basic operations like addition, subtraction on FormalPowerSeries. The review is almost complete and should get merged soon.
  • #9783 added the sphinx docs for the series.formal module.
  • #9789 replaced all the solve calls in the series.formal with the new solveset function.

Work on computing limits of sequences:

This is the second part of my GSoC project aiming to implement the algorithm for computing limits of sequences as described in the poster Computing Limits Of Sequences by Manuel Kauers.

  • #9803 implemented the difference_delta function. difference_delta(a(n), n) is defined as a(n + 1) - a(n). It is the discrete analogous of differentiation.
  • #9836 aims at completing the implementation of the algorithm. It is still under review and hopefully it will be in soon.

Final Tasks:

Get #9782 and #9836 merged soon.

Upcoming:

A thank you post ;)


Older blog entries


Planet SymPy is made from the blogs of SymPy's contributors. The opinions it contains are those of the contributor. This site is powered by Rawdog and Rawdog RSS. Feed readers can read Planet SymPy with RSS, FOAF or OPML.