April 29, 2016

I have been selected for GSoC’16! The results came out on Apr 23, and I have never been happier! I got around to writing this blog post only now, because of my end semester examinations which ended yesterday. I have been alotted Isuru and Sumith as my official mentors. I’m very excited to start working on the project, alongside them.

Right now, I’ll start my discussions on the implementation details, and overall structure of the code. Also I will begin work on the Fast Fourier algorithm for univariate polynomial multiplication.

Looking forward to a busy summer!

April 26, 2016

So, I have been accepted for the Google Summer of Code – 2016 for the project “Ruby Wrappers for SymEngine”, under the mentoring organization SciRuby.

The aim of this post is to give an introduction to the project.

The abstract of the project is as follows:

A project started by the SymPy organisation, SymEngine is a standalone fast C++ symbolic manipulation library. It solves mathematical problems the same way a human does, but way more quickly and precisely. The motivation for SymEngine is to develop the Computer Algebra System once in C++ and then use it from other languages rather than doing the same thing all over again for each language that it is required in. The project for Ruby bindings has already been setup at symengine.rb. Few things that the project involves are:
  • Extending the C interface of SymEngine library.
  • Wrapping up the C interface for Ruby using Ruby C API, including error handling.
  • Designing the Ruby interface.
  • Integrating IRuby with symengine gem for better printing and writing IRuby notebooks.
  • Integrating the gem with existing gems like gmp, mpfr and mpc.
  • Making the installation of symengine gem easier.

If you are interested, the full proposal, which includes the timeline is available online.

Also, the GitHub repository for the project is at SymEngine/SymEngine.rb.

The actual coding phase starts in about a month, and before that I plan to complete the Ruby Wrappers for the Trigonometric and Hyperbolic Functions and to write the necessary tests. Next, the NTheory CWrappers can be wrapped into Ruby. This too will be done before the GSoC period starts.

Keep checking the blog if you are interested to track the progress of this project. I will be posting weekly updates in the blog.

Auf Wiedersehen!

 


March 25, 2016

Hi there! Last week I went to Singapore for FOSSASIA Open Tech Summit 2016. I conducted a Worskhop on SymPy and assisted the PyDy Workshop in Python track hosted by Kushal Das. This blog post accounts to my experience as a speaker, as a attendee at FOSSASIA and as a traveler to Singapore.

About FOSSASIA

FOSSASIA is the premier Free and Open Source technology event in Asia for developers, start-ups, and contributors. Projects at FOSSASIA range from open hardware, to design, graphics and software. FOSSASIA was established in 2009. Previous events took place in Cambodia and Vietnam.

As the name suggests its one of the largest tech conferences in Asia and my expectations were pretty high from this conference and moreover It was my first international conference. I witnessed lots of amazing people in the conference and interacted with a few as well. This is how it started:

The SymPy/PyDy Workshop

Community is more important than Code @ Singapore Science Center Level 3, Pauling Lab

The SymPy and PyDy workshop was scheduled on 20th March at 1:00 - 2:00 PM (PyDy) and 2:00 - 4:00 PM (SymPy). Jason suggested to conduct the SymPy workshop first since PyDy uses SymPy and it would be easier for people to learn SymPy first and then PyDy, but since the schedule was already published, It was not possible to reschedule the workshops, so we had to continue with PyDy first. Sahil started the PyDy workshop at 1:00 PM, though we had to spend a lot of time installing Anaconda to everyone's systems by creating a local server and distributing flash drives as most of the people didn't had Anaconda or Canopy installed. This has been the problem for almost all the workshops I have conducted in the past. It seems I need to invent an efficient way to do this faster in future as we spent 30-40 odd minutes in installation.

Fortunately Sahil finished his presentation at around 2:15 PM. Then I took over for SymPy workshop, I started with the basic introduction to SymPy, the slides can be found here. Then I jumped to IPython notebook exercises to demonstrate more of SymPy. People were amazed by the capabilities of this amazing piece of software. The most beautiful feature they liked was printing and integration. The workshop went pretty well except for the glitches in the HDMI port of my laptop (probably, its the right time to get a new laptop). Here are some SymPy stickers for you, if you missed there.

Singapore was Fun ;)

Visiting Singapore has been a great experience, the culture is a mix of Malaysian, Indian and native Singaporean. The City is well connected with MRT/SMRT (Metro and Buses). It's quite easy get anywhere around the city. People here are very helpful and nice. I didn't faced any problems throughout my stay there. I spent most of my time near Science Center, China Town and Little India. There were lot of people from India and particularly from Delhi and three from my University. It was awesome time spent with geeks all around. Tagging some of them Mayank, Ishaan, Umair, Jigyasa, Yask, Garvit, Manan, sorry If I missed someone. Here is a pic of the last day of the conference.

Thank you!

Thank you FOSSASIA Organizing Team, Hong Phuc Dang for inviting me to be part of this awesome FOSS community. I would not have been able to attend the conference without the generous financial support from SymPy, Thank you Ondrej Certik, Aaron Meurer & SymPy contributors.

Good Bye!

Good bye! everyone, see you on my next blog post, meanwhile you can have a look at a Picture of me doing back flip at Sentosa ;)

March 07, 2016

This is my first blog post. The blog was made to track progress of my GSoC project and get feedback from my mentors, if my proposal gets selected. I’m proposing to implement the Multivariate and Univariate polynomial class in SymEngine.

Wish me luck!

February 06, 2016

Hi there! It's been sometime now since my last blog post, It's probably the right time to write one now. Yesterday, I gave a talk on SymPy at Python Delhi User group Meetup at CSDS, New Delhi. Things never go the way you want, an hour was wasted in just setting up Anaconda on everyone's system, eventually I had to cut on the material I could demonstrate, though It was nice to see that people were very enthusiatic about SymPy, they actively solved excercises. It was fun interacting with everyone.

Here is a Pic of the Seminar room at CSDS:

I should also admit that, I have increased my appetite for attending conferences and meetups, these days. In the last 4 months I have attended 3 Meetups (PyDelhi Meetup) and 1 Conference (PyCon India 2015). I think this is one of the best things I have done in last few years & I would recommend anyone with a slight interest in Python either Beginner or Expert should attend PyDelhi Meetups. Looking forward to more such meetups and conferences.

I gave SymPy stickers to everyone who solved atleast one excercise (Since, I didn't had enough stickers).

January 26, 2016

This post is based off a Jupyter notebook I made in 2013. You can download the original here. That notebook was based off a wiki page on the SymPy wiki, which in turn was based on a message to the SymPy mailing list.

What is hashing?

Before we start, let's have a brief introduction to hashing. A hash function is a function that maps a set of objects to a set of integers. There are many kinds of hash functions, which satisfy many different properties, but the most important property that must be satisfied by any hash function is that it be a function (in the mathematical sense), that is, if two objects are equal, then their hash should also be equal.

Usually, the set of integers that the hash function maps to is much smaller than the set of objects, so that there will be multiple objects that hash to the same value. However, generally for a hash function to be useful, the set of integers should be large enough, and the hash function well distributed enough that if two objects hash to the same value, then they are very likely to be equal.

To summarize, a hash function must satisfy the property:

  • If two objects are equal, then their hashes should be equal.

Additionally, a good hash function should satisfy the property:

  • If two objects have the same hash, then they are likely to be the same object.

Since there are generally more possible objects than hash values, two objects may hash to the same value. This is called a hash collision, and anything that deals with hashes should be able to deal with them.

This won't be discussed here, but an additional property that a good hash function should satisfy to be useful is this:

  • The hash of an object should be cheap to compute.

What is it used for?

If we have a hash function that satisfies the above properties, then we can use it to create from a collection of objects something called a hash table. Suppose we have a collection of objects, and given any object, we want to be able to compute very quickly if that object belongs to our collection. We could store these objects in an ordered array, but then to determine if it is in the array, we would have to search potentially through every element of the array (in other words, an \(O(n)\)) algorithm.

With hashing, we can do better. We create what is known as a hash table. Instead of storing the objects in an ordered array, we create an array of buckets, each corresponding to some hash values. We then hash each object, and store it into the array corresponding to its hash value (if there are more hash values than buckets, we distribute them using a second hash function, which can be as simple as taking the modulus with respect to the number of buckets, % n).

This image from Wikipedia shows an example.

img

To determine if an object is in a hash table, we only have to hash the object, and look in the bucket corresponding to that hash. This is an \(O(1)\) algorithm, assuming we have a good hash function, because each bucket will generally hold very few objects, possibly even none.

Note: there are some additional things that need to be done to handle hash collisions, but the basic idea is the same, and as long as there aren't too many hash collisions, which should happen if hash values are evenly distributed and the size of the hash table is large compared to the number of objects stored in it, the average time to determine if an object is in the hash table is still \(O(1)\).

Hashing in Python

Python has a built in function that performs a hash called hash(). For many objects, the hash is not very surprising. Note, the hashes you see below may not be the same ones you see if you run the examples, because Python hashing depends on the architecture of the machine you are running on, and, in newer versions of Python, hashes are randomized for security purposes.

>>> hash(10)
10
>>> hash(()) # An empty tuple
3527539
>>> hash('a')
12416037344

In Python, not all objects are hashable. For example

>>> hash([]) # An empty list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

This is because Python has an additional restriction on hashing:

  • In order for an object to be hashable, it must be immutable.

This is important basically because we want the hash of an object to remain the same across the object's lifetime. But if we have a mutable object, then that object itself can change over its lifetime. But then according to our first bullet point above, that object's hash has to change too.

This restriction simplifies hash tables. If we allowed an object's hash to change while it is in a hash table, we would have to move it to a different bucket. Not only is this costly, but the hash table would have to notice that this happened; the object itself doesn't know that it is sitting in a hash table, at least not in the Python implementation.

In Python, there are two objects that correspond to hash tables, dict and set. A dict is a special kind of hash table called an associative array. An associative array is a hash table where each element of the hash table points to another object. The other object itself is not hashed.

Think of an associative array as a generalization of a regular array (like a list). In a list, objects are associated to nonnegative integer indices, like

>>> l = ['a', 'b', 7]
>> l[0]
'a'
>>> l[2]
7

In an associative array (i.e., a dict) we can index objects by anything, so long as the key is hashable.

>>> d = {0: 'a', 'hello': ['world']}
>>> d[0]
'a'
>>> d['hello']
['world']

Note that only the keys need to be hashable. The values can be anything, even unhashable objects like lists.

The uses for associative arrays are boundless. dict is one of the most useful data types in the Python language. Some example uses are

  • Extension of list with "missing values". For example, {0: 'a', 2: 7} would correspond to the above list l with the value 'b' corresponding to the key 1 removed.

  • Representation of a mathematical function with a finite domain.

  • A poor-man's database (the Wikipedia image above is an associative array mapping names to telephone numbers).

  • Implementing a Pythonic version of the switch-case statement.

The other type of hash table, set, more closely matches the definition I gave above for a hash table. A set is just a container of hashable objects. sets are unordered, and can only contain one of each object (this is why they are called "sets," because this matches the mathematical definition of a set).

In Python 2.7 or later, you can create a set with { and }, like {a, b, c}. Otherwise, use set([a, b, c]).

>>> s = {0, (), '2'}
>>> s
{0, '2', ()}
>>> s.add(1)
>>> s
{0, 1, '2', ()}
>>> s.add(0)
>>> s
{0, 1, '2', ()}

A final note: set and dict are themselves mutable, and hence not hashable! There is an immutable version of set called frozenset. There are no immutable dictionaries.

>>> f = frozenset([0, (), '2'])
>>> f
frozenset({0, '2', ()})
>>> hash(f)
-7776452922777075760
>>> # A frozenset, unlike a set, can be used as a dictionary key
>>> d[f] = 'a set'
>>> d
{0: 'a', frozenset({0, '2', ()}): 'a set', 'hello': ['world']}

Creating your own hashable objects

Before we move on, there is one final thing we need to know about hashing in Python, which is how to create hashes for custom objects. By default, if we create an object, it will be hashable.

>>> class Nothing(object):
...     pass
...
>>> N = Nothing()
>>> hash(N)
270498113

Implementation-wise, the hash is just the object's id, which corresponds to its position in memory. This satisfies the above conditions: it is (extremely) cheap to compute, and since by default objects in Python compare unequal to one another, objects with different hashes will be unequal.

>>> M = Nothing()
>>> M == N
False
>>> hash(M)
270498117
>>> hash(M) == hash(N)
False

To define a hash function for an object, define the __hash__ method.

>>> class HashToOne(object):
...     def __hash__(self):
...         return 1
...
>>> HTO = HashToOne()
>>> hash(HTO)
1

To set an object as not hashable, set __hash__ to None.

>>> class NotHashable(object):
...     __hash__ = None
...
>>> NH = NotHashable()
>>> hash(NH)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'NotHashable'

Finally, to override the equality operator ==, define __eq__.

>>> class AlwaysEqual(object):
...     def __eq__(self, other):
...         if isinstance(other, AlwaysEqual):
...             return True
...        return False
...
>>> AE1 = AlwaysEqual()
>>> AE2 = AlwaysEqual()
>>> AE1 == AE2
True

One of the key points that I hope you will take away from this post is that if you override __eq__, you must also override __hash__ to agree. Note that Python 3 will actually require this: in Python 3, you cannot override __eq__ and not override __hash__. But that's as far as Python goes in enforcing these rules, as we will see below. In particular, Python will never actually check that your __hash__ actually agrees with your __eq__.

Messing with hashing

Now to the fun stuff. What happens if we break some of the invariants that Python expects of hashing. Python expects two key invariants to hold

  1. The hash of an object does not change across the object's lifetime (in other words, a hashable object should be immutable).

  2. a == b implies hash(a) == hash(b) (note that the reverse might not hold in the case of a hash collision).

As we shall see, Python expects, but does not enforce either of these.

Example 1: Mutating a hash

Let's break rule 1 first. Let's create an object with a hash, and then change that object's hash over its lifetime, and see what sorts of things can happen.

>>> class Bad(object):
...     def __init__(self, hash): # The object's hash will be hash
...         self.hash = hash
...     def __hash__(self):
...         return self.hash
...
>>> b = Bad(1)
>>> hash(b)
1
>>> d = {b:42}
>>> d[b]
42
>>> b.hash = 2
>>> hash(b)
2
>>> d[b]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: <__main__.Bad object at 0x1047e7438>

Here, we implicitly changed the hash of b by mutating the attribute of b that is used to compute the hash. As a result, the object is no longer found in a dictionary, which uses the hash to find the object.

The object is still there, we just can't access it any more.

>>> d
{<__main__.Bad object at 0x1047e7438>: 42}

Note that Python doesn't prevent me from doing this. We could make it if we want (e.g., by making __setattr__ raise AttributeError), but even then we could forcibly change it by modifying the object's __dict__. We could try some more fancy things using descriptors, metaclasses, and/or __getattribute__, but even then, if we knew what was happening, we could probably find a way to change it.

This is what is meant when people say that Python is a "consenting adults" language. You are expected to not try to break things, but generally aren't prevented from doing so if you try.

Example 2: More mutation

Let's try something even more crazy. Let's make an object that hashes to a different value each time we look at the hash.

>>> class DifferentHash(object):
...     def __init__(self):
...         self.hashcounter = 0
...     def __hash__(self):
...         self.hashcounter += 1
...         return self.hashcounter
...
>>> DH = DifferentHash()
>>> hash(DH)
1
>>> hash(DH)
2
>>> hash(DH)
3

Obviously, if we use DH as a key to a dictionary, then it will not work, because we will run into the same issue we had with Bad. But what about putting DH in a set.

>>> DHset = {DH, DH, DH}
>>> DHset
{<__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>}

Woah! We put the exact same object in a set three times, and it appeared all three times. This is not what is supposed to happen with a set.

>>> {1, 1, 1}
{1}

What happens when we do stuff with DHset?

>>> DHset.remove(DH)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: <__main__.DifferentHash object at 0x1047e75f8>

That didn't work, because set.remove searches for an object by its hash, which is different by this point.

Now let's make a copy of DHset. The set.copy method will create a shallow copy (meaning that the set container itself will be different, according to is comparison, but the objects themselves will the same, according to is comparison).

>>> DHset2 = DHset.copy()
>>> DHset2 == DHset
True

Everything is fine so far. This object is only going to cause trouble if something recomputes its hash. But remember that the whole reason that we had trouble with something like Bad above is that Python doesn't recompute that hash of an object, unless it has to. So let's do something that will force it to do so: let's pop an object from one of the sets and add it back in.

>>> D = DHset.pop()
>>> DHset.add(D)
>>> DHset
{<__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>}
>>> DHset2
{<__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>,
 <__main__.DifferentHash at 0x101f79f50>}
>>> DHset == DHset2
False

There we go. By removing it from the set, we made the set forget about its hash, so it had to be recomputed when we added it again. This version of DHset now has a DH with a different hash than it had before. Thinking back to set being a hash table, in this DHset, the three DH objects are in different "buckets" than they were in before. DHset.__eq__(DHset2) notices that the bucket structure is different right away and returns False.

By the way, what hash value are we up to these days?

>>> hash(DH)
9

Example 3: When a == b does not imply hash(a) == hash(b)

Now let's look at point 2. What happens if we create an object with __eq__ that disagrees with __hash__. We actually already have made a class like this, the AlwaysEqual object above. Instances of AlwaysEqual will always compare equal to one another, but they will not have the same hash, because they will use object's default __hash__ of id. Let's take a closer look at the AE1 and AE2 objects we created above.

>>> hash(AE1)
270498221
>>> hash(AE2)
270498197
>>> hash(AE1) == hash(AE2)
False
>>> AE1 == AE2
True
>>> {AE1, AE2}
{<__main__.AlwaysEqual at 0x101f79950>,
 <__main__.AlwaysEqual at 0x101f79ad0>}

We can already see that we have broken one of the key properties of a set, which is that it does not contain the same object twice (remember that AE1 and AE2 should be considered the "same object" because AE1 == AE2 is True).

This can lead to subtle issues. For example, suppose we had a list and we wanted to remove all the duplicate items from it. An easy way to do this is to convert the list to a set and then convert it back to a list.

>>> l = ['a', 'a', 'c', 'a', 'c', 'b']
>>> list(set(l))
['a', 'c', 'b']

Now, this method is obviously not going to work for a list of AlwaysEqual objects.

>>> AE3 = AlwaysEqual()
>>> l = [AE1, AE1, AE3, AE2, AE3]
>>> list(set(l))
[<__main__.AlwaysEqual at 0x102c1d590>,
 <__main__.AlwaysEqual at 0x101f79ad0>,
 <__main__.AlwaysEqual at 0x101f79950>]

Actually, what happened here is that the equality that we defined on AlwaysEqual was essentially ignored. We got a list of unique items by id, instead of by __eq__. You can imagine that if __eq__ were something a little less trivial, where some, but not all, objects are considered equal, that this could lead to very subtle issues.

But there is an issue with the above algorithm. It isn't stable, that is, it removes the ordering that we had on the list. We could do this better by making a new list, and looping through the old one, adding elements to the new list if they aren't already there.

>>> def uniq(l):
...     newl = []
...     for i in l:
...         if i not in newl:
...             newl.append(i)
...     return newl
...
>>> uniq(['a', 'a', 'c', 'a', 'c', 'b'])
['a', 'c', 'b']
>>> uniq([AE1, AE1, AE3, AE2, AE3])
[<__main__.AlwaysEqual at 0x101f79ad0>]

This time, we used in, which uses ==, so we got only one unique element of the list of AlwaysEqual objects.

But there is an issue with this algorithm as well. Checking if something is in a list is \(O(n)\), but we have an object that allows checking in \(O(1)\) time, namely, a set. So a more efficient version might be to create a set alongside the new list for containment checking purposes.

>>> def uniq2(l):
...     newl = []
...     newlset = set()
...     for i in l:
...         if i not in newlset:
...             newl.append(i)
...             newlset.add(i)
...     return newl
...
>>> uniq2(['a', 'a', 'c', 'a', 'c', 'b'])
['a', 'c', 'b']
>>> uniq2([AE1, AE1, AE3, AE2, AE3])
[<__main__.AlwaysEqual at 0x101f79ad0>,
 <__main__.AlwaysEqual at 0x102c1d590>,
 <__main__.AlwaysEqual at 0x101f79950>]

Bah! Since we used a set, we compared by hashing, not equality, so we are left with three objects again. Notice the extremely subtle difference here. Basically, it is this:

>>> AE1 in {AE2}
False
>>> AE1 in [AE2]
True

Set containment uses hashing; list containment uses equality. If the two don't agree, then the result of your algorithm will depend on which one you use!

By the way, as you might expect, dictionary containment also uses hashing, and tuple containment uses equality:

>>> AE1 in {AE2: 42}
False
>>> AE1 in (AE2,)
True

Example 4: Caching hashing

If you ever want to add subtle bizarreness to a system, add some sort of caching, and then do it wrong.

As we noted in the beginning, one important property of a hash function is that it is quick to compute. A nice way to achieve this for heavily cached objects is to cache the value of the cache on the object, so that it only needs to be computed once. The pattern (which is modeled after SymPy's Basic) is something like this:

>>> class HashCache(object):
...     def __init__(self, arg):
...         self.arg = arg
...         self.hash_cache = None
...     def __hash__(self):
...         if self.hash_cache is None:
...             self.hash_cache = hash(self.arg)
...         return self.hash_cache
...     def __eq__(self, other):
...         if not isinstance(other, HashCache):
...             return False
...         return self.arg == other.arg
...

HashCache is nothing more than a small wrapper around a hashable argument, which caches its hash.

>>> hash('a')
12416037344
>>> a = HashCache('a')
>>> hash(a)
12416037344

For ordinary Python builtins, simply recomputing the hash will be faster than the attribute lookup used by HashCache. Note: This uses the %timeit magic from IPython. %timeit only works when run in IPython or Jupyter.

>>> %timeit hash('a')
10000000 loops, best of 3: 69.9 ns per loop
>>> %timeit hash(a)
1000000 loops, best of 3: 328 ns per loop

But for a custom object, computing the hash may be more computationally expensive. As hashing is supposed to agree with equality (as I hope you've realized by now!), if computing equality is expensive, computing a hash function that agrees with it might be expensive as well.

As a simple example of where this might be useful, consider a highly nested tuple, an object whose hash that is relatively expensive to compute.

>>> a = ()
>>> for i in range(1000):
...     a = (a,)
...
>>> A = HashCache(a)
>>> %timeit hash(a)
100000 loops, best of 3: 9.61 µs per loop
>>> %timeit hash(A)
1000000 loops, best of 3: 325 ns per loop

So far, we haven't done anything wrong. HashCache, as you may have noticed, has __eq__ defined correctly:

>>> HashCache(1) == HashCache(2)
False
>>> HashCache(1) == HashCache(1)
True

But what happens if we mutate a HashCache. This is different from examples 1 and 2 above, because we will be mutating what happens with equality testing, but not the hash (because of the cache).

In the below example, recall that small integers hash to themselves, so hash(1) == 1 and hash(2) == 2.

>>> a = HashCache(1)
>>> d = {a: 42}
>>> a.arg = 2
>>> hash(a)
1
>>> d[a]
42

Because we cached the hash of a, which was computed as soon as we created the dictionary d, it remained unchanged when modified the arg to be 2. Thus, we can still find the key of the dictionary. But since we have mutated a, the equality testing on it has changed. This means that, as with the previous example, we are going to have issues with dicts and sets keeping unique keys and entries (respectively).

>>> a = HashCache(1)
>>> b = HashCache(2)
>>> hash(a)
1
>>> hash(b)
2
>>> b.arg = 1
>>> a == b
True
>>> hash(a) == hash(b)
False
>>> {a, b}
{<__main__.HashCache at 0x102c32050>, <__main__.HashCache at 0x102c32450>}
>>> uniq([a, b])
[<__main__.HashCache at 0x102c32050>]
>>> uniq2([a, b])
[<__main__.HashCache at 0x102c32050>, <__main__.HashCache at 0x102c32450>]

Once we mutate b so that it compares equal to a, we start to have the same sort of issues that we had in example 3 with AlwaysEqual. Let's look at an instant replay.

>>> a = HashCache(1)
>>> b = HashCache(2)
>>> b.arg = 1
>>> print(a == b)
True
>>> print(hash(a) == hash(b))
True
>>> print({a, b})
set([<__main__.HashCache object at 0x102c32a10>])
>>> print(uniq([a, b]))
[<__main__.HashCache object at 0x102c32a50>]
>>> print(uniq2([a, b]))
[<__main__.HashCache object at 0x102c32a50>]

Wait a minute, this time it's different! Comparing it to above, it's pretty easy to see what was different this time. We left out the part where we showed the hash of a and b. When we did that the first time, it cached the hash of b, making it forever be 2, but when we didn't do it the second time, the hash had not been cached yet, so the first time it is computed (in the print(hash(a) == hash(b)) line), b.arg has already been changed to 1.

And herein lies the extreme subtlety: if you mutate an object with that hashes its cache like this, you will run into issues only if you had already called some function that hashed the object somewhere. Now just about anything might compute the hash of an object. Or it might not. For example, our uniq2 function computes the hash of the objects in its input list, because it stores them in a set, but uniq does not:

>>> a = HashCache(1)
>>> b = HashCache(2)
>>> uniq2([a, b])
>>> b.arg = 1
>>> print(a == b)
True
>>> print(hash(a) == hash(b))
False
>>> print({a, b})
set([<__main__.HashCache object at 0x102c32c50>, <__main__.HashCache object at 0x102c32c10>])
>>> print(uniq([a, b]))
[<__main__.HashCache object at 0x102c32c50>]
>>> print(uniq2([a, b]))
[<__main__.HashCache object at 0x102c32c50>, <__main__.HashCache object at 0x102c32c10>]
>>> a = HashCache(1)
>>> b = HashCache(2)
>>> uniq([a, b])
>>> b.arg = 1
>>> print(a == b)
True
>>> print(hash(a) == hash(b))
True
>>> print({a, b})
set([<__main__.HashCache object at 0x102c32c90>])
>>> print(uniq([a, b]))
[<__main__.HashCache object at 0x102c32bd0>]
>>> print(uniq2([a, b]))
[<__main__.HashCache object at 0x102c32bd0>]

The moral of this final example is that if you are going to cache something, that something had better be immutable.

Conclusion

The conclusion is this: don't mess with hashing. The two invariants above are important. Let's restate them here,

  1. The hash of an object must not change across the object's lifetime (in other words, a hashable object should be immutable).

  2. a == b implies hash(a) == hash(b) (note that the reverse might not hold in the case of a hash collision).

If you don't follow these rules, you will run into very subtle issues, because very basic Python operations expect these invariants.

If you want to be able to mutate an object's properties, you have two options. First, make the object unhashable (set __hash__ = None). You won't be able to use it in sets or as keys to a dictionary, but you will be free to change the object in-place however you want.

A second option is to make all mutable properties non-dependent on hashing or equality testing. This option works well if you just want to cache some internal state that doesn't inherently change the object. Both __eq__ and __hash__ should remain unchanged by changes to this state. You may also want to make sure you use proper getters and setters to prevent modification of internal state that equality testing and hashing does depend on.

If you choose this second option, however, be aware that Python considers it fair game to swap out two identical immutable (i.e., hashable) objects at any time. If a == b and a is hashable, Python (and Python libraries) are free to replace a with b anywhere. For example, Python uses an optimization on strings called interning, where common strings are stored only once in memory. A similar optimization is used in CPython for small integers. If store something on a but not b and make a's hash ignore that data, you may find that some function that should return a may actually return b. For this reason, I generally don't recommend this second option unless you know what you are doing.

Finally, to keep invariant 2, here are some tips:

  • Make sure that the parts of the object that you use to compare equality are not themselves mutable. If they are, then your object cannot itself be immutable. This means that if a == b depends on a.attr == b.attr, and a.attr is a list, then you will need to use a tuple instead (if you want a to be hashable).

  • You don't have to invent a hash function. If you find yourself doing bitshifts and XORs, you're doing it wrong. Reuse Python's builtin hashable objects. If the hash of your object should depend on the hash of a and b, define __hash__ to return hash((a, b)). If the order of a and b does not matter, use hash(frozenset([a, b])).

  • Don't cache something unless you know that the entire cached state will not be changed over the lifetime of the cache. Hashable objects are actually great for caches. If they properly satisfy invariant 1, and all the state that should be cached is part of the hash, then you will not need to worry. And the best part is that you can just use dict for your cache.

  • Unless you really need the performance or memory gains, don't make your objects mutable. This makes programs much harder to reason about. Some functional programming languages take this idea so far that they don't allow any mutable objects.

  • Don't worry about the situation where hash(a) == hash(b) but a != b. This is a hash collision. Unlike the issues we looked at here, hash collisions are expected and checked for in Python. For example, our HashToOne object from the beginning will always hash to 1, but different instances will compare unequal. We can see that the right thing is done in every case with them.

    >>> a = HashToOne()
    >>> b = HashToOne()
    >>> a == b
    False
    >>> hash(a) == hash(b)
    True
    >>> {a, b}
    {<__main__.HashToOne at 0x102c32a10>, <__main__.HashToOne at 0x102c32cd0>}
    >>> uniq([a, b])
    [<__main__.HashToOne at 0x102c32cd0>, <__main__.HashToOne at 0x102c32a10>]
    >>> uniq2([a, b])
    [<__main__.HashToOne at 0x102c32cd0>, <__main__.HashToOne at 0x102c32a10>]
    

    The only concern with hash collisions is that too many of them can remove the performance gains of dict and set.

  • Conversely, if you are writing something that uses an object's hash, remember that hash collisions are possible and unavoidable.

    A classic example of a hash collision is -1 and -2. Remember I mentioned above that small integers hash to themselves:

    >>> hash(1)
    1
    >>> hash(-3)
    -3
    

    The exception to this is -1. The CPython interpreter uses -1 as an error state, so -1 is not a valid hash value. Hence, hash(-1) can't be -1. So the Python developers picked the next closest thing.

    >>> hash(-1)
    -2
    >>> hash(-2)
    -2
    

    If you want to check if something handles hash collisions correctly, this is a simple example. I should also note that the fact that integers hash to themselves is an implementation detail of CPython that may not be true in alternate Python implementations.

  • Finally, we didn't discuss this much here, but don't assume that the hash of your object will be the same across Python sessions. In Python 3.3 and up, hash values of strings are randomized from a value that is seeded when Python starts up. This also affects any object whose hash is computed from the hash of strings. In Python 2.7, you can enable hash randomization with the -R flag to the interpreter. The following are two different Python sessions.

    >>> print(hash('a'))
    -7750608935454338104
    
    >>> print(hash('a'))
    8897161376854729812
    

December 19, 2015

Note: No Starch Press has sent me a copy of this book for review purposes.

SHORT VERSION: Doing Math with Python is well written and introduces topics in a nice, mathematical way. I would recommend it for new users of SymPy.

Doing Math with Python by Amit Saha is a new book published by No Starch Press. The book shows how to use Python to do high school-level mathematics. It makes heavy use of SymPy in many chapters, and this review will focus mainly on those parts, as that is the area I have expertise in.

The book assumes a basic understanding of programming in Python 3, as well as the mathematics used (although advanced topics are explained). No prior background in the libraries used, SymPy and matplotlib, is assumed. For this reason, this book can serve as an introduction them. Each chapter ends with some programming exercises, which range from easy exercises to more advanced ones.

The book has seven chapters. In the first chapter, "Working with numbers", basic mathematics using pure Python is introduced (no SymPy yet). It should be noted that Python 3 (not Python 2) is required for this book. One of the earliest examples in the book (3/2 == 1.5) will not work correctly without it. I applaud this choice, although I might have added a more prominent warning to wary users. (As a side note, in the appendix, it is recommended to install Python via Anaconda, which I also applaud). This chapter also introduces the fractions module, which seems odd since sympy.Rational will be implicitly used for rational numbers later in the text (to little harm, however, since SymPy automatically converts fractions.Fraction instances to sympy.Rational).

In all, this chapter is a good introduction to the basics of the mathematics of Python. There is also an introduction to variables and strings. However, as I noted above, one should really have some background with basic Python before reading this book, as concepts like flow control and function definition are assumed (note: there is an appendix that goes over this).

Chapters 2 and 3 cover plotting with matplotlib and basic statistics, respectively. I will not say much about the matplotlib chapter, since I know only basic matplotlib myself. I will note that the chapter covers matplotlib from a (high school) mathematics point of view, starting with a definition of the Cartesian plane, which seems a fitting choice for the book.

Chapter 3 shows how to do basic statistics (mean, median, standard deviation, etc.) using pure Python. This chapter is clearly meant for pedagogical purposes for basic statistics, since the basic functions mean, median, etc. are implemented from scratch (as opposed to using numpy.mean or the standard library statistics.mean). This serves as a good introduction to more Python concepts (like collections.Counter) and statistics.

Note that the functions in this chapter assume that the data is the entire population, not a sample. This is mentioned at the beginning of the chapter, but not elaborated on. For example, this leads to a different definition of variance than what might be seen elsewhere (the calculate_variance used in this chapter is statistics.pvariance, not statistics.variance).

It is good to see that a numerically stable definition of variance is used here (see PEP 450 for more discussion on this). These numerical issues show why it is important to use a real statistics library rather than a home grown one. In other words, use this chapter to learn more about statistics and Python, but if you ever need to do statistics on real data, use a statistics library like statistics or numpy. Finally, I should note that this book appears to be written against Python 3.3, whereas statistics was added to the Python standard library in Python 3.4. Perhaps it will get a mention in future editions.

Chapter 4, "Algebra and Symbolic Math with SymPy" starts the introduction to SymPy. The chapter starts similar to the official SymPy tutorial in describing what symbolics is, and guiding the reader away from common misconceptions and gotchas. The chapter does a good job of explaining common gotchas and avoiding antipatterns.

This chapter may serve as an alternative to the official tutorial. Unlike the official tutorial, which jumps into higher-level mathematics and broader use-cases, this chapter may be better suited to those wishing to use SymPy from the standpoint of high school mathematics.

My only gripes with this chapter, which, in total, are minor, relate to printing.

  1. The typesetting of the pretty printing is inconsistent and, in some cases, incorrect. Powers are printed in the book using superscript numbers, like

    However, SymPy prints powers like

     2
    x
    

    even when Unicode pretty printing is enabled. This is a minor point, but it may confuse users. Also, the output appears to use ASCII pretty printing (mixed with superscript powers), for example

        x²   x³   x⁴   x⁵
    x + -- + -- + -- + --
        2    3    4    5
    

    Most users will either get MathJax printing (if they are using the Jupyter notebook), or Unicode printing, like

         2    3    4    5
        x    x    x    x
    x + ── + ── + ── + ──
        2    3    4    5
    

    Again, this is a minor point, but at the very least the correct printing looks better than the fake printing used here.

  2. In line with the previous point, I would recommend telling the user to start with init_printing(). The function is used once to change the order of printing to rev-lex (for series printing). There is a link to the tutorial page on printing. That page goes into more depth than is necessary for the book, but I would recommend at least mentioning to always call init_printing(), as 2-D printing can make a huge difference over the default str printing, and it obviates the need to call pprint.

Chapter 5, "Playing with Sets and Probability" covers SymPy's set objects (particularly FiniteSet) to do some basic set theory and probability. I'm excited to see this in the book. The sets module in SymPy is relatively new, but quite powerful. We do not yet have an introduction to the sets module in the SymPy tutorial. This chapter serves as a good introduction to it (albeit only with finite sets, but the SymPy functions that operate on infinite sets are exactly the same as the ones that operate on finite sets). In all, I don't have much to say about this chapter other than that I was pleasantly surprised to see it included.

Chapter 6 shows how to draw geometric shapes and fractals with matplotlib. I again won't say much on this, as I am no matplotlib expert. The ability to draw leaf fractals and Sierpiński triangles with Python does look entertaining, and should keep readers enthralled.

Chapter 7, "Solving Calculus Problems" goes into more depth with SymPy. In particular, assumptions, limits, derivatives, and integrals. The chapter alternates between symbolic formulations using SymPy and numeric calculations (using evalf). The numeric calculations are done both for simple examples and more advanced things (like implementing gradient descent).

One small gripe here. The book shows that

from sympy import Symbol
x = Symbol('x')
if (x + 5) > 0:
    print('Do Something')
else:
    print('Do Something else')

raises TypeError at the evaluation of (x + 5) > 0 because its truth value cannot be determined. The solution to this issue is given as

x = Symbol('x', positive=True)
if (x + 5) > 0:
    print('Do Something')
else:
    print('Do Something else')

Setting x to be positive via Symbol('x', positive=True) is correct, but even in this case, evaluating an inequality may still raise a TypeError (for example, if (x - 5) > 0). The better way to do this is to use (x + 5).is_positive. This would require a bit more discussion, especially since SymPy uses a three-valued logic for assumptions, but I do consider "if <symbolic inequality>" to be a SymPy antipattern.

I like Saha's approach in this chapter of first showing unevaluated forms (Limit, Derivative, Integral), and then evaluating them with doit(). This puts users in the mindset of a mathematical expression being a formula which may or may not later be "calculated". The opposite approach, using the function forms, limit, diff, and integrate, which evaluate if they can and return an unevaluated object if they can't, can be confusing to new users in my experience. A common new SymPy user question is (some form of) "how do I evaluate an expression?" (the answer is doit()). Saha's approach avoids this question by showing doit() from the outset.

I also like that this chapter explains the gotcha of math.sin(Symbol('x')), although I personally would have included this earlier in the text.

(Side note: now that I look, these are both areas in which the official tutorial could be improved).

Summary

This book is a good introduction to doing math with Python, and, for the chapters that use it, a good basic introduction to SymPy. I would recommend it to anyone wishing to learn SymPy, but especially to anyone whose knowledge of mathematics may preclude them from getting the most out of the official SymPy tutorial.

I imagine this book would work well as a pedagogical tool, either for math teachers or for self-learners. The exercises in this book should push the motivated to learn more.

I have a few minor gripes, but no major issues.

You can purchase this book from the No Starch Press website, both as a print book or an ebook. The website also includes a sample chapter (chapter 1), code samples from the book, and exercise solutions.

October 21, 2015

The excitement

People travelling from all over the country(and outside!) to Bangalore for a conference on a weekend, Yay!
We were really excited about the workshop and devsprint that the SymPy team was about to deliver. More so excited we were about the fact that we will finally be meeting one another.

Day 0

DevSprint

The first day of the conference kicked off with the devsprints. That morning the whole team met up, present there were Harsh, Sudhanshu, AMiT, Sartaj, Shivam and Sumith . Abinash couldn't make it but he was there in spirit :)
We all got our awesome SymPy tees and stickers, thanks to AMiT.
Having got alloted mentoring space in the devsprint, basic introduction of SymPy was given by Sumith. Some interesting mentoring spaces were CPython by Kushal Das, Data Science by Bargava. The whole list is here
We got the participants started off with setting up the development workflow of SymPy and then they started working on the internals. We alloted bugs to many and directed them to the solution. Sadly, not many issues could alloted or closed due to the really poor internet connection at the conference hall but it was cool interacting with the enthusiasts. We also happened to meet Saurabh Jha, a contributor to SymPy who had worked on Linear Algebra and he helped us out with the devsprint.

Workshop

The workshops ran in two and a half hour slot. This was conducted by Harsh, Sudhanshu, AMiT and Sumith.
Sumith started off with introduction to SymPy. Then we spent some helping everyone setup their systems with SymPy and IPython notebooks, even though prior instructions were given, we had to do this so as to get everyone on level ground.

Harsh took first half of the content and exercises
Sudhanshu took the second half, while AMiT and Sumith were helping out the participants with their queries.

PyCon

We distributed t-shirts to all the participants at the end. Thanks to all those who participated, we had an awesome time.

PyCon

Day 0 ended with all of us wrapping off the devsprint.
After having dinner together, everybody headed back looking forward to the coming two days of the conference.

Day 1

Day 1 started off with a keynote by Dr Ajith Kumar B.P followed by multiple talks and lightning talks.
More interesting than the scheduled talks were the conversations that we had with people present in the conference. Exchanging views, discussing on a common point of interest was surely one of the best experience that I had.

Lightning talk

Shivam delivered a lightning talk titled Python can be fast. Here, he stressed on the fact that implementing correct data structures is important and Python is not always to be blamed. He gave relevant examples from his summers work at SymPy.

PyCon

By this point, we had reached considerable audience in the conference and lot of them were really interested in SymPy. We had a lot of younger participants who were enthusiastic about SymPy as it participates in GSoC, some of them also sent in patches.

Day 2

Day 2 started off with a keynote by Nicholas H.Tollervey.

Talk

Sumith delivered a talk titled SymEngine: The future fast core of computer algebra systems. The content included SymPy, SymEngine and the interface. Some light was shed on Python wrappers to C++ code. Thanks to all the audience present there.

PyCon

As the day was closing in, Harsh and Shivam had to leave to catch their flights.

Open Space

After multiple people requesting to help them get started with SymPy, we decided to conduct an open space.
Open spaces are a way for people to come together to talk about topics, ideas or whatever they want. All people had to do is just show up :) Present there were Sudhanshu, Sartaj, AMiT and Sumith. Sartaj luckily came up with a solveset bug. We had a live show of how bug-fixing is done. Filing an issue, fixing the code, writing tests and sending in a PR was all demonstrated.

PyCon

Closing thoughts

Conferences are the perfect place to discuss and share knowledge and ideas. The people present there were experts in their area of interests and conversations with them is a cool experience. Meeting the team was something that we were looking forward right from the start.

PyCon

Missing Sartaj and Abinash

PyCon

Discussing SymPy and the gossips in person is a different experience altogether. I'll make sure to attend all the conference that I possibly can from hereon.

Thanks for the reading
Be back for more

October 05, 2015

Last Friday was my last day working at Continuum Analytics. I enjoyed my time at the company, and wish success to it, but the time has come for me to move on. Starting later this year, I will start working with Anthony Scopatz at his new lab ERGS at the University of South Carolina.

During my time at Continuum (over two years if you count a summer internship), I primarily worked on the Anaconda distribution and its open source package manager, conda. I learned a lot of lessons in that time, and I'd like to share some of them here.

In no particular order:

  • Left to their own devices, people will make the minimal possible solution to packaging. They won't try to architect something. The result will be over-engineered, specific to their use-case, and lack reproducibility.

  • The best way to ensure that some software has no bugs is for it to have many users.

  • Be wary of the "software would be great if it weren't for all the users" mentality (cf. the previous point).

  • Most people don't code defensively. If you are working on a project that requires extreme stability, be cautious of contributions from those outside the development team.

  • Hostility towards Windows and Windows users doesn't help anyone.

  • https://twitter.com/asmeurer/status/593170976981913600

  • For a software updater, stability is the number one priority. If the updater breaks, how can a fix be deployed?

  • Even if you configure your program to update itself every time it runs you will still get bug reports with arbitrarily old versions.

  • Separating components into separate git repositories leads to a philosophical separation of concerns among the components.

  • Everyone who isn't an active developer on the project will ignore this separation and open issues in the wrong repo.

  • Avoid object oriented programming when procedural programming will do just fine.1

  • Open source is more about the open than the source. Develop things in the open, and you will create a community that respects you.1

  • Academics (often) don't know good software practices, nor good licensing practices.

  • Neither do some large corporations.

  • Avoid over-engineering things.

  • Far fewer people than I would have thought understand the difference between hard links and soft links.2

  • Changelogs are useful.

  • Semantic versioning is over-hyped.

  • If you make something and release it, the first version should be 1.0 (not 0.1 or 0.0.1).

  • Getting a difficult package to compile is like hacking a computer. All it takes is time.

  • It doesn't matter how open source friendly your business is, there will always be people who will be skeptical and point their fingers at the smallest proprietary components, fear monger, and overgeneralize unrelated issues into FUD. These people should generally be ignored.

  • Don't feed the trolls.1

  • People constantly misspell the name of Apple's desktop operating system.

  • People always assume you have way more automation than you really do.

  • The Python standard library is not a Zen garden. Some parts of it are completely broken, and if you need to rely on them, you'll have to rewrite them. shutil.rmtree on Windows is one example of this.

  • Linux is strictly backwards compatible. Windows is strictly forwards compatible. 3

  • On Linux, things tend to be very simple. On Windows, things tend to be very complicated.

  • I can't decide about OS X. It lies somewhere in between.

  • Nobody uses 32-bit Linux. Why do we even support that?

  • People oversimplify the problem of solving for package dependencies in their heads. No one realizes that it's meaningless to say something like "the dependencies of NumPy" (every build of every version of NumPy has its own set of dependencies, which may or may not be the same).

  • Writing a set of rules and a solver to solve against those rules is relatively easy. Writing heuristics to tell users why those rules are unsolvable when they are is hard.

  • SAT solvers solve NP-complete problems in general, but they can be very fast to solve common case problems. 1

  • Some of the smartest people I know, who otherwise make very rational and intelligent decisions, refuse to update to Python 3.

  • As an introvert, the option of working from home is great for maintaining sanity.

  • Aeron chairs are awesome.

  • If living in Austin doesn't turn you into a foodie you will at least gain a respect for them.

  • Twitter, if used correctly, is a great way to interact with your users.

  • Twitter is also a great place to learn new things. Follow John Cook and Bret Victor.

  • One of the best ways to make heavily shared content is to make it about git (at least if you're an expert).

  • A good optimization algorithm avoids getting caught in local maxima by trying different parts of the search space that initially appear to be worse. The same approach should be taken in life.

Footnotes


  1. These are things that I already knew, but were reiterated. 

  2. If you are one of those people, I have a small presentation that explains the difference here 

  3. These terms can be confusing, and I admit I got this backwards the first time I wrote this. According to Wikipedia, forwards compatible means a system can accept input intended for a later version of itself and backwards compatible means a system can accept input intended for an earlier version of itself.

    What I specifically mean here is that in terms of building packages for Linux or Windows, for Linux, you should build a package on the oldest version that you wish to support. That package will work on newer versions of Linux, but not anything older (generally due to the version of libc you are linked against).

    On the other hand, on Windows, you can can compile things on the newest version (I used Windows 8 on my main Windows build VM), and it will work on older versions of Windows like XP (as long as you ship the right runtime DLLs). This is also somewhat confusing because Windows tends to be both forwards compatible and backwards compatible. 

August 27, 2015

Hi! I am Amit Kumar (@aktech), a final year undergraduate student of Mathematics & Computing at Delhi Technological University. This post summarizes my experience working on GSoC Project on Improving Solvers in SymPy.

Introduction

I first stumbled upon SymPy last year, while looking for some Open Source Computer Algebra Systems to contribute. I didn't had any Open Source experience by then, So SymPy was an Ideal Choice for getting into the beautiful world of Open Source. I wasn't even Proficient in Python so at first it was little difficult for me, but Thanks! to the beauty of the language itself, which makes anyone comfortable with it in no time. Soon, I decided to participate into Google Summer of Code under SymPy. Though at this point of time, I didn't had decided about the project, I would like to work in Summers.

First Contribution

I started learning the codebase & made my first contribution by Fixing an EasyToFix bug in solvers.py through the PR #8647, Thanks to @smichr for helping me making my first ever open source contribution. After my first PR, I started looking for more things to work and improve upon and I started commiting quite often. During this period I learnt the basics of Git, which is one of the most important tools for contributing to Open Source.

Project Ideas

When I got a bit comfortable with the basics of SymPy & contributing to open source in general, I decided to chose an area (module) to concentrate on. The modules I was interested in, were Solvers and Integrals, I was literally amazed by the capability of a CAS to integrate and solve equations. I decided to work on one of these in the summers. There was already some work done on the Integrals module in 2013, which was yet to be Merged. I wasn't well versed about the Manuel Bronsteins works on Methods of Integration in a Computer Algebra System, so I was little skeptical about working on Integrals. The Solvers module attracted me due it's awesome capabilities, I found it one of the most useful features of any Computer Algebra Systems, So I finally decided to work on Solvers Module.

Coding

I was finally accepted to work on Solvers this summer. I had my exams during the community bonding period, So I started almost in the first week of Coding Period. I made a detailed timeline of my work in summers, but through my experience I can say that's seldom useful. Since, you never know what may come out in between you and your schedule. As an instance PR #9540, was a stumbling block in lot of my work, which was necessary to fix for proceeding ahead.

Phase I (Before Mid Terms)

When coding period commenced, I started implementing the linsolve, the linear system solver which is tolerant to different input forms & can solve almost all forms of linear systems. At the start I got lot of reviews from Jason and Harsh, regarding improvement of the function. One of the most important thing I learnt which they focused on was Test Driven Development, they suggested me to write extensive tests before implementing the logic, which helps in reducing the problems in visualizing the final implementaion of the function and avoids API changes.

After linsolve I implemented ComplexPlane, which is basically Complex Sets. It is useful for representing infinite solutions in argand plane. While implementing this I learnt that chosing the right API is one of the most important factors while designing aa important functionality. To know more about it, see my blog post here. During this period I also worked on fixing Intersection's of FiniteSet with symbolic elements, which was a stumbling block.

Phase II (After Mid Terms)

After successfully passing the Mid Terms, I started working more on robustness of solveset, Thanks to @hargup for pointing out the motivation for this work. The idea is to tell the user about the domain of solution returned. Simplest motivation was the solution of the equation |x| - n, for more info see my blog post here. I also worked on various trivial and non trivial bugs which were more or less blocking my work.

Then I started replacing solve with solveset in the codebase, the idea was to make a smooth transition between solve and solveset, while doing this Jason pointed out that I should not remove solve tests, which can make solve vunerable to break, So I reverted removing of solve tests. Later we decided to add domain argument to solveset, which would help the user in easily dictating to solveset about what solutions they are interested in, thanks to @shivamvats for doing this in a PR. After the decision of adding domain argument, Harsh figured out that, as of now solveset is vunerable to API changes, so it's not the right time to replace solve with solveset, so we decided to halt this work, as a result I closed my several PR's unmerged.

I also worked on Implementing Differential Calculus Method such as is_increasing etc, which is also Merged now. Meanwhile I have been working on documenting solveset, because a lot of people don't know what we are doing & why we are doing, so It's very important to answer all those subtle questions which may come up in there mind, So we decided to create a FAQ style documentation of solveset see PR #9500. This is almost done, some polishing is needed. It would be Merged soon.

During this period apart from my work, there are some other works as well which is worth mentioning, one of them is ConditionSet by Harsh which serves the purpose of unevaluated solve object and even much more than that for our future endeavours with solveset. Others being codomain & not_empty by Gaurav @gxyd which are also important additions to SymPy.

Advice

TODO: Probably, this will need a comprehensive post, I would write soon.

Future Plans

Recently Harsh came up with an idea of tree based solver. Since now ConditionSet has been introduced, the solving of equations can be seen as set transformation, We can do the following things to solve equations (abstract View):

  • Apply Various Set Transformations on the given Set.
  • Define a Metric of the usability or define a notion of better solution over others.
  • Different Transformation would be the nodes of the tree.
  • Suitable searching techniques could be applied to get the best solution.

For more info see mailing list thread here.

As a part of this I worked on implementing a general decomposition function decompogen in PR #9831, It's almost done, will be merged soon.

I plan for a long term association with SymPy, I take the full responsibilty of my code. I will try to contribute as much as I can particularly in sets and solvers module.

Conclusion

On a concluding note, I must say that getting the opportunity to work on SymPy this summer has been one of the best things that could happen to me. Thanks to Harsh for helping me all my endeavour, also for being one of the best mentors I could get. I would like to thank Sean as well who from his busy schedule took up the time to attend meetings, hangouts and for doing code reviews. Also thanks to Chris Smith who is the most gentle and helpful person I have ever seen, he is one of the reasons I started contributing to SymPy. Thanks to Aaron, Ondrej, and last but not the least my fellow GSoCer's at SymPy leosartaj, debugger22, sumith1896, shivamvats, abinashmeher999. Special Thanks to whole SymPy team and Community for a wonderful collaboration experience. Kudos!

August 23, 2015

This week we announced the release of SymEngine on Sage list. For that, I made some changes into the build system for versioning and to use SymEngine from other C/C++ projects.
First, SymEngineConfig.cmake would output a set of flags, imported dependencies, etc. SymEngineConfigVersion.cmake would check that the version is compatible and if the 32/64-bitness is correct of the SymEngine project and the other CMake project. When SymEngine is only built, then these files would be at the root level and when installed they would be at /lib/cmake/symengine. An excerpt from the wiki page, I wrote at, https://github.com/sympy/symengine/wiki/Using-SymEngine-from-a-Cpp-project
Using SymEngine in another CMake project
To use SymEngine from another CMake project include the following in yourCMakeLists.txt file
find_package(SymEngine 0.1.0 CONFIG)
You can give the path to the SymEngine installation directory if it was installed to a non standard location by,
find_package(SymEngine 0.1.0 CONFIG PATHS /path/to/install/dir/lib/cmake/symengine)
Alternatively, you can give the path to the build directory.
find_package(SymEngine 0.1.0 CONFIG PATHS /path/to/build/dir)
An example project would be,
cmake_minimum_required(VERSION 2.8)
find_package(symengine 0.1.0 CONFIG)
set(CMAKE_CXX_FLAGS_RELEASE ${CMAKE_CXX_FLAGS_RELEASE} "-std=c++0x")

include_directories(${SYMENGINE_INCLUDE_DIRS})
add_executable(example main.cpp)
target_link_libraries(example ${SYMENGINE_LIBRARIES})
More options are here
Using SymEngine in Non CMake projects
You can get the include flags and link flags needed for SymEngine using the command line CMake.
compile_flags=`cmake --find-package -DNAME=SymEngine -DCOMPILER_ID=GNU -DLANGUAGE=CXX -DMODE=COMPILE`
link_flags=`cmake --find-package -DNAME=SymEngine -DCOMPILER_ID=GNU -DLANGUAGE=CXX -DMODE=LINK`

g++ $compile_flags main.cpp $link_flags
Python wrappers
There was a suggestion to make the Python wrappers separate, so that in a distribution like Gentoo, the package sources can be distributed separately.
So, I worked on the Python wrappers to get them to be built independently or with the main repo. Now, the python wrappers directory along with the setup.py file from the root folder can be packaged and they would work without a problem.

August 21, 2015

From not knowing anything considerable in programming and open source to reaching this level, has been a wonderful ride. Google Summer of Code has been full of ups and downs but none the less exhilarating.

Didn't even know at the time of my first patch that I would be so closely associated to SymEngine and the team members just a few months down the line.

After a couple of bug fixes, my first major contribution came in as the UnivariatePolynomial class. The biggest challenge here was implementing multiplication using Kronecker's trick. This was my first experience of implementing an algorithm from a paper. The UnivariatePolynomial class shaped up really well, there are minor improvements that can be made and some optimizations that could be done. But standalone, it is a fully functional class.

Once this was done, my next aim was to optimize multiplication to reach Piranha's speed. This was a very enriching period and the discussions with the team members and Francesco was a great learning experience. En route, I also got a chance to explore Piranha under the hood and trouble Francesco for reasoning why certain things were the way they. End of this, we were able to hit Piranha's speed. I remember I was the happiest I had been in days.

Once we hit the lower level speed, we decided to hard-depend on Piranha for Polynomial. This meant adding Piranha as SymEngine dependence. Here I had to learnt how to write and wrote CMake files as well as setting up Piranha testing in Travis meant writing shell and CI scripts. We faced a problem here, resolution to which meant implementing Catch as a testing framework for SymEngine. Catch is an awesome library and community is very pleasant. Implementing this was a fun work too. Also the high level value class Expression was implemented in SymEngine, mostly taken from Francesco's work.

I then started writing the Polynomial class, most of the work is done here(597). But the design is not very well thought of. I say this because once ready this can only support integer(ZZ) domain. But we will also need rational(QQ) and expression(EX). The code will be of much use but we have been discussing a much cleaner implementation with Ring class. Most of the progress and the new design decisions are being documented here.

Second half has been really rough, with the university running. Ondrej has been really patient with me, I thank him for that. The bond that I made with him through mails, technical and non technical, has really grown strong. He has allowed me to continue the work the Polynomial and implement more details and algorithms in future. I am looking forward to that as long term association is an amazing thing and I am proud to be responsible for the Polynomial module in SymEngine.

I am indebted to my mentor Ondrej Certik and all the SymEngine and SymPy developers who were ever ready to help and answer my silliest of questions. It’s an amazing community and they are really very helpful and always appreciated even the smallest of my contributions. The best part of SymEngine is you know contributors one to one and it is like a huge family of learners. I am looking forward to meeting the team (atleast SymPy India in near future).

Google Summer of Code has been one exhilarating journey. I don't know if I was a good programmer then or a good programmer now but I can say that I am a better programmer now.

This is just the beginning of the ride, GSoC a stepping stone.

There will be blog posts coming here, so stay tuned. Till then,
Bye

August 20, 2015

This is the 12th week. Hard deadline is this Friday. GSoC is coming to an end leaving behind a wonderful experience. Well here's how my past few weeks went.

Highlights:

Work on Formal Power Series:

  • #9776 added the fps method in Expr class. Instead of fps(sin(x)), user can now simply do sin(x).fps().
  • #9782 implements some basic operations like addition, subtraction on FormalPowerSeries. The review is almost complete and should get merged soon.
  • #9783 added the sphinx docs for the series.formal module.
  • #9789 replaced all the solve calls in the series.formal with the new solveset function.

Work on computing limits of sequences:

This is the second part of my GSoC project aiming to implement the algorithm for computing limits of sequences as described in the poster Computing Limits Of Sequences by Manuel Kauers.

  • #9803 implemented the difference_delta function. difference_delta(a(n), n) is defined as a(n + 1) - a(n). It is the discrete analogous of differentiation.
  • #9836 aims at completing the implementation of the algorithm. It is still under review and hopefully it will be in soon.

Final Tasks:

Get #9782 and #9836 merged soon.

Upcoming:

A thank you post ;)

August 16, 2015

Hello all. Here are the most recent developments in the Polynomial wrappers.

Report

  • The Polynomial wrappers was using piranha::hash_set as the Polynomial wrappers, hence when there was no Piranha as a dependence, the Polynomial wouldn't compile. The fix to this was to use std::unordered_set with -DWITH_PIRANHA=no so that there would be atleast a slow version available.

  • Another issue was Travis testing of Polynomial. Since we depend on Piranha, we had to setup Travis testing with Piranha included and Polynomial tests run. This was done in the merged PR 585.

  • Before we get the Polynomial merged we have to add mul_poly, improve printing, and test exhaustively. The mul_poly is ready here, will be merged once more tests are prepared.

For mul_poly, previously we never checked the variables corresponding to the hash_sets, which implies you could only multiply a n variable polynomial with another n variable polynomial with the variable symbols same in both. When the variables of two hash_sets are different, a work around would be needed. This would result in slow down if done directly.

As suggested by Ondřej, mul_poly now calls two functions _normalize_mul and _mul_hashest. Here _noramlize_mul sees to it that the hash_sets satisfy the afore mentioned criteria and then _mul_hashset operates
For example, say mul_poly is called, then _normalize_mul converts {1, 2, 3} of x, y, z and {4, 5, 6} of p, q, r to {1, 2, 3, 0, 0, 0} and {0, 0, 0, 4, 5, 6} and _mul_hashset multiplies the two hash_set. The speed of benchmarks determined by _mul_hashset.

  • The printing needs improvement. As of now the polynomial 2*x + 2*y gets printed as 2*y**1*x**0 + 2*y**0*x**1.

  • Not all that was planned could be completed this summers, mostly because of my hectic schedule after the vacations ended and institure began. I am planning to work after the program ends too, when the workload eases. As the final deadline week of GSoC is coming up, I need to ensure at least the PRs on hold gets merged.I am planning to continue after the period ends so as

That's all I have
See ya

August 15, 2015

symengine-0.1.0 beta version was released this week and these two weeks were spent on making sure symengine works without a problem on Sage.

One issue was the linking of the python libraries in Sage. In binary releases of sage, the variable distutils.sysconfig.get_config_var('LIBDIR')  is wrong. It is set to the build machine's location. In Windows this is set to empty. Earlier, to link the python libraries into the python wrappers, python library was found using the above variable, but in some cases like Sage and Windows this method fails. To fix this, CMake now looks in `sys.prefix`/libs and `sys.prefix`/lib as well to find the python libraries.

Another issue that came up was cmake generating bad link flags. When installing in Sage, it is important to make sure the libraries in Sage are linked and not the system wide libraries. To do that libraries were searched for in the sage directories ignoring the system wide libraries. When given the full path of the libraries to link, we noticed a strange behaviour. `/path/to/sage/local/lib/libgmp.so` was changed to `-lgmp` causing the linker to pick up the system-wide gmp library. 

After reading through CMake documentation, I realized that this was due to `find_library` giving wrong locations of system libraries where there are multiple libraries of the same name for different architectures. For example if the output of `find_library(math NAMES m)` was given to find the standard math library, it may find a `libm.so` that was intended for a different architecture. Therefore when cmake realizes that the library being linked to is a system library then the full path is converted to `-lm` to delegate the task to the linker to find the correct library. 

This behaviour is useful for some scenarios, but in our case, this was not the behaviour I needed. Fortunately there was a workaround for this mentioned in the documentation. Using IMPORTED target feature in CMake, I was able to get CMake to use the full path of the library.

R7 and R8 benchmarks from symbench benchmarks of sage were added to benchmark SymEngine-C++ against GiNaC and also SymEngine-Python against SymPy and Sage.

August 10, 2015

Hi there! It's been 11 weeks into GSoC and we have reached into the last week before the soft deadline. Here is the Progress so far.

  Progress of Week 10 & 11

Last couple of weeks, I worked mainly on the Documentation of the solveset module. It's very important to let others know what we are doing and why we are doing, so this PR #9500 is an effort to accomplish that. Here are some of the important questions, I have tried to answer in the PR #9500

:check: What was the need of a new solvers module? :check: Why do we use sets as an output type? :check: What is this domain argument about? :check: What will you do with the old solve? :check: What are the general design principles behind the development of solveset? :check: What are the general methods employed by solveset to solve an equation? :check: How do we manipulate and return an infinite solutions? :check: How does solveset ensures that it is not returning any wrong solution?

There is still some polishing required in this as suggested by @hargup

Linsolve Docs

I completed the documentation PR for linsolve. See PR #9587

Differential Calculus Methods

I have also started working on the differential calculus methods as mentioned in my proposal here. See diff-cal branch.

from future import plan Week #12:

This week I plan to finish up all the pending work and wrap up the project and get PR #9500 Merged.

$ git log

  PR #9500 : Documenting solveset

  PR #9587 : Add Linsolve Docs

That's all for now, looking forward for week #12. :grinning:

July 29, 2015

It's been a long time since my last post. Holidays are now over and my classes have started. Last few days have been hectic for me. Here's the highlights of my last two weeks with SymPy.

Highlights:

My implementation of the algorithm to compute formal power series is finally done. As a result #9639 finally got merged. Thanks Jim and Sean for all the help. As #9639 brought in all the necessary changes #9572 was closed.

In the SymPy master,

>>> fps(sin(x), x)
x - x**3/6 + x**5/120 + O(x**6)
>>> fps(1/(1-x), x)
1 + x + x**2 + x**3 + x**4 + x**5 + O(x**6)

On a side note, I was invited for Push access by Aaron. Thanks Aaron. :)

Tasks:

  • Improve test coverage of series.formal.
  • Start working on operations on Formal Power Series.

July 27, 2015

Hi there! It's been nine weeks into GSoC . Here is the Progress for this week.

  Progress of Week 9

This week I worked on Replacing solve with solveset or linsolve in the codebase: Here are the modules, I covered, as of now:

@moorepants pointed out that I should not change old solvetests, since people may break an untested code, this argument is valid, so I have added equivalent tests for solveset, where it is competent with solve.

There are some untested code in codebase as well, where solve is used, for those cases replacing has not been done, as the tests would pass anyway, since those lines are not tested. So I have added a TODO for those instances, to replace with solveset, when those lines are tested.

Other Work

I also changed the output of linsolve when no solution are returned, earlier it throwed ValueError & now it returns an EmptySet(), which is consistent with rest of the solveset. See PR #9726

from future import plan Week #10:

This week I plan to Merge my pending PR's on replacing old solve in the code base with solveset, and work on Documentation & lambertw solver.

$ git log

  PR #9726 : Return EmptySet() if there are no solution to linear system

  PR #9724 : Replace solve with solveset in core

  PR #9717 : Replace solve with solveset in sympy.calculus

  PR #9716 : Use solveset instead of solve in sympy.sets

  PR #9717 : Replace solve with solveset in sympy.series

  PR #9710 : Replace solve with solveset in sympy.stats

  PR #9708 : Use solveset instead of solve in sympy.geometry

  PR #9587 : Add Linsolve Docs

  PR #9500 : Documenting solveset

That's all for now, looking forward for week #10. :grinning:

July 25, 2015

These two weeks me and Ondrej started adding support for different compilers.

I added support for MinGW and MinGW-w64. There were some documented, but not yet fixed bugs in MinGW that I encountered. When including cmath, there were errors saying `_hypot` not defined, and `off64_t` not defied. I added flags `-D_hypot=hypot -Doff64_t=_off64_t` to fix this temporarily. With that symengine was successfully built.

For python wrappers in windows, after building there was a wrapper not found error which was the result of not having the extension name as pyd in windows. Another problem faced was that, python distribution's `libpython27.a` for x64 was compiled for 32 bit architecture and there were linking errors. I found some patched files at http://www.lfd.uci.edu/~gohlke/pythonlibs/#libpython and python wrappers were built successfully. Also added continuous integration for MinGW using appveyor.

With MinGW, to install gmp all you had to do was run the command `mingw-get install mingw32-gmp`. For MinGW-w64, I had to compile gmp. For this appveyor came in handy. I started a build in appveyor, stopped it and then logged into the appveyor machine remotely using `remmina` (Each VM was shutdown after 40 minutes. Within that 40 minutes you can login and debug the building). I compiled gmp using msys and mingw-w64 and then downloaded them to my machine. For appveyor runs, these pre-compiled binaries of gmp were used to test MinGW-w64

Ondrej and I worked together to make sure SymEngine could be built using MSVC in Debug mode. Since gmp couldn't be used out of the box in MSVC, we used MPIR project's sources which included visual studio project files. MPIR is a fork of GMP and provides MSVC support. We used it to build SymEngine in MSVC. Later I added support for Release mode and also added continuous integration for both build types and platform types.

Python extension can also be built with MSVC. We are testing the Python extensions in Release mode only right now, because appveyor has only python release mode libraries and therefore when building the extension in Debug mode it gives an error saying python27_d.lib is not found.

I also improved the wrappers for Matrix by adding `__getitem__` and `__setitem__` so that the matrices can be used easily in Python.

Another improvement to SymEngine was the automatic simplification of expressions like `0.0*x` and `x**0.0`. These expressions are not simplified more in master, so I 'm proposing a patch to simplify them to `0.0` and `1.0` respectively.

July 24, 2015

Hello all. Last week has been rough, here's what I could do.

Report

The printing now works, hence I could test them. Due to that we could even test both the constructors, one from hash_set and other from Basic.

The Polynomial wrappers PR, we need to get in quick, our highest priority.

We need to make the methods more robust, we plan to get it in this weekend.
Once this is in, Shivam can start writing function expansions.

I have also couple of other tasks:

  • Use std::unordered_set so that we can have something even when there is no Piranha as dependency.
  • Replace mpz_class with piranha::integer throughout SymEngine and checkout benchmarks.

I intend to get Polynomial in this weekend because I get free on weekends :)
As there are only 3-4 weeks remaining, I need to buck up.

That's all I have
Bidāẏa

July 20, 2015

Hi there! It's been eight weeks into GSoC . Here is the Progress for this week.

  Progress of Week 8

This week, my PR for making invert_real more robust was Merged, along with these:

  • PR #9628 : Make invert_real more robust

  • PR #9668 : Support solving for Dummy symbols in linsolve

  • PR #9666 : Equate S.Complexes with ComplexPlane(S.Reals*S.Reals)

Note: We renamed S.Complex to S.Complexes, which is analogous with S.Reals as suggested by @jksuom.

I also opened PR #9671 for Simplifying ComplexPlane output when ProductSet of FiniteSets are given as input: ComplexPlane(FiniteSet(x)*FiniteSet(y)), It was earlier simplified to:

ComplexPlane(Lambda((x, y), x + I*y), {x} x {y})

It isn't very useful to represent a point or discrete set of points in ComplexPlane with an expression like above. So in the above PR it is now simplified as FiniteSet of discrete points in ComplexPlane:

In [3]: ComplexPlane(FiniteSet(a, b, c)*FiniteSet(x, y, z))
Out[3]: {a + I*x, a + I*y, a + I*z, b + I*x, b + I*y, b + I*z, c + I*x, c + I*y, c + I*z}

It's awaiting Merge, as of now.

Now, I have started replacing solve with solveset and linsolve.

from future import plan Week #9:

This week I plan to Merge my pending PR's & work on replacing old solve in the code base with solveset.

$ git log

  PR #9710 : Replace solve with solveset in sympy.stats

  PR #9708 : Use solveset instead of solve in sympy.geometry

  PR #9671 : Simplify ComplexPlane({x}*{y}) to FiniteSet(x + I*y)

  PR #9668 : Support solving for Dummy symbols in linsolve

  PR #9666 : Equate S.Complexes with ComplexPlane(S.Reals*S.Reals)

  PR #9628 : Make invert_real more robust

  PR #9587 : Add Linsolve Docs

  PR #9500 : Documenting solveset

That's all for now, looking forward for week #9. :grinning:

July 17, 2015

Hello. Short time since my last post. Here's my report since then.

Progress

I have continued my work on the Polynomial wrappers.

Constructors from hash_set and Basic have been developed and pushed up. Printing has also been pushed. I'm currently writing tests for both, they'll be ready soon.

When hash_set_eq() and hash_set_compare() were developed, we realised that there were many functions in *_eq() and *_compare() form with repeated logic, the idea was to templatize them which Shivam did in his PR #533.

Solution to worry of slow compilation was chalked which I wish to try in the coming week, using std::unique_ptr to a hash_set, instead of a straight hash_set. Hence not necessary to know the full definition of hash_set in the header. I've been reading relevant material, known as PIMPL idiom.

Report

WIP
* #511 - Polynomial Wrapper

Targets for Week 9

I wish to develop the Polynomial wrappers further in the following order.

  • Constructors and basic methods, add, mul, etc, working with proper tests.
  • Solve the problem of slow compilation times.
  • As mentioned previously, use standard library alternates to Piranha constructs so that we can have something even when there is no Piranha as dependency.

After the institute began, the times have been rough. Hoping everything falls in place.

Oh by the way, SymPy will be present (and represented heavily) at PyCon India 2015. We sent in the content and final proposal for review last week. Have a look at the website for our proposal here.

That's all this week.
sayōnara

July 14, 2015

Hi there! It's been seven weeks into GSoC and second half has started now. Here is the Progress so far.

  Progress of Week 7

This week I Opened #9628, which is basically an attempt to make solveset more robust, as I mentioned in my last post. The idea is to tell the user about the domain of solution returned.

Now, It makes sure that n is positive, in the following example:

In [3]: x = Symbol('x', real=True)

In [4]: n = Symbol('n', real=True)

In [7]: solveset(Abs(x) - n, x)
Out[7]: Intersection([0, oo), {n}) U Intersection((-oo, 0], {-n})

Otherwise it will return an EmptySet()

In [6]: solveset(Abs(x) - n, x).subs(n, -1)
Out[6]: EmptySet()

Earlier:

In [12]: solveset(Abs(x) - n, x)
Out[12]: {-n, n}

So, for this to happen, we needed to make changes in the invert_real:

if isinstance(f, Abs):
  g_ys = g_ys - FiniteSet(*[g_y for g_y in g_ys if g_y.is_negative])
  return _invert_real(f.args[0],
    Union(g_ys, imageset(Lambda(n, -n), g_ys)), symbol)
    Union(imageset(Lambda(n, n), g_ys).intersect(Interval(0, oo)),
          imageset(Lambda(n, -n), g_ys).intersect(Interval(-oo, 0))),
          symbol)

So, we applied set operations on the invert to make it return non-EmptySet only when there is a solution.

Now For more Complex Cases:

For the following case:

In [14]: invert_real(2**x, 2 - a, x)
Out[14]: (x, {log(-a + 2)/log(2)})

For the invert to be real, we must state that a belongs to the Interval (-oo, 2] otherwise it would be complex, but no set operation on {log(-a + 2)/log(2)} can make the interval of a to be in (-oo, 2].

Although, it does returns an EmptySet() on substituting absurd values:

In [23]: solveset(2**x + a - 2, x).subs(a, 3)
Out[23]: EmptySet()

So, we need not make any changes to the Pow handling in invert_real & It's almost done now, except for a couple of TODO's:

  • Document new changes
  • Add More tests

Though, I will wait for final thumbs up from @hargup, regarding this.

from future import plan Week #7:

This week I plan to complete PR #9628 & get it Merged & start working on replacing old solve in the code base with solveset.

$ git log

Below is the list of other PR's I worked on:

  PR #9671 : Simplify ComplexPlane({x}*{y}) to FiniteSet(x + I*y)

  PR #9668 : Support solving for Dummy symbols in linsolve

  PR #9666 : Equate S.Complexes with ComplexPlane(S.Reals*S.Reals)

  PR #9628 : [WIP] Make invert_real more robust

  PR #9587 : Add Linsolve Docs

  PR #9500 : Documenting solveset

That's all for now, looking forward for week #8. :grinning:

Highlights:

  • I opened #9639 bringing in the rest of the algorithm for computing Formal Power Series. There are still some un-implemented features. I hope to complete them in this week.

  • Few of my PR's got merged this week(#9622, #9615 and #9599). Thanks @jcrist, @aktech and @pbrady.

  • Opened #9643 for adding the docs related to Fourier Series.

Tasks:

  • Polish #9572 and get it ready to be merged.

  • Complete #9639.

  • Get docs of Fourier Series merged.

That's it. See you all next week. Happy Coding!

July 13, 2015

This week I worked on the Sage wrappers and Python wrappers. To make it easier to try out symengine, I made changes to the sage wrappers such that if sage does not have symengine_conversions methods, (i.e. sage not updated to the symengine branch) then conversions would be done via python strings. For example, an integer is converted to a Python string and then to a Sage integer. This is slow, but makes it easier to install symengine. You can try it out by downloading cmake-3.2.3.spkg and symengine-0.1.spkg and installing them. (Link to download is .....) To install type

sage -i /path/to/cmake-3.2.3.spkg

sage -i /path/to/symengine-0.1.spkg

Python wrappers included only a small amount of functions from SymEngine. Wrappers were added to functions like log, trigonometric functions, hyperbolic functions and their inverses.

CMake package for Sage is now ready for review, http://trac.sagemath.org/ticket/18078.

SymEngine package for Sage can be found here, https://github.com/isuruf/sage/tree/symengine. A PR would be sent as soon as CMake ticket is positively reviewed.

Next week, testing with Sage, Python docstrings, SymEngine package for Sage are the main things that I have planned for now. Also a PyNumber class to handle python numbers would be started as well.

Hello. Sorry for the really late post. As I was moving from home to Mumbai back and also part of the grading team of International Physics Olympiad(IPhO), I could not contribute as much as I had thought I could. Here is what I have for this week.

Progress

The Expression class was built upon the initial works of Francesco. I made a SymEngine patch with his as an initial commit. We now have a top-level value class.

The slowdowns finally got tackled. It was Piranha that needed amendment. The slowdown, as discussed previously, was due to the class thread_pool. This was resolved was templatizing thread_pool i.e. replace class thread_pool: private detail::thread_pool_base<> with template <typename = void> class thread_pool_: private detail::thread_pool_base<>. This basically saw to it that inclusion of individual headers. Including single piranha.hpp still had this problem. The problem was piranha.hpp includes settings.hpp, which in turn defines a non-template function called set_n_threads() which internally invokes the thread pool. This was resolved by a similar fix, the setting class to <typename = void> class settings_.

Many things were reported until now, hence Ondřej suggested a documentation of all the decisions taken. The wiki page, En route to Polynomial was hence made.

Report

WIP
* #511 - Polynomial Wrapper

Merged
* #512 - Add Francesco to AUTHORS
* #500 - Expression wrapper.

Documentation
En route to Polynomial

Targets for Week 8

Get the Polynomial wrapper merged.

Points to be noted:
* Use standard library alternates to Piranha constructs so that we can have something even when there is no Piranha as dependency.
* Basic class in, so that Shivam can start some work in SymEngine.

I am thankful to Ondřej and the SymEngine team for bearing with my delays. I hope I can compensate in the coming week.

That's all this week.
Adéu

July 07, 2015

Midterm evaluations are complete. I got to say Google was fairly quick in mailing the results. It was just after a few minutes after the deadline, I received a mail telling me I had passed. Yay!

Here's my report for week 6.

Highlights:

1. Formal Power Series:

For the most of the week I worked towards improving the implementation for the second part of the algorithm. I was able to increase the range of admissible functions. For this I had to write a custom solver for solving the RE of hypergeometric type. It's lot faster and better in solving the specific type of RE's this algorithm generates in comparison to just using rsolve for all the cases. However, it still has some issues. It's currently in testing phase and probably will be PR ready by the end of this week.

The code can be found here.

While working on it, I also added some more features to FormalPowerSeries(#9572).

Some working examples. (All the examples were run in isympy)

In [1]: fps(sin(x), x)
Out[1]: x - x**3/6 + x**5/120 + O(x**6)
In [2]: fps(cos(x), x)
Out[2]: 1 - x**2/2 + x**4/24 + O(x**6)
In [3]: fps(exp(acosh(x))
Out[3]: I + x - I*x**2/2 - I*x**4/8 + O(x**6)

2. rsolve:

During testing, I found that rsolve raises exceptions while trying to solve RE's, like (k + 1)*g(k) and (k + 1)*g(k) + (k + 3)*g(k+1) + (k + 5)*g(k+2) rather than simply returning None which it generally does incase it is unable to solve a particular RE. The first and the second RE are formed by functions 1/x and (x**2 + x + 1)/x**3 respectively which can often come up in practice. So, to solve this I opened #9615. It is still under review.

3. Fourier Series:

#9523 introduced SeriesBase class and FourierSeries. Both FormalPowerSeries and FourierSeries are based on SeriesBase. Thanks @flacjacket and @jcrist for reviewing and merging this.

In [1]: f = Piecewise((0, x <= 0), (1, True))
In [2]: fourier_series(f, (x, -pi, pi)
Out[2]: 2*sin(x)/pi + 2*sin(3*x)/(3*pi) + 1/2 + ...

4. Sequences:

While playing around with sequences, I realized periodic sequences can be made more powerful. They can now be used for periodic formulas(#9613).

In [1]: sequence((k, k**2, k**3))
Out[2]: [0, 1, 8, 3, ...]

5. Others:

Well I got tired with FormalPowerSeries(I am just a human), so I took a little detour from my regular project work and opened #9622 and #9626 The first one deals with inconsistent diff of Polys while while the second adds more assumption handler's like is_positive to Min/Max.

Tasks Week-7:

  • Test and polish hyper2 branch. Complete the algorithm.
  • Add sphinx docs for FourierSeries.
  • Start thinking on the operations that can be performed on FormalPowerSeries.

That's it. See you all next week. Happy Coding!

July 05, 2015

Hi there! It's been six weeks into GSoC, and it marks the half of GSoC. The Mid term evaluations have been done now, Google has been preety quick doing this, I recieved the passing mail within 15 minutes after the deadline to fill up evaluations, so basically GSoC Admin did the following, (I guess):

SELECT * FROM GSoCStudents
WHERE EvaluationResult='PASS';
and SendThemMail

(Don't Judge my SQL, I am not good at it!)

  Progress of Week 6

Last week my Linsolve PR #9438 finally got Merged Thanks! to @hargup @moorepants @flacjacket @debugger22 for reviewing it and suggesting constructive changes.

This week I worked on Intersection's of FiniteSet with symbolic elements, which was a blocking issue for lot of things, I managed to Fix the failing test which I mentioned in my last post. Eventually this PR got Merged as well, which has opened doors for lot of improvements.

Thanks to @jksuom & @hargup for iterating over this PR, and making some very useful comments for improving it to make it Mergeable.

I had a couple of hangout meeting with @hargup this week, (though now he has left for SciPy for a couple of weeks), we discussed about the further plan, for making solveset more robust, such as like returning the domain of invert while calling the invert_real , See #9617.

Motivation for this:

In [8]: x = Symbol('x', real=True)

In [9]: n = Symbol('n', real=True)

In [12]: solveset(Abs(x) - n, x)
Out[12]: {-n, n}

The solution returned above is not actually complete, unless, somehow we state n should be positive for the output set to be non-Empty. See #9588

from future import plan Week #7:

This week I plan to work on making invert_real more robust.

Relavant Issue:

$ git log

  PR #9618 : Add test for solveset(x**2 + a, x) issue 9557

  PR #9587 : Add Linsolve Docs

  PR #9500 : Documenting solveset

  PR #9612 : solveset return solution for solveset(True, ..)

  PR #9540 : Intersection's of FiniteSet with symbolic elements

  PR #9438 : Linsolve

  PR #9463 : ComplexPlane

  PR #9527 : Printing of ProductSets

  PR # 9524 : Fix solveset returned solution making denom zero

That's all for now, looking forward for week #7. :grinning:

July 04, 2015

This week, I worked on improving the testing and making Sage wrappers. First, building with Clang had several issues and they were not tested. One issue was a clang bug when `-ffast-math` optimization is used. This flag would make floating point arithmetic perform better, but it may do arithmetic not allowed by the IEEE floating point standard. Since it performs faster we have enabled it in Release mode and due to a bug in clang, a compiler error is given saying  error: unknown type name '__extern_always_inline' . This was fixed by first checking if the error is there in cmake and then adding a flag D__extern_always_inline=inline. Another issue was that type_traits header was not found. This was fixed by upgrading the standard C++ library, libstdc++

This week, I finished the wrappers for Sage. Now converters to and from sage can be found at sage.symbolic.symengine. For this module to convert using the C++ level members, symengine_wrapper.pyx 's definitions of the classes were taken out and declared in symengine_wrapper.pxd and implemented in pyx file. To install symengine in sage, https://github.com/sympy/symengine/issues/474 has to be resolved. A cmake check will be added to find whether this issue exists and if so, then the flag -Wa,-q will be added to the list of flags. We have to make a release of symengine if we were to make spkg's to install symengine in Sage, so some of my time next week will involve getting symengine ready for a release and then making spkgs for everyone to try out symengine.

July 03, 2015

Hello, received a mail few minutes into typing this, passed the midterm review successfully :)
Just left me wondering how do these guys process so many evaluations so quickly. I do have to confirm with Ondřej about this.
Anyways, the project goes on and here is my this week's summary.

Progress

SymEngine successfully moved to using Catch as a testing framework.

The travis builds for clang were breaking, this let me to play around with travis and clang builds to fix this issue. The linux clang build used to break because we used to mix-up and link libraries like GMP compiled with different standard libraries.
Thanks to Isuru for lending a helping hand and fixing it in his PR.

Next task to make SYMENGINE_ASSERT not use standard assert(), hence I wrote my custom assert which simulates the internal assert.
Now we could add the DNDEBUG as a release flag when Piranha is a dependence, this was also done.

Started work on Expression wrapper, PR that starts off from Francesco's work sent in.

Investigated the slow down in benchmarks that I have been reporting in the last couple of posts. Using git commit(amazing tool, good to see binary search in action!), the first bad commit was tracked. We realized that the inclusion of piranha.hpp header caused the slowdown and was resolved by using mp_integer.hpp, just the requirement header.
With immense help of Franceso, the problem was cornered to this:
* Inclusion of thread_pool leads to the slowdown, a global variable that it declares to be specific.
* In general a multi-threaded application may result in some compiler optimizations going off, hence slowdown.
* Since this benchmark is memory allocation intensive, another speculation is that compiler allocates memory differently.

This SO question asked by @bluescarni should lead to very interesting developments.

We have to investigate this problem and get it sorted. Not only because we depend on Piranha, we might also have multi-threading in SymEngine later too.

Report

No benchmarking was done this week.
Here is my PR reports.

WIP
* #500 - Expression Wrapper

Merged
* #493 - The PR with Catch got merged.
* #498 - Made SYMENGINE_ASSERT use custom assert instead of assert() and DNDEBUG as a release flag with PIRANHA.
* #502 - Make poly_mul used mpz_addmul (FMA), nice speedup of expand2b. * #496 - En route to fixing SYMENGINE_ASSERT led to a minor fix in one of the assert statements.
* #491 - Minor fix in compiler choice documentation.

Targets for Week 7

  • Get the Expression class merged.
  • Investigate and fix the slow-downs.

The rest of tasks can be finalized in later discussion with Ondřej.

That's all this week.
Ciao


Older blog entries


Planet SymPy is made from the blogs of SymPy's contributors. The opinions it contains are those of the contributor. This site is powered by Rawdog and Rawdog RSS. Feed readers can read Planet SymPy with RSS, FOAF or OPML.