As regular readers will know, I often share one file Python scripts here. Recently I wrote table.py, a simple module for creating reStructuredText tables in Python. It is available from my code section. Or you can go to the direct download. Save it as table.py somewhere you can find it!
As always, constructive comments, suggestions and improvements are very welcome.
The way it works is that you put the data you want into a matrix-like structure, i.e. a list or tuple containing a number of lists or tuples. So each list/tuple forms a table row, each item inside the list/tuple forms a cell.
So as an example, say we want to make a table showing the expansion of the European Union. We start with a matrix (i.e. tuple of tuples):
>>> eu_enlargement = (("Name", "Year", "Members"), ("ECSC", "1958", "6"),
... ("EC", "1973", "9"), ("EC", "1981", "10"), ("EC", "1986", "12"),
... ("EU", "1995", "15"), ("EU", "2004", "25"), ("EU", "2007", "27"))
We import the Table class from the table.py module:
>>> from table import Table
We make an instance of table, giving the matrix as the argument:
>>> eu_table = Table(data = eu_enlargement)
Or to be less verbose:
>>> eu_table = Table(eu_enlargement)
Lastly, we can print out the table in a reStructuredText table:
>>> print eu_table.create_table()
Which outputs the following table:
+------+------+---------+
| Name | Year | Members |
+======+======+=========+
| ECSC | 1958 | 6 |
+------+------+---------+
| EC | 1973 | 9 |
+------+------+---------+
| EC | 1981 | 10 |
+------+------+---------+
| EC | 1986 | 12 |
+------+------+---------+
| EU | 1995 | 15 |
+------+------+---------+
| EU | 2004 | 25 |
+------+------+---------+
| EU | 2007 | 27 |
+------+------+---------+
Which when rendered to HTML, looks like this:
| Name | Year | Members |
|---|---|---|
| ECSC | 1958 | 6 |
| EC | 1973 | 9 |
| EC | 1981 | 10 |
| EC | 1986 | 12 |
| EU | 1995 | 15 |
| EU | 2004 | 25 |
| EU | 2007 | 27 |
Very simple stuff, but handy if you are a reStructuredText fan.
Doctest
I used table.py to finally wield the doctest module in anger. doctest is a standard library test module.
doctest is one of those things that is harder to explain than to just do.
Start by running the following command at the shell:
python table.py
This should hopefully do nothing. This is because all of the tests have passed.
Now run:
python table.py -v
If you compare the output with the code of table.py, you should see what is going on.
The idea is you put interactive examples into the docstrings of the module, which not only help to document the module, but also provide something that can be automatically checked.
So to write tests using doctest, you simple use the module at the shell, as we did in the first half of this post, and then copy everything in the shell into the docstrings.
The one complication is that if one of your commands creates a blank line, then you need to put <BLANKLINE> in the blank line in your docstring.
<p>Some matters of style, some matters of taste:</p>
<p>Have you considered 'Name Year Members'.split() instead of ("Name", "Year", "Members")? It seems to split Python programmers into "love it"/"hate it". I like it and so does Raymond Hettinger.</p>
<p>The matrix version of that would be:</p>
<div class="highlight"><pre><span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">(),</span> <span class="p">[</span><span class="s">'Name Year Members'</span><span class="p">,</span>
<span class="s">'ECSC 1958 6'</span><span class="p">,</span> <span class="s">'EC 1973 9'</span><span class="p">,</span> <span class="s">'EC 1981 10'</span><span class="p">,</span> <span class="s">'EC 1986 12'</span><span class="p">,</span>
<span class="s">'EU 1995 15'</span><span class="p">,</span> <span class="s">'EU 2004 25'</span><span class="p">,</span> <span class="s">'EU 2007 27'</span><span class="p">])</span>
</pre></div>
<p>Hmm, what a shame split isn't a builtin.</p>
<p>You often use an auxiliary variable to maintain an index count whilst looping though a list:</p>
<div class="highlight"><pre><span class="n">i</span> <span class="o">=</span> <span class="mf">0</span>
<span class="k">for</span> <span class="n">cell</span> <span class="ow">in</span> <span class="n">row</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span> <span class="o">></span> <span class="bp">self</span><span class="o">.</span><span class="n">widths</span><span class="p">[</span><span class="n">i</span><span class="p">]:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">widths</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">cell</span><span class="p">)</span>
<span class="n">i</span> <span class="o">+=</span> <span class="mf">1</span>
</pre></div>
<p>use <tt class="docutils literal"><span class="pre">for</span> <span class="pre">i,cell</span> <span class="pre">in</span> <span class="pre">enumerate(row):</span></tt> instead.</p>
<p>And another thing... <tt class="docutils literal"><span class="pre">if</span> <span class="pre">len(cell)</span> <span class="pre">></span> <span class="pre">self.widths[i]:</span> <span class="pre">self.widths[i]</span> <span class="pre">=</span> <span class="pre">len(cell)</span></tt> is better written as <tt class="docutils literal"><span class="pre">self.widths[i]</span> <span class="pre">=</span> <span class="pre">max(self.widths[i],</span> <span class="pre">len(cell))</span></tt></p>
<p>So that's:</p>
<div class="highlight"><pre><span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">cell</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">row</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">widths</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">widths</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="nb">len</span><span class="p">(</span><span class="n">cell</span><span class="p">))</span>
</pre></div>
<p>More in a bit.</p>
<div class="highlight"><pre><span class="n">div_tup</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">([((</span><span class="n">width</span> <span class="o">+</span> <span class="mf">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">line</span><span class="p">)</span> <span class="k">for</span> <span class="n">width</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">widths</span><span class="p">])</span>
</pre></div>
<p>What version of Python do you use? Since we got generator expressions (2.4) almost everything that takes a list takes a generator expression (an iterator) instead. That means you don't need to create an intermediate list with «[ some comprehension ]». You can just drop the square brackets and go:</p>
<div class="highlight"><pre><span class="nb">tuple</span><span class="p">(((</span><span class="n">width</span> <span class="o">+</span> <span class="mf">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">line</span><span class="p">)</span> <span class="k">for</span> <span class="n">width</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">widths</span><span class="p">)</span>
</pre></div>
<p>I would also recommend dropping the brackets around the expression to the left of "for"; the "for" keyword is strong hint (to you and the parser) that it's a generator expression:</p>
<div class="highlight"><pre><span class="nb">tuple</span><span class="p">((</span><span class="n">width</span> <span class="o">+</span> <span class="mf">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">line</span> <span class="k">for</span> <span class="n">width</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">widths</span><span class="p">)</span>
</pre></div>
<p>You ought to decide whether to write «'string' * number» or «number * 'string'»; you use both and switch between them. I wish PEP 8 said something about this. Personally I use the first one, with number on the right. I've no idea why, I just picked one and stuck with it.</p>
<p>Surrounding code:</p>
<div class="highlight"><pre><span class="n">div_sub</span> <span class="o">=</span> <span class="s">"+</span><span class="si">%s</span><span class="s">"</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">widths</span><span class="p">)</span> <span class="o">+</span> <span class="s">"+</span><span class="se">\\</span><span class="s">n"</span>
<span class="n">div_tup</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">([((</span><span class="n">width</span> <span class="o">+</span> <span class="mf">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">line</span><span class="p">)</span> <span class="k">for</span> <span class="n">width</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">widths</span><span class="p">])</span>
<span class="k">return</span> <span class="n">div_sub</span> <span class="o">%</span> <span class="n">div_tup</span>
</pre></div>
<p>Ahhh.. using string formatting to join lots of strings together. The alarm bell is that you're constructing a format string dynamically. It's not always a mistake, but often is. In this case, use '+'.join(...). You can even get rid of the intermediate tuple:</p>
<div class="highlight"><pre><span class="k">return</span> <span class="s">'+'</span> <span class="o">+</span> <span class="s">'+'</span><span class="o">.</span><span class="n">join</span><span class="p">((</span><span class="n">width</span> <span class="o">+</span> <span class="mf">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">line</span> <span class="k">for</span> <span class="n">width</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">widths</span><span class="p">)</span> <span class="o">+</span> <span class="s">'+</span><span class="se">\\</span><span class="s">n'</span>
</pre></div>
<p>It's a shame you couldn't come to the code clinic at PyCon UK that Raymond Hettinger (and me, but Raymond did all the heavy lifting) did. It was lots of fun and I think you would've liked it.</p>
<p>Hmm. Those we're long comments. Maybe I should've written a blog article instead.</p>
<p>Hi David,</p>
<p>Thanks again for those really useful comments, I will study them carefully. On the code clinic, I heard a lot of people say it was fantastic. I was pretty involved in set up tasks that day, though I did manage to grab a little of the generator tutorial. How about you hold it again next year?</p>
<p><em>What version of Python do you use?</em></p>
<p>I have tended to target Python 2.3 and above. I probably end up using 2.4 or 2.5 features, but Iusually try to make it work on 2.3 when required, because that is what the last version of Redhat and the previous version of OS X (10.4 Tiger) shipped with.</p>
<p>The dependencies I tend to use also tend to require 2.3 and above, so there is no point going below that and supporting Python 2.2.</p>
<p>Perhaps that is outdated and I should try out more modern features, but there are a lot of Redhat and Solaris servers out there with Python 2.3.</p>
<p><em>use an auxiliary variable to maintain an index count whilst looping though a list</em></p>
<p>On the i = 0, you caught me there. Being mostly self taught and having started with the BBC Micro and Amstrad CPC, this following quote from the Dutch computer scientist Edsger Dijkstra is probably true!:</p>
<p><em>It is practically impossible to teach good programming style to students that have had prior exposure to BASIC; as potential programmers they are mentally mutilated beyond hope of regeneration.</em></p>
<p>I try not to use integer counts when unpacking, unless I do care how many items I have unpacked. I will try out <em>enumerate</em>, as I have never seen it before.</p>
<p>I think the BBC Micro provides a fine start in life.</p>
<p>I sympathise with you on the Python 2.3 thing. That would normally be my "must run on" version, but for our recent <a href="http://www.clearclimatecode.org" rel="nofollow">www.clearclimatecode.org</a> project we chose Python 2.4 (more or less by accident), and it has proved annoying when we tried to use AIX 6.1 for which only Python 2.3 binaries are easily available.</p>
<p>Great code snippet - I think I'd find some usage for that one. I wonder what is the license of your code? MIT?</p>
<p>@David Jones:</p>
<p>You can do</p>
<div class="highlight"><pre><span class="k">from</span> <span class="nn">string</span> <span class="k">import</span> <span class="n">split</span>
</pre></div>
<p>instead of lambda uglyness</p>
<p>Hello, Zeth.</p>
<p>Congratulations for the script. It's very helpful.
I'd like to report a bug.
I'm trying to create tables with special characters, and the output is not OK. I modified the script using unicode to calculate the right length of cell.
Look at these lines, around line 194 in table.py</p>
<blockquote>
<dl class="docutils">
<dt>for cell in row:</dt>
<dd># Calculate the required spacing for each cell
row_space.append((self.widths[i] - len(unicode(cell, 'utf-8'))) * " ")
# Add the cell data and the space to the row list
row_list.append(cell)
row_list.append(row_space[i])
i += 1</dd>
</dl>
</blockquote>
<p>Your original code is:</p>
<blockquote>
row_space.append((self.widths[i] - len(cell)) * " ")</blockquote>
<div class="highlight"><pre>
</pre></div>