Scikits for developers

scikits is a ...

Scikits are independent and separately installable projects hosted under a common namespace that are too specialized to live in scipy itself, but targeted at the same community. People interested in starting a scikit should write an email to the scipy-dev list.

Whilst the recommended license for scikits projects is the (new) BSD license, scikits packages are free to choose their own open source license. The license should be officially OSI approved. We, the scipy-developers, will allow packages to contain code with licenses that, in our judgment, comply with the Open Source Definition but have not gone through the approval process. This is to allow us to adopt old code with permissive licenses. The package itself, though, should use a well-known OSI-approved license.

Scikits SVN and project structure

All scikits share a common svn repository (learn about svn here), and tags and branches are at the toplevel, rather than per project. Consequently each project should prefix the project name to any tag/branch directory (see below for example). Here is what the structure looks like from the point of the mlabwrap scikit:

   branches
     mlabwrap-1.0
     ...
   tags
     mlabwrap-1.0.0
     ...
   trunk
     mlabwrap/
        setup.py                  # setuptools based
        setup.cfg                 # optional, needed e.g. for development releases
        README.txt
        scikits/
          __init__.py             # __import__('pkg_resources').declare_namespace(__name__)
          mlabwrap/
             __init__.py          # everything scikits.mlabwrap exports
             version.py           # see below (FIXME is this the right place?)
             tests/               # unit-tests where NumpyTest can find them, i.e.
                __init__.py       # empty
                test_mlabwrap.py  # .../scikits/NAME/tests/test_NAME.py
          ...

             ...              
     some_other_scikit
         ...

Scikits' use of setuptools

Setuptools is the coming de-facto standard for creating, building and distributing (CPAN-style) python-packages, and as such more or less supersedes distutils, although it is not yet distributed with python (it will be in python 2.6).

All scikits will need to use setuptools in order to allow for the scikits namespace package (i.e. project names look like scikits.mlabwrap, but scikits is essentially just a namespace, not a package on its own right). The scikits/__init__.py file is identical for all projects and just contains the single line in the comment above. But setuptools offers other benefits as well, chiefly the ability to automatically download and install packages (".egg"s) and their dependencies via easy_install as well as several additional conveniences and features offers distutils (a few useful tricks are mentioned below).

Quick setuptools intro

Installing setuptools/easy_install

On ubuntu or debian you should be able to do:

> sudo aptitude install setuptools 
# to then download&install a python-package system-wide; careful uninstall is by hand!
> sudo easy_install some_python_package

alternatively, under unix:

> cd /tmp; wget http://peak.telecommunity.com/dist/{virtual-python,ez_setup}.py
# choice a) install everything as root, eggs go into the system dirs
> sudo python ez_setup.py
# choice b) create a local ~/bin/python ~/lib/python2.x with system python as prototype
# (PYTHONPATH should be empty, PATH must contain ~/bin)
> python virtual-python.py; python ez_setup.py;
# choice c) use existing PYTHONPATH (e.g., if PYTHONPATH=~/py/lib)
# then create ~/.pydistutils.cfg with this content:
[install]
install_lib = ~/py-lib
install_scripts = ~/bin
# and do 
> python ez_setup.py

Additional use of numpy.distutils

Most (all?) scikits will also use numpy, but additionally using `numpy.distutils` is not strictly required (projects with C extensions will need the numpy headers, but these can also be obtained with numpy.get_includes), but if numpy.distutils is used, it must be imported after setuptools.

Template setup.py for a scikits project

FIXME not quite right.

    # file `setup.py`
    import setuptools
    from numpy.distutils.core import setup, Extension
    ...
    setup(install_requires='numpy', # can also add version specifiers      
          namespace_packages=['scikits'],
          packages=setuptools.find_packages(),
          test_suite="scikits.mlabwrap.tests.test_mlabwrap", # for python setup.py test
          zip_safe=True, # the package can run out of an .egg file
          name="mlabwrap", # what the project will be known as in cheeseshop
          version="1.1",  # see notes on version handling below
          description="A high-level bridge to matlab",
          author="Alexander Schmolck",
          author_email=...,
          license="MIT",
          #FIXME url, download_url, ext_modules
         
          ...)

Versioning

Hardcoding a version string in several places is error-prone, therefore it is recommended that you adopt the following approach:

1. specify the version number of the *official* release in setup.py, as above (e.g setup(version='1.1', ...) 2, Put this in setup.cfg for a development release (or leave blank for an official release). NB. you are doing a development release version above should to the *coming* official release, not to the one that has already been released; the development version identifier will then look something like 1.1.dev77 where 77 is the svn revision; for official releases the identifier will just be "1.1")

[egg_info]
tag_svn_revision = 1
tag_build = .dev

3. Have a version.py file at the place shown in the hierachy above, that follows the following template:

from pkg_resources import require
__version__ = require('scikits.mlabwrap')[0].version # substitute your project name

4. from version import __version__ in your __init__.py or scikits/package/api.py file depending on which convention you use

XXX note: the reason I recommend the version.py file approach above is that it means there's one consistent place to get the version from, regardless of whether the package itself is only a namespace or not (i.e. if __init__.py is empty).

Using setuptools to distribute your project as an .egg

> python setup.py register sdist bdist_egg upload -s

Will register to PyPI, the python package index and upload a gpg signed (-s) source (sdist) and binary (bdist) distribution of your project.

Some more notes regarding setuptools

  • setuptools is largely backwards-compatible to distutils, the old distutils ways can also be used although setuptools does offer preferred ways for doing several things (e.g. a MANFIEST.in file can be used; however setuptools will use it directly (in conjunction with VC info) and completely ignore the contents of the MANIFEST file that distutils generates from it; typically neither is required though as in the absence of a MANIFEST.in file setuptools will automatically include all files under version control when building a source distribution), the old distutils ways can also be used
  • During development use python setup.py develop [...] (once) instead of python setup.py install [...] (many times). This will essentially allow you to conveniently edit the files in your working svn directory without needing to install your module to site-packages every time you made a change and want to test it. More Info
  • the python setup.py test command is also useful, more in Testing below.
  • with `pkg_resources` setuptools provides a convenient and platform independent way to access data files and other resources from within the egg, extracting them if required -- for example, during testing (and only testing), mlabwrap needs to unpack its tests directory in order to make some helper files visible to matlab; so test_mlabwrap.py includes the following lines:
    import pkg_resources
    import atexit
    # testDir is the filename that the tests directory is installed or unpacked to
    testDir = pkg_resources.resource_filename('scikits.mlabwrap', 'tests')
    atexit.register(pkg_resources.cleanup_resources)
    

  • finally setuptools makes it possible to write packages that offer plug-in entry points for 3rd party libraries (e.g. a documentation framwork that accepts different types of markup parsers) see Egg info files

Testing

Here is a simple template for writing unittests with numpy.testing:

# file trunk/mlabwrap/scikits/mlabwrap/tests/test_mlabwrap.py

from numpy.testing import NumpyTest, NumpyTestCase, assert_equal
import numpy
from scikits.mlabwrap import mlab
# testcase subclass names *must* follow the pattern 'test...'
# use as many different testcases as desired
class testMlabwrap(NumpyTestCase): # unittest.TestCase would also be fine
     # use prefix 'test' for test-methods (NumpyTestCase also accepts 'check' and 'bench')
     def test_basic(self):
         "This tests the fundamental functionality of mlabwrap."
         # use numpy.testing.assert_equal to compare arrays or containers that contain arrays
         assert_equal(numpy.array([1.,2.,3.]), mlab.linspace(1,3,3))
         # ...
     # only run if n>=3 in NumpyTest(...).run(level=n), on which see below
     def test_more_exotic_stuff(self, level=3): 
         something_obscure_and_expensive()

# adding this will make ``python test_mlabwrap.py`` run the tests
# for running the tests from the interactive shell or other modules see below
if __name__ == '__main__':
      NumpyTest().run()

Every scikit should have a tests/test_<SCIKIT_NAME>.py file in its package dir (see directory tree above). This location ensures that numpy's testing framework can find and run all the tests by issuing a command such as

numpy.testing.NumpyTest(<SCIKIT_NAME>).run(verbose=2, level=3)

<SCIKIT_NAME> here could either be a string ('scikits.mlabwrap') or the actual package (previously imported). Apart from assert_equal and similar comparison functions that handle arrays correctly, the main advantage numpy.testing has over plain unittest is that it allows

Finally, as we specified setup(..., test_suite='scikits.mlabwrap.tests.test_mlabwrap') in setup.py the tests can also be run via python setup.py test. TODO: write about using ipython for convenient interactive testing