Zero-Rank Arrays

Zero-rank arrays are arrays with shape=(). For example:

>>> x = array(1)
>>> x.shape
()

Zero-Rank Arrays and Array Scalars

Array scalars are similar to zero-rank arrays in many aspects

>>> int_(1).shape
()

They even print the same:

>>> print int_(1)
1
>>> print array(1)
1

However there are some important differences:

  • Array scalars are immutable
  • Array scalars have different python type for different data types

Motivation for Array Scalars

Numpy's design decision to provide 0-d arrays and array scalars in addition to native python types goes against one of the fundamental python design principles that there should be only one obvious way to do it. In this section we will try to explain why it is necessary to have three different ways to represent a number.

There were several numpy-discussion threads:

It has been suggested several times that numpy just use rank-0 arrays to represent scalar quantities in all case. Pros and cons of converting rank-0 arrays to scalars were summarized as follows:

         Pros: 

               1) Some cases when Python expects an integer (the most
               dramatic is when slicing and indexing a sequence:
               _PyEval_SliceIndex in ceval.c) it will not try to
               convert it to an integer first before raising an error.
               Therefore it is convenient to have 0-dim arrays that
               are integers converted for you by the array object.

               2) No risk of user confusion by having two types that
               are nearly but not exactly the same and whose separate
               existence can only be explained by the history of
               Python and NumPy development.

               3) No problems with code that does explicit typechecks
               (isinstance(x, float) or type(x) ==
               types.FloatType). Although explicit typechecks are
               considered bad practice in general, there are a couple
               of valid reasons to use them.

               4) No creation of a dependency on Numeric in pickle
               files (though this could also be done by a special case
               in the pickling code for arrays)
         
         Cons:  It is difficult to write generic code because scalars
                do not have the same methods and attributes as arrays.
                (such as .type  or .shape).  Also Python scalars have
                different numeric behavior as well. 

                This results in a special-case checking that is not 
                pleasant.  Fundamentally it lets the user believe that 
                somehow multidimensional homoegeneous arrays
                are something like Python lists (which except for
                Object arrays they are not).

(See "Multidimensional Arrays for Python" by Travis Oliphant, draft 02-Feb-2005.)

Numpy implements a solution that is designed to have all the pros and none of the cons above.

      Create Python scalar types for all of the 21 types and also
      inherit from the three that already exist.  Define equivalent
      methods and attributes for these Python scalar types.

(Ibid.)

The Need for Zero-Rank Arrays

Once the idea to use zero-rank arrays to represent scalars was rejected, it was natural to consider whether zero-rank arrays can be eliminated alltogether. However there are some important use cases where zero-rank arrays cannot be replaced by array scalars. See also A case for rank-0 arrays.

  • Output arguments
    >>> y = int_(5)
    >>> add(5,5,x)
    array(10)
    >>> x
    array(10)
    >>> add(5,5,y)
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    TypeError: return arrays must be of ArrayType
    
  • Shared data
    >>> x = array([1,2])
    >>> y = x[1:2]
    >>> y.shape = ()
    >>> y
    array(2)
    >>> x[1] = 20
    >>> y
    array(20)
    

Indexing of Zero-Rank Arrays

As of NumPy release 0.9.3, zero-rank arrays do not support any indexing

>>> x[...]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: 0-d arrays can't be indexed.

On the other hand there are several cases that make sense for rank-zero arrays.

Ellipsis and empty tuple

Sasha started a discussion on scipy-dev with the folowing proposal:

... it may be reasonable to allow a[...].  This way
ellipsis can be interpereted as any number of  ":"s including zero. 
Another subscript operation that makes sense for scalars would be
a[...,newaxis] or even a[{newaxis, }* ..., {newaxis,}*], where 
{newaxis,}* stands for any number of comma-separated newaxis tokens. 
This will allow one to use ellipsis in generic code that would work on
any numpy type. 

The idea to allow increasing the rank of zero-rank arrays with [..., newaxis] was originally proposed by Sebastien.

Francesc Altet supported the idea of [...] on zero-rank arrays and suggested that [()] be supported as well.

Francesc's proposal was:

In [65]: type(numpy.array(0)[...])
Out[65]: <type 'numpy.ndarray'>

In [66]: type(numpy.array(0)[()])   # Indexing a la numarray
Out[66]: <type 'int32_arrtype'>

In [67]: type(numpy.array(0).item())  # already works
Out[67]: <type 'int'>

There is a consensus that for a zero-rank array x, both x[...] and x[()] should be valid, but the question remains on what should be the type of the result - zero rank ndarray or x.dtype?

(Sasha) First, whatever choice is made for x[...] and x[()] they should be the same because ... is just syntactic sugar for "as many : as necessary", which in the case of zero rank leads to ... = (:,)*0 = (). Second, rank zero arrays and numpy scalar types are interchangeable within numpy, but numpy scalars can be use in some python constructs where ndarrays can't. For example:

>>> (1,)[array(0)]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: tuple indices must be integers
>>> (1,)[int32(0)]
1

Since most if not all numpy function automatically convert zero-rank arrays to scalars on return, there is no reason for [...] and [()] operations to be different.

See changeset:1864 for implementation of x[...] and x[()] returning numpy scalars.

See changeset:1866 for implementation of x[...] = v and x[()] = v.

Increasing rank with newaxis

Everyone who commented liked this feature, so as of changeset:1871 any number of ellipses and newaxis tokens can be placed as a subscript argument for a zero-rank array. For example:

>>> x = array(1)
>>> x[newaxis,...,newaxis,...]
array([[1]])

It is not clear why more than one ellipsis should be allowed, but this is the behavior of higher rank arrays that we are trying to preserve.

Refactoring

Currently all indexing on zero-rank arrays is implemented in a special if (nd == 0) branch of code that used to always raise an index error. This ensures that the changes do not affect any existing usage (except, the usage that relies on exceptions). On the other hand part of motivation for these changes was to make behavior of ndarrays more uniform and this should allow to eliminate if (nd == 0) checks alltogether.