[SciPy-dev] Machine learning datasets (was Presentation of pymachine, a python package for machine learning)
Robert Kern
robert.kern@gmail....
Thu May 31 12:17:23 CDT 2007
Anne Archibald wrote:
> Datasets published in academic papers are no less subject to these
> restrictions; generally if you want to use one you must negotiate with
> the author.
Not necessarily. There is another US-specific exception. Data is not
copyrightable in the United States. For something to be copyrightable here, it
must contain some creative content. Thus, while I may not photocopy a phone book
and sell the copy (the arrangement, typography, etc. are deemed creative and
copyrightable), I may write down all of the numbers and typeset my own phone book.
Now, most other countries don't have this rule. Notably, countries in the EU
tend to recognize "the sweat of the brow" expended in collecting the data as
being worthy of copyright protection.
IANAL, but my approach would be to get in touch with the original source of the
data if possible, and ask. The biggest problem you'll face is that few of those
sources have ever thought about their datasets in terms of copyright licenses,
particularly *software* copyright licenses that permit modification to their
precious data. If it's an American source and the data appears to be freely
distributed, as in the UCI database, I would probably just take it as public
domain according to US law.
But of course, I'm in the US. There is a *tiny* possibility that although the
"author" of the data is inside the US, too, he intends to pursue copyright
outside of the US. However, if it's on something as visible as the UCI site,
this possibility is really tiny.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Scipy-dev
mailing list