[SciPy-dev] Machine learning datasets (was Presentation of pymachine, a python package for machine learning)

Bruce Southey bsouthey@gmail....
Wed May 30 08:15:59 CDT 2007


Hi,
You might find the UCI Machine Learning Repository a useful resource for data:
http://www.ics.uci.edu/~mlearn/MLRepository.html

Standard sources are:
Statlib: http://lib.stat.cmu.edu/
Netlib: http://www.netlib.org/

Even with those included with R may be used because some are in public domain.

Regards
Bruce


On 5/14/07, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:
> This google search turns up more sources:
>
> http://www.google.com/search?q=data+%22in+the+public+domain%22+%22freely+by+the+public%22&ie=utf-8&oe=utf-8&rls=org.mozilla:en-US:official&client=firefox-a
>
>
> On 5/14/07, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:
> > With some digging, we should be able to find text, images, scientific
> data, census info, and audio which is in the public domain.  I see this a
> lot:
> >
> > "The information on government web pages is in the public domain unless
> specifically annotated otherwise (copyright may be held elsewhere) and may
> therefore be used freely by the public."
> >
> > I'm still learning the ins and outs of licensing, but my understanding is
> that data/information released as "public domain" can be bundled with
> anything, i.e. BSD licenses.  Does anyone have any insight into this?
> >
> > Here are a few starter sites which mention that the data they provide is
> in the public domain:
> >
> > Geophysical data:
> >
> > http://www.ngdc.noaa.gov/ngdcinfo/onlineaccess.html
> >
> > Nasa images and measurements:
> >
> >
> http://adc.gsfc.nasa.gov/adc/questions_feedback.html#policies1%22
> >
> > Weather, forecasts:
> >
> > http://www.weather.gov/disclaimer.php
> >
> > Genomic data:
> >
> >
> http://www.drugresearcher.com/news/ng.asp?id=10960-snp-database-goes
> >
> >
> > -Pete
> >
> >
> > On 5/14/07, David Cournapeau < david@ar.media.kyoto-u.ac.jp> wrote:
> >
> > > Peter Skomoroch wrote:
> > > > I followed some of the discussion around datasets in the previous
> > > > threads.  As you mentioned, it might make sense to make some larger
> > > > datasets available separately or as cran-style optional installs, but
> > > > I think it would also be a big plus for scipy to have some more
> > > > smaller built-in datasets of various types.
> > > I agree, but there is the problem of licensing. I don't know what is the
> > > status of data from a legal point of view: if I copy data from a book, I
> > > guess this is copyrighted as the rest of the book. But then there are
> > > some famous datasets (eg old faithful, iris, etc...) which are available
> > > in different softwares with different licenses.
> > >
> > > Copying the datasets of R (in r-base) would be useful, but they fall
> > > under the GPL, hence they cannot be included in scipy, at least if
> > > datasets are also under the GPL. Unfortunately, I don't know the legal
> > > details of those cases (status of data in a package licensed under the
> GPL).
> > >
> > > Anyway, if I have some useful datasets, I will certainly make them in a
> > > separate package, so that it can be used outside pymachine.
> > >
> > > David
> > > _______________________________________________
> > > Scipy-dev mailing list
> > > Scipy-dev@scipy.org
> > > http://projects.scipy.org/mailman/listinfo/scipy-dev
> > >
> >
> >
> >
> >
> > --
> > Peter N. Skomoroch
> > peter.skomoroch@gmail.com
> > http://www.datawrangling.com
>
>
>
> --
> Peter N. Skomoroch
> peter.skomoroch@gmail.com
> http://www.datawrangling.com
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-dev
>
>


More information about the Scipy-dev mailing list