[SciPy-dev] Machine learning datasets (was Presentation of pymachine, a python package for machine learning)

Peter Skomoroch peter.skomoroch@gmail....
Wed May 30 08:37:59 CDT 2007


I was going to suggest UCI, but the licensing wasn't clear from the
site (there are a few restricted datasets).  Maybe contacting the
maintainer is the best way to sort out the usage:

Librarian: Patrick M. Murphy (ml-repository@ics.uci.edu)




On 5/30/07, Bruce Southey <bsouthey@gmail.com> wrote:
> Hi,
> You might find the UCI Machine Learning Repository a useful resource for data:
> http://www.ics.uci.edu/~mlearn/MLRepository.html
>
> Standard sources are:
> Statlib: http://lib.stat.cmu.edu/
> Netlib: http://www.netlib.org/
>
> Even with those included with R may be used because some are in public domain.
>
> Regards
> Bruce
>
>
> On 5/14/07, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:
> > This google search turns up more sources:
> >
> > http://www.google.com/search?q=data+%22in+the+public+domain%22+%22freely+by+the+public%22&ie=utf-8&oe=utf-8&rls=org.mozilla:en-US:official&client=firefox-a
> >
> >
> > On 5/14/07, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:
> > > With some digging, we should be able to find text, images, scientific
> > data, census info, and audio which is in the public domain.  I see this a
> > lot:
> > >
> > > "The information on government web pages is in the public domain unless
> > specifically annotated otherwise (copyright may be held elsewhere) and may
> > therefore be used freely by the public."
> > >
> > > I'm still learning the ins and outs of licensing, but my understanding is
> > that data/information released as "public domain" can be bundled with
> > anything, i.e. BSD licenses.  Does anyone have any insight into this?
> > >
> > > Here are a few starter sites which mention that the data they provide is
> > in the public domain:
> > >
> > > Geophysical data:
> > >
> > > http://www.ngdc.noaa.gov/ngdcinfo/onlineaccess.html
> > >
> > > Nasa images and measurements:
> > >
> > >
> > http://adc.gsfc.nasa.gov/adc/questions_feedback.html#policies1%22
> > >
> > > Weather, forecasts:
> > >
> > > http://www.weather.gov/disclaimer.php
> > >
> > > Genomic data:
> > >
> > >
> > http://www.drugresearcher.com/news/ng.asp?id=10960-snp-database-goes
> > >
> > >
> > > -Pete
> > >
> > >
> > > On 5/14/07, David Cournapeau < david@ar.media.kyoto-u.ac.jp> wrote:
> > >
> > > > Peter Skomoroch wrote:
> > > > > I followed some of the discussion around datasets in the previous
> > > > > threads.  As you mentioned, it might make sense to make some larger
> > > > > datasets available separately or as cran-style optional installs, but
> > > > > I think it would also be a big plus for scipy to have some more
> > > > > smaller built-in datasets of various types.
> > > > I agree, but there is the problem of licensing. I don't know what is the
> > > > status of data from a legal point of view: if I copy data from a book, I
> > > > guess this is copyrighted as the rest of the book. But then there are
> > > > some famous datasets (eg old faithful, iris, etc...) which are available
> > > > in different softwares with different licenses.
> > > >
> > > > Copying the datasets of R (in r-base) would be useful, but they fall
> > > > under the GPL, hence they cannot be included in scipy, at least if
> > > > datasets are also under the GPL. Unfortunately, I don't know the legal
> > > > details of those cases (status of data in a package licensed under the
> > GPL).
> > > >
> > > > Anyway, if I have some useful datasets, I will certainly make them in a
> > > > separate package, so that it can be used outside pymachine.
> > > >
> > > > David
> > > > _______________________________________________
> > > > Scipy-dev mailing list
> > > > Scipy-dev@scipy.org
> > > > http://projects.scipy.org/mailman/listinfo/scipy-dev
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > Peter N. Skomoroch
> > > peter.skomoroch@gmail.com
> > > http://www.datawrangling.com
> >
> >
> >
> > --
> > Peter N. Skomoroch
> > peter.skomoroch@gmail.com
> > http://www.datawrangling.com
> > _______________________________________________
> > Scipy-dev mailing list
> > Scipy-dev@scipy.org
> > http://projects.scipy.org/mailman/listinfo/scipy-dev
> >
> >
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-dev
>



-- 
Peter N. Skomoroch
peter.skomoroch@gmail.com
http://www.datawrangling.com


More information about the Scipy-dev mailing list