Introduction
This is the main page of pymachine, a project to implement machine learning tools for the scipy environment. This project is starting as a summer of code, but we hope that other people will jump in later to improve and add functionalities. The original wikified proposal can be found here: MachineLearningOriginalProposal. Some useful informations will be written in the pymachine blog: http://pymachine.blogspot.com/.
Outside the proposal, not much info is available yet, but this is only the starting of the project (on 28th May 2007). As nothing is released yet, this page is mainly a brain dump. Once I will have something public to release, I will clean this page.
Presentation and wishes from the community
There was quite a lengthy thread on scipy-user about pymachine, I will sum it up at MachineLearningOpeningThread?. http://projects.scipy.org/pipermail/scipy-user/2007-May/012146.html
Things to do
ros vs col
Should we use feature per row vs per column ? I forgot to ask about frames per row vs frames per column. I should do a small benchmark with recent numpy and some basic things, such as basic likelihood computation, etc... -> Some preliminary things are done in pyem.profile_data.
At least netlab and R seems to follow the convention of one feature per row. This is a compelling reason to make this the default. Nothing prevents me from using other convention (eg using axis argument) later.
pyem
To do in this order:
- have one example of pdf estimation, one clustering example and one discriminative example (eg supervised). This would help for polishing the basic API.
- set docstrings everywhere it makes sense, according the the scipy standard.
- plotting facilities: density contour plots would be nice.
