Ticket #2 (assigned defect)

Opened 3 years ago

Last modified 2 years ago

Notebook file format validator

Reported by: tzanko Assigned to: antont (accepted)
Priority: lowest Milestone:
Component: nbdoc Version:
Severity: normal Keywords: xml validate
Cc:

Description

I need something that can find errors in the notebook file and give sufficient information for the user to correct them. It should find xml syntax errors, and errors in the structure of the file. Can it be done? The first part is probably hard, the lxml tracebacks give very little info on the error, so probably some modifications in lxml are required.

Attachments

niceerrs.py (0.9 kB) - added by Robert Kern <rkern@enthought.com> on 08/15/05 22:43:52.

Change History

08/14/05 16:41:19 changed by anonymous

  • component changed from ipython to nbdoc.

Set to nbdoc component (I've now made one component per project: ipython/nbshell/nbdoc). We can then use the component field correctly to tag where things go.

08/15/05 07:32:54 changed by antont

i've used ElementTree? for this, as it can point out the proper place of error in a malformatted XML file. can add some utility of nbdoc if needed, but so far just doing ElementTree?.parse(file) has served me fine.

08/15/05 15:54:32 changed by tzanko

I'm a little reluctant to use both lxml and ElementTree? in nbshell. If absolutely necessary, I'll include the parts of ElementTree? in nbshell that I need. That's only part of the problem however. We must also check that a valid xml file is also a valid notebook file. I think that such a test is quite important. Among other things it will ensure that nbdoc and nbshell always use the same format.

08/15/05 22:43:52 changed by Robert Kern <rkern@enthought.com>

  • attachment niceerrs.py added.

08/15/05 22:45:00 changed by Robert Kern <rkern@enthought.com>

There are a couple of options for validation:

* If we write a real RelaxNG or XSD schema, then the output of xmllint(1) could be used. xmllint(1) is part of the libxml2 distribution, so our users will probably have installed it when they installed libxml2 (and we can tell them to in the instructions).

* We could use the libxml2 Python bindings that are included in the libxml2 distribution to do that validation. They're a pain to use for bread-and-butter tasks (thus lxml), but confining our use of them to just validation might be worthwhile. As you can see from the validate.py example on that page, you can register a callback to redirect the error messages.

* We could add the error-redirection bits to lxml and contribute them.

* On a completely different track, we can ignore standard schemata altogether abuse unittest to do validation. I think it's going to require replacing the loader (to parameterize the TestSuite? by giving it an ElementTree?), the runner (to descend the ElementTree? and run the appropriate test methods based on the element), and the case objects (to report friendly errors). There's a brief tutorial on such hacking here. A potential downside of this approach is that we can only validate a subset of Docbook markup. This may be something of a good thing: that subset will be the only parts of Docbook that the GUI supports; but it may also be a bad thing: it may prevent me from using full Docbook outside of the GUI. If this is done carefully, however, that downside might be avoided. This might be our best option since it doesn't involve more XML gunk that only I seem to like doing.

As for nice reporting of syntax errors:

* xmllint(1)

* You could try one of the XML parsers that come with Python whenever lxml fails to parse the file. xml.parsers.expat seems to have a reasonable amount of information. See the attached file. It gives output like this:

[test]$ ./niceerrs.py invalid.nbk
   26      <ipython-cell type="input" number="5">
   27      </ipython-cell>
   28      </ipython-cell>
            ^- mismatched tag
   29      <ipython-cell type="stdout" number="5">
   30      </ipython-cell>

08/16/05 10:26:27 changed by antont

  • status changed from new to assigned.

i'll adopt niceerrs as a notebook method for giving users feedback in nbshell, probably tomorrow.

02/14/07 00:58:53 changed by fperez

  • priority changed from normal to lowest.