Bonus Question Blog!
First thoughts, let's step through the program execution:
This trace follows the program in general. If I go down a path, it is because the initializer was explicitly called (I don't say it every time for the sake of space)
- python runs the '__main__' block and calls the HtmlTreeViewer function
- Creates a HtmlTreeParser with "htmlString" as parameter. One of the arguments is formatter.NullFormatter( ) which is some interesting reading found /usr/share/jython/Lib/formatter.py
- HtmlTreeParser is a subclass of htmllib.HTMLParser
- Inside htmllib is a class HTMLParser which in turn is a subclass of sgmllib.SGMLParser
- In the file sgmllib there is SGMLParser which calls markupbase.ParserBase.reset(self)
- I haven't found where HtmlTagTreeModel() is located yet, but I don't think it is vital for the error at this point.
- Call function feed(initialText) which is a part of sgmllib.py
- feed goes along and takes in the raw data and begins to process it. On line 138 we call self.parse_endtag(i), where i=4279 on the error producing run.
- Parse_endtag runs along, does its thing and calls finish_endtag, again passing the value 4279
- finish_endtag, executes the 'else' statement,
- "if tag not in self.stack" returns true
- tag='p' self.stack = ['html', 'body']
- The try statement throws the AttributeError and the program calls unknown_endtag, which in sgmllib is passed, should be passed all the way up to our code!
- Our code gets a call to unknown_endtag which just passes the buck to endtag()
- endtag() runs through all of the tags in the tagStack and checks for a match. If no match it 'pushes' the popped tag back onto the previously popped tag until a match is found. But for our 'p', no match is found and we cannot pop yet another (popping from the previously popped <i.e. the last> tag in the tagStack. Note: I am saying "pushing" or "popping" but the code is "addChildren" etc.
- So, because that last tag does not have another child, in the quest to find the end tag for 'p', we attempt to access the empty children's self.children
Future Work
- Find out how to fix it (obviously!)
- Figure out why 'p' is giving us so much trouble
- The source for www.jython.org, when I open it in just a simple editor, the first </p> end tag (the one we are looking for, I think) is in red (why?)
- Analyze how tags are added to the tagStack and why our 'p' doesn't have a corresponding one.
- Determine where the fix should be
- starttag
- unknown_endtag
- other?
No comments:
Post a Comment