Open Bug 1296313 Opened 8 years ago Updated 7 months ago

Show non ASCII spaces as their HTML entites in the markup-view

Kategorien

(DevTools :: Inspector, enhancement, P3)

enhancement

Tracking

(Not tracked)

Menschen

(Reporter: pbro, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: parity-chrome)

Attachments

(3 files)

Attached file entity.html
- Open the attached HTML test case in a tab,
- Open the devtools inspector,
- Notice how the 3 HTML entities present in the source of the page are shown as ≡, < and > respectively

The inspector shows the DOM tree, not the authored markup as it came from the server. So showing the evaluated entities makes sense from that point of view.

However, this can lead to many confusions for users. We've had reports of people thinking that the <input /> shown in the inspector was an element, where it really is just the text in a text node.

And we've had reports of people using &emsp; for instance and not seeing it in the inspector, and therefore not understanding where the extra space was coming from.

So far we've WONTFIX'd these because there was no way for devtools to get to the authored content. But I'd like to use this bug to investigate whether we could fix this.

The inspector uses an inIDeepTreeWalker instance to walk the DOM [1] and therefore display nodes in the UI. At this stage, the HTML has already been parsed and we only walk the DOM tree, we don't have access to the original content source, right?

Would there be a way to get, for each DOM node in the tree, either the original source text for that node, or a location information for us to find it in the HTML response?

We do something sort of similar for the CSS Rules that are shown in the inspector in the sidebar. We use the CSSOM to retrieve the list of rules, but we also get the location information for these rules and properties, so we then extract the authored text from the stylesheets.
Do you think something like this would be possible for the HTML panel?

[1] http://searchfox.org/mozilla-central/rev/ae78ab94fadabc89fc6258d03c4a1a70f763f43a/devtools/server/actors/inspector.js#2842-2865
See Also: → 967493
This would be probably a bit slow and memory hungry, so it should happen, I think, only when loading
with devtools open.
But parser could attach some extra information to nodes. Easiest way would be to use 
nsINode::SetProperty. What can be a bit tricky is to detect when data is from some script in which case the property parser had set should be removed.


hsivonen might have some ideas here.
Flags: needinfo?(hsivonen)
(In reply to Patrick Brosset <:pbro> from comment #0)
> So far we've WONTFIX'd these because there was no way for devtools to get to
> the authored content. But I'd like to use this bug to investigate whether we
> could fix this.

... and be the ONLY browser manufacturer to have such a facility!
(In reply to Patrick Brosset <:pbro> from comment #0)
> However, this can lead to many confusions for users. We've had reports of
> people thinking that the <input /> shown in the inspector was an element,
> where it really is just the text in a text node.
> 
> And we've had reports of people using &emsp; for instance and not seeing it
> in the inspector, and therefore not understanding where the extra space was
> coming from.

I observe that all the characters cited as actually confusing are ones that the innerHTML getter escapes.
 
> So far we've WONTFIX'd these because there was no way for devtools to get to
> the authored content. But I'd like to use this bug to investigate whether we
> could fix this.

Addressing the confusion of users who lack conceptual clarity between the source and the DOM by showing the *actual* source is a total overkill if the confusion could be addressed by escaping just the few characters that the innerHTML getter escapes.

> The inspector uses an inIDeepTreeWalker instance to walk the DOM [1] and
> therefore display nodes in the UI. At this stage, the HTML has already been
> parsed and we only walk the DOM tree, we don't have access to the original
> content source, right?

Right.

> Would there be a way to get, for each DOM node in the tree, either the
> original source text for that node, or a location information for us to find
> it in the HTML response?

At present, no.

> We do something sort of similar for the CSS Rules that are shown in the
> inspector in the sidebar. We use the CSSOM to retrieve the list of rules,
> but we also get the location information for these rules and properties, so
> we then extract the authored text from the stylesheets.
> Do you think something like this would be possible for the HTML panel?

Making the HTML parser attach location info to all parser-created DOM nodes is in theory possible. In terms of performance, this would have the cost of making the parser track the column in addition to tracking the line, the performance cost of transferring more data from the parser thread to the main thread and the post-parse memory cost of having that data in the DOM.

Column tracking exists in the Java original but has been removed from the C++ as a micro optimization, since Gecko didn't want to see the column info. Validator.nu effectively annotates the source text with character runs that correspond to tokens in order to be able to highlight erroneous tags in the source view. (Note the tag "</head>" being highlighted here: https://html5.validator.nu/?doc=https%3A%2F%2Fbug1296313.bmoattachments.org%2Fattachment.cgi%3Fid%3D8782474&showsource=yes )

If we did this only when loading with the dev tools open, chances are we wouldn't end up addressing the user confusion issue.
Flags: needinfo?(hsivonen)
(In reply to Henri Sivonen (:hsivonen) from comment #3)
> (In reply to Patrick Brosset <:pbro> from comment #0)
> > However, this can lead to many confusions for users. We've had reports of
> > people thinking that the <input /> shown in the inspector was an element,
> > where it really is just the text in a text node.
> > 
> > And we've had reports of people using &emsp; for instance and not seeing it
> > in the inspector, and therefore not understanding where the extra space was
> > coming from.
> 
> I observe that all the characters cited as actually confusing are ones that
> the innerHTML getter escapes.
>  
> > So far we've WONTFIX'd these because there was no way for devtools to get to
> > the authored content. But I'd like to use this bug to investigate whether we
> > could fix this.
> 
> Addressing the confusion of users who lack conceptual clarity between the
> source and the DOM by showing the *actual* source is a total overkill if the
> confusion could be addressed by escaping just the few characters that the
> innerHTML getter escapes.

That's basically what the 'names' option in bug 1277828 is meant to do. For the use cases mentioned so far the HTML entity display is enough and it is irrelevant how the characters were actually authored.

Sebastian
See Also: → 1277828
A couple of more thoughts:

 * The nbsp issue could be addressed by rendering space-like characters that aren't plain ASCII spaces with e.g. a gray background or maybe as a middle dot similarly to how word processor optionally show non-printing characters.

 * To the extent users are confused by text nodes containing < or >, showing the actual source wouldn't fully address the confusion with e.g. the contents of script, iframe and textarea nodes. It might be a good idea to try to come up with some way of making users understand that they are looking at post-parse text nodes instead of trying to cater to the misconception that they are looking at pre-parse HTML.
(In reply to Henri Sivonen (:hsivonen) from comment #5)
> A couple of more thoughts:
> 
>  * The nbsp issue could be addressed by rendering space-like characters that
> aren't plain ASCII spaces with e.g. a gray background or maybe as a middle
> dot similarly to how word processor optionally show non-printing characters.

Yep. This would be immensely beneficial (as I previously suggested).

>  * To the extent users are confused by text nodes containing < or >, showing
> the actual source wouldn't fully address the confusion with e.g. the
> contents of script, iframe and textarea nodes. It might be a good idea to
> try to come up with some way of making users understand that they are
> looking at post-parse text nodes instead of trying to cater to the
> misconception that they are looking at pre-parse HTML.

Well, as a first "stab", make it clear in the documentation, that uncolorized text nodes, *whatever there content* are just that, TEXT. There's a system I work on that uses textareas to host html templates:
<textarea id="template-id" appendto="something">
  <div> blah {macro} blah </div>
</textarea>

etc. So yes, this would be beneficial.

I realize that textnodes are already uncolored, but it needs to be made a formal definition of the tool's functionality (I believe).
Priority: -- → P3

Chrome DevTools seems to be doing what Henri said in comment 5:

The nbsp issue could be addressed by rendering space-like characters that aren't plain ASCII spaces with e.g. a gray background or maybe as a middle dot similarly to how word processor optionally show non-printing characters.

This appears to be the best we can do here, and would help reduce people's confusion considerably.
So let me move this bug back to DevTools and see if we can get that done.

Type: defect → enhancement
Component: DOM: Core & HTML → Inspector
Product: Core → DevTools
Summary: Investigate a way to display authored text content in the DOM inspector → Show non ASCII spaces as their HTML entites in the markup-view

This is to avoid cases where people are confused because they wrote
  in their HTML documents but are not seeing this in the markup
view and spend a long time debugging a problem that might otherwise
be solved in seconds had they seen the entity in the inspector.

Chrome DevTools does this too, and the list of characters to replace
was actually copied from their source code here:
https://github.com/ChromeDevTools/devtools-frontend/blob/57f033561fe0d35b51e1c2825a466852cdc1ee4e/front_end/elements/ElementsTreeOutline.js#L1583

Note that is a bit special. The DOM is not the HTML. HTML is the
language used to write the document sent to the browser. The DOM
is the interpretation of that document after parsing. So entities
are gone from the DOM.

This only attempts to resurect some entities only because they often
are used. This does not actually loads the HTML document from the
server again to find the real entities and copy them from there.
And in fact this might be resurecting entities that were not even
part of the original HTML document.

But, this is what people need, and what Chrome does.

Keywords: parity-chrome

Note that also Firebug already allowed to display whitespace characters. ASCII ones with a dot and non-ASCII ones as HTML entities, among other display forms. It did so by using the nsIEntityConverter API, which doesn't seem to exist anymore, unfortunately.

Sebastian

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

Allgemein

Created:
Updated:
Size: