1296313 - Show non ASCII spaces as their HTML entites in the markup-view

Reporter

Description

•

8 years ago

Attached file entity.html — Details

- Open the attached HTML test case in a tab,
- Open the devtools inspector,
- Notice how the 3 HTML entities present in the source of the page are shown as ≡, < and > respectively

The inspector shows the DOM tree, not the authored markup as it came from the server. So showing the evaluated entities makes sense from that point of view.

However, this can lead to many confusions for users. We've had reports of people thinking that the <input /> shown in the inspector was an element, where it really is just the text in a text node.

And we've had reports of people using &emsp; for instance and not seeing it in the inspector, and therefore not understanding where the extra space was coming from.

So far we've WONTFIX'd these because there was no way for devtools to get to the authored content. But I'd like to use this bug to investigate whether we could fix this.

The inspector uses an inIDeepTreeWalker instance to walk the DOM [1] and therefore display nodes in the UI. At this stage, the HTML has already been parsed and we only walk the DOM tree, we don't have access to the original content source, right?

Would there be a way to get, for each DOM node in the tree, either the original source text for that node, or a location information for us to find it in the HTML response?

We do something sort of similar for the CSS Rules that are shown in the inspector in the sidebar. We use the CSSOM to retrieve the list of rules, but we also get the location information for these rules and properties, so we then extract the authored text from the stylesheets.
Do you think something like this would be possible for the HTML panel?

[1] http://searchfox.org/mozilla-central/rev/ae78ab94fadabc89fc6258d03c4a1a70f763f43a/devtools/server/actors/inspector.js#2842-2865

Patrick Brosset <:pbro>

Reporter

Updated

•

8 years ago

Comment 1

•

8 years ago

This would be probably a bit slow and memory hungry, so it should happen, I think, only when loading
with devtools open.
But parser could attach some extra information to nodes. Easiest way would be to use 
nsINode::SetProperty. What can be a bit tricky is to detect when data is from some script in which case the property parser had set should be removed.


hsivonen might have some ideas here.

Flags: needinfo?(hsivonen)

Codacoder

Comment 2

•

8 years ago

(In reply to Patrick Brosset <:pbro> from comment #0)
> So far we've WONTFIX'd these because there was no way for devtools to get to
> the authored content. But I'd like to use this bug to investigate whether we
> could fix this.

... and be the ONLY browser manufacturer to have such a facility!

Henri Sivonen (:hsivonen)

Comment 3

•

8 years ago

(In reply to Patrick Brosset <:pbro> from comment #0)
> However, this can lead to many confusions for users. We've had reports of
> people thinking that the <input /> shown in the inspector was an element,
> where it really is just the text in a text node.
> 
> And we've had reports of people using &emsp; for instance and not seeing it
> in the inspector, and therefore not understanding where the extra space was
> coming from.

I observe that all the characters cited as actually confusing are ones that the innerHTML getter escapes.
 
> So far we've WONTFIX'd these because there was no way for devtools to get to
> the authored content. But I'd like to use this bug to investigate whether we
> could fix this.

Addressing the confusion of users who lack conceptual clarity between the source and the DOM by showing the *actual* source is a total overkill if the confusion could be addressed by escaping just the few characters that the innerHTML getter escapes.

> The inspector uses an inIDeepTreeWalker instance to walk the DOM [1] and
> therefore display nodes in the UI. At this stage, the HTML has already been
> parsed and we only walk the DOM tree, we don't have access to the original
> content source, right?

Right.

> Would there be a way to get, for each DOM node in the tree, either the
> original source text for that node, or a location information for us to find
> it in the HTML response?

At present, no.

> We do something sort of similar for the CSS Rules that are shown in the
> inspector in the sidebar. We use the CSSOM to retrieve the list of rules,
> but we also get the location information for these rules and properties, so
> we then extract the authored text from the stylesheets.
> Do you think something like this would be possible for the HTML panel?

Making the HTML parser attach location info to all parser-created DOM nodes is in theory possible. In terms of performance, this would have the cost of making the parser track the column in addition to tracking the line, the performance cost of transferring more data from the parser thread to the main thread and the post-parse memory cost of having that data in the DOM.

Column tracking exists in the Java original but has been removed from the C++ as a micro optimization, since Gecko didn't want to see the column info. Validator.nu effectively annotates the source text with character runs that correspond to tokens in order to be able to highlight erroneous tags in the source view. (Note the tag "</head>" being highlighted here: https://html5.validator.nu/?doc=https%3A%2F%2Fbug1296313.bmoattachments.org%2Fattachment.cgi%3Fid%3D8782474&showsource=yes )

If we did this only when loading with the dev tools open, chances are we wouldn't end up addressing the user confusion issue.

Flags: needinfo?(hsivonen)

Sebastian Zartner [:sebo]

Comment 4

•

8 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #3)
> (In reply to Patrick Brosset <:pbro> from comment #0)
> > However, this can lead to many confusions for users. We've had reports of
> > people thinking that the <input /> shown in the inspector was an element,
> > where it really is just the text in a text node.
> > 
> > And we've had reports of people using &emsp; for instance and not seeing it
> > in the inspector, and therefore not understanding where the extra space was
> > coming from.
> 
> I observe that all the characters cited as actually confusing are ones that
> the innerHTML getter escapes.
>  
> > So far we've WONTFIX'd these because there was no way for devtools to get to
> > the authored content. But I'd like to use this bug to investigate whether we
> > could fix this.
> 
> Addressing the confusion of users who lack conceptual clarity between the
> source and the DOM by showing the *actual* source is a total overkill if the
> confusion could be addressed by escaping just the few characters that the
> innerHTML getter escapes.

That's basically what the 'names' option in bug 1277828 is meant to do. For the use cases mentioned so far the HTML entity display is enough and it is irrelevant how the characters were actually authored.

Sebastian

Comment 5

•

8 years ago

A couple of more thoughts:

 * The nbsp issue could be addressed by rendering space-like characters that aren't plain ASCII spaces with e.g. a gray background or maybe as a middle dot similarly to how word processor optionally show non-printing characters.

 * To the extent users are confused by text nodes containing < or >, showing the actual source wouldn't fully address the confusion with e.g. the contents of script, iframe and textarea nodes. It might be a good idea to try to come up with some way of making users understand that they are looking at post-parse text nodes instead of trying to cater to the misconception that they are looking at pre-parse HTML.

Codacoder

Comment 6

•

8 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #5)
> A couple of more thoughts:
> 
>  * The nbsp issue could be addressed by rendering space-like characters that
> aren't plain ASCII spaces with e.g. a gray background or maybe as a middle
> dot similarly to how word processor optionally show non-printing characters.

Yep. This would be immensely beneficial (as I previously suggested).

>  * To the extent users are confused by text nodes containing < or >, showing
> the actual source wouldn't fully address the confusion with e.g. the
> contents of script, iframe and textarea nodes. It might be a good idea to
> try to come up with some way of making users understand that they are
> looking at post-parse text nodes instead of trying to cater to the
> misconception that they are looking at pre-parse HTML.

Well, as a first "stab", make it clear in the documentation, that uncolorized text nodes, *whatever there content* are just that, TEXT. There's a system I work on that uses textareas to host html templates:
<textarea id="template-id" appendto="something">
  <div> blah {macro} blah </div>
</textarea>

etc. So yes, this would be beneficial.

I realize that textnodes are already uncolored, but it needs to be made a formal definition of the tool's functionality (I believe).

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

8 years ago

Priority: -- → P3

Patrick Brosset <:pbro>

Reporter

Comment 8

•

5 years ago

Chrome DevTools seems to be doing what Henri said in comment 5:

The nbsp issue could be addressed by rendering space-like characters that aren't plain ASCII spaces with e.g. a gray background or maybe as a middle dot similarly to how word processor optionally show non-printing characters.

This appears to be the best we can do here, and would help reduce people's confusion considerably.
So let me move this bug back to DevTools and see if we can get that done.

Type: defect → enhancement

Component: DOM: Core & HTML → Inspector

Product: Core → DevTools

Summary: Investigate a way to display authored text content in the DOM inspector → Show non ASCII spaces as their HTML entites in the markup-view

Patrick Brosset <:pbro>

Reporter

Comment 9

•

5 years ago

Attached file Bug 1296313 - Show non-ASCII spaces as HTML entities in text nodes — Details

This is to avoid cases where people are confused because they wrote
in their HTML documents but are not seeing this in the markup
view and spend a long time debugging a problem that might otherwise
be solved in seconds had they seen the entity in the inspector.

Chrome DevTools does this too, and the list of characters to replace
was actually copied from their source code here:
https://github.com/ChromeDevTools/devtools-frontend/blob/57f033561fe0d35b51e1c2825a466852cdc1ee4e/front_end/elements/ElementsTreeOutline.js#L1583

Note that is a bit special. The DOM is not the HTML. HTML is the
language used to write the document sent to the browser. The DOM
is the interpretation of that document after parsing. So entities
are gone from the DOM.

This only attempts to resurect some entities only because they often
are used. This does not actually loads the HTML document from the
server again to find the real entities and copy them from there.
And in fact this might be resurecting entities that were not even
part of the original HTML document.

But, this is what people need, and what Chrome does.

Patrick Brosset <:pbro>

Reporter

Updated

•

5 years ago

Keywords: parity-chrome

Sebastian Zartner [:sebo]

Comment 10

•

5 years ago

Attached image Firebug showing whitespace characters.png — Details

Note that also Firebug already allowed to display whitespace characters. ASCII ones with a dot and non-ASCII ones as HTML entities, among other display forms. It did so by using the nsIEntityConverter API, which doesn't seem to exist anymore, unfortunately.

Sebastian

Sebastian Zartner [:sebo]

Updated

•

5 years ago

Blocks: firebug-gaps

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

entity.html 8 years ago Patrick Brosset <:pbro> 193 bytes, text/html		Details
Bug 1296313 - Show non-ASCII spaces as HTML entities in text nodes 5 years ago Patrick Brosset <:pbro> 47 bytes, text/x-phabricator-request		Details \| Review
Firebug showing whitespace characters.png 5 years ago Sebastian Zartner [:sebo] 26.64 KB, image/png		Details

Bugzilla

Quick Search

Show non ASCII spaces as their HTML entites in the markup-view

Kategorien

(DevTools :: Inspector, enhancement, P3)

Tracking

(Not tracked)

Menschen

(Reporter: pbro, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: parity-chrome)

Crash Data

Sicherheit

(public)

User Story

Attachments

(3 files)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 8

Comment 9

Updated

Comment 10

Updated

Updated

Attachment

Allgemein

Description

File Name

Content Type