⚓ T368336 Select a chart library

		Status	Subtype	Assigned	Task
		Öffnen Sie		None	T368338 Document chart library selection
		Resolved		Catrope	T368336 Select a chart library

Catrope created this task.Jun 25 2024, 2:02 AM

Catrope mentioned this in T368338: Document chart library selection.Jun 25 2024, 2:06 AM

Note: If chart is client JS-only feature, it will not be shown in various places such as PDF export or apps. We need to first consider whether to render chart server side. Having a client-only feature is not really a good idea.

Snowmanonahoe subscribed.Jun 25 2024, 4:30 PM

In T368336#9920402, @Bugreporter wrote:

Note: If chart is client JS-only feature, it will not be shown in various places such as PDF export or apps. We need to first consider whether to render chart server side. Having a client-only feature is not really a good idea.

We plan to start with only rendering charts server-side, and adding client-side enhancements at a later stage.

JJMC89 edited parent tasks, added: T368338: Document chart library selection; removed: T368335: [Epic] Make and document key blocking decisions for the Charts project.Jun 26 2024, 2:03 AM

Nemoralis subscribed.Jun 26 2024, 2:06 AM

Some other charting libraries:

RGraph
Teechart
Plotly.js
Chart.js (HTML canvas only - canvas can be easily converted to raster images, but not vector ones)
Mermaid
DataDraw (cf T338098, but seems inactive since 2023)

Another alternative is not using an existing library at all and make a service to generate SVG from stratch.

Here is some high-level criteria to consider for evaluating a library:

Must-have

Open source
Actively being developed
Stable and likely to be supported for a long time
Ability to server-render charts (preferably as SVGs)
Support for layering more interactivity (through CSS animations and/or JS hydration client-side)
Geo/map-based visualizations (minimum 2D)
Customizable visual design (like color palette and patterns/textures for better accessibility)
i18n ready (essentially, can we support multilingualism with it)
Intuitive / easy-to-learn syntax for building charts

Nice-to-have

Support for hyperlinks (so we could do things like link to articles from chart labels or descriptions)
Time-based animations
3D visualizations

I'm not going into the details of the differences in very specific types of visualizations, like what exact interactivity and customization is possible. Obviously, plotting data on common charts like line, bar, pie, etc are the baseline. All the libraries considered above are quite powerful and support a wide range of visualization options. Most of them support the nice-to-have stuff anyway. Things like the level of support for "good" server-rendering/hydration and "intuitive" syntax matter more, but are also squishier concepts. At first glance, Apache ECharts looks promising, especially the server-side rendering and different levels of hydration.

I welcome edits/addition to this list -- I'm sure I've missed things. I anticipate we'll come up with new criteria once we start our analysis.

DTorsani-WMF subscribed.Jun 28 2024, 5:16 PM

LGoto triaged this task as High priority.Jul 1 2024, 5:42 PM

LGoto set the point value for this task to 8.

LGoto moved this task from Needs Triage to Estimated on the Charts board.

@CCiufo-WMF Could you add a few concrete examples of charts that we would want to be able to build?

LGoto edited projects, added Charts (Sprint 1); removed Charts.Jul 1 2024, 6:00 PM

Catrope moved this task from Incoming to Ready for Dev on the Charts (Sprint 1) board.Jul 1 2024, 6:02 PM

Catrope claimed this task.Jul 1 2024, 6:06 PM

Catrope updated Other Assignee, added: aude.

Customizable visual design (like color palette and patterns/textures for better accessibility)

Pre-work that will eventually lead to a color palette for charts being identified is happening in T360494

Jrbranaa subscribed.Jul 1 2024, 6:20 PM

Catrope moved this task from Ready for Dev to Doing on the Charts (Sprint 1) board.Jul 2 2024, 7:08 PM

Additional technical considerations we should look at (thank you @Jdlrobson for these ideas):

Bundle size of the library (if/when we need to load it client-side)
Output size
Ability to integrate with Vue

Catrope added a subscriber: Jdlrobson.Jul 3 2024, 8:35 PM

Wostr subscribed.Thu, Jul 4, 6:01 PM

Prototyperspective subscribed.Mon, Jul 8, 9:46 PM

Krinkle subscribed.Tue, Jul 9, 7:00 PM

Catrope updated the task description. (Show Details)Wed, Jul 10, 7:00 PM

We began with a quick look at each library to see if it looked potentially promising, and this ruled out most of the options:

TeeChart: is not open source
DataDraw: appears to be inactive; its documentation reference page says "In preparation (Mar '23)", and it's now July 2024
Chart.js: does not appear to support server-side rendering
Plotly.js: does not appear to support server-side rendering
RGraph: server-side rendering is not built in, only supported through PhantomJS. Documentation leaves how to implement that as an exercise for the reader.
Our World in Data grapher: documentation has a disclaimer saying it's not designed for external reuse. Appears to require React, which we don't use anywhere else.
D3: very low-level. Not intended as a high-level chart library, but as a something for high-level chart libraries to be built on top of, or to build one-off interactive visualizations with. Observable Plot and Vega are both built on top of D3, so we should consider those instead.
Observable Plot: unclear if server-side rendering is supported. Maps don't appear to be supported. The website appears to try to obscure the fact that the library is open source and lead you to buying their hosted product instead, which raises concerns about whether we can rely on it to be maintained as open source long-term.
Mermaid: mainly for flow charts and other process-related things. Support for the kinds of charts we would want is basic. No support for maps. Could be useful for specific chart types in the future (e.g. timelines, flow charts).

That leaves just Apache eCharts and Vega, which we analyzed in more detail, based on the criteria in T368336#9935313 and other things that we found while looking at these libraries.

	eCharts	Vega
Open source	Yes	Yes
Actively being developed	Yes	Yes
Stable and likely to be supported for a long time	Yes	Yes (previously had 3 breaking releases in 3 years, but that was 5 years ago)
Ability to server-render charts as SVGs	Yes	Yes
Support for layering more interactivity	Yes (with some CSS animations / hover effects, but mostly client-side JS)	Yes (only with client-side JS)
Geo/map-based visualization	Yes	Yes
Customizable visual design	Yes	Yes
Do labels translated into RTL languages work	Yes	Yes
Intuitive / easy-to-learn syntax for defining charts	Yes	No for Vega itself (syntax is very verbose); Yes for Vega-Lite, which has simpler syntax
Hyperlink support	Partial (can link the entire chart title, but not parts of it)	No
Time-based animations	Yes	No (this exists, but is not integrated upstream)
3D visualizations	Yes	No
Hover effects in server-side rendered SVGs	Limited	No
Size of full library (minified)	662 KB	515 KB
Supports tree-shaking	Yes	No?
Comes with CLI for server-side rendering	No (but we can write one ourselves)	Yes
Supports TopoJSON	No (but we can convert TopoJSON to GeoJSON with a separate CLI utility)	Yes
Has a history of serious security vulnerabilities	No	Yes

Based on the above, we have decided to use eCharts for now. However, we're not ruling out using Vega in the future just yet: if we experience issues with eCharts and think Vega might be better, we might switch to it later; or we might build most charts in eCharts but use Vega (or Mermaid, see above) for certain specific chart types if it turns out to be much better than eCharts for those specific things.

CCiufo-WMF updated the task description. (Show Details)Wed, Jul 10, 10:14 PM

+1 to this decision! But I wanted to mention

Chart.js: does not appear to support server-side rendering

https://www.npmjs.com/package/chartjs-node exists, and there's red-agate-svg-canvas if you want SVGs.

Nice pick

Hi,
at first glance this library looks realy great. Project is mature, behing solid Apache trade mark and with presumably large stacks of money suppling this project.

Does anyone considerd Apache ECharts from ethics / censorship / human rights perspective? Does media wiki really want to include and support this project in it's codebase?

Maybe it would be good idea to use a fork of this software with detailed independent audit regarding above mentioned aspects and data security. Still forkin it would "support" questionable sponsors of this project.

Project contributors / developers team doesn't build too much confidence:

9 are from Baidu

https://en.wikipedia.org/wiki/Baidu#Censorship

In November 2022, Sustainalytics downgraded Baidu to "non-compliant" with the United Nations Global Compact principles due to complicity with censorship.

https://www.reuters.com/article/china-esg-downgrade-idUSL4N32320Q/

Tencent owns China's dominant messaging app WeChat, while Baidu and Weibo respectively run China's leading search engine and Twitter-like microblog services.
Content companies in China are required to comply with the country's strict censorship regime, which has tightened significantly in recent years. Platforms swiftly and regularly delete items, from complaints about COVID lockdowns to cryptic criticism of politics.

25 persons out of 30 are from China.

https://en.wikipedia.org/wiki/Censorship_in_China

As of 2023, the World Press Freedom Index ranks China as the country with the second least press freedom in the world after North Korea.

https://en.wikipedia.org/wiki/Human_rights_in_China#Freedom_of_speech
https://en.wikipedia.org/wiki/Human_rights_in_China#Freedom_of_the_press

/edit 22-07-2024, order of text rearranged, hopefully it's more clear/

@Pietrasagh: What are specific concerns about "ethics / censorship / human rights" because a number of developers or contributors were either born or located (?; references for statements are welcome) in some country when it comes to using code of a FOSS licensed chart rendering software hosted by the Apache Software Foundation?

CCiufo-WMF mentioned this in T165118: Support Vega 5.0+.Thu, Jul 25, 3:31 PM

CCiufo-WMF mentioned this in T120319: Migrate graphs from Vega 2 to Vega 5 syntax on edit with VE.

CCiufo-WMF mentioned this in T335128: Update documentation and examples of Extension:Graph after deployment of Vega 5.

Vega/Vega-Lite author here. Vega does actually support hyperlinks via the href property. See for example https://vega.github.io/vega/docs/marks/rect/. Vega also supports timers (https://vega.github.io/vega/docs/event-streams/) which can make animated visualizations. Happy to talk through any specific needs you have from Vega to make it useful for Wikipedia.

I think Vega makes maybe more sense than eCharts since Vega is fully declarative and therefore can be sandboxed for safety more easily than a library needs you to execute arbitrary code.

In T368336#10015903, @domoritz wrote:

Vega/Vega-Lite author here. Vega does actually support hyperlinks via the href property. See for example https://vega.github.io/vega/docs/marks/rect/. Vega also supports timers (https://vega.github.io/vega/docs/event-streams/) which can make animated visualizations. Happy to talk through any specific needs you have from Vega to make it useful for Wikipedia.

Happy to hear that these things are supported. I didn't realize that event streams contained timers, because I didn't see any documentation about animation specifically or any examples demonstrating this capability. Maybe this could be documented more clearly.

I think Vega makes maybe more sense than eCharts since Vega is fully declarative and therefore can be sandboxed for safety more easily than a library needs you to execute arbitrary code.

How does eCharts need us to execute arbitrary code? As far as I can tell, it's just as declarative as Vega is.

Good point on the docs/examples for timers. One example (albeit not a chart in the strict sense) is this clock: https://vega.github.io/vega/examples/clock/. There is also this rotating globe: https://vega.github.io/vega/examples/earthquakes-globe/.

I'm not as familiar with eCharts but looking for example at https://echarts.apache.org/examples/en/editor.html?c=bar-waterfall2, the example passes a function to the formatter. Vega uses expressions for custom functions. These expressions always go through a custom parser that prevents dangerous code (that may send data or run loops) from being executed. IIRC we designed this originally for Wikipedia so that it could run arbitrary specifications client-side without exposing readers to any risks.

Another example that uses code is https://echarts.apache.org/examples/en/editor.html?c=bar-race which uses custom JS code to run the animation. In Vega, you can specify the timer entirely within the declarative specification (all within JSON).

I see that the focus for now is on server-side but at some point it may make sense to also support client-side rendering again especially if you want interactivity. I also saw that the focus is on implementing specific charts but at some point you might want to offer a fallback to a fully customizable language and then Vega may make sense.

I'd be happy to jump on a video call if you want to discuss any other concerns or questions you have.

I would add one more argument in favour of Vega: the conversion of existing graph code would be much easier using Vega than a new library because all existing code already use Vega syntax.

In T368336#10017066, @Pamputt wrote:

I would add one more argument in favour of Vega: the conversion of existing graph code would be much easier using Vega than a new library because all existing code already use Vega syntax.

They are using Vega 2 syntax, which is different the more recent one, Vega 5.

I'd also note that the Vega dependency was the primary reason we disabled ext:Graph (twice). And that while Vega's expressions layer has since been hardened, it likely still poses more risk for our use-cases than other options.

I’m not sure any other library supports custom but restricted expressions for extended functionality. If it’s a concern, we could probably add a way to disable these custom expressions to restrict functionality but make the library safer for custom charts.

In T368336#10017856, @domoritz wrote:

I’m not sure any other library supports custom but restricted expressions for extended functionality. If it’s a concern, we could probably add a way to disable these custom expressions to restrict functionality but make the library safer for custom charts.

T336595#8848425 indicated that safe-vega is not really a thing.

If I understand correctly, there are multiple phases here.

Phase 1: custom charting api powered by some library like echarts or Vega. Here you can use whatever library (however the choice should be influenced what specification format makes sense in phase 2) since you don’t expose the full api to authors.

Phase 2 (later): potentially expand flexibility and customizable by giving authors more access to the underlying library. This is where we need a declarative api with restrictions for authors so that viewers don’t get exposed to potentially dangerous code.

I’m suggesting that in phase 1 you use Vega as is but for phase 2 you could consider a restricted version depending on your security needs.

In T368336#10016026, @domoritz wrote:

Good point on the docs/examples for timers. One example (albeit not a chart in the strict sense) is this clock: https://vega.github.io/vega/examples/clock/. There is also this rotating globe: https://vega.github.io/vega/examples/earthquakes-globe/.

Thanks, those are helpful examples!

I'm not as familiar with eCharts but looking for example at https://echarts.apache.org/examples/en/editor.html?c=bar-waterfall2, the example passes a function to the formatter. Vega uses expressions for custom functions. These expressions always go through a custom parser that prevents dangerous code (that may send data or run loops) from being executed. IIRC we designed this originally for Wikipedia so that it could run arbitrary specifications client-side without exposing readers to any risks.

Another example that uses code is https://echarts.apache.org/examples/en/editor.html?c=bar-race which uses custom JS code to run the animation. In Vega, you can specify the timer entirely within the declarative specification (all within JSON).

I see what you mean: Vega supports animations declaratively using timers, whereas eCharts doesn't. Vega could really benefit from better documenting and advertising this feature, because when I was researching this the only thing I found was an MIT research paper with no evidence that it was ever upstreamed, while eCharts has bar race examples on its examples page (though as you say, on closer inspection those use code to periodically change the input data, not a built-in eCharts feature).

I see that the focus for now is on server-side but at some point it may make sense to also support client-side rendering again especially if you want interactivity. I also saw that the focus is on implementing specific charts but at some point you might want to offer a fallback to a fully customizable language and then Vega may make sense.

We do intend to support client-side rendering in the future for interactive graphs specifically, yes. However, we do not intend to offer direct access to the chart library, whether that's eCharts or Vega. (More about this in my next comment.)

In T368336#10016026, @domoritz wrote:

I also saw that the focus is on implementing specific charts but at some point you might want to offer a fallback to a fully customizable language and then Vega may make sense.

In T368336#10017930, @domoritz wrote:

Phase 2 (later): potentially expand flexibility and customizable by giving authors more access to the underlying library. This is where we need a declarative api with restrictions for authors so that viewers don’t get exposed to potentially dangerous code.

Offering direct access to the underlying library to wiki users will not happen. It's not just that it isn't in our short-term plans, we consider it actively harmful for several reasons:

Security: we're currently rebuilding our charts system from scratch after disabling the old one over security issues that were directly caused by this. Even with the hardening Vega has done, we think it's too risky to allow direct user access to expressions in Vega, or other similar functionality in other libraries
Schema migrations: libraries sometimes make breaking changes to the definition formats they accept (both eCharts and Vega have done this before). When that happens, migrating user-authored definitions (or user-authored template/Lua code that generates definitions) is really painful, takes a lot of work, and blocks upgrading to the new version of the library. We went through this with the Vega 2->5 upgrade, which involved writing our own code to remap Vega 2 definitions to the Vega 5 format, and was never finished. We don't want to do that again.
User-friendlyness: in the old system, users rarely hand-wrote graph definitions, instead they used templates or Lua modules that took a simpler set of parameters and generated a graph definition. This is a clear sign that the definitions are too complex for users to write. In part this is because Vega's definition schema is especially verbose, and Vega Lite fixes a lot of that, but even with Vega Lite or eCharts, these definition syntaxes just aren't very user friendly in general, and people would likely continue to rely on templating them.

For all these reasons, we will only support predefined graph types. This way, we are in control of the definition schema that users use and can make it simpler than the library's. We can do our own sanitization to make sure that user input doesn't make it anywhere dangerous, and library upgrades with breaking changes are much easier to handle. This also removes the need to have templates generate chart definitions, which makes managing them easier overall.

That all makes sense. Thanks for layout out all the considerations.

I’ll just add that one challenge with proving a new custom format for making charts is that you then have to provide documentation, examples, and issue triaging for that new language. To make that manageable it makes sense to keep the language small until the need for more features arises.

Select a chart library
Closed, ResolvedPublic8 Estimated Story Points
Actions

Description

Details

Related Objects
Search...

Event Timeline

Select a chart libraryClosed, ResolvedPublic8 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Select a chart library
Closed, ResolvedPublic8 Estimated Story Points
Actions

Related Objects
Search...