Page MenuHomePhabricator

Select a chart library
Closed, ResolvedPublic8 Estimated Story Points

Description

Details

Other Assignee
aude

Event Timeline

Note: If chart is client JS-only feature, it will not be shown in various places such as PDF export or apps. We need to first consider whether to render chart server side. Having a client-only feature is not really a good idea.

Note: If chart is client JS-only feature, it will not be shown in various places such as PDF export or apps. We need to first consider whether to render chart server side. Having a client-only feature is not really a good idea.

We plan to start with only rendering charts server-side, and adding client-side enhancements at a later stage.

Some other charting libraries:

  • RGraph
  • Teechart
  • Plotly.js
  • Chart.js (HTML canvas only - canvas can be easily converted to raster images, but not vector ones)
  • Mermaid
  • DataDraw (cf T338098, but seems inactive since 2023)

Another alternative is not using an existing library at all and make a service to generate SVG from stratch.

Here is some high-level criteria to consider for evaluating a library:

Must-have

  • Open source
  • Actively being developed
  • Stable and likely to be supported for a long time
  • Ability to server-render charts (preferably as SVGs)
  • Support for layering more interactivity (through CSS animations and/or JS hydration client-side)
  • Geo/map-based visualizations (minimum 2D)
  • Customizable visual design (like color palette and patterns/textures for better accessibility)
  • i18n ready (essentially, can we support multilingualism with it)
  • Intuitive / easy-to-learn syntax for building charts

Nice-to-have

  • Support for hyperlinks (so we could do things like link to articles from chart labels or descriptions)
  • Time-based animations
  • 3D visualizations

I'm not going into the details of the differences in very specific types of visualizations, like what exact interactivity and customization is possible. Obviously, plotting data on common charts like line, bar, pie, etc are the baseline. All the libraries considered above are quite powerful and support a wide range of visualization options. Most of them support the nice-to-have stuff anyway. Things like the level of support for "good" server-rendering/hydration and "intuitive" syntax matter more, but are also squishier concepts. At first glance, Apache ECharts looks promising, especially the server-side rendering and different levels of hydration.

I welcome edits/addition to this list -- I'm sure I've missed things. I anticipate we'll come up with new criteria once we start our analysis.

LGoto triaged this task as High priority.Jul 1 2024, 5:42 PM
LGoto set the point value for this task to 8.
LGoto moved this task from Needs Triage to Estimated on the Charts board.

@CCiufo-WMF Could you add a few concrete examples of charts that we would want to be able to build?

Catrope updated Other Assignee, added: aude.

Customizable visual design (like color palette and patterns/textures for better accessibility)

Pre-work that will eventually lead to a color palette for charts being identified is happening in T360494

Additional technical considerations we should look at (thank you @Jdlrobson for these ideas):

  • Bundle size of the library (if/when we need to load it client-side)
  • Output size
  • Ability to integrate with Vue

We began with a quick look at each library to see if it looked potentially promising, and this ruled out most of the options:

  • TeeChart: is not open source
  • DataDraw: appears to be inactive; its documentation reference page says "In preparation (Mar '23)", and it's now July 2024
  • Chart.js: does not appear to support server-side rendering
  • Plotly.js: does not appear to support server-side rendering
  • RGraph: server-side rendering is not built in, only supported through PhantomJS. Documentation leaves how to implement that as an exercise for the reader.
  • Our World in Data grapher: documentation has a disclaimer saying it's not designed for external reuse. Appears to require React, which we don't use anywhere else.
  • D3: very low-level. Not intended as a high-level chart library, but as a something for high-level chart libraries to be built on top of, or to build one-off interactive visualizations with. Observable Plot and Vega are both built on top of D3, so we should consider those instead.
  • Observable Plot: unclear if server-side rendering is supported. Maps don't appear to be supported. The website appears to try to obscure the fact that the library is open source and lead you to buying their hosted product instead, which raises concerns about whether we can rely on it to be maintained as open source long-term.
  • Mermaid: mainly for flow charts and other process-related things. Support for the kinds of charts we would want is basic. No support for maps. Could be useful for specific chart types in the future (e.g. timelines, flow charts).

That leaves just Apache eCharts and Vega, which we analyzed in more detail, based on the criteria in T368336#9935313 and other things that we found while looking at these libraries.

eChartsVega
Open sourceYesYes
Actively being developedYesYes
Stable and likely to be supported for a long timeYesYes (previously had 3 breaking releases in 3 years, but that was 5 years ago)
Ability to server-render charts as SVGsYesYes
Support for layering more interactivityYes (with some CSS animations / hover effects, but mostly client-side JS)Yes (only with client-side JS)
Geo/map-based visualizationYesYes
Customizable visual designYesYes
Do labels translated into RTL languages workYesYes
Intuitive / easy-to-learn syntax for defining chartsYesNo for Vega itself (syntax is very verbose); Yes for Vega-Lite, which has simpler syntax
Hyperlink supportPartial (can link the entire chart title, but not parts of it)No
Time-based animationsYesNo (this exists, but is not integrated upstream)
3D visualizationsYesNo
Hover effects in server-side rendered SVGsLimitedNo
Size of full library (minified)662 KB515 KB
Supports tree-shakingYesNo?
Comes with CLI for server-side renderingNo (but we can write one ourselves)Yes
Supports TopoJSONNo (but we can convert TopoJSON to GeoJSON with a separate CLI utility)Yes
Has a history of serious security vulnerabilitiesNoYes

Based on the above, we have decided to use eCharts for now. However, we're not ruling out using Vega in the future just yet: if we experience issues with eCharts and think Vega might be better, we might switch to it later; or we might build most charts in eCharts but use Vega (or Mermaid, see above) for certain specific chart types if it turns out to be much better than eCharts for those specific things.

+1 to this decision! But I wanted to mention

Chart.js: does not appear to support server-side rendering

https://www.npmjs.com/package/chartjs-node exists, and there's red-agate-svg-canvas if you want SVGs.

Hi,
at first glance this library looks realy great. Project is mature, behing solid Apache trade mark and with presumably large stacks of money suppling this project.

Does anyone considerd Apache ECharts from ethics / censorship / human rights perspective? Does media wiki really want to include and support this project in it's codebase?

Maybe it would be good idea to use a fork of this software with detailed independent audit regarding above mentioned aspects and data security. Still forkin it would "support" questionable sponsors of this project.

Project contributors / developers team doesn't build too much confidence:

  • 9 are from Baidu

https://en.wikipedia.org/wiki/Baidu#Censorship

In November 2022, Sustainalytics downgraded Baidu to "non-compliant" with the United Nations Global Compact principles due to complicity with censorship.

https://www.reuters.com/article/china-esg-downgrade-idUSL4N32320Q/

Tencent owns China's dominant messaging app WeChat, while Baidu and Weibo respectively run China's leading search engine and Twitter-like microblog services.
Content companies in China are required to comply with the country's strict censorship regime, which has tightened significantly in recent years. Platforms swiftly and regularly delete items, from complaints about COVID lockdowns to cryptic criticism of politics.

  • 25 persons out of 30 are from China.

https://en.wikipedia.org/wiki/Censorship_in_China

As of 2023, the World Press Freedom Index ranks China as the country with the second least press freedom in the world after North Korea.

https://en.wikipedia.org/wiki/Human_rights_in_China#Freedom_of_speech
https://en.wikipedia.org/wiki/Human_rights_in_China#Freedom_of_the_press

/edit 22-07-2024, order of text rearranged, hopefully it's more clear/

@Pietrasagh: What are specific concerns about "ethics / censorship / human rights" because a number of developers or contributors were either born or located (?; references for statements are welcome) in some country when it comes to using code of a FOSS licensed chart rendering software hosted by the Apache Software Foundation?

Vega/Vega-Lite author here. Vega does actually support hyperlinks via the href property. See for example https://vega.github.io/vega/docs/marks/rect/. Vega also supports timers (https://vega.github.io/vega/docs/event-streams/) which can make animated visualizations. Happy to talk through any specific needs you have from Vega to make it useful for Wikipedia.

I think Vega makes maybe more sense than eCharts since Vega is fully declarative and therefore can be sandboxed for safety more easily than a library needs you to execute arbitrary code.

Vega/Vega-Lite author here. Vega does actually support hyperlinks via the href property. See for example https://vega.github.io/vega/docs/marks/rect/. Vega also supports timers (https://vega.github.io/vega/docs/event-streams/) which can make animated visualizations. Happy to talk through any specific needs you have from Vega to make it useful for Wikipedia.

Happy to hear that these things are supported. I didn't realize that event streams contained timers, because I didn't see any documentation about animation specifically or any examples demonstrating this capability. Maybe this could be documented more clearly.

I think Vega makes maybe more sense than eCharts since Vega is fully declarative and therefore can be sandboxed for safety more easily than a library needs you to execute arbitrary code.

How does eCharts need us to execute arbitrary code? As far as I can tell, it's just as declarative as Vega is.

Good point on the docs/examples for timers. One example (albeit not a chart in the strict sense) is this clock: https://vega.github.io/vega/examples/clock/. There is also this rotating globe: https://vega.github.io/vega/examples/earthquakes-globe/.

I'm not as familiar with eCharts but looking for example at https://echarts.apache.org/examples/en/editor.html?c=bar-waterfall2, the example passes a function to the formatter. Vega uses expressions for custom functions. These expressions always go through a custom parser that prevents dangerous code (that may send data or run loops) from being executed. IIRC we designed this originally for Wikipedia so that it could run arbitrary specifications client-side without exposing readers to any risks.

Another example that uses code is https://echarts.apache.org/examples/en/editor.html?c=bar-race which uses custom JS code to run the animation. In Vega, you can specify the timer entirely within the declarative specification (all within JSON).

I see that the focus for now is on server-side but at some point it may make sense to also support client-side rendering again especially if you want interactivity. I also saw that the focus is on implementing specific charts but at some point you might want to offer a fallback to a fully customizable language and then Vega may make sense.

I'd be happy to jump on a video call if you want to discuss any other concerns or questions you have.

I would add one more argument in favour of Vega: the conversion of existing graph code would be much easier using Vega than a new library because all existing code already use Vega syntax.

I would add one more argument in favour of Vega: the conversion of existing graph code would be much easier using Vega than a new library because all existing code already use Vega syntax.

They are using Vega 2 syntax, which is different the more recent one, Vega 5.

I'd also note that the Vega dependency was the primary reason we disabled ext:Graph (twice). And that while Vega's expressions layer has since been hardened, it likely still poses more risk for our use-cases than other options.

I’m not sure any other library supports custom but restricted expressions for extended functionality. If it’s a concern, we could probably add a way to disable these custom expressions to restrict functionality but make the library safer for custom charts.

I’m not sure any other library supports custom but restricted expressions for extended functionality. If it’s a concern, we could probably add a way to disable these custom expressions to restrict functionality but make the library safer for custom charts.

T336595#8848425 indicated that safe-vega is not really a thing.

If I understand correctly, there are multiple phases here.

Phase 1: custom charting api powered by some library like echarts or Vega. Here you can use whatever library (however the choice should be influenced what specification format makes sense in phase 2) since you don’t expose the full api to authors.

Phase 2 (later): potentially expand flexibility and customizable by giving authors more access to the underlying library. This is where we need a declarative api with restrictions for authors so that viewers don’t get exposed to potentially dangerous code.

I’m suggesting that in phase 1 you use Vega as is but for phase 2 you could consider a restricted version depending on your security needs.

Good point on the docs/examples for timers. One example (albeit not a chart in the strict sense) is this clock: https://vega.github.io/vega/examples/clock/. There is also this rotating globe: https://vega.github.io/vega/examples/earthquakes-globe/.

Thanks, those are helpful examples!

I'm not as familiar with eCharts but looking for example at https://echarts.apache.org/examples/en/editor.html?c=bar-waterfall2, the example passes a function to the formatter. Vega uses expressions for custom functions. These expressions always go through a custom parser that prevents dangerous code (that may send data or run loops) from being executed. IIRC we designed this originally for Wikipedia so that it could run arbitrary specifications client-side without exposing readers to any risks.

Another example that uses code is https://echarts.apache.org/examples/en/editor.html?c=bar-race which uses custom JS code to run the animation. In Vega, you can specify the timer entirely within the declarative specification (all within JSON).

I see what you mean: Vega supports animations declaratively using timers, whereas eCharts doesn't. Vega could really benefit from better documenting and advertising this feature, because when I was researching this the only thing I found was an MIT research paper with no evidence that it was ever upstreamed, while eCharts has bar race examples on its examples page (though as you say, on closer inspection those use code to periodically change the input data, not a built-in eCharts feature).

I see that the focus for now is on server-side but at some point it may make sense to also support client-side rendering again especially if you want interactivity. I also saw that the focus is on implementing specific charts but at some point you might want to offer a fallback to a fully customizable language and then Vega may make sense.

We do intend to support client-side rendering in the future for interactive graphs specifically, yes. However, we do not intend to offer direct access to the chart library, whether that's eCharts or Vega. (More about this in my next comment.)

I also saw that the focus is on implementing specific charts but at some point you might want to offer a fallback to a fully customizable language and then Vega may make sense.

Phase 2 (later): potentially expand flexibility and customizable by giving authors more access to the underlying library. This is where we need a declarative api with restrictions for authors so that viewers don’t get exposed to potentially dangerous code.

Offering direct access to the underlying library to wiki users will not happen. It's not just that it isn't in our short-term plans, we consider it actively harmful for several reasons:

  • Security: we're currently rebuilding our charts system from scratch after disabling the old one over security issues that were directly caused by this. Even with the hardening Vega has done, we think it's too risky to allow direct user access to expressions in Vega, or other similar functionality in other libraries
  • Schema migrations: libraries sometimes make breaking changes to the definition formats they accept (both eCharts and Vega have done this before). When that happens, migrating user-authored definitions (or user-authored template/Lua code that generates definitions) is really painful, takes a lot of work, and blocks upgrading to the new version of the library. We went through this with the Vega 2->5 upgrade, which involved writing our own code to remap Vega 2 definitions to the Vega 5 format, and was never finished. We don't want to do that again.
  • User-friendlyness: in the old system, users rarely hand-wrote graph definitions, instead they used templates or Lua modules that took a simpler set of parameters and generated a graph definition. This is a clear sign that the definitions are too complex for users to write. In part this is because Vega's definition schema is especially verbose, and Vega Lite fixes a lot of that, but even with Vega Lite or eCharts, these definition syntaxes just aren't very user friendly in general, and people would likely continue to rely on templating them.

For all these reasons, we will only support predefined graph types. This way, we are in control of the definition schema that users use and can make it simpler than the library's. We can do our own sanitization to make sure that user input doesn't make it anywhere dangerous, and library upgrades with breaking changes are much easier to handle. This also removes the need to have templates generate chart definitions, which makes managing them easier overall.

That all makes sense. Thanks for layout out all the considerations.

I’ll just add that one challenge with proving a new custom format for making charts is that you then have to provide documentation, examples, and issue triaging for that new language. To make that manageable it makes sense to keep the language small until the need for more features arises.