Serving Hybrid-Cloud SQL Interactive Queries at Twitter
Authors:
Chunxu Tang,
Beinan Wang,
Huijun Wu,
Zhenzhao Wang,
Yao Li,
Vrushali Channapattan,
Zhenxiao Luo,
Ruchin Kabra,
Mainak Ghosh,
Nikhil Kantibhai Navadiya,
Prachi Mishra,
Prateek Mukhedkar,
Anneliese Lu
Abstract:
The demand for data analytics has been consistently increasing in the past years at Twitter. In order to fulfill the requirements and provide a highly scalable and available query experience, a large-scale in-house SQL system is heavily relied on. Recently, we evolved the SQL system into a hybrid-cloud SQL federation system, compliant with Twitter's Partly Cloudy strategy. The hybrid-cloud SQL fed…
▽ More
The demand for data analytics has been consistently increasing in the past years at Twitter. In order to fulfill the requirements and provide a highly scalable and available query experience, a large-scale in-house SQL system is heavily relied on. Recently, we evolved the SQL system into a hybrid-cloud SQL federation system, compliant with Twitter's Partly Cloudy strategy. The hybrid-cloud SQL federation system is capable of processing queries across Twitter's data centers and the public cloud, interacting with around 10PB of data per day.
In this paper, the design of the hybrid-cloud SQL federation system is presented, which consists of query, cluster, and storage federations. We identify challenges in a modern SQL system and demonstrate how our system addresses them with some important design decisions. We also conduct qualitative examinations and summarize instructive lessons learned from the development and operation of such a SQL system.
△ Less
Submitted 9 July, 2022;
originally announced July 2022.
Taming Hybrid-Cloud Fast and Scalable Graph Analytics at Twitter
Authors:
Chunxu Tang,
Yao Li,
Zhenxiao Luo,
Mainak Ghosh,
Huijun Wu,
Lu Zhang,
Anneliese Lu,
Ruchin Kabra,
Nikhil Kantibhai Navadiya,
Prachi Mishra,
Prateek Mukhedkar,
Vrushali Channapattan
Abstract:
We have witnessed a boosted demand for graph analytics at Twitter in recent years, and graph analytics has become one of the key parts of Twitter's large-scale data analytics and machine learning for driving engagement, serving the most relevant content, and promoting healthier conversations. However, infrastructure for graph analytics has historically not been an area of investment at Twitter, re…
▽ More
We have witnessed a boosted demand for graph analytics at Twitter in recent years, and graph analytics has become one of the key parts of Twitter's large-scale data analytics and machine learning for driving engagement, serving the most relevant content, and promoting healthier conversations. However, infrastructure for graph analytics has historically not been an area of investment at Twitter, resulting in a long timeline and huge engineering effort for each project to deal with graphs at the Twitter scale. How do we build a unified graph analytics user experience to fulfill modern data analytics on various graph scales spanning from thousands to hundreds of billions of vertices and edges?
To bring fast and scalable graph analytics capability into production, we investigate the challenges we are facing in large-scale graph analytics at Twitter and propose a unified graph analytics platform for efficient, scalable, and reliable graph analytics across on-premises and cloud, to fulfill the requirements of diverse graph use cases and challenging scales. We also conduct quantitative benchmarking on Twitter's production-level graph use cases between popular graph analytics frameworks to certify our solution.
△ Less
Submitted 25 August, 2022; v1 submitted 24 April, 2022;
originally announced April 2022.