Amin Vahdat

Amin Vahdat

Mountain View, California, United States
10K followers 500+ connections

About

At Google, I lead the global Machine Learning, Systems and Cloud AI team, with…

Activity

Join now to see all activity

Experience

Education

Publications

  • Pip: Detecting the Unexpected in Distributed Systems

    Proceedings of NSDI

    Bugs in distributed systems are often hard to find. Many bugs reflect discrepancies between a system's behavior and the programmer's assumptions about that behavior. We present Pip, an infrastructure for comparing actual behavior and expected behavior to expose structural errors and performance problems in distributed systems. Pip allows programmers to express, in a declarative language, expectations about the system's communications structure, timing, and resource consumption. Pip includes…

    Bugs in distributed systems are often hard to find. Many bugs reflect discrepancies between a system's behavior and the programmer's assumptions about that behavior. We present Pip, an infrastructure for comparing actual behavior and expected behavior to expose structural errors and performance problems in distributed systems. Pip allows programmers to express, in a declarative language, expectations about the system's communications structure, timing, and resource consumption. Pip includes system instrumentation and annotation tools to log actual system behavior, and visualization and query tools for exploring expected and unexpected behavior. Pip allows a developer to quickly understand and debug both familiar and unfamiliar systems.

    We applied Pip to several applications, including FAB, SplitStream, Bullet, and RanSub. We generated most of the instrumentation for all four applications automatically. We found the needed expectations easy to write, starting in each case with automatically generated expectations. Pip found unexpected behavior in each application, and helped to isolate the causes of poor performance and incorrect behavior.

    Other authors
    See publication
  • WAP5: Black-Box Performance Debugging for Wide-Area Systems

    Proceedings of WWW

    Wide-area distributed applications are challenging to debug, optimize, and maintain. We present Wide-Area Project 5 (WAP5), which aims to make these tasks easier by exposing the causal structure of communication within an application and by exposing delays that imply bottlenecks. These bottlenecks might not otherwise be obvious, with or without the application's source code. Previous research projects have presented algorithms to reconstruct application structure and the corresponding timing…

    Wide-area distributed applications are challenging to debug, optimize, and maintain. We present Wide-Area Project 5 (WAP5), which aims to make these tasks easier by exposing the causal structure of communication within an application and by exposing delays that imply bottlenecks. These bottlenecks might not otherwise be obvious, with or without the application's source code. Previous research projects have presented algorithms to reconstruct application structure and the corresponding timing information from black-box message traces of local-area systems. In this paper we present (1) a new algorithm for reconstructing application structure in both local- and wide-area distributed systems, (2) an infrastructure for gathering application traces in PlanetLab, and (3) our experiences tracing and analyzing three systems: CoDeeN and Coral, two content-distribution networks in PlanetLab; and Slurpee, an enterprise-scale incident-monitoring system.

    Other authors
    See publication

More activity by Amin

View Amin’s full profile

  • See who you know in common
  • Get introduced
  • Contact Amin directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Amin Vahdat

Add new skills with these courses