The recorded video of my presentation of DiCE at USENIX ATC’11 finally surfaced on the Web.
Search
-
Latest news and posts
Archives
Navigation
For members
Meta
The recorded video of my presentation of DiCE at USENIX ATC’11 finally surfaced on the Web.
Are the bugs in your OpenFlow application keeping you up all night? Today, despair no more!
As promised in our upcoming NSDI paper, we are releasing the first public version (0.7) of NICE, our tool for testing OpenFlow applications for the popular NOX controller platform.
You can read about how to use it here or simply grab the sources here.
Our paper A NICE Way to Test OpenFlow Applications has been accepted at NSDI 2012 (joint work with Jennifer Rexford from Princeton University).
The emergence of OpenFlow-capable switches enables exciting new network functionality, at the risk of programming errors that make communication less reliable. The centralized programming model, where a single controller program manages the network, seems to reduce the likelihood of bugs. However, the system is inherently distributed and asynchronous, with events happening at different switches and end hosts, and inevitable delays affecting communication with the controller. In this paper, we present efficient, systematic techniques for testing unmodified controller programs. Our NICE tool applies model checking to explore the state space of the entire system—the controller, the switches, and the hosts. Scalability is the main challenge, given the diversity of data packets, the large system state, and the many possible event orderings. To address this, we propose a novel way to augment model checking with symbolic execution of event handlers (to identify representative packets that exercise code paths on the controller). We also present a simplified OpenFlow switch model (to reduce the state space), and effective strategies for generating event interleavings likely to uncover bugs. Our prototype tests Python applications on the popular NOX platform. In testing three real applications—a MAC-learning switch, in-network server load balancing, and energy-efficient traffic engineering—we uncover eleven bugs.
Marco presented our DiCE demo at SIGCOMM 2011. The text of the submission is available here. Below is a screenshot of a prefix hijack (origin misconfiguration) attempt that our DiCE prototype detects on an Internet-like topology.
We went to Geneva on August 13 to watch the famous fireworks. Some of the photos are available on our photos page. The video of the finale is below (Peter is joking that these explosions are nothing compared to the state space explosion we are facing in our work on automatically testing OpenFlow applications!)
Our paper on Automating the Testing of OpenFlow Applications has been accepted at WRIPE 2011 (joint work with Jennifer Rexford from Princeton University). Marco will present the work in October.
Software-defined networking, and the emergence of OpenFlow-capable switches, enables a wide range of new network functionality. However, enhanced programmability inevitably leads to more software faults (or bugs). We believe that tools for testing OpenFlow programs are critical to the success of the new technology. However, the way OpenFlow applications interact with the data plane raises several challenges.
First, the space of possible inputs (e.g., packet headers and inter-packet timings) is huge. Second, the centralized controller has a indirect view of the traffic and experiences unavoidable delays in installing rules in the switches. Third, external factors like user behavior (e.g., mobility) and higher-layer protocols (e.g., the TCP state machine) affect the correctness of OpenFlow programs.
In this work-in-progress paper, we extend techniques for symbolic execution to generate inputs that systematically explore the space of system executions. Initial experiences with our prototype, which symbolically executes NOX applications written in Python, suggest that our techniques can help programmers identify bugs in their OpenFlow programs.
Our paper titled “Toward Online Testing of Federated and Heterogeneous Distributed Systems” will appear at the 2011 USENIX Annual Technical Conference (USENIX ATC ’11).
In this paper, we argue that distributed system reliability should be improved by proactively identifying potential faults using an online testing functionality. We propose an approach called DiCE that continuously and automatically explores the system behavior, to check whether the system deviates from its desired behavior. This paper outlines our vision and the problem we want to tackle. Then, it focuses on describing our experience in integrating DiCE with an open-source BGP router. We evaluate DiCE’s ability to quickly detect origin misconfiguration (popularly known as ‘prefix hijacking’), a recurring operator mistake that causes Internet-wide outages. The most (in)famous instance is perhaps the one of YouTube hijacking.
I presented DiCE in the Routing Working Group session at the RIPE62 meeting in Amsterdam. A RIPE meeting is a large event that gathers ISPs & network operators. So it was a great opportunity to plant the seed.
Here is the recorded video of my presentation.
They suggested to get in touch with the global routing operations Working Group at the IETF and we will look into that.
We have 2 Ph.D. student openings and we are actively looking for students to join the project.