A fine-grained community visitors evaluation with Millisampler

What the analysis is: 

Millisampler is one in all Meta’s newest characterization instruments and permits us to watch, characterize, and debug community efficiency at high-granularity timescales effectively. This light-weight community visitors characterization instrument for continuous monitoring operates at superb, configurable timescales. It collects time collection of ingress and egress visitors volumes, variety of lively flows, incoming ECN marks, and ingress and egress retransmissions. Moreover, Millisampler can also be capable of determine in-region visitors and cross-region visitors (longer RTT). Millisampler runs on our server fleet accumulating quick, periodic snapshots of this knowledge at 100us, 1ms, and 10ms time granularities, shops it in native disk, and makes it accessible for a number of days for on-demand evaluation. For the reason that knowledge is barely aggregated flow-level header info, it doesn’t comprise any personally identifiable info (PII). Even with the minimal quantity of data it collects, Millisampler knowledge has confirmed very helpful in follow,  significantly when mixed with present coarser-grained knowledge — we’re capable of see clearly how change buffers or host NICs, for instance, is likely to be unable to deal with the ingress visitors sample.


The way it works: 

Millisampler contains userspace code to schedule runs, retailer knowledge, and serve knowledge, and an eBPF-based tc filter that runs within the kernel to gather fine-timescale knowledge. The person code attaches the tc filter and permits knowledge assortment. A tc filter is among the many first programmable steps on the receipt of a packet and close to the final step on transmission. On ingress, which means the eBPF code executes on the CPU core that’s processing the comfortable irq (backside half) because the packet is directed towards the proudly owning socket. As a result of processing occurs on many CPU cores, to keep away from locks, we use per-CPU variables, which enhance the reminiscence requirement to eradicate threat of competition. To attenuate overhead, we pattern periodically and for brief durations of time. Userspace due to this fact configures two parameters in Millisampler: the sampling interval and the variety of samples. We schedule runs with three sampling intervals: 10ms, 1ms, and 100μs, with a hard and fast variety of samples to 2,000 for all sampling intervals. Which means that our commentary durations vary from 200ms (100μs sampling charge) to 20s (10ms sampling charge), permitting us to watch occasions at sub-RTT to cross-region RTT time scales, and, on the identical time, repair the reminiscence footprint of every run to 2,000 64-bit counters per CPU core for every worth we measure.

Millisampler collects quite a lot of metrics. It computes ingress and egress whole bytes and ingress ECN-marked bytes from the lengths and CE bits of the packets. Millisampler additionally soundsTTLd marked retransmits. Millisampler makes use of a 128-bit sketch to estimate the variety of lively (incoming and outgoing) connections. Utilizing the sketch ends in an approximation of the connection rely that’s exact as much as a dozen connections and saturates at round 500 connections per sampling interval. Though there’s area for extra precision, in follow, greater than the precise variety of connections, the qualitative variation between a couple of connections to dozens or a whole bunch of connections has been useful towards figuring out patterns of visitors with extra connections (heavy incast) versus extra visitors with fewer connections.

Why it issues:

Millisampler is a strong instrument for troubleshooting and efficiency evaluation. Two contrasting community efficiency faults that we solved at Meta in the previous few years relied on our needing a fine-grained view of visitors. The primary drawback featured synchronized visitors bursts at superb time scales, and seeing this motivated us to construct and deploy Millisampler to catch it rapidly if it occurred once more. The second, which an early Millisampler prototype helped root-cause, featured a NIC driver bug that induced it to cease delivering packets for milliseconds at a time, thereby proving the worth of Millisampler in advanced investigations. Whereas Millisampler (or Millisampler-like knowledge) performed an essential position in these investigations, it was solely as a part of our wealthy ecosystem of knowledge assortment instruments that observe a dizzying array of metrics throughout hosts and a community.

Past such incidents, Millisampler knowledge has additionally confirmed helpful in characterizing and analyzing visitors traits of providers, permitting us to design and deploy a spread of options to assist enhance their efficiency. For instance, now we have been capable of characterize the character of bursts throughout plenty of providers with a purpose to perceive the depth of incast and tune transport efficiency accordingly. We’ve got additionally been ready to have a look at advanced interactions between short-RTT and long-RTT flows and perceive how bursts of both have an effect on equity for the opposite. In a following submit, we’ll have a look at an extension of Millisampler — Syncmillisampler — the place we run Millisampler synchronously throughout all hosts in a rack and use that knowledge to determine buffer competition within the top-of-rack ASICs.

Learn the complete paper:


Ehab Ghabashneh, Cristian Lumezanu, Raghu Nallamothu, and Rob Sherwood additionally contributed to the design and implementation of Millisampler.