Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views10 pages

LLM-Sketch: Enhancing Network Sketches With LLM: Yuanpeng Li Zhen Xu Zongwei LV

The document presents LLM-Sketch, a novel algorithm designed to enhance network stream mining by utilizing a two-tier data structure and large language models (LLMs) for flow classification. This approach improves accuracy in estimating flow sizes while minimizing memory usage, outperforming existing methods by achieving a 7.5× accuracy improvement. LLM-Sketch effectively addresses the challenges posed by dynamic networks and skewed traffic distributions, making it suitable for large-scale applications.

Uploaded by

Srawann Srav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views10 pages

LLM-Sketch: Enhancing Network Sketches With LLM: Yuanpeng Li Zhen Xu Zongwei LV

The document presents LLM-Sketch, a novel algorithm designed to enhance network stream mining by utilizing a two-tier data structure and large language models (LLMs) for flow classification. This approach improves accuracy in estimating flow sizes while minimizing memory usage, outperforming existing methods by achieving a 7.5× accuracy improvement. LLM-Sketch effectively addresses the challenges posed by dynamic networks and skewed traffic distributions, making it suitable for large-scale applications.

Uploaded by

Srawann Srav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

LLM-Sketch: Enhancing Network Sketches with LLM

Yuanpeng Li Zhen Xu Zongwei Lv


Peking University Zhejiang University Peking University
Zhongguancun Laboratory Hangzhou, China Beijing, China
Beijing, China

Yannan Hu∗ Yong Cui Tong Yang∗


Zhongguancun Laboratory Tsinghua University Peking University
Beijing, China Zhongguancun Laboratory Zhongguancun Laboratory
Beijing, China Beijing, China
arXiv:2502.07495v1 [cs.NI] 11 Feb 2025

Abstract have gained traction for their compact nature and ability to pro-
Network stream mining is fundamental to many network opera- vide small-yet-bounded error under stringent memory constraints,
tions. Sketches, as compact data structures that offer low memory making them well-suited for large-scale network stream mining
overhead with bounded accuracy, have emerged as a promising scenarios.
solution for network stream mining. Recent studies attempt to opti- Despite their appeal, existing sketch solutions often struggle to
mize sketches using machine learning; however, these approaches maintain acceptable error rates in the face of massive-scale net-
face the challenges of lacking adaptivity to dynamic networks and works and highly skewed traffic distributions [7, 15]. In practice, a
incurring high training costs. In this paper, we propose LLM-Sketch, small fraction of large flows typically accounts for the majority of
based on the insight that fields beyond the flow IDs in packet head- total traffic volume, while many small flows remain numerous yet
ers can also help infer flow sizes. By using a two-tier data structure contribute only modestly. A representative example is the Count-
and separately recording large and small flows, LLM-Sketch im- Min Sketch (CMS) [12], which updates and queries counters based
proves accuracy while minimizing memory usage. Furthermore, it on hashed flow IDs. Although CMS is simple and memory-efficient,
leverages fine-tuned large language models (LLMs) to reliably esti- it faces a fundamental trade-off: counters sized for small flows un-
mate flow sizes. We evaluate LLM-Sketch on three representative dercount the large ones, while counters sized for large flows waste
tasks, and the results demonstrate that LLM-Sketch outperforms memory on the many small ones. Consequently, CMS cannot ac-
state-of-the-art methods by achieving a 7.5× accuracy improve- curately capture the minority of large flows without significantly
ment. inflating overall memory usage.
To address skewness, recent works have proposed splitting large
Keywords and small flows into distinct data structures, typically a key-value
(KV) table for large flows combined with a compact sketch for
Network stream mining, Sketches, Flow classification, LLM
smaller ones [14, 20, 24, 35]. This approach reduces collisions among
ACM Reference Format: different flow sizes and avoids over-allocating memory for small
Yuanpeng Li, Zhen Xu, Zongwei Lv, Yannan Hu, Yong Cui, and Tong Yang. flows. However, a major drawback remains unresolved: when a
2025. LLM-Sketch: Enhancing Network Sketches with LLM. In . ACM, New
new flow arrives, it is difficult to know immediately if it will turn
York, NY, USA, 10 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
into a large flow or stay small. Meanwhile, learning-based sketches
attempt to predict flow size directly, hoping to bypass dynamic flow
1 Introduction classification. LCMS [17], for instance, trains a model to estimate
Network stream mining is a cornerstone of modern computer net- whether a flow will be large, then updates either the KV table
working, underpinning critical tasks such as DDoS victim detection or CMS accordingly. Although this can reduce collisions when
[25, 38], load balancing [3, 26], congestion control [22], and traffic the prediction is correct, it often suffers from real-world accuracy
engineering [2, 34]. As network scale and traffic volumes continue issues and transfers poorly to dynamic network environments.
to grow, ensuring efficient and accurate stream mining at scale be- Meta-sketch [10] takes a different route by learning the distribution
comes increasingly challenging. In response, sketches [9, 11, 12, 15] of flow sizes rather than explicitly splitting flows into large and
∗ Tong small. However, its training overhead is notably high, limiting its
Yang ([email protected]) and Yannan Hu ([email protected]) are the
corresponding authors. deployment in real-world scenarios. Other learning-based methods
have explored optimizing hashing or query processing [8, 31], but
Permission to make digital or hard copies of all or part of this work for personal or they tend to share similar drawbacks – either relying heavily on ID-
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation size correlations or incurring significant training cost. As a result,
on the first page. Copyrights for components of this work owned by others than the these approaches still struggle to handle unpredictable traffic shifts
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or while maintaining low error rates and manageable resource usage.
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected]. In this paper, we propose LLM-Sketch, a new sketch algorithm
Conference’17, Washington, DC, USA that adapts to skewed network traffic by combining a two-tier data
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM. structure with an LLM-powered real-time flow classifier. Our key
ACM ISBN 978-x-xxxx-xxxx-x/YYYY/MM
https://doi.org/10.1145/nnnnnnn.nnnnnnn insight is that leveraging the full packet header – beyond just the
Conference’17, July 2017, Washington, DC, USA Li et al.

flow ID – enables more accurate predictions of future flow sizes. By Definition 2.4. Hierarchical Heavy Hitters (HHH) Query: Consider
incorporating these additional header fields, LLM-Sketch effectively a hierarchy 𝐻 imposed on the flow IDs in U. For example, each IP
infers whether a newly arriving flow is likely to become large or address can be subdivided into multiple levels of prefixes (e.g., /8,
remain small, without relying on weak correlations between flow /16, /24). Given a data stream S and a threshold 𝑇 , an HHH query
IDs and sizes. LLM-Sketch’s design centers on two main techniques: aims to find all nodes in the hierarchy whose aggregated flow size
Technique 1: Two-tier data structure. LLM-Sketch’s data struc- exceeds 𝑇 . [6]
ture features two tiers: a heavy part for large flows and a light
part for small flows. This design more effectively handles skewed 2.2 Large Language Model
distributions by reducing collisions that typically arise when large A large language model (LLM) is a neural network designed to pro-
and small flows share the same counters. By tracking large flows in cess and generate natural language using vast amounts of training
a dedicated space, LLM-Sketch captures their sizes more accurately data. In recent years, it also finds success in other fields, such as
while preventing overflow in counters designated for small flows. biomedical research, code generation, and data analysis. Typical
Meanwhile, a compact sketch records small flows, limiting mem- LLMs include BERT [13], GPT [1], and Llama [30]. Among them,
ory overhead. To determine whether a flow should be considered BERT (Bidirectional Encoder Representations from Transformers)
large, we rely on a real-time classifier. We also implement a simple introduces masked language model (MLM), enabling the model to
lock flag mechanism to retain historical classification results and learn context from both directions. It also leverages next sentence
prevent genuinely large flows from being evicted prematurely. prediction to capture relationships between sentences. Variations
Technique 2: LLM-powered flow classifier. LLM-Sketch em- of BERT include DistilBERT [27], RoBERTa [36], and ALBERT [21],
ploys a flow classifier built on a Large Language Model (LLM) each offering improvements in efficiency or performance.
adapted for network traffic. We embed each packet header (exclud-
ing specific IP addresses to avoid overfitting) into a token sequence 2.3 The Count-Min Sketch
and feed it into the model. Instead of imposing a hard threshold for
As shown in Figure 1, the data structure of the Count-Min sketch
labeling flows as large or small, the classifier employs a soft-label
(CMS) consists of 𝑑 counter array. The 𝑖-th counter array 𝐴𝑖 con-
strategy: the model outputs a continuous value in [0, 1]. Flows that
sists of 𝑤 counters, and is also associated with a hash function
are significantly larger than the threshold receive labels near 1,
ℎ𝑖 (.)(1 ⩽ ℎ𝑖 (.) ⩽ 𝑤). When inserting a packet of flow 𝑓 , CMS
while those that are considerably smaller receive labels near 0. In
computes the 𝑑 hash functions to locate the 𝑑 mapped counters,
between, flows whose predicted sizes fall near the threshold are
𝐴1 [ℎ 1 (𝑓 )], · · · , 𝐴𝑑 [ℎ𝑑 (𝑓 )], and increments each mapped counter
assigned intermediate values (e.g., 0.4–0.6), capturing the inherent
by 1. When querying the size of flow 𝑓 , CMS reports the mini-
uncertainty and thus mitigating errors associated with borderline
mum value of each mapped counter as the estimated flow size, i.e.,
misclassifications, thereby improving overall accuracy of the sketch.
min{𝐴𝑖 [ℎ𝑖 (𝑓 )]}(1 ⩽ 𝑖 ⩽ 𝑑).
By integrating these real-time predictions with the two-tier data
structure, LLM-Sketch can more reliably track large flows while
keeping memory usage low. h1 +1
We demonstrate the versatility of LLM-Sketch through three f
h2 +1
representative network stream mining tasks: flow size query, heavy
hitter query, and hierarchical heavy hitter (HHH) query. We imple- h3 +1
ment LLM-Sketch in Python and conduct extensive experiments
on three real-world datasets. Experimental results show that LLM- Figure 1: The Count-Min sketch.
Sketch achieves, on average, a 7.5× improvement in accuracy over
state-of-the-art methods. All related source code is publicly avail-
able on GitHub 1 .
3 The LLM-Sketch Algorithm
2 Background In this section, we first propose the data structure and operations
of LLM-Sketch. Then we present how the flow classifier is designed.
2.1 Problem Definition After that, we describe the application of LLM-Sketch.
Definition 2.1. Data Stream: A data stream S = {𝑝 1, 𝑝 2, · · · , 𝑝𝑖 , · · · }
is a sequence of packets, where each packet 𝑝𝑖 has a flow ID 𝑓 , 3.1 Data Structure and Operations
drawn from a universe U. In this paper, we focus on flow measure- Data structure: As shown in Figure 2, the data structure of LLM-
ment within a fixed time window, so the data stream is finite. Note Sketch consists of three parts: a heavy part, a light part, and a flow
that each packet can only be proceeded once. classifier. The heavy part is a key-value (KV) table with 𝑤ℎ buckets.
Definition 2.2. Flow size query: Given a data stream S, reporting Each bucket contains 𝑑ℎ cells, and each cell records a flow, including
the size of every flow 𝑓 ∈ U, where flow size is defined as the ˆ The heavy part is also associated with
its flow ID 𝑓 and flow size 𝑛.
number of packets of ID 𝑓 , i.e., 𝑛(𝑓 ) = |{𝑖 : 𝑝𝑖 = 𝑓 }|. a hash function ℎ(.)(1 ⩽ ℎ(.) ⩽ 𝑤ℎ ), which maps flows to buckets.
The light part is a CMS, which maintains the sizes of small flows
Definition 2.3. Heavy hitter query: Given a data stream S and a using small counters (e.g., 8-bit) to save memory. The flow classifier
threshold 𝑇 , reporting the ID of every flow with size larger than 𝑇 . is a model that infers whether an incoming packet belongs to a large
1 https://github.com/LLM-Sketch/LLM-Sketch flow or a small flow. Its design is detailed in Section 3.2. Ideally,
LLM-Sketch: Enhancing Network Sketches with LLM Conference’17, July 2017, Washington, DC, USA

large flows are recorded in the heavy part, whereas the light part is Example 3: When inserting a packet of 𝑓 5, LLM-Sketch locates
used only for small flows. bucket 𝐵3. Since 𝐵3 is full, LLM-Sketch uses the flow classifier to
predict that 𝑓 5 is a small flow and therefore inserts (𝑓 5, 1) into the
Heavy part light part.
If (in heavy part
or heavy part is not full <k, v> <k, v> Example 4: When inserting a packet of 𝑓 8, LLM-Sketch locates
or predict as large) bucket 𝐵4. Since 𝐵4 is full, LLM-Sketch predicts 𝑓 8 as a large flow.
LLM-Sketch then evicts the flow with the minimum flow size (i.e.,
Embedding

𝑓 7) from 𝐵4, inserts (𝑓 8, 1) into 𝐵4, and inserts 𝑓 7 into the light

Soft Label
LLM part.
Optimization: Large-flow Locking. In the insertion process de-
scribed above, if a hash collision occurs, LLM-Sketch evicts the flow
Flow classifier
Otherwise with the smallest recorded size. Although this approach generally
works well, it may inadvertently evict newly arrived flows that
Light part are actually large but have not yet accumulated a significant size.
To address this issue, we introduce a lock flag in each cell of the
Figure 2: Workflow of LLM-Sketch. heavy part. This flag tracks how often a flow is predicted to be
large, thereby reducing the likelihood of evicting flows that were
Insertion: When inserting a packet of flow 𝑓 , LLM-Sketch locates previously identified as large, even if their current size is still small.
the mapped bucket 𝐵 [ℎ(𝑓 )] using the hash function ℎ. There are Lock flag update: Whenever a packet is inserted and its flow is
three cases: (re)classified, we update the lock flag based on the classifier’s pre-
Case 1: If 𝑓 is already recorded in 𝐵 [ℎ(𝑓 )], LLM-Sketch simply diction and the flow’s recorded size. Let 𝑦ˆ ∈ {0, 1} be the predicted
increments its flow size by 1. label for this packet, where 𝑦ˆ = 1 indicates large flow and 𝑦ˆ = 0
Case 2: If 𝑓 is not in 𝐵 [ℎ(𝑓 )] and there is an empty cell, LLM- indicates small flow. Recall that 𝑛ˆ represents the flow’s recorded
Sketch inserts (𝑓 , 1) into that cell. size. Then the lock flag 𝐿 ∈ {0, 1} is updated as follows:
Case 3: If 𝑓 is not in 𝐵 [ℎ(𝑓 )] and all cells in the bucket are full, (
ˆ 𝑦ˆ
𝐿·𝑛+
1, w.p. 𝑛+1 ,
LLM-Sketch uses the flow classifier to predict whether 𝑓 is a large 𝐿← ˆ
flow or a small flow. Based on the classifier’s output, there are 0, otherwise.
two sub-cases: 1) If 𝑓 is a large flow, let 𝑓𝑚𝑖𝑛 be the flow with the This rule can be viewed as an unbiased estimator of the fraction of
minimum flow size in 𝐵 [ℎ(𝑓 )]. LLM-Sketch evicts 𝑓𝑚𝑖𝑛 from the times the flow is predicted to be large, accumulated over its updates
bucket, inserts it into the light part, and then inserts (𝑓 , 1) into (See Theorem 4.3). If 𝐿 = 1 after the update, we treat the flow as
𝐵 [ℎ(𝑓 )]. 2) If 𝑓 is a small flow, LLM-Sketch directly inserts 𝑓 into large and therefore lock it. Otherwise, if 𝐿 = 0, it is more likely to
the light part. be a small flow and can be safely evicted if necessary.
Query: When querying the size of a flow 𝑓 , LLM-Sketch first checks Eviction policy: When a hash collision occurs and LLM-Sketch
if 𝑓 is in the heavy part. If so, it reports the recorded flow size; needs to evict a flow from a full bucket, it first checks whether any
otherwise, it reports the result from the light part. flows in that bucket have 𝐿 = 0. 1) If there is at least one unlocked
flow (𝐿 = 0), LLM-Sketch evicts tthe one with the smallest size
Heavy part Light part among them. 2) If all flows in the bucket are locked (𝐿 = 1), LLM-
+1 Sketch must evict the flow with the minimum size regardless of its
f1 B1 f1, 1000 f1, 1001 +1 lock flag. Although the latter scenario should be rare, it can still
f2 B2 f2, 1 occur due to the classification errors or the probabilistic nature of
the lock flag update.
f5 B3 f3, 100 f4, 10 +5

f8 B4 f6, 200 f7, 5 3.2 Model Design


+5
f8, 1 We choose to adapt a Large Language Model (LLM) as our flow
classifier due to its ability to capture complex patterns in packet
headers. By leveraging an LLM, we can process each packet header
Figure 3: An example of LLM-Sketch. in a contextual manner, enabling the classifier to learn nuanced
relationships that simpler models might overlook. Furthermore, the
Example 1: Figure 3 illustrates the different cases in the insertion inherent flexibility of LLMs makes them well-suited for handling
process of LLM-Sketch. When inserting a packet of flow 𝑓 1, LLM- packet headers of varying lengths and formats.
Sketch computes the hash function ℎ to locate bucket 𝐵1. Since 𝑓 1 Embedding: Since the raw packet header data cannot be directly
is already recorded in 𝐵1, LLM-Sketch simply increments its flow interpreted by a language model, we introduce an embedding layer
size by 1. that transforms the packet headers into token embeddings. Specif-
Example 2: When inserting a packet of 𝑓 2, LLM-Sketch locates ically, we treat the packet header as a binary string and segment
bucket 𝐵2. Since 𝐵2 has an empty cell, LLM-Sketch inserts (𝑓 2, 1) it into two-byte chunks, each serving as a token for the embed-
into that cell. ding layer. This approach circumvents the need for a cumbersome,
Conference’17, July 2017, Washington, DC, USA Li et al.

field-by-field parsing. In practice, to prevent overfitting to spe- Heavy hitter query: For heavy hitter query, LLM-Sketch main-
cific IP addresses, we remove the source and destination IP fields tains the same insertion procedure described earlier. When query-
before feeding the remaining header data into the model. Conse- ing heavy hitters, LLM-Sketch simply the heavy part to find all
quently, the classifier focuses on more generalizable features—such flows whose recorded sizes exceed the given threshold. Those flows
as transport-layer information—rather than memorizing particular are subsequently reported as heavy hitters.
endpoints in the training data. Hierarchical heavy hitter (HHH) query: To support HHH query,
Objective Function: A straightforward strategy might be to define LLM-Sketch replaces the light part with a CocoSketch [39]. The
a hard threshold 𝑇 (e.g., 64) to classify flows as large (⩾ 𝑇 ) or small insertion process remains primarily unchanged, except that flows
(< 𝑇 ) categories. However, directly optimizing for a strict binary which would originally be inserted into CMS are now inserted into
cutoff can lead to the following issues: CocoSketch. When performing an HHH query, LLM-Sketch first
• Flows near the threshold (e.g., those with sizes 60-70) often share merges the heavy and light parts into a single key-value table and
similar characteristics, making a single sharp boundary some- then obtains HHH using the aggregation approach proposed by
what arbitrary. CocoSketch.
• Misclassifying flows near the threshold has relatively little impact
on the overall sketch accuracy, so aggressively fitting a binary 4 Theoretical Analysis
boundary can introduce unnecessary complexity. 4.1 Accuracy and Error Bound
To address these concerns, we adopt a soft-label approach that We make the following assumptions in our analysis.
smooths the discrete large-versus-small boundary. Concretely, we
• Assumption 1 (Classification Consistency): The predicted
assign a label to each flow based on
 label of a flow does not change from large to small (or vice versa)

label = 𝜎 𝑎 (log(𝑛) − log(𝑇 ) , during its lifetime.
• Assumption 2 (Sufficient Heavy Part): The heavy part is large
where 𝑛 is the flow size, 𝑇 is the threshold, 𝜎 (·) is the sigmoid enough so that any large flow correctly classified as large is never
function, and 𝑎 is a scaling parameter. This design has several evicted once inserted.
advantages:
• Continuity: Rather than a hard 0/1 label, flows receive labels in Theorem 4.1. The probability that a large flow is fully accurate
the continuous range (0, 1), enabling a smooth transition around (i.e., tracked with zero error) in LLM-Sketch is
the threshold. 
𝑃LLMS = 𝐴 + (1 − 𝐴) · 𝑃CMS 𝑤𝑙𝑖𝑔ℎ𝑡 , 𝑑𝑙𝑖𝑔ℎ𝑡 , 𝑁𝑙𝑖𝑔ℎ𝑡 ,
• Reduced sensitivity: Flows that are significantly larger than 𝑇
yield labels near 1, while those much smaller than 𝑇 yield labels where 𝐴 is the classifier’s accuracy for large flows, i.e., the probability
near 0. Flows in the ambiguous region around 𝑇 hover near 0.5, that a large flow is correctly identified as large. 𝑤𝑙𝑖𝑔ℎ𝑡 , 𝑑𝑙𝑖𝑔ℎ𝑡 is the
making misclassifications less punitive. width and depth of the light part (CMS), and 𝑁𝑙𝑖𝑔ℎ𝑡 is the number of
• Smoother optimization: Training as a regression task on these flows that end up in the light part. 𝑃CMS (𝑤, 𝑑, 𝑁 ) is the probability
soft labels typically exhibits more stable convergence than a strict that a single flow is fully accurate in a CMS of width 𝑤 and depth
classification objective. 𝑑, when there are 𝑁 total flows in that sketch. One common formula,
based on a Poisson approximation of collisions, is:
In practice, this soft-label mechanism helps the classifier learn
𝑃 CMS (𝑤, 𝑑, 𝑁 ) = 1 − 1 − 𝑒 − (𝑁 −1)/𝑤 .
𝑑
a nuanced notion of flow size, rather than fixating on a single, po-
tentially noisy threshold. Large flows (e.g., above 1,000) naturally
produce labels close to 1, whereas small flows (e.g., below 5) pro- Proof. Consider a large flow 𝑓 :
duce labels close to 0. Flows near 𝑇 fall around 0.5, diminishing the Case 1: Correct Classification (probability 𝐴). Under Assump-
adverse impact of uncertain classifications. Consequently, the clas- tion 2, once inserted into the heavy part, 𝑓 is never evicted. Con-
sifier achieves better overall performance and adaptability across sequently, 𝑓 is fully accurate, so the probability in this scenario is
diverse network environments. 1.
Case 2: Misclassification (probability 1 − 𝐴). If 𝑓 is misclassified
3.3 Application as small, it goes into the light part. The probability that 𝑓 is fully
In this section, we describe how to apply LLM-Sketch to 3 typical accurate in CMS is 𝑃CMS (𝑤𝑙𝑖𝑔ℎ𝑡 , 𝑑𝑙𝑖𝑔ℎ𝑡 , 𝑁𝑙𝑖𝑔ℎ𝑡 ).
tasks: flow size query, heavy hitter query, and hierarchical heavy By the law of total probability, we sum over these two disjoint
hitter query. cases:
Flow size query: LLM-Sketch can be used directly to measure flow 𝑃LLMS = 𝐴 × 1 + (1 − 𝐴) × 𝑃 CMS (𝑤𝑙𝑖𝑔ℎ𝑡 , 𝑑𝑙𝑖𝑔ℎ𝑡 , 𝑁𝑙𝑖𝑔ℎ𝑡 )
sizes. 
= 𝐴 + (1 − 𝐴) · 𝑃CMS 𝑤𝑙𝑖𝑔ℎ𝑡 , 𝑑𝑙𝑖𝑔ℎ𝑡 , 𝑁𝑙𝑖𝑔ℎ𝑡 .
Optimization: using fingerprints. Following many existing works
[33, 40], LLM-Sketch also supports the use of fingerprints in place □
of full flow IDs. This is particularly beneficial when the original ˆ ) and 𝑛(𝑓 ) be the estimated and actual size
Theorem 4.2. Let 𝑛(𝑓
ID is large (e.g., the 13-byte 5-tuple). Although fingerprints may
of a flow 𝑓 . Let ∥flight ∥ 1 be the total number of packets sent to the
introduce collisions, they substantially reduce memory usage. Con-
light part. We have
sequently, LLM-Sketch can achieve higher accuracy under the same
memory budget compared with recording the full flow IDs. ∥flight ∥ 1 ⩽ ∥f ∥ 1 − 𝐴 × 𝑁𝑙𝑎𝑟𝑔𝑒 × 𝑇 ,
LLM-Sketch: Enhancing Network Sketches with LLM Conference’17, July 2017, Washington, DC, USA

where ∥f ∥ 1 is the total number of packets. Let 𝛿 = (1 − 𝐴)𝑒 −𝑑𝑙𝑖𝑔ℎ𝑡 , We aim to show that
𝑒 . With probability at least 1 − 𝛿,
𝜖 = 𝑤𝑙𝑖𝑔ℎ𝑡 𝑡
1 ∑︁
E[𝐿𝑡 ] = E[𝑦ˆ𝑖 ]
ˆ ) ⩽ 𝑛(𝑓 ) + 𝜖 ∥flight ∥ 1 .
𝑛(𝑓 𝑡 𝑖=1

Proof. We split into two cases according to whether the flow 𝑓 Base Case (𝑡 = 1). When 𝑡 = 1, the flow has been inserted
is correctly or incorrectly classified: exactly once. Thus,
Case 1: Correct Classification (probability 𝐴). If 𝑓 is a large (
1, w.p. 𝑦ˆ1
flow and the classifier labels it as large, 𝑓 is inserted into the heavy 𝐿1 =
part. Under our assumptions, once in the heavy part, 𝑓 is fully 0, otherwise
accurate. Thus, in this scenario, we trivially have Hence,
ˆ ) = 𝑛(𝑓 ) ⩽ 𝑛(𝑓 ) + 𝜖 ∥flight ∥ 1 .
𝑛(𝑓 E[𝐿1 ] = 1 × 𝑦ˆ1 + 0 × (1 − 𝑦ˆ1 ) = 𝑦ˆ1
Case 2: Misclassification (probability 1 − 𝐴). If 𝑓 is a large Since the expectation of the predicted labels for just one insertion
flow but is incorrectly labeled as small, it goes into the light part. is 𝑦ˆ1 , we have
Standard CMS analysis shows that with probability 1 − 𝑒 −𝑑𝑙𝑖𝑔ℎ𝑡 , 1
1 ∑︁
ˆ ) ⩽ 𝑛(𝑓 ) + 𝜖 ∥flight ∥ 1 . E[𝐿1 ] = E[𝑦ˆ𝑖 ] = E[𝑦ˆ1 ]
𝑛(𝑓 1 𝑖=1
Equivalently, Thus, the base case holds.
Inductive Step. Assume that after 𝑡 insertions,
 
𝑃 𝑛(𝑓ˆ ) > 𝑛(𝑓 ) + 𝜖 ∥flight ∥ 1 misclassified ⩽ 𝑒 −𝑑𝑙𝑖𝑔ℎ𝑡 .
𝑡
1 ∑︁
Hence, unconditionally, the probability that 𝑓 ends up in CMS E[𝐿𝑡 ] = E[𝑦ˆ𝑖 ]
and is over-counted by more than 𝜖 ∥flight ∥ 1 is at most 𝑡 𝑖=1

(1 − 𝐴)𝑒 −𝑑𝑙𝑖𝑔ℎ𝑡 . We want to show that the statement also holds for 𝑡 + 1, i.e.,
𝑡 +1
Therefore, with probability at least 1 − (1 − 𝐴)𝑒 −𝑑𝑙𝑖𝑔ℎ𝑡 , E[𝐿𝑡 +1 ] =
1 ∑︁
E[𝑦ˆ𝑖 ]
ˆ ) ⩽ 𝑛(𝑓 ) + 𝜖 ∥flight ∥ 1 . 𝑡 + 1 𝑖=1
𝑛(𝑓
Combining both cases, we set By the law of total expectation and the update rule for 𝐿, we
have
𝛿 = (1 − 𝐴)𝑒 −𝑑𝑙𝑖𝑔ℎ𝑡 , h  i
E[𝐿𝑡 +1 ] = E E 𝐿𝑡 +1 𝐿𝑡 , 𝑛ˆ𝑡 , 𝑦ˆ𝑡 +1
which gives h 𝐿 · 𝑛ˆ + 𝑦ˆ i
𝑡 𝑡 𝑡 +1
 
ˆ ) ⩽ 𝑛(𝑓 ) + 𝜖 ∥flight ∥ 1 ⩾ 1 − 𝛿. =E
𝑃 𝑛(𝑓 𝑛ˆ𝑡 + 1
Finally, observe that Since 𝑛ˆ𝑡 = 𝑡 at the (𝑡 + 1)-th insertion, we have
h 𝐿 · 𝑡 + 𝑦ˆ i
∥flight ∥ 1 ⩽ ∥f ∥ 1 − 𝐴 × 𝑁𝑙𝑎𝑟𝑔𝑒 × 𝑇 𝑡 𝑡 +1
E[𝐿𝑡 +1 ] = E
𝑡 +1
because at least 𝐴 × 𝑁𝑙𝑎𝑟𝑔𝑒 large flows are correctly classified and 1  
stored in the heavy part, each contributing at least 𝑇 packets. Hence, = E 𝐿𝑡 · 𝑡 + 𝑦ˆ𝑡 +1
𝑡 +1
the light part contains at most ∥f ∥ 1 − 𝐴 × 𝑁𝑙𝑎𝑟𝑔𝑒 × 𝑇 packets in 1 h i
= 𝑡 · E[𝐿𝑡 ] + E[𝑦ˆ𝑡 +1 ]
total, ensuring that 𝑡 +1
 
Using the inductive hypothesis E[𝐿𝑡 ] = 𝑡1 𝑖=1
Í𝑡
ˆ ) ⩽ 𝑛(𝑓 ) + 𝜖 ∥f ∥ 1 − 𝐴 × 𝑁𝑙𝑎𝑟𝑔𝑒 × 𝑇 .
𝑛(𝑓 E[𝑦ˆ𝑖 ], we have

□ 𝑡
1 h 1 ∑︁ i
E[𝐿𝑡 +1 ] = 𝑡· E[𝑦ˆ𝑖 ] + E[𝑦ˆ𝑡 +1 ]
𝑡 +1 𝑡 𝑖=1
4.2 Lock Flag Estimation
𝑡 +1
Theorem 4.3. The lock flag 𝐿 is an unbiased estimator of the flow’s 1 ∑︁
= E[𝑦ˆ𝑖 ].
predicted labels. 𝑡 + 1 𝑖=1

Proof. Let 𝑛ˆ𝑡 be the flow size for a given flow up to the 𝑡-th This completes the inductive step.
packet, and 𝑦ˆ𝑡 ∈ {0, 1} be the classifier’s predicted label at the 𝑡-th Conclusion. By mathematical induction, we have
insertion. Let 𝐿𝑡 be the value of the lock flag after the 𝑡-th insertion. 𝑡
1 ∑︁
The lock flag is updated as follows: E[𝐿𝑡 ] = E[𝑦ˆ𝑖 ]
𝑡 𝑖=1
𝐿𝑡 ·𝑛ˆ 𝑡 +𝑦ˆ𝑡 +1
(
1, w.p. 𝑛ˆ 𝑡 +1 for all 𝑡. Therefore, 𝐿𝑡 is an unbiased estimator of the average
𝐿𝑡 +1 =
0, otherwise predicted label up to the 𝑡-th insertion. □
Conference’17, July 2017, Washington, DC, USA Li et al.

5 Experimental Results 3.0


Bucket Size = 1 Bucket Size = 1
2.5 Bucket Size = 2 8 Bucket Size = 2
5.1 Experimental Setup Bucket Size = 4 Bucket Size = 4
Bucket Size = 8
2.0 Bucket Size = 8 6
Computation platform and implementation: We conduct all Bucket Size = 16 Bucket Size = 16
1.5

AAE
ARE
experiments on a GPU server (Intel i9-10980XE) equipped with 4
1.0
a NVIDIA-4090 GPU (with 24GB memory). We implement LLM- 2
0.5
Sketch (Ours), Learned CM sketch (LCMS) [17], meta-sketch (MS)
0.0 0
[10], ElasticSketch (ES) [35], and CocoSketch (CocoS) [39] in Python. 200 300 400 500 600 700 800 900 1000 200 300 400 500 600 700 800 900 1000
Memory (KB) Memory (KB)
For heavy hitter query, we set the threshold 𝑇 = 0.01% · ∥fl ∥ 1 as
(a) ARE. (b) AAE.
many papers do. For LLM-Sketch, we use Roberta [36] as the base
model, and set the soft label to.
  Figure 4: Accuracy vs. # bucket size.
label = 𝜎 2.298((log2 (𝑛) − log2 (64))

Hence, for flows whose sizes exceed 256, the label is above 0.99; 7
Heavy Ratio = 10% Heavy Ratio = 10%
while for flows whose sizes are below 16, the label is below 0.01. We 4 Heavy Ratio = 20% 6 Heavy Ratio = 20%
Heavy Ratio = 30% Heavy Ratio = 30%
apply LoRA (Low-Rank Adaptation) [18] to fine-tune our model. Heavy Ratio = 40% 5 Heavy Ratio = 40%
3 Heavy Ratio = 50% Heavy Ratio = 50%
We limit the training process to 1 epoch due to fast convergence. 4

ARE

AAE
For other algorithms, we set their parameters as the original paper 2 3
2
recommended. 1
1
Datasets: The datasets used for the evaluation are listed as follows. 0 0
2 200 300 400 500 600 700 800 900 1000 200 300 400 500 600 700 800 900 1000
• CAIDA dataset is a passive traces dataset collected from high- Memory (KB) Memory (KB)
speed monitors on a commercial backbone link. We split the (a) ARE. (b) AAE.
dataset into sub-datasets with time window of 5s. Each sub-
dataset consists of approximate 190k flows, 2.1M packets. We use
12 adjacent sub-dataset as the training set, and 1 sub-dataset as Figure 5: Accuracy vs. heavy ratio.
the test set.
• MAWI dataset 3 is a traffic dataset maintained by the MAWI
Hash Num = 1 5 Hash Num = 1
Working Group of the WIDE Project, collected at the transit 3.0 Hash Num = 2 Hash Num = 2
2.5 Hash Num = 3 4 Hash Num = 3
link of WIDE to the upstream ISP. We split the dataset into sub- Hash Num = 4 Hash Num = 4
datasets with time window of 5s. Each sub-dataset consists of 2.0 3

AAE
ARE

1.5
approximate 670k flows, 1.2M packets. We use 12 adjacent sub- 2
1.0
dataset as the training set, and 1 sub-dataset as the test set. 1
0.5
• IMC DC dataset 4 contains data from the data centers studied in
0.0
[7]. The dataset consists of multible sub-datasets, each consists 200 300 400 500 600 700 800 900 1000 200 300 400 500 600 700 800 900 1000
Memory (KB) Memory (KB)
of approximate 63k flows, 0.9M packets. We use 10 sub-dataset
(a) ARE. (b) AAE.
as the training set, and 1 sub-dataset as the test set.
Metrics: The metrics used for the evaluation are listed as follows.
1 Í ˆ )|
|𝑛 (𝑓 ) −𝑛 (𝑓 Figure 6: Accuracy vs. # hash functions.
• Average Relative Error (ARE): | U | 𝑓 ∈U 𝑛(𝑓 ) , where U
ˆ ) are the actual and estimated flow
is the universe, 𝑛(𝑓 ) and 𝑛(𝑓
size of flow 𝑓 , respectively. avoiding collision-induced errors. Once the bucket size reaches 8,
• Average Absolute Error (AAE): | U 1 Í ˆ further increases lead to only marginal improvements. Therefore,
| 𝑓 ∈ U |𝑛(𝑓 ) − 𝑛(𝑓 )|.
we set the bucket size in the heavy part to 8.
• F1 Score: 2·𝑃𝑅·𝑅𝑅
𝑃𝑅+𝑅𝑅 , where PR (Precision Rate) is the ratio of true Heavy ratio (Figure 5): We adjust the proportion of total memory
positive flows to all reported flows, and RR (Recall Rate) is the
allocated to the heavy part (referred to as the heavy ratio) and
ratio of true positive flows to all actual flows.
measure the accuracy. We find that a heavy ratio of 10% consistently
yields the lowest ARE, while the heavy ratio that achieves the
5.2 Experiment on Parameter Settings lowest AAE varies with the total memory size. This is because
# bucket size (Figure 4): We vary the bucket size used in the heavy increasing the heavy ratio improves the accuracy of large flows
part of LLM-Sketch. We find that when the bucket size is 1, the but lowers the accuracy of small flows. When the total memory is
error is highest because two large flows may collide in the same small, the accuracy of small flows has a greater impact on overall
bucket. As the bucket size increases, accuracy improves. When the accuracy, so a smaller heavy ratio yields better results. Conversely,
bucket size is at least 2, even if collisions occur, the two large flows when the total memory is large, the accuracy of large flows has a
can still be recorded in different cells within the same bucket, thus bigger influence, and AAE becomes more sensitive to large-flow
2 https://www.caida.org/catalog/datasets/passive_dataset accuracy, so a larger heavy ratio leads to better overall performance.
3 https://mawi.wide.ad.jp/mawi Therefore, for flow size query, we set the heavy ratio to 20% as a
4 https://pages.cs.wisc.edu/~tbenson/IMC10_Data.html balance between the accuracy of large flows and small flows. Note
LLM-Sketch: Enhancing Network Sketches with LLM Conference’17, July 2017, Washington, DC, USA

Ours 120 Ours


respectively, while its AAE is also 8.1 and 12.1 lower than those of
40
MS MS LCMS and ES, respectively. It is worth noting that meta-sketch is
LCMS 100 LCMS
30 ES ES not practical at our current data scale for two main reasons:
80

AAE
ARE

20 60 (1) When the memory budget is set to 200 KB, training a 500k-step
40 model fails to converge, leading to poor accuracy.
10
20 (2) When the memory is increased to 400 KB, meta-sketch requires
0 0 more than 24 GB of GPU memory for training.
200 300 400 500 600 700 800 900 1000 200 300 400 500 600 700 800 900 1000
Memory (KB) Memory (KB)
Accuracy of heavy hitter query (Figure 8): We compare LLM-
(a) ARE. (b) AAE.
Sketch with ES. We find that LLM-Sketch achieves higher accuracy
among the 2 algorithms. Under a memory budget of 50KB, LLM-
Figure 7: Accuracy of flow size query on CAIDA dataset. Sketch reaches an F1 score of 0.94, whereas the F1 score of ES is
0.74. In addition, the ARE of LLM-Sketch is also on average 2.6
times lower than that of ES.
1.00
0.0200 Ours Accuracy of HHH query (Figure 9): We compare LLM-Sketch
ES
0.95 0.0175 with CocoS. We find that LLM-Sketch achieves higher accuracy
0.90 0.0150
among the 2 algorithms. Under a memory budget of 50KB, LLM-
F1 Score

0.0125
ARE

0.85 0.0100 Sketch reaches an F1 score of 0.94, whereas the F1 score of CocoS
0.80 0.0075 is 0.82. In addition, the ARE of LLM-Sketch is also on average 1.9
Ours 0.0050 times lower than that of ES.
0.75 ES 0.0025
50 75 100 125 150 175 200 225 250 50 75 100 125 150 175 200 225 250 Accuracy on other datasets (Figure 10-11:) Apart from the
Memory (KB) Memory (KB)
CAIDA dataset, we evaluate the accuracy on flow size query and
(a) F1 score. (b) ARE. heavy hitter query on two additional datasets and find that LLM-
Sketch achieves high accuracy. For flow size query, on average,
Figure 8: Accuracy of heavy hitter query on CAIDA dataset. LLM-Sketch’s ARE is 5.3 and 13.2 times lower than those of LCMS
and ES, respectively, while its AAE is also 3.9 and 8.2 lower than
those of LCMS and ES, respectively. For heavy hitter query, under
1.000
Ours a memory budget of 50KB, LLM-Sketch reaches an F1 score of 0.99
0.975 0.10 CocoS
0.950
and 0.98 on the two datasets, respectively, whereas the F1 score of
0.08
0.925 ES is 0.94 and 0.89, respectively. It is worth noting that, on the IMC
F1 Score

0.06
ARE

0.900 DC dataset, when the memory size is large, ElasticSketch achieves


0.875 0.04 higher accuracy. This is because, with ample memory, ElasticS-
0.850 Ours 0.02 ketch’s heavy part has sufficient cells to store and identify potential
0.825 CocoS
50 75 100 125 150 175 200 225 250 50 75 100 125 150 175 200 225 250 large flows, thus avoiding errors from misclassifying large flows. In
Memory (KB) Memory (KB)
this scenario, LLM-Sketch also maintains high accuracy (F1 score >
(a) F1 score. (b) ARE. 0.99, ARE < 0.05).

Figure 9: Accuracy of HHH query on CAIDA dataset. 5.4 Micro Benchmark


Model accuracy (Figure 12): We evaluate the classification ac-
curacy for flows of different sizes and observe that LLM-Sketch
that for heavy hitter query, we only use the heavy part, because accurately classifies both very large and very small flows, which
heavy hitter query focuses solely on large-flow accuracy. meets our expectations. For flows smaller than 16 packets and those
# hash functions (Figure 6): We vary the number of hash func- larger than 256 packets, the model’s classification accuracy exceeds
tions used in the light part of LLM-Sketch and examine its accuracy. 90%. Although accuracy for flows within the [32,64) range is rel-
We observe that when there is only 1 hash function, the error is atively low, it has little impact on the overall performance and is
highest due to the lack of multi-hash error reduction. When the thus acceptable.
number of hash functions is at least two, the best-performing num- Enhancement by Using Full Packet Headers (Figure 13). We
ber of hashes depends on the memory size. Nevertheless, with 3 compare the accuracy of LLM-Sketch with a baseline algorithm
hash functions, LLM-Sketch consistently achieves near-optimal that uses only flow IDs for classification. We train the classifier on
accuracy. Therefore, we set the number of hash functions in the a training set and test it on 4 test sets, each collected at a different
light part to 3. time interval from the training set. We also evaluate the accuracy of
heavy hitter query under 50KB. We find that when using full packet
5.3 End-to-end Performance headers, the classifier can accurately infer flow sizes over an extended
Accuracy of flow size query (Figure 7): We compare LLM-Sketch period, thereby preserving the sketch’s high accuracy. As shown in
with LCMS, MS, and ES. We find that LLM-Sketch achieves the highest Figure 13, although at minute 0 the baseline and LLM-Sketch yield
accuracy among all 4 algorithms. On average, the ARE of LLM- essentially the same results, the baseline’s classifier accuracy and its
Sketch is 11.8 and 18.8 times lower than that of LCMS and ES, sketch accuracy decline rapidly over time, whereas LLM-Sketch’s
Conference’17, July 2017, Washington, DC, USA Li et al.

8
35 Ours 35 Ours 5 Ours Ours
LCMS LCMS LCMS 7 LCMS
30 ES 30 ES ES ES
4 6
25 25 5
20 20 3
4

AAE
ARE

ARE

AAE
15 15 2 3
10 10 2
1
5 5 1
0 0 0 0
200 300 400 500 600 700 800 900 1000 200 300 400 500 600 700 800 900 1000 200 300 400 500 600 700 800 900 1000 200 300 400 500 600 700 800 900 1000
Memory (KB) Memory (KB) Memory (KB) Memory (KB)

(a) ARE on MAWI. (b) AAE on MAWI. (c) ARE on IMC DC. (d) AAE on IMC DC.

Figure 10: Accuracy of flow size query on other datasets. (When memory is 200KB, the ARE and AAE of MS on MAWI are 224 and 271,
respectively, those on IMC DC are 29 and 108, respectively.)

1.00 1.00
0.009 Ours Ours
ES 0.025 ES
0.99 0.008 0.98
0.98 0.007 0.96 0.020
0.006

F1 Score
F1 Score

0.015
ARE

ARE
0.97 0.005 0.94
0.96 0.004 0.92 0.010
0.003
0.95 Ours 0.90 Ours 0.005
ES 0.002 ES
50 75 100 125 150 175 200 225 250 50 75 100 125 150 175 200 225 250 50 75 100 125 150 175 200 225 250 50 75 100 125 150 175 200 225 250
Memory (KB) Memory (KB) Memory (KB) Memory (KB)

(a) F1 on MAWI. (b) ARE on MAWI. (c) F1 on IMC DC. (d) ARE on IMC DC.

Figure 11: Accuracy of flow size query on other datasets.

1.0
6 Related Work
0.9
Traditional sketches: Sketches can broadly be categorized into
Accuracy

0.8
two types: 1) Classic sketches consist of a counter matrix and multi-
0.7
ple hash functions. During updates and queries, flow IDs are hashed
0.6 Accuracy into multiple counters, and the mapped counters are then updated
and queried accordingly. Typical classic sketches include the Count-
[2, 2)
[4 4)
[8 , 8)
[16, 16)
[642, 64)
[12 , 12 )
[2 8, 2 8)
[5156, 556)
[10 2, 1 12)
[4 48, 204 )
[81096, 4098)
92 81 6)
63 )
)
[3 , 32

[2024, 024

, 1 92
84
[1,

Min Sketch (CMS) [12], the Conservative Update Sketch (CUS) [15],
Flow Size Bins and the Count Sketch (CS) [11]. However, classic sketches fail to
account for the highly skewed nature of network traffic, resulting
Figure 12: Model accuracy. in memory waste. 2) Sophisticated sketches address this problem by
separating large flows from small flows [14, 19, 20, 24, 35]. These
sketches typically consist of multiple parts, with different parts
Ours 0.94 using different data structures to record flows of varying sizes. A
0.90 Baseline 0.93 typical sophisticated sketch is ElasticSketch [35], which is com-
0.88
0.92
0.86
posed of a heavy part and a light part. The heavy part is a key-value
0.91
Sketch F1
Model F1

0.84 0.90 table, while the light part is a CM sketch. Packets are first inserted
0.82 0.89 into the heavy part. When the heavy part becomes full, ElasticS-
0.88
0.80 Ours ketch uses an eviction method to remove the flow that is most likely
0.87 Baseline
0.78 to be small and inserts it into the light part. Researchers have at-
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
Time from Training Data (min) Time from Training Data (min) tempted to improve sketch accuracy by adjusting the flow selection
(a) F1 score of model. (b) F1 score of sketch. method for eviction, but most of these attempts have been based on
experience. Additional sketches have been proposed for specialized
tasks, such as heavy hitter query [4, 5, 28], hierarchical heavy hitter
Figure 13: Accuracy over time.
query [6, 39], and DDoS victim/super-spreader query [29, 37].
ML-based sketches: In recent years, researchers have attempted
to use machine learning methods to improve sketch performance.
accuracy shows only a slight decrease. At the 20-minute mark, the
Learned Count-Min Sketch (LCMS) [17] employs an RNN to learn
classifier’s F1 scores for LLM-Sketch and the baseline drop by 0.059
and infer whether a flow is large and uses an additional hash table
and 0.113, respectively, while the sketch’s end-to-end F1 scores also
to record large flows. Other solutions [8, 10, 31] use ML to enhance
drop by 0.017 and 0.069, respectively.
LLM-Sketch: Enhancing Network Sketches with LLM Conference’17, July 2017, Washington, DC, USA

hashing, updating, and querying processes within the sketch, lead- [15] Cristian Estan and George Varghese. 2002. New directions in traffic measurement
ing to improved performance. The main difference between these and accounting. In Proceedings of the 2002 conference on Applications, technologies,
architectures, and protocols for computer communications. 323–336.
works and ours is that they only learn features from the flow ID [16] Hong Ye He, Zhi Guo Yang, and Xiang Ning Chen. 2020. PERT: Payload encoding
and distribution, without utilizing other information carried by the representation from transformer for encrypted traffic classification. In 2020 ITU
Kaleidoscope: Industry-Driven Digital Transformation (ITU K). IEEE, 1–8.
packets. [17] Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. 2019. Learning-Based
LLM for other network tasks: Some works explore how to adapt Frequency Estimation Algorithms.. In International Conference on Learning Rep-
LLMs to other network operations, such as traffic classification, resentations.
[18] Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean
viewport prediction, adaptive bitrate streaming, and cluster job Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large
scheduling. Typical works include PERT [16], ET-BERT [23], YaTC language models. arXiv preprint arXiv:2106.09685 (2021).
[41], and NetLLM [32]. [19] Qun Huang, Patrick PC Lee, and Yungang Bao. 2018. Sketchlearn: Relieving
user burdens in approximate measurement with automated statistical inference.
In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data
Communication. 576–590.
7 Conclusion [20] Qun Huang, Siyuan Sheng, Xiang Chen, Yungang Bao, Rui Zhang, Yanwei Xu, and
In this paper, we propose LLM-Sketch, which combines a novel Gong Zhang. 2021. Toward { Nearly-Zero-Error } sketching via compressive sens-
ing. In 18th USENIX Symposium on Networked Systems Design and Implementation
two-part data structure with an LLM-powered flow classifier to ef- (NSDI 21). 1027–1044.
fectively separate large flows from small ones and reduce collisions. [21] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
By employing a soft-label mechanism for real-time flow predictions, Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning
of language representations. arXiv preprint arXiv:1909.11942 (2019).
it accurately identifies potential heavy flows while minimizing mis- [22] Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang,
classifications. Experimental results on real-world datasets confirm Zheng Cao, Ming Zhang, Frank Kelly, Mohammad Alizadeh, et al. 2019. HPCC:
High precision congestion control. In Proceedings of the ACM special interest
that LLM-Sketch achieves a significant 7.5× improvement in accu- group on data communication. 44–58.
racy over state-of-the-art methods, demonstrating its versatility [23] Xinjie Lin, Gang Xiong, Gaopeng Gou, Zhen Li, Junzheng Shi, and Jing Yu. 2022.
for diverse network stream mining tasks. All related code is open- Et-bert: A contextualized datagram representation with pre-training transformers
for encrypted traffic classification. In Proceedings of the ACM Web Conference
sourced on Github. 2022. 633–642.
[24] Zaoxing Liu, Ran Ben-Basat, Gil Einziger, Yaron Kassner, Vladimir Braverman,
Roy Friedman, and Vyas Sekar. 2019. Nitrosketch: Robust and general sketch-
References based monitoring in software switches. In Proceedings of the ACM Special Interest
[1] Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- Group on Data Communication. 334–350.
cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal [25] Zaoxing Liu, Hun Namkung, Georgios Nikolaidis, Jeongkeun Lee, Changhoon
Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 Kim, Xin Jin, Vladimir Braverman, Minlan Yu, and Vyas Sekar. 2021. Jaqen:
(2023). A { High-Performance } { Switch-Native } approach for detecting and mitigating
[2] Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson volumetric { DDoS } attacks with programmable switches. In 30th USENIX Security
Huang, Amin Vahdat, et al. 2010. Hedera: dynamic flow scheduling for data Symposium (USENIX Security 21). 3829–3846.
center networks.. In Nsdi, Vol. 10. San Jose, USA, 89–92. [26] Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017.
[3] Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Silkroad: Making stateful layer-4 load balancing fast and cheap using switching
Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, asics. In Proceedings of the Conference of the ACM Special Interest Group on Data
Rong Pan, Navindra Yadav, et al. 2014. CONGA: Distributed congestion-aware Communication. 15–28.
load balancing for datacenters. In Proceedings of the 2014 ACM conference on [27] Victor Sanh, L Debut, J Chaumond, and T Wolf. 2019. DistilBERT, a distilled
SIGCOMM. 503–514. version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019. arXiv preprint
[4] Ran Ben Basat, Xiaoqi Chen, Gil Einziger, and Ori Rottenstreich. 2020. Design- arXiv:1910.01108 (2019).
ing heavy-hitter detection algorithms for programmable switches. IEEE/ACM [28] Lu Tang, Qun Huang, and Patrick PC Lee. 2019. Mv-sketch: A fast and compact
Transactions on Networking 28, 3 (2020), 1172–1185. invertible sketch for heavy flow detection in network data streams. In IEEE
[5] Ran Ben-Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. 2016. Heavy INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2026–2034.
hitters in streams and sliding windows. In IEEE INFOCOM 2016-The 35th Annual [29] Lu Tang, Qun Huang, and Patrick PC Lee. 2020. SpreadSketch: Toward invertible
IEEE International Conference on Computer Communications. IEEE, 1–9. and network-wide detection of superspreaders. In IEEE INFOCOM 2020-IEEE
[6] Ran Ben Basat, Gil Einziger, Roy Friedman, Marcelo C Luizelli, and Erez Waisbard. Conference on Computer Communications. IEEE, 1608–1617.
2017. Constant time updates in hierarchical heavy hitters. In Proceedings of the [30] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas-
Conference of the ACM Special Interest Group on Data Communication. 127–140. mine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhos-
[7] Theophilus Benson, Aditya Akella, and David A Maltz. 2010. Network traffic char- ale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv
acteristics of data centers in the wild. In Proceedings of the 10th ACM SIGCOMM preprint arXiv:2307.09288 (2023).
conference on Internet measurement. 267–280. [31] Hengrui Wang, Huiping Lin, Zheng Zhong, Tong Yang, and Muhammad Shahzad.
[8] Dimitris Bertsimas and Vassilis Digalakis. 2021. Frequency estimation in data 2022. Enhanced machine learning sketches for network measurements. IEEE
streams: Learning the optimal hashing scheme. IEEE Transactions on Knowledge Trans. Comput. 72, 4 (2022), 957–970.
and Data Engineering 35, 2 (2021), 1541–1553. [32] Duo Wu, Xianda Wang, Yaqi Qiao, Zhi Wang, Junchen Jiang, Shuguang Cui, and
[9] Burton H Bloom. 1970. Space/time trade-offs in hash coding with allowable Fangxin Wang. 2024. Netllm: Adapting large language models for networking.
errors. Commun. ACM 13, 7 (1970), 422–426. In Proceedings of the ACM SIGCOMM 2024 Conference. 661–678.
[10] Yukun Cao, Yuan Feng, and Xike Xie. 2023. Meta-sketch: A neural data structure [33] Yuchen Xu, Wenfei Wu, Bohan Zhao, Tong Yang, and Yikai Zhao. 2023. MimoS-
for estimating item frequencies of data streams. In Proceedings of the AAAI ketch: A Framework to Mine Item Frequency on Multiple Nodes with Sketches.
Conference on Artificial Intelligence, Vol. 37. 6916–6924. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and
[11] Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2002. Finding frequent Data Mining. 2838–2849.
items in data streams. In International Colloquium on Automata, Languages, and [34] Kaicheng Yang, Yuanpeng Li, Sheng Long, Tong Yang, Ruijie Miao, Yikai Zhao,
Programming. Springer, 693–703. Chaoyang Ji, Penghui Mi, Guodong Yang, Qiong Xie, et al. 2023. { AAsclepius } :
[12] Graham Cormode and Shan Muthukrishnan. 2005. An improved data stream Monitoring, Diagnosing, and Detouring at the Internet Peering Edge. In 2023
summary: the count-min sketch and its applications. Journal of Algorithms 55, 1 USENIX Annual Technical Conference (USENIX ATC 23). 655–671.
(2005), 58–75. [35] Tong Yang, Jie Jiang, Peng Liu, Qun Huang, Junzhi Gong, Yang Zhou, Rui Miao,
[13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Xiaoming Li, and Steve Uhlig. 2018. Elastic sketch: Adaptive and fast network-
Pre-training of deep bidirectional transformers for language understanding. arXiv wide measurements. In Proceedings of the 2018 Conference of the ACM Special
preprint arXiv:1810.04805 (2018). Interest Group on Data Communication. 561–575.
[14] Rui Ding, Shibo Yang, Xiang Chen, and Qun Huang. 2023. Bitsense: Universal [36] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy
and nearly zero-error optimization for sketch counters with compressive sensing. Omer, and Lewis Mike. 2019. Roberta: A robustly optimized bert pretraining
In Proceedings of the ACM SIGCOMM 2023 Conference. 220–238. approach. arXiv preprint arXiv:1907.11692 364 (2019).
Conference’17, July 2017, Washington, DC, USA Li et al.

[37] Minlan Yu, Lavanya Jose, and Rui Miao. 2013. Software { Defined } { Traffic } 2021 ACM SIGCOMM 2021 Conference. 207–222.
Measurement with { OpenSketch } . In 10th USENIX symposium on networked [40] Bohan Zhao, Xiang Li, Boyu Tian, Zhiyu Mei, and Wenfei Wu. 2021. Dhs: Adaptive
systems design and implementation (NSDI 13). 29–42. memory layout organization of sketch slots for fast and accurate data stream
[38] Menghao Zhang, Guanyu Li, Shicheng Wang, Chang Liu, Ang Chen, Hongxin processing. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge
Hu, Guofei Gu, Qianqian Li, Mingwei Xu, and Jianping Wu. 2020. Poseidon: Discovery & Data Mining. 2285–2293.
Mitigating volumetric ddos attacks with programmable switches. In the 27th [41] Ruijie Zhao, Mingwei Zhan, Xianwen Deng, Yanhao Wang, Yijun Wang, Guan
Network and Distributed System Security Symposium (NDSS 2020). Gui, and Zhi Xue. 2023. Yet another traffic classifier: A masked autoencoder
[39] Yinda Zhang, Zaoxing Liu, Ruixin Wang, Tong Yang, Jizhou Li, Ruijie Miao, Peng based traffic transformer with multi-level flow representation. In Proceedings of
Liu, Ruwen Zhang, and Junchen Jiang. 2021. CocoSketch: High-performance the AAAI Conference on Artificial Intelligence, Vol. 37. 5420–5427.
sketch-based measurement over arbitrary partial key query. In Proceedings of the

You might also like