-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[None][doc] add adp balance blog #7213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[None][doc] add adp balance blog #7213
Conversation
📝 WalkthroughWalkthroughAdds three PNG patterns to Git LFS via Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (8)
70-75: Specify code-fence languages to satisfy linters and improve readability.markdownlint flagged MD040. Add explicit languages: use
textfor illustrative timelines andyamlfor the config snippet.Apply this diff:
@@ -``` +```text iter_i: [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gnM] iter_i+1: [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gnM] ... iter_i+N-1: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]@@
-+text
iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
...
iter_i+N-1: [C0, g01, ..., g0M], [C1, g11, ..., g1M], ..., [CN, gN1, ..., gNM]@@ -``` +```yamlAlso applies to: 79-84, 138-138
54-54: Add alt text to images (MD045) and improve accessibility.All images lack alt text and are flagged by markdownlint (MD045). Add descriptive alt attributes to the
<img>tags.Apply this diff:
@@ - <img src="./../media/ADP Balance.png"> + <img src="./../media/ADP Balance.png" alt="Baseline: sort by tokens then round-robin across ranks"> @@ - <img src="./../media/combined_token_distribution.png"> + <img src="./../media/combined_token_distribution.png" alt="Distribution of input and output token lengths"> @@ - <img src="./../media/si2_ei30235_plot1_token_distribution.png"> + <img src="./../media/si2_ei30235_plot1_token_distribution.png" alt="Avg tokens and balance_ratio vs iteration (baseline)"> @@ - <img src="./../media/si100_ei12000_plot1_token_distribution.png"> + <img src="./../media/si100_ei12000_plot1_token_distribution.png" alt="Avg tokens and balance_ratio in iterations 100–12000 (baseline)"> @@ - <img src="./../media/balanced_si100_ei12000_plot1_token_distribution.png"> + <img src="./../media/balanced_si100_ei12000_plot1_token_distribution.png" alt="Avg tokens and balance_ratio in iterations 100–12000 (ADP Balance)"> @@ - <img src="./../media/combined_charts.png"> + <img src="./../media/combined_charts.png" alt="Pareto curve of TPS vs timeout_iters and batching_wait_iters">Optional: consider renaming the asset with a space (
ADP Balance.png) to a hyphenated name (e.g.,adp-balance.png) to avoid path quoting issues in some build systems.Also applies to: 97-97, 119-119, 128-128, 147-147, 160-160
22-22: Fix non-ASCII punctuation and tighten the sentence.There is a Chinese comma and the sentence can be made crisper.
Apply this diff:
-With non-extremely-long context scenarios,we can assume that the overall latency is dominated by the non-MHA portion; hence the time complexity is effectively O(N). +In non-extremely-long-context scenarios, we can assume the overall latency is dominated by non-MHA components; hence, the per-iteration time scales roughly linearly with the number of ranks.
11-11: Avoid multiple H1 headings; rename the second header to clarify scope.There are two top-level headers (“Theoretical Analysis and Modeling” and “Theoretical Analysis”). Prefer a single H1 per page.
Apply this diff:
-# Theoretical Analysis +## Notation and MetricsAlso applies to: 15-15
71-75: Correct subscript typos in illustrative timelines (gnM/gnN→gNM/gNN).A few indices flip case/order and may confuse readers.
Apply this diff:
@@ -iter_i: [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gnM] -iter_i+1: [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gnM] +iter_i: [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM] +iter_i+1: [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gNM] @@ -iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN] -iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN] +iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN] +iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN]Also applies to: 80-84
156-156: Capitalize “Pareto”.Stylistic consistency in section titles.
Apply this diff:
-The following Figure shows the pareto curve of impact of different `timeout_iters` and `batch_waiting_iters` values on TPS. +The following figure shows the Pareto curve for the impact of different `timeout_iters` and `batching_wait_iters` values on TPS.
50-60: Image path with space may be brittle across doc toolchains.
./../media/ADP Balance.pngcontains a space. Some static-site generators and link-checkers stumble on unencoded spaces. Preferadp-balance.png.If you decide to rename the asset, update the
<img>srcaccordingly and run the docs build/link check. I can supply a follow-up diff once the asset is renamed.
110-116: Strengthen experimental methodology notes (hardware, seeds, repetitions).The reported TPS gains are compelling; adding hardware details (GPU model/count, driver/CUDA/TensorRT-LLM commit), random seed control, and number of runs with mean±std would increase reproducibility and credibility.
I can draft a short “Experimental Setup” subsection with placeholders for these fields if you’d like.
Also applies to: 133-136
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
-
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md
[grammar] ~11-~11: There might be a mistake here.
Context: ...ks. # Theoretical Analysis and Modeling Optimization Goals: - Minimize the load ...
(QB_NEW_EN)
[grammar] ~12-~12: There might be a mistake here.
Context: ...nalysis and Modeling Optimization Goals: - Minimize the load difference between dif...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: ...ecution time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...tion time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N} iter...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ...i = \max{0 \leq m < N} iter_time_{im} $$ where
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...= \max_{0 \leq m < N} iter_time_{im} $$ where
(QB_NEW_EN)
[grammar] ~22-~22: Ensure spelling is correct
Context: ...odule). With non-extremely-long context scenarios,we can assume that the overall latency is ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~22-~22: There might be a mistake here.
Context: ...the time complexity is effectively O(N). Since scheduling is intended to resolve ...
(QB_NEW_EN)
[grammar] ~25-~25: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance_ratio = avg_tokens / max_toke...
(QB_NEW_EN)
[grammar] ~26-~26: There might be a mistake here.
Context: ...lance_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N to...
(QB_NEW_EN)
[grammar] ~27-~27: There might be a mistake here.
Context: ...ce_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N token...
(QB_NEW_EN)
[grammar] ~30-~30: There might be a mistake here.
Context: ... the theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (...
(QB_NEW_EN)
[grammar] ~31-~31: There might be a mistake here.
Context: ...e theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (iter_t...
(QB_NEW_EN)
[grammar] ~32-~32: There might be a mistake here.
Context: ...{\infty} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_t...
(QB_NEW_EN)
[grammar] ~35-~35: There might be a mistake here.
Context: ...} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_time * tps $...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ... sol_tps = elp_time / sol_time * tps $$ Where: - iter_time_i denotes the e...
(QB_NEW_EN)
[grammar] ~77-~77: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...
(QB_NEW_EN)
[grammar] ~90-~90: There might be a mistake here.
Context: ...t a critical requirement. # Experiments ## Dataset The dataset contains 16,000 req...
(QB_NEW_EN)
[grammar] ~106-~106: There might be a mistake here.
Context: ...rameter Setting model=DeepSeekV3-NVFP4, TP=8, EP=8, WideEP=ON, max_num_tokens = ...
(QB_NEW_EN)
[grammar] ~156-~156: There might be a mistake here.
Context: ...following Figure shows the pareto curve of impact of different timeout_iters and...
(QB_NEW_EN)
[grammar] ~166-~166: There might be a mistake here.
Context: ...al1024_wideep_balance_ep_20250810_764586 [2] ADP balance: to50bw10: log_gb200_ifb...
(QB_NEW_EN)
[grammar] ~167-~167: There might be a mistake here.
Context: ...eep_adpbalance_to50_bw10_20250812_766150
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md
54-54: Images should have alternate text (alt text)
(MD045, no-alt-text)
70-70: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
79-79: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
97-97: Images should have alternate text (alt text)
(MD045, no-alt-text)
119-119: Images should have alternate text (alt text)
(MD045, no-alt-text)
128-128: Images should have alternate text (alt text)
(MD045, no-alt-text)
138-138: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
147-147: Images should have alternate text (alt text)
(MD045, no-alt-text)
160-160: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (2)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (2)
166-167: Double-check log directory names: “streamterval1024” looks like a typo.If these are literal experiment paths, keep them as-is; if they’re human-readable labels, consider correcting to “streaming_interval1024”.
Would you confirm whether the suffix is “streaming_interval1024” in your logs? If it is a typo here, I can provide a diff to fix it.
1-10: Overall: Clear motivation, solid modeling, and practical guidance.The narrative flows well from problem statement to method, with useful metrics and an actionable config snippet. The trade-offs (TTFT vs throughput) are articulated. Once the minor consistency and lint fixes land, this will be a strong addition to the tech blog.
Also applies to: 46-49, 61-68, 86-89, 152-163
a7c2bd0 to
b107017
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (3)
156-156: Capitalize “Pareto” and fix parameter name tobatching_wait_iters.-The following Figure shows the pareto curve of impact of different `timeout_iters` and `batch_waiting_iters` values on TPS. +The following figure shows the Pareto curve of the impact of different `timeout_iters` and `batching_wait_iters` values on TPS.
64-67: Unify parameter name totimeout_iters(avoid copy/paste config errors).The prose uses
time_out_iters, while examples/YAML usetimeout_iters. Please normalize totimeout_iters.-Specifically, we introduce two parameters: `time_out_iters` and `batching_wait_iters`. -- `time_out_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. +Specifically, we introduce two parameters: `timeout_iters` and `batching_wait_iters`. +- `timeout_iters`: The maximum number of steps a rank will wait when it has a context request while others do not.
135-136: Unify parameter name tobatching_wait_iters.Prose uses
batch_waiting_itersbut YAML usesbatching_wait_iters. This mismatch can cause config errors.-This is achieved by adding `timeout_iters=50` and `batch_waiting_iters=10` in `config.yaml` to balance ADP tokens through waiting. +This is achieved by adding `timeout_iters=50` and `batching_wait_iters=10` in `config.yaml` to balance ADP tokens through waiting.
🧹 Nitpick comments (12)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (12)
70-75: Specify a language for fenced blocks (MD040).Label these illustrative schedules as plain text to satisfy markdownlint and improve rendering.
-``` +```text iter_i: [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gnM] iter_i+1: [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gnM] ... iter_i+N-1: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]--- `79-84`: **Specify a language for fenced blocks (MD040).** Same as above. ```diff -``` +```text iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN] iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN] ... iter_i+N-1: [*C0*, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]--- `97-101`: **Add alt text for the image (MD045).** ```diff - <img src="./../media/combined_token_distribution.png"> + <img alt="Input/output token length distributions (combined)" src="./../media/combined_token_distribution.png">
115-116: Pluralize metric name for consistency (“avg_tokens”).Align with earlier definition and figure captions.
-tps: 25664, avg_balance_ratio=54.11%, sol_tps=39552. The curves for avg_token and balance_ratio over iterations are shown below. +tps: 25664, avg_balance_ratio=54.11%, sol_tps=39552. The curves for avg_tokens and balance_ratio over iterations are shown below.
119-123: Add alt text for the image (MD045).- <img src="./../media/si2_ei30235_plot1_token_distribution.png"> + <img alt="Avg tokens and balance_ratio by iteration (full range)" src="./../media/si2_ei30235_plot1_token_distribution.png">
128-132: Add alt text for the image (MD045).- <img src="./../media/si100_ei12000_plot1_token_distribution.png"> + <img alt="Avg tokens and balance_ratio by iteration [100, 12000]" src="./../media/si100_ei12000_plot1_token_distribution.png">
138-143: Annotate the config block as YAML (MD040) and keep keys consistent.-``` +```yaml attention_dp_config: enable_balance: true batching_wait_iters: 10 timeout_iters: 50--- `147-151`: **Add alt text for the image (MD045).** ```diff - <img src="./../media/balanced_si100_ei12000_plot1_token_distribution.png"> + <img alt="Avg tokens and balance_ratio with ADP Balance [100, 12000]" src="./../media/balanced_si100_ei12000_plot1_token_distribution.png">
160-163: Add alt text for the image (MD045).- <img src="./../media/combined_charts.png"> + <img alt="Pareto analyses across timeout_iters and batching_wait_iters (TPS vs TTFT trade-offs)" src="./../media/combined_charts.png">
5-5: Use “and” instead of “&&” in heading.Improves readability and aligns with house style.
-# Motivation && Background +# Motivation and Background
49-51: Minor grammar: “can only reduce” (word order).-One approach is to sort the requests in the request queue by `num_tokens` and then dispatch them to different ranks in a round-robin fashion, as shown in the figure below. This method balances the total number of tokens across ranks from a global perspective, and only can reduce the disparity in token numbers between different contexts when all ranks are processing context requests. +One approach is to sort the requests in the queue by `num_tokens` and then dispatch them to ranks in a round-robin fashion, as shown below. This method balances tokens across ranks from a global perspective, and can only reduce the disparity in token numbers between different contexts when all ranks are processing context requests.
22-23: Fix full-width comma and clarity.Replace the full-width comma with a standard comma and streamline the clause.
-With non-extremely-long context scenarios,we can assume that the overall latency is dominated by the non-MHA portion; hence the time complexity is effectively O(N). +In non–extremely-long context scenarios, we assume the overall latency is dominated by the non-MHA portion; hence the time complexity is effectively O(N).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (6)
-
docs/source/blogs/media/ADP_Balance.pngis excluded by!**/*.png -
docs/source/blogs/media/balanced_si100_ei12000_plot1_token_distribution.pngis excluded by!**/*.png -
docs/source/blogs/media/combined_charts.pngis excluded by!**/*.png -
docs/source/blogs/media/combined_token_distribution.pngis excluded by!**/*.png -
docs/source/blogs/media/si100_ei12000_plot1_token_distribution.pngis excluded by!**/*.png -
docs/source/blogs/media/si2_ei30235_plot1_token_distribution.pngis excluded by!**/*.png
📒 Files selected for processing (2)
-
.gitattributes(1 hunks) -
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md
[grammar] ~11-~11: There might be a mistake here.
Context: ...ks. # Theoretical Analysis and Modeling Optimization Goals: - Minimize the load ...
(QB_NEW_EN)
[grammar] ~12-~12: There might be a mistake here.
Context: ...nalysis and Modeling Optimization Goals: - Minimize the load difference between dif...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: ...ecution time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...tion time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N} iter...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ...i = \max{0 \leq m < N} iter_time_{im} $$ where
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...= \max_{0 \leq m < N} iter_time_{im} $$ where
(QB_NEW_EN)
[grammar] ~22-~22: Ensure spelling is correct
Context: ...odule). With non-extremely-long context scenarios,we can assume that the overall latency is ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~22-~22: There might be a mistake here.
Context: ...the time complexity is effectively O(N). Since scheduling is intended to resolve ...
(QB_NEW_EN)
[grammar] ~25-~25: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance_ratio = avg_tokens / max_toke...
(QB_NEW_EN)
[grammar] ~26-~26: There might be a mistake here.
Context: ...lance_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N to...
(QB_NEW_EN)
[grammar] ~27-~27: There might be a mistake here.
Context: ...ce_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N token...
(QB_NEW_EN)
[grammar] ~30-~30: There might be a mistake here.
Context: ... the theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (...
(QB_NEW_EN)
[grammar] ~31-~31: There might be a mistake here.
Context: ...e theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (iter_t...
(QB_NEW_EN)
[grammar] ~32-~32: There might be a mistake here.
Context: ...{\infty} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_t...
(QB_NEW_EN)
[grammar] ~35-~35: There might be a mistake here.
Context: ...} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_time * tps $...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ... sol_tps = elp_time / sol_time * tps $$ Where: - iter_time_i denotes the e...
(QB_NEW_EN)
[grammar] ~77-~77: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...
(QB_NEW_EN)
[grammar] ~90-~90: There might be a mistake here.
Context: ...t a critical requirement. # Experiments ## Dataset The dataset contains 16,000 req...
(QB_NEW_EN)
[grammar] ~106-~106: There might be a mistake here.
Context: ...rameter Setting model=DeepSeekV3-NVFP4, TP=8, EP=8, WideEP=ON, max_num_tokens = ...
(QB_NEW_EN)
[grammar] ~156-~156: There might be a mistake here.
Context: ...following Figure shows the pareto curve of impact of different timeout_iters and...
(QB_NEW_EN)
[grammar] ~166-~166: There might be a mistake here.
Context: ...al1024_wideep_balance_ep_20250810_764586 [2] ADP balance: to50bw10: log_gb200_ifb...
(QB_NEW_EN)
[grammar] ~167-~167: There might be a mistake here.
Context: ...eep_adpbalance_to50_bw10_20250812_766150
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md
54-54: Images should have alternate text (alt text)
(MD045, no-alt-text)
70-70: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
79-79: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
97-97: Images should have alternate text (alt text)
(MD045, no-alt-text)
119-119: Images should have alternate text (alt text)
(MD045, no-alt-text)
128-128: Images should have alternate text (alt text)
(MD045, no-alt-text)
138-138: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
147-147: Images should have alternate text (alt text)
(MD045, no-alt-text)
160-160: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (1)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (1)
166-167: Typo in log path: “streamterval” vs “streaminterval”
It looks like both entries usestreamterval1024, but you probably meantstreaminterval1024. Our repo search shows these strings only in this markdown, with no other occurrences of either spelling—so before updating, make sure that any real log folders or tags on disk (or in your storage/location naming) also use the corrected spelling to avoid breaking links.• docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md: lines 166–167 [streamterval1024 only appears here]
• Confirm the actual on-disk/log-tag spelling and update this snippet if it’s safe.
b107017 to
a0d4efc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (6)
53-59: Replace HTML images with MyST figures and add alt text to satisfy MD045 and improve a11y.Use MyST
{figure}(auto-numbering, captions) and provide:alt:. Example for Lines 53–59; please apply similarly to all image blocks listed in this comment.-<div align="center"> -<figure> - <img src="./../media/tech_blog10_baseline_round_robin_strategy.png"> -</figure> -</div> -<p align="center"><sub><em>Figure 1: Baseline: balance number of context request tokens across ranks through sort and round-robin </em></sub></p> +```{figure} ./../media/tech_blog10_baseline_round_robin_strategy.png +:alt: Baseline round-robin scheduling balances context tokens across ranks. +:align: center + +Figure 1: Baseline: balance number of context request tokens across ranks through sort and round-robin. +```Also applies to: 99-105, 128-134, 137-143, 149-155, 169-175, 186-192
31-45: Clarify summation bounds and variable naming in SOL formulas.Use a finite iteration count
Tand renameelp_timetoelapsed_timefor readability.-`sol_tps` represents the theoretical upper-bound throughput. -$$ - sol\_time = \sum_{i=0}^{\infty} (iter\_time_i / balance\_ratio) -$$ +`sol_tps` represents the theoretical upper-bound throughput. Let `T` be the total number of scheduling iterations. +$$ + sol\_time = \sum_{i=0}^{T-1} \frac{iter\_time_i}{balance\_ratio} +$$ @@ -$$ - sol\_tps = elp\_time / sol\_time * tps -$$ +$$ + sol\_tps = \frac{elapsed\_time}{sol\_time} \cdot tps +$$ @@ -- `elp_time` denotes the empirically measured end-to-end elapsed time. +- `elapsed_time` denotes the empirically measured end-to-end time.Also applies to: 33-38
7-9: Minor wording tweaks for precision.“dis-aggregated” → “disaggregated”; “Attention velocity” → “Attention throughput”.
-... dis-aggregated architecture, or for scenarios ... +... disaggregated architecture, or for scenarios ... @@ -... which slows down the Attention velocity. +... which slows down the Attention throughput.
50-61: Avoid manual figure numbering and alignment HTML; let Sphinx/MyST handle captions and numbering.Switch all figures to
{figure}blocks (see earlier example). This prevents drift when figures are added/removed.Also applies to: 144-156, 156-168, 176-184
116-121: Table terms: align column header with term used in text/code.Use “SOL TPS” or “sol_tps” consistently; pick one and use it throughout (including formulas, captions, and text).
23-30: Clarify the “dominated by the non-MHA portion; hence O(N)” sentence.This reads counterintuitively: if non-MHA dominates, attention balance wouldn’t drive end-to-end. Consider rephrasing to state assumptions precisely or drop the O(N) remark.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (7)
-
docs/source/blogs/media/tech_blog10_baseline_performance_detail.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_performance_overview.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_context_wait_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_dataset_token_distribution.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_full_strategy_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.pngis excluded by!**/*.png
📒 Files selected for processing (2)
-
.gitattributes(1 hunks) -
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
[grammar] ~13-~13: There might be a mistake here.
Context: ...ks. # Theoretical Analysis and Modeling Optimization Goals: - Minimize the load ...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ...nalysis and Modeling Optimization Goals: - Minimize the load difference between dif...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...ecution time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ...tion time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N} iter...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...i = \max{0 \leq m < N} iter_time_{im} $$ where
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...= \max_{0 \leq m < N} iter_time_{im} $$ where
(QB_NEW_EN)
[grammar] ~26-~26: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance_ratio = avg_tokens / max_toke...
(QB_NEW_EN)
[grammar] ~27-~27: There might be a mistake here.
Context: ...lance_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N to...
(QB_NEW_EN)
[grammar] ~28-~28: There might be a mistake here.
Context: ...ce_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N token...
(QB_NEW_EN)
[grammar] ~31-~31: There might be a mistake here.
Context: ... the theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (...
(QB_NEW_EN)
[grammar] ~32-~32: There might be a mistake here.
Context: ...e theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (iter_t...
(QB_NEW_EN)
[grammar] ~33-~33: There might be a mistake here.
Context: ...{\infty} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_t...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ...} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_time * tps $...
(QB_NEW_EN)
[grammar] ~37-~37: There might be a mistake here.
Context: ... sol_tps = elp_time / sol_time * tps $$ Where: - iter_time_i denotes the e...
(QB_NEW_EN)
[grammar] ~78-~78: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...
(QB_NEW_EN)
[grammar] ~116-~116: There might be a mistake here.
Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---...
(QB_NEW_EN)
[grammar] ~117-~117: There might be a mistake here.
Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 | |...
(QB_NEW_EN)
[grammar] ~118-~118: There might be a mistake here.
Context: ...| | Baseline | 25664 | 54.11% | 39552 | | ADP Balance with Context Wait | 33499 ...
(QB_NEW_EN)
[grammar] ~119-~119: There might be a mistake here.
Context: ...Context Wait | 33499 | 84.33% | 38312 | | ADP Balance with Full Strategy | 34140...
(QB_NEW_EN)
[style] ~198-~198: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative.
Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe...
(PROVE_TO_BE_WORDY)
🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
55-55: Images should have alternate text (alt text)
(MD045, no-alt-text)
71-71: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
80-80: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
101-101: Images should have alternate text (alt text)
(MD045, no-alt-text)
130-130: Images should have alternate text (alt text)
(MD045, no-alt-text)
139-139: Images should have alternate text (alt text)
(MD045, no-alt-text)
151-151: Images should have alternate text (alt text)
(MD045, no-alt-text)
160-160: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
171-171: Images should have alternate text (alt text)
(MD045, no-alt-text)
188-188: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)
160-165: Review resolved:attention_dp_configkeys are valid
The keysenable_balance,batching_wait_iters, andtimeout_itersare defined in the current release’sAttentionDpConfigschema (tensorrt_llm/llmapi/llm_args.py) and honored at runtime (in both the LLM API and PyTorch executor configurations). No changes to the blog snippet are necessary.
a0d4efc to
20c5c1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)
1-204: Add missing alt text to blog images and verify Git LFS trackingAll referenced images under docs/source/blogs/media/ are present, but none include an
altattribute, and Git LFS tracking couldn’t be confirmed in CI. To meet accessibility standards and keep large assets in LFS, please:• In docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md, update each
<img>tag to include a descriptivealt, for example:<img src="../media/tech_blog10_baseline_round_robin_strategy.png" alt="Baseline round-robin strategy distributing context tokens evenly across DP ranks">Repeat for:
– tech_blog10_baseline_round_robin_strategy.png
– tech_blog10_dataset_token_distribution.png
– tech_blog10_baseline_performance_overview.png
– tech_blog10_baseline_performance_detail.png
– tech_blog10_context_wait_performance.png
– tech_blog10_full_strategy_performance.png
– tech_blog10_tps_ttft_pareto_curve.png• Run
git lfs ls-files(or install/configure Git LFS) to ensure all large PNGs in docs/source/blogs/media/ are tracked. If any are missing, add them viagit lfs track "docs/source/blogs/media/*.png"and recommit.
♻️ Duplicate comments (8)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (8)
65-67: Unify parameter names: timeout_iters and batching_wait_iters.Use the same names as elsewhere and in the YAML.
-Specifically, we introduce two parameters: `time_out_iters` and `batching_wait_iters`. -- `time_out_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase. +Specifically, we introduce two parameters: `timeout_iters` and `batching_wait_iters`. +- `timeout_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase.
71-76: Add fence language for timeline block (MD040).-``` +```text iter_i: [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gnM] iter_i+1: [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gnM] ... iter_i+N-1: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]--- `80-85`: **Add fence language for timeline block (MD040).** ```diff -``` +```text iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN] iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN] ... iter_i+N-1: [*C0*, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]--- `158-165`: **Fix param name and add fence language to YAML block.** ```diff -With the full ADP Balance strategy (`timeout_iters=50` and `batch_waiting_iters=10`), the system throughput is further boosted. The output TPS reaches 34,140, achieving a 1.33x speedup. The overall `balance_ratio` improves to 87.70%, and the `sol_tps` of 37,912 is very close to the actual `tps` of 34,140, indicating a highly balanced system. Users can enable this feature by adding the following configuration to `config.yaml`: +With the full ADP Balance strategy (`timeout_iters=50` and `batching_wait_iters=10`), the system throughput is further boosted. The output TPS reaches 34,140, achieving a 1.33x speedup. The overall `balance_ratio` improves to 87.70%, and the `sol_tps` of 37,912 is very close to the actual `tps` of 34,140, indicating a highly balanced system. Users can enable this feature by adding the following configuration to `config.yaml`:-``` +```yaml attention_dp_config: enable_balance: true batching_wait_iters: 10 timeout_iters: 50--- `53-57`: **Add alt text and ensure LFS tracking for baseline RR figure.** Missing alt breaks MD045; this image was also flagged as not tracked by LFS earlier. Apply: ```diff - <img src="./../media/tech_blog10_baseline_round_robin_strategy.png"> + <img src="./../media/tech_blog10_baseline_round_robin_strategy.png" alt="Baseline round-robin scheduling strategy illustration">Also add to .gitattributes (outside this file):
+docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png filter=lfs diff=lfs merge=lfs -text
100-103: Add alt text and ensure LFS tracking for dataset distribution figure.Alt missing (MD045); image was previously untracked by LFS.
- <img src="./../media/tech_blog10_dataset_token_distribution.png"> + <img src="./../media/tech_blog10_dataset_token_distribution.png" alt="Input and output token length distribution histogram">.gitattributes addition:
+docs/source/blogs/media/tech_blog10_dataset_token_distribution.png filter=lfs diff=lfs merge=lfs -text
128-133: Add alt text and ensure LFS tracking for baseline overview figure.Alt missing; this file was also called out as not in LFS.
- <img src="./../media/tech_blog10_baseline_performance_overview.png"> + <img src="./../media/tech_blog10_baseline_performance_overview.png" alt="Baseline average tokens per rank and balance ratio over iterations">.gitattributes addition:
+docs/source/blogs/media/tech_blog10_baseline_performance_overview.png filter=lfs diff=lfs merge=lfs -text
186-191: Add alt text and ensure LFS tracking for Pareto curve figure.Alt missing; this image was previously untracked by LFS.
- <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png"> + <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png" alt="Pareto curve of TPS/GPU vs TTFT across timeout_iters and batching_wait_iters settings">.gitattributes addition:
+docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png filter=lfs diff=lfs merge=lfs -text
🧹 Nitpick comments (5)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (5)
110-111: Pluralize “GPU”.-The experiments were conducted using the DeepSeekV3 model on a Blackwell GB200 machine, 8 GPU, with a DP size of 8 for the Attention module and an EP size of 8 for the MoE module. +The experiments were conducted using the DeepSeekV3 model on a Blackwell GB200 machine, 8 GPUs, with a DP size of 8 for the Attention module and an EP size of 8 for the MoE module.
137-142: Add alt text for baseline detail figure (MD045).- <img src="./../media/tech_blog10_baseline_performance_detail.png"> + <img src="./../media/tech_blog10_baseline_performance_detail.png" alt="Baseline detail: tokens and balance ratio between iterations 100 and 12000">
149-154: Add alt text for context-wait figure (MD045).- <img src="./../media/tech_blog10_context_wait_performance.png"> + <img src="./../media/tech_blog10_context_wait_performance.png" alt="ADP Balance with Context Wait: tokens and balance ratio over iterations">
169-174: Add alt text for full-strategy figure (MD045).- <img src="./../media/tech_blog10_full_strategy_performance.png"> + <img src="./../media/tech_blog10_full_strategy_performance.png" alt="ADP Balance full strategy: tokens and balance ratio over iterations">
116-121: Optional: clarify table units and formatting.Consider adding units (TPS) in header and aligning thousands with commas; minor readability win.
-| Method | TPS | Avg Balance Ratio | Sol TPS | +| Method | TPS (tokens/s) | Avg Balance Ratio | Sol TPS (tokens/s) |
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (7)
-
docs/source/blogs/media/tech_blog10_baseline_performance_detail.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_performance_overview.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_context_wait_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_dataset_token_distribution.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_full_strategy_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.pngis excluded by!**/*.png
📒 Files selected for processing (2)
-
.gitattributes(1 hunks) -
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
[grammar] ~13-~13: There might be a mistake here.
Context: ...s. ## Theoretical Analysis and Modeling Optimization Goals: - Minimize the load ...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ...nalysis and Modeling Optimization Goals: - Minimize the load difference between dif...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...ecution time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ...tion time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N} iter...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...i = \max{0 \leq m < N} iter_time_{im} $$ where
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...= \max_{0 \leq m < N} iter_time_{im} $$ where
(QB_NEW_EN)
[grammar] ~26-~26: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance_ratio = avg_tokens / max_toke...
(QB_NEW_EN)
[grammar] ~27-~27: There might be a mistake here.
Context: ...lance_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N to...
(QB_NEW_EN)
[grammar] ~28-~28: There might be a mistake here.
Context: ...ce_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N token...
(QB_NEW_EN)
[grammar] ~31-~31: There might be a mistake here.
Context: ... the theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (...
(QB_NEW_EN)
[grammar] ~32-~32: There might be a mistake here.
Context: ...e theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (iter_t...
(QB_NEW_EN)
[grammar] ~33-~33: There might be a mistake here.
Context: ...{\infty} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_t...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ...} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_time * tps $...
(QB_NEW_EN)
[grammar] ~37-~37: There might be a mistake here.
Context: ... sol_tps = elp_time / sol_time * tps $$ Where: - iter_time_i denotes the e...
(QB_NEW_EN)
[grammar] ~78-~78: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...
(QB_NEW_EN)
[grammar] ~116-~116: There might be a mistake here.
Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---...
(QB_NEW_EN)
[grammar] ~117-~117: There might be a mistake here.
Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 | |...
(QB_NEW_EN)
[grammar] ~118-~118: There might be a mistake here.
Context: ...| | Baseline | 25664 | 54.11% | 39552 | | ADP Balance with Context Wait | 33499 ...
(QB_NEW_EN)
[grammar] ~119-~119: There might be a mistake here.
Context: ...Context Wait | 33499 | 84.33% | 38312 | | ADP Balance with Full Strategy | 34140...
(QB_NEW_EN)
[style] ~198-~198: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative.
Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe...
(PROVE_TO_BE_WORDY)
🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
55-55: Images should have alternate text (alt text)
(MD045, no-alt-text)
71-71: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
80-80: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
101-101: Images should have alternate text (alt text)
(MD045, no-alt-text)
130-130: Images should have alternate text (alt text)
(MD045, no-alt-text)
139-139: Images should have alternate text (alt text)
(MD045, no-alt-text)
151-151: Images should have alternate text (alt text)
(MD045, no-alt-text)
160-160: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
171-171: Images should have alternate text (alt text)
(MD045, no-alt-text)
188-188: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
20c5c1e to
c3778f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)
75-80: Label fenced code blocks with languages (MD040).Use text for timelines and yaml for config.
-``` +```text iter_i: [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gnM] iter_i+1: [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gnM] ... iter_i+N-1: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]-
+text
iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
...
iter_i+N-1: [C0, g01, ..., g0M], [C1, g11, ..., g1M], ..., [CN, gN1, ..., gNM]-``` +```yaml attention_dp_config: enable_balance: true batching_wait_iters: 10 timeout_iters: 50Also applies to: 84-89, 164-169
69-71: Unify parameter names and terminology across text, tables, and config.Use timeout_iters and batching_wait_iters everywhere; refer to “Context Wait” consistently.
-Specifically, we introduce two parameters: `time_out_iters` and `batching_wait_iters`. -- `time_out_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase. +Specifically, we introduce two parameters: `timeout_iters` and `batching_wait_iters`. +- `timeout_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase.-The experiments are divided into three groups: a baseline, ADP Balance with `ctx_wait`, and ADP Balance with both `ctx_wait` and `batching_wait`. As the results show, enabling `ctx_wait` provides a significant TPS improvement. Further enabling `batching_wait` enhances the balancing effect even more. +The experiments are divided into three groups: Baseline, ADP Balance with `timeout_iters` (“Context Wait”), and ADP Balance with both `timeout_iters` and `batching_wait_iters`. Enabling Context Wait provides a significant TPS improvement; adding `batching_wait_iters` further enhances balance.-With the full ADP Balance strategy (`timeout_iters=50` and `batch_waiting_iters=10`), the system throughput is further boosted. +With the full ADP Balance strategy (`timeout_iters=50` and `batching_wait_iters=10`), the system throughput is further boosted.Also applies to: 117-126, 160-169
57-63: Verify all referenced images exist and are tracked by Git LFS.Past checks flagged four PNGs not tracked by LFS; re-verify after recent edits.
#!/usr/bin/env bash set -euo pipefail md="docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md" media_dir="docs/source/blogs/media" # Gather image refs from HTML <img> and MyST {figure} refs=$(rg -No '<img\s+src="\K[^"]+' "$md"; rg -No '^\s*```{figure}\s+\K\S+' "$md" || true) echo "--- Existence & LFS tracking ---" while read -r ref; do f="$media_dir/$(basename "$ref")" status="MISSING" [[ -f "$f" ]] && status="FOUND" lfs="NOT_TRACKED" if git lfs ls-files --name-only | rg -q "^$(printf '%q' "$f" | sed 's/\\//\//g')$"; then lfs="TRACKED" fi printf "%-55s %s, LFS: %s\n" "$f" "$status" "$lfs" done <<< "$refs" echo echo "Patterns in .gitattributes involving png:" rg -n 'filter=lfs' .gitattributes | rg -n '\.png' -nAlso applies to: 99-109, 128-138, 148-159, 160-169, 171-179, 182-196
🧹 Nitpick comments (4)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (4)
57-61: Add alt text to all images to satisfy MD045 and improve accessibility.Provide descriptive alt text via the img alt attribute.
- <img src="./../media/tech_blog10_baseline_round_robin_strategy.png"> + <img alt="Baseline round-robin strategy balancing context tokens across ranks" src="./../media/tech_blog10_baseline_round_robin_strategy.png">- <img src="./../media/tech_blog10_dataset_token_distribution.png"> + <img alt="Token distribution of input and output lengths across 16k requests" src="./../media/tech_blog10_dataset_token_distribution.png">- <img src="./../media/tech_blog10_baseline_performance_overview.png"> + <img alt="Baseline performance overview: average tokens per rank and balance ratio over iterations" src="./../media/tech_blog10_baseline_performance_overview.png">- <img src="./../media/tech_blog10_baseline_performance_detail.png"> + <img alt="Baseline performance detail in iterations [0, 12000]: tokens and balance ratio" src="./../media/tech_blog10_baseline_performance_detail.png">- <img src="./../media/tech_blog10_context_wait_performance.png"> + <img alt="Context Wait (timeout_iters) performance: tokens and balance ratio over iterations" src="./../media/tech_blog10_context_wait_performance.png">- <img src="./../media/tech_blog10_full_strategy_performance.png"> + <img alt="Full strategy (timeout_iters + batching_wait_iters) performance: tokens and balance ratio" src="./../media/tech_blog10_full_strategy_performance.png">- <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png"> + <img alt="Pareto curves: TPS/GPU vs TTFT for different timeout_iters and batching_wait_iters" src="./../media/tech_blog10_tps_ttft_pareto_curve.png">Also applies to: 103-107, 132-136, 141-145, 153-157, 173-177, 190-194
180-180: Polish phrasing and capitalization.-Negative impact: The number of iterations increases due to the waiting mechanism, which had bad impact on tps. +Negative impact: The number of iterations increases due to the waiting mechanism, which has a negative impact on TPS.
114-115: Minor wording tweak for hardware description.-The experiments were conducted using the DeepSeekV3 model on a Blackwell GB200 machine, 8 GPU, with a DP size of 8 for the Attention module and an EP size of 8 for the MoE module. +The experiments used the DeepSeekV3 model on an NVIDIA GB200 system (8 GPUs), with DP size 8 for Attention and EP size 8 for MoE.
41-48: Consider renaming elp_time to elapsed_time for clarity.Also update the formula and definitions to match.
- sol\_tps = elp\_time / sol\_time * tps + sol\_tps = elapsed\_time / sol\_time * tps-- `elp_time` denotes the empirically measured end-to-end elapsed time. +- `elapsed_time` denotes the empirically measured end-to-end elapsed time.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (7)
-
docs/source/blogs/media/tech_blog10_baseline_performance_detail.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_performance_overview.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_context_wait_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_dataset_token_distribution.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_full_strategy_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.pngis excluded by!**/*.png
📒 Files selected for processing (2)
-
.gitattributes(1 hunks) -
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
[grammar] ~13-~13: There might be a mistake here.
Context: ...s. ## Theoretical Analysis and Modeling Optimization Goals: - Minimize the loa...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ...sis and Modeling Optimization Goals: - Minimize the load difference between dif...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...ion time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N} iter...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...i = \max{0 \leq m < N} iter_time_{im} $$ where
(QB_NEW_EN)
[grammar] ~28-~28: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance_ratio = avg_tokens / max_toke...
(QB_NEW_EN)
[grammar] ~29-~29: There might be a mistake here.
Context: ...lance_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N t...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ... theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (iter_t...
(QB_NEW_EN)
[grammar] ~37-~37: There might be a mistake here.
Context: ...{\infty} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_t...
(QB_NEW_EN)
[grammar] ~40-~40: There might be a mistake here.
Context: ...} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_time * tps $...
(QB_NEW_EN)
[grammar] ~41-~41: There might be a mistake here.
Context: ... sol_tps = elp_time / sol_time * tps $$ Where: - iter_time_i denotes the e...
(QB_NEW_EN)
[grammar] ~82-~82: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...
(QB_NEW_EN)
[grammar] ~120-~120: There might be a mistake here.
Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---...
(QB_NEW_EN)
[grammar] ~121-~121: There might be a mistake here.
Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 | |...
(QB_NEW_EN)
[grammar] ~122-~122: There might be a mistake here.
Context: ...| | Baseline | 25664 | 54.11% | 39552 | | ADP Balance with Context Wait | 33499 ...
(QB_NEW_EN)
[grammar] ~123-~123: There might be a mistake here.
Context: ...Context Wait | 33499 | 84.33% | 38312 | | ADP Balance with Full Strategy | 34140...
(QB_NEW_EN)
[style] ~202-~202: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative.
Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe...
(PROVE_TO_BE_WORDY)
🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
59-59: Images should have alternate text (alt text)
(MD045, no-alt-text)
75-75: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
84-84: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
105-105: Images should have alternate text (alt text)
(MD045, no-alt-text)
134-134: Images should have alternate text (alt text)
(MD045, no-alt-text)
143-143: Images should have alternate text (alt text)
(MD045, no-alt-text)
155-155: Images should have alternate text (alt text)
(MD045, no-alt-text)
164-164: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
175-175: Images should have alternate text (alt text)
(MD045, no-alt-text)
192-192: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
c3778f8 to
deefc99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)
118-126: Unify parameter names and terminology across text and config.Replace ctx_wait/batching_wait/batch_waiting_iters with timeout_iters/batching_wait_iters and refer to “Context Wait.” Prevent copy-paste config errors.
-The experiments are divided into three groups: a baseline, ADP Balance with `ctx_wait`, and ADP Balance with both `ctx_wait` and `batching_wait`. As the results show, enabling `ctx_wait` provides a significant TPS improvement. Further enabling `batching_wait` enhances the balancing effect even more. +The experiments are divided into three groups: Baseline, ADP Balance with `timeout_iters` (“Context Wait”), and ADP Balance with both `timeout_iters` and `batching_wait_iters`. Enabling Context Wait provides a significant TPS improvement; adding `batching_wait_iters` further enhances balance.-With the full ADP Balance strategy (`timeout_iters=50` and `batch_waiting_iters=10`), the system throughput is further boosted. +With the full ADP Balance strategy (`timeout_iters=50` and `batching_wait_iters=10`), the system throughput is further boosted.-The curves demonstrate the impact of different `timeout_iters` (TO) and `batch_waiting_iters` (BI) configurations on system performance. +The curves demonstrate the impact of different `timeout_iters` (TO) and `batching_wait_iters` (BI) configurations on system performance.-In low-load scenarios, `batch_waiting_iters` provides no significant TPS benefit while still increasing TTFT. +In low-load scenarios, `batching_wait_iters` provides no significant TPS benefit while still increasing TTFT.Also applies to: 160-168, 182-189
57-63: Track remaining images in Git LFS to avoid repo bloat.Four referenced PNGs appear not tracked by LFS yet. Add to .gitattributes and re-add via LFS.
+# .gitattributes (append) +docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png filter=lfs diff=lfs merge=lfs -text +docs/source/blogs/media/tech_blog10_dataset_token_distribution.png filter=lfs diff=lfs merge=lfs -text +docs/source/blogs/media/tech_blog10_baseline_performance_overview.png filter=lfs diff=lfs merge=lfs -text +docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png filter=lfs diff=lfs merge=lfs -textThen:
#!/bin/bash git lfs track "docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png" \ "docs/source/blogs/media/tech_blog10_dataset_token_distribution.png" \ "docs/source/blogs/media/tech_blog10_baseline_performance_overview.png" \ "docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png" git add .gitattributes \ docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png \ docs/source/blogs/media/tech_blog10_dataset_token_distribution.png \ docs/source/blogs/media/tech_blog10_baseline_performance_overview.png \ docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.pngQuick verification script:
#!/bin/bash md="docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md" rg -nPo '<img\s+src="\K[^"]+\.png' "$md" \ | sed 's#^\./\.\./#docs/source/blogs/#' \ | while read -r p; do printf "%-80s %s\n" "$p" "$(git lfs ls-files | awk '{print $NF}' | rg -q "^$p$" && echo LFS || echo NOT_LFS)" doneAlso applies to: 103-109, 132-138, 190-196
🧹 Nitpick comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)
180-181: Polish phrasing for clarity.Minor grammar and casing fix.
-Negative impact: The number of iterations increases due to the waiting mechanism, which had bad impact on tps. +Negative impact: The waiting mechanism increases the number of iterations, which negatively impacts TPS.
44-49: Naming nit: elp_time → elapsed_time for readability.Improves self-documentation of the formula.
-- `elp_time` denotes the empirically measured end-to-end elapsed time. +- `elapsed_time` denotes the empirically measured end-to-end elapsed time.- sol\\_tps = elp\\_time / sol\\_time * tps + sol\\_tps = elapsed\\_time / sol\\_time * tps
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (7)
docs/source/blogs/media/tech_blog10_baseline_performance_detail.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_baseline_performance_overview.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_context_wait_performance.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_dataset_token_distribution.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_full_strategy_performance.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.pngis excluded by!**/*.png
📒 Files selected for processing (2)
.gitattributes(1 hunks)docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
[grammar] ~13-~13: There might be a mistake here.
Context: ...s. ## Theoretical Analysis and Modeling Optimization Goals: - Minimize the loa...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ...sis and Modeling Optimization Goals: - Minimize the load difference between dif...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...ion time. The formula is as follows: $$ iter\time_i = \max{0 \leq m < N} iter...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...i = \max_{0 \leq m < N} iter\time{im} $$ where $iter\time{im}$ represents ...
(QB_NEW_EN)
[grammar] ~28-~28: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance\_ratio = avg\_tokens / max\_t...
(QB_NEW_EN)
[grammar] ~29-~29: There might be a mistake here.
Context: ...ce\_ratio = avg\_tokens / max\_tokens $$ where $avg\tokens = {\sum{i=1}^N ...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ... theoretical upper-bound throughput. $$ sol\time = \sum{i=0}^{\infty} (iter\...
(QB_NEW_EN)
[grammar] ~37-~37: There might be a mistake here.
Context: ...infty} (iter\_time_i / balance\_ratio) $$ $$ sol\_tps = elp\_time / sol...
(QB_NEW_EN)
[grammar] ~40-~40: There might be a mistake here.
Context: ...(iter\_time_i / balance\_ratio) $$ $$ sol\_tps = elp\_time / sol\_time * tp...
(QB_NEW_EN)
[grammar] ~41-~41: There might be a mistake here.
Context: ...ol\_tps = elp\_time / sol\_time * tps $$ Where: - iter_time_i denotes the e...
(QB_NEW_EN)
[grammar] ~82-~82: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...
(QB_NEW_EN)
[grammar] ~120-~120: There might be a mistake here.
Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---...
(QB_NEW_EN)
[grammar] ~121-~121: There might be a mistake here.
Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 | |...
(QB_NEW_EN)
[grammar] ~122-~122: There might be a mistake here.
Context: ...| | Baseline | 25664 | 54.11% | 39552 | | ADP Balance with Context Wait | 33499 ...
(QB_NEW_EN)
[grammar] ~123-~123: There might be a mistake here.
Context: ...Context Wait | 33499 | 84.33% | 38312 | | ADP Balance with Full Strategy | 34140...
(QB_NEW_EN)
[style] ~202-~202: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative.
Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe...
(PROVE_TO_BE_WORDY)
🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
59-59: Images should have alternate text (alt text)
(MD045, no-alt-text)
84-84: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
105-105: Images should have alternate text (alt text)
(MD045, no-alt-text)
134-134: Images should have alternate text (alt text)
(MD045, no-alt-text)
143-143: Images should have alternate text (alt text)
(MD045, no-alt-text)
155-155: Images should have alternate text (alt text)
(MD045, no-alt-text)
175-175: Images should have alternate text (alt text)
(MD045, no-alt-text)
192-192: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
deefc99 to
567a3f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)
59-60: Add alt text to all images (MD045).Provide concise, descriptive alt for accessibility and lint compliance.
- <img src="./../media/tech_blog10_baseline_round_robin_strategy.png"> + <img alt="Baseline: balance context tokens across ranks via sort + round-robin" src="./../media/tech_blog10_baseline_round_robin_strategy.png"> ... - <img src="./../media/tech_blog10_dataset_token_distribution.png"> + <img alt="Input and output token length distribution" src="./../media/tech_blog10_dataset_token_distribution.png"> ... - <img src="./../media/tech_blog10_baseline_performance_overview.png"> + <img alt="Baseline average tokens and balance ratio over iterations" src="./../media/tech_blog10_baseline_performance_overview.png"> ... - <img src="./../media/tech_blog10_baseline_performance_detail.png"> + <img alt="Baseline average tokens and balance ratio for iterations [100, 12000]" src="./../media/tech_blog10_baseline_performance_detail.png"> ... - <img src="./../media/tech_blog10_context_wait_performance.png"> + <img alt="Context Wait: average tokens and balance ratio over iterations" src="./../media/tech_blog10_context_wait_performance.png"> ... - <img src="./../media/tech_blog10_full_strategy_performance.png"> + <img alt="Full Strategy: average tokens and balance ratio over iterations" src="./../media/tech_blog10_full_strategy_performance.png"> ... - <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png"> + <img alt="Pareto curves of TPS/GPU vs TTFT" src="./../media/tech_blog10_tps_ttft_pareto_curve.png">Also applies to: 105-106, 134-135, 143-144, 155-156, 175-176, 192-193
84-89: Label the second timeline code fence (MD040).Add language hint for the illustrative block.
-``` +```text iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN] iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN] ... iter_i+N-1: [*C0*, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]--- `69-71`: **Unify parameter name: time_out_iters → timeout_iters.** Use `timeout_iters` consistently to match config and prior references. ```diff -Specifically, we introduce two parameters: `time_out_iters` and `batching_wait_iters`. -- `time_out_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase. +Specifically, we introduce two parameters: `timeout_iters` and `batching_wait_iters`. +- `timeout_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase. - `batching_wait_iters`: The maximum number of batch iterations to wait to balance the number of context batches across ranks. After the initial wait, all ranks will have context batches. However, the ranks that have waited the longest might accumulate multiple context batches, while the ranks that were just freed up might only have one. This can still lead to an imbalance in the number of context batches. By introducing an additional waiting period, newly scheduled ranks can accumulate more context batches, leading to a more balanced load distribution.
🧹 Nitpick comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)
114-115: Minor grammar: pluralize GPU.Change “8 GPU” → “8 GPUs”; tighten phrasing.
-The experiments were conducted using the DeepSeekV3 model on a Blackwell GB200 machine, 8 GPU, with a DP size of 8 for the Attention module and an EP size of 8 for the MoE module. +The experiments were conducted using the DeepSeekV3 model on a Blackwell GB200 machine, 8 GPUs, with DP size 8 for the Attention module and EP size 8 for the MoE module.
41-49: Optional: rename elp_time → elapsed_time for clarity.Improves readability of formulas and definitions.
- sol\\_tps = elp\\_time / sol\\_time * tps + sol\\_tps = elapsed\\_time / sol\\_time * tps @@ -- `elp_time` denotes the empirically measured end-to-end elapsed time. +- `elapsed_time` denotes the empirically measured end-to-end elapsed time.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (7)
docs/source/blogs/media/tech_blog10_baseline_performance_detail.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_baseline_performance_overview.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_context_wait_performance.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_dataset_token_distribution.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_full_strategy_performance.pngis excluded by!**/*.pngdocs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.pngis excluded by!**/*.png
📒 Files selected for processing (2)
.gitattributes(1 hunks)docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
[grammar] ~13-~13: There might be a mistake here.
Context: ...s. ## Theoretical Analysis and Modeling Optimization Goals: - Minimize the loa...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ...sis and Modeling Optimization Goals: - Minimize the load difference between dif...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...ion time. The formula is as follows: $$ iter\time_i = \max{0 \leq m < N} iter...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...i = \max_{0 \leq m < N} iter\time{im} $$ where $iter\time{im}$ represents ...
(QB_NEW_EN)
[grammar] ~28-~28: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance\_ratio = avg\_tokens / max\_t...
(QB_NEW_EN)
[grammar] ~29-~29: There might be a mistake here.
Context: ...ce\_ratio = avg\_tokens / max\_tokens $$ where $avg\tokens = {\sum{i=1}^N ...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ... theoretical upper-bound throughput. $$ sol\time = \sum{i=0}^{\infty} (iter\...
(QB_NEW_EN)
[grammar] ~37-~37: There might be a mistake here.
Context: ...infty} (iter\_time_i / balance\_ratio) $$ $$ sol\_tps = elp\_time / sol...
(QB_NEW_EN)
[grammar] ~40-~40: There might be a mistake here.
Context: ...(iter\_time_i / balance\_ratio) $$ $$ sol\_tps = elp\_time / sol\_time * tp...
(QB_NEW_EN)
[grammar] ~41-~41: There might be a mistake here.
Context: ...ol\_tps = elp\_time / sol\_time * tps $$ Where: - iter_time_i denotes the e...
(QB_NEW_EN)
[grammar] ~82-~82: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...
(QB_NEW_EN)
[grammar] ~120-~120: There might be a mistake here.
Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---...
(QB_NEW_EN)
[grammar] ~121-~121: There might be a mistake here.
Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 | |...
(QB_NEW_EN)
[grammar] ~122-~122: There might be a mistake here.
Context: ...| | Baseline | 25664 | 54.11% | 39552 | | ADP Balance with Context Wait | 33499 ...
(QB_NEW_EN)
[grammar] ~123-~123: There might be a mistake here.
Context: ...Context Wait | 33499 | 84.33% | 38312 | | ADP Balance with Full Strategy | 34140...
(QB_NEW_EN)
[style] ~202-~202: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative.
Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe...
(PROVE_TO_BE_WORDY)
🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
59-59: Images should have alternate text (alt text)
(MD045, no-alt-text)
84-84: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
105-105: Images should have alternate text (alt text)
(MD045, no-alt-text)
118-118: Spaces inside code span elements
(MD038, no-space-in-code)
134-134: Images should have alternate text (alt text)
(MD045, no-alt-text)
143-143: Images should have alternate text (alt text)
(MD045, no-alt-text)
155-155: Images should have alternate text (alt text)
(MD045, no-alt-text)
175-175: Images should have alternate text (alt text)
(MD045, no-alt-text)
192-192: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
ce51c72 to
db8d4a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)
1-210: Fix documentation naming, alt text, and code‐fence labels
- Consolidate parameter naming: replace all occurrences of
time_out_iterswithtimeout_itersin the prose to match the config snippet.- Add meaningful
altattributes to every<img>tag (currently missing on seven images).- Label all unlabeled fenced code blocks with the correct language (e.g.,
yaml,text).
♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)
71-73: Unify parameter name: time_out_iters → timeout_iters (consistency with later sections).Readers will copy/paste; mismatch causes confusion.
Apply:
-Specifically, we introduce two parameters: `time_out_iters` and `batching_wait_iters`. -- `time_out_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase. +Specifically, we introduce two parameters: `timeout_iters` and `batching_wait_iters`. +- `timeout_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase.
120-120: Fix inline code backticks and wording (MD038).-The experiments are divided into three groups: a baseline, ADP Balance with ``timeout_iters` (“Context Wait”), and ADP Balance with both `timeout_iters` and `batching_wait_iters`. Enabling Context Wait provides a significant TPS improvement; adding `batching_wait_iters` further enhances balance. +The experiments are divided into three groups: Baseline, ADP Balance with `timeout_iters` ("Context Wait"), and ADP Balance with both `timeout_iters` and `batching_wait_iters`. Enabling Context Wait provides a significant TPS improvement; adding `batching_wait_iters` further enhances balance.
164-171: Align naming and fence label; fix config section.Use batching_wait_iters and ```yaml (no space).
-With the full ADP Balance strategy (`timeout_iters=50` and `batch_waiting_iters=10`), the system throughput is further boosted. The output TPS reaches 34,140, achieving a 1.33x speedup. The overall `balance_ratio` improves to 87.70%, and the `sol_tps` of 37,912 is very close to the actual `tps` of 34,140, indicating a highly balanced system. Users can enable this feature by adding the following configuration to `config.yaml`: -``` yaml +With the full ADP Balance strategy (`timeout_iters=50` and `batching_wait_iters=10`), the system throughput is further boosted. The output TPS reaches 34,140, achieving a 1.33x speedup. The overall `balance_ratio` improves to 87.70%, and the `sol_tps` of 37,912 is very close to the actual `tps` of 34,140, indicating a highly balanced system. Users can enable this feature by adding the following configuration to `config.yaml`: +```yaml attention_dp_config: enable_balance: true - batching_wait_iters: 10 + batching_wait_iters: 10 timeout_iters: 50
🧹 Nitpick comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)
19-51: Optional: prefer MyST math blocks and clearer variable name.Consider MyST’s math directives for Sphinx stability and rename
elp_time→e2e_time(orelapsed_time) in text and formulas for clarity.-$$ - sol\_tps = elp\_time / sol\_time * tps -$$ +$$ + sol\_tps = e2e\_time / sol\_time \times tps +$$ @@ -- `elp_time` denotes the empirically measured end-to-end elapsed time. +- `e2e_time` denotes the empirically measured end-to-end elapsed time.If desired, convert all display math blocks to:
```{math} :label: sol_tps \mathrm{sol\_tps} = \frac{\mathrm{e2e\_time}}{\mathrm{sol\_time}} \times \mathrm{tps}--- `59-65`: **Nit: captions are good; ensure figures reference numbers match and are sequential.** Skim after edits to keep Figure N labels and references consistent. Also applies to: 103-111, 132-140, 141-149, 155-161, 173-181, 192-198 </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro **💡 Knowledge Base configuration:** - MCP integration is disabled by default for public repositories - Jira integration is disabled by default for public repositories - Linear integration is disabled by default for public repositories You can enable these sources in your CodeRabbit configuration. <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 567a3f708b7916132c009bb74e217a6808e7c179 and db8d4a95c40315519f5ebff449c174a77fba98c7. </details> <details> <summary>⛔ Files ignored due to path filters (7)</summary> * `docs/source/blogs/media/tech_blog10_baseline_performance_detail.png` is excluded by `!**/*.png` * `docs/source/blogs/media/tech_blog10_baseline_performance_overview.png` is excluded by `!**/*.png` * `docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png` is excluded by `!**/*.png` * `docs/source/blogs/media/tech_blog10_context_wait_performance.png` is excluded by `!**/*.png` * `docs/source/blogs/media/tech_blog10_dataset_token_distribution.png` is excluded by `!**/*.png` * `docs/source/blogs/media/tech_blog10_full_strategy_performance.png` is excluded by `!**/*.png` * `docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png` is excluded by `!**/*.png` </details> <details> <summary>📒 Files selected for processing (2)</summary> * `.gitattributes` (1 hunks) * `docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md` (1 hunks) </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary> * .gitattributes </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🪛 LanguageTool</summary> <details> <summary>docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md</summary> [grammar] ~13-~13: There might be a mistake here. Context: ...s. ## Theoretical Analysis and Modeling **Optimization Goals**: - Minimize the loa... (QB_NEW_EN) --- [grammar] ~14-~14: There might be a mistake here. Context: ...sis and Modeling **Optimization Goals**: - Minimize the load difference between dif... (QB_NEW_EN) --- [grammar] ~21-~21: There might be a mistake here. Context: ...ion time. The formula is as follows: $$ iter\_time_i = \max_{0 \leq m < N} iter\... (QB_NEW_EN) --- [grammar] ~22-~22: There might be a mistake here. Context: ..._i = \max_{0 \leq m < N} iter\_time_{im} $$ where $iter\_time_{im}$ represents t... (QB_NEW_EN) --- [grammar] ~30-~30: There might be a mistake here. Context: ...ol_tps`) can be computed as follows: $$ balance\_ratio = avg\_tokens / max\_toke... (QB_NEW_EN) --- [grammar] ~31-~31: There might be a mistake here. Context: ...lance\_ratio = avg\_tokens / max\_tokens $$ where $avg\_tokens = {\sum_{i=1}^N t... (QB_NEW_EN) --- [grammar] ~38-~38: There might be a mistake here. Context: ... theoretical upper-bound throughput. $$ sol\_time = \sum_{i=0}^{\infty} (iter\_t... (QB_NEW_EN) --- [grammar] ~39-~39: There might be a mistake here. Context: ...{\infty} (iter\_time_i / balance\_ratio) $$ $$ sol\_tps = elp\_time / sol\_t... (QB_NEW_EN) --- [grammar] ~42-~42: There might be a mistake here. Context: ...} (iter\_time_i / balance\_ratio) $$ $$ sol\_tps = elp\_time / sol\_time * tps $... (QB_NEW_EN) --- [grammar] ~43-~43: There might be a mistake here. Context: ... sol\_tps = elp\_time / sol\_time * tps $$ Where: - `iter_time_i` denotes the e... (QB_NEW_EN) --- [grammar] ~84-~84: There might be a mistake here. Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r... (QB_NEW_EN) --- [grammar] ~122-~122: There might be a mistake here. Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---... (QB_NEW_EN) --- [grammar] ~123-~123: There might be a mistake here. Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 | |... (QB_NEW_EN) --- [grammar] ~124-~124: There might be a mistake here. Context: ...| | Baseline | 25664 | 54.11% | 39552 | | ADP Balance with Context Wait | 33499 ... (QB_NEW_EN) --- [grammar] ~125-~125: There might be a mistake here. Context: ...Context Wait | 33499 | 84.33% | 38312 | | ADP Balance with Full Strategy | 34140... (QB_NEW_EN) --- [style] ~204-~204: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative. Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe... (PROVE_TO_BE_WORDY) </details> </details> <details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> <details> <summary>docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md</summary> 61-61: Images should have alternate text (alt text) (MD045, no-alt-text) --- 86-86: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 107-107: Images should have alternate text (alt text) (MD045, no-alt-text) --- 120-120: Spaces inside code span elements (MD038, no-space-in-code) --- 136-136: Images should have alternate text (alt text) (MD045, no-alt-text) --- 145-145: Images should have alternate text (alt text) (MD045, no-alt-text) --- 157-157: Images should have alternate text (alt text) (MD045, no-alt-text) --- 177-177: Images should have alternate text (alt text) (MD045, no-alt-text) --- 194-194: Images should have alternate text (alt text) (MD045, no-alt-text) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary> * GitHub Check: Pre-commit Check </details> <details> <summary>🔇 Additional comments (1)</summary><blockquote> <details> <summary>docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)</summary><blockquote> `1-12`: **Clear intro and motivation.** Good framing and scope for the blog; structure reads well. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
db8d4a9 to
edffceb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)
1-369: Run markdownlint and fix reported doc errors (MD040/MD045/MD051 + others).Lint (markdownlint-cli2 v0.17.2) shows multiple failures in docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (lines 1–369) and across the docs tree — actionable fixes: add alt text for images (MD045), remove/replace inline HTML with Markdown (MD033), specify fenced-code languages (MD040), correct link fragment targets (MD051), ensure blank lines around headings/lists/code fences (MD022/MD032/MD031), and wrap/shorten long lines (MD013). Re-run: npx --yes markdownlint-cli2 "/docs//*.md" and resolve all errors before merging.
♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)
150-155: Fix generation index in the second timeline (use M, not N).Indexes in the ADP timeline should mirror the Baseline block (M for generation). Also keeps narrative consistent with Figure captions and text.
Apply:
-iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN] -iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN] +iter_i: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM] +iter_i+1: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM]
341-344: Rename to batching_wait_iters for consistency with config and prior usage.The bullet uses
batch_waiting_iters; everywhere else and in YAML it’sbatching_wait_iters.- - **Low-load scenarios**: `batch_waiting_iters` provides minimal benefit while adding latency overhead + - **Low-load scenarios**: `batching_wait_iters` provides minimal benefit while adding latency overhead
100-100: Add alt text to all images (MD045, accessibility).Every
lacks alt; add concise descriptions.
- <img src="./../media/tech_blog10_baseline_round_robin_strategy.png"> + <img alt="Baseline: round-robin distribution across ranks" src="./../media/tech_blog10_baseline_round_robin_strategy.png">- <img src="./../media/tech_blog10_dataset_token_distribution.png"> + <img alt="Input and output token length distributions" src="./../media/tech_blog10_dataset_token_distribution.png">- <img src="./../media/tech_blog10_baseline_performance_overview.png"> + <img alt="Baseline: avg tokens and balance ratio over iterations" src="./../media/tech_blog10_baseline_performance_overview.png">- <img src="./../media/tech_blog10_baseline_performance_detail.png"> + <img alt="Baseline detail: balance ratio for iterations 100–12,000" src="./../media/tech_blog10_baseline_performance_detail.png">- <img src="./../media/tech_blog10_context_wait_performance.png"> + <img alt="Context Wait: improved balance stability 100–12,000" src="./../media/tech_blog10_context_wait_performance.png">- <img src="./../media/tech_blog10_full_strategy_performance.png"> + <img alt="Full Strategy: balance stability 100–12,000" src="./../media/tech_blog10_full_strategy_performance.png">- <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png"> + <img alt="Pareto curves: TPS/GPU vs TTFT across TO/BW settings" src="./../media/tech_blog10_tps_ttft_pareto_curve.png">Also applies to: 183-183, 229-229, 243-243, 264-264, 304-304, 328-328
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (7)
-
docs/source/blogs/media/tech_blog10_baseline_performance_detail.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_performance_overview.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_context_wait_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_dataset_token_distribution.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_full_strategy_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.pngis excluded by!**/*.png
📒 Files selected for processing (2)
-
.gitattributes(1 hunks) -
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
[grammar] ~5-~5: There might be a mistake here.
Context: ... TensorRT-LLM team ## Table of Contents - [ADP Balance Strategy](#adp-balance-strat...
(QB_NEW_EN)
[grammar] ~6-~6: There might be a mistake here.
Context: ...able of Contents - ADP Balance Strategy - Table of Contents ...
(QB_NEW_EN)
[grammar] ~7-~7: There might be a mistake here.
Context: ...balance-strategy) - Table of Contents - [Motivation and Background](#motivation-a...
(QB_NEW_EN)
[grammar] ~8-~8: There might be a mistake here.
Context: ...contents) - Motivation and Background - [Theoretical Analysis and Modeling](#theo...
(QB_NEW_EN)
[grammar] ~9-~9: There might be a mistake here.
Context: ...) - Theoretical Analysis and Modeling - [Mathematical Modeling](#mathematical-mod...
(QB_NEW_EN)
[grammar] ~10-~10: There might be a mistake here.
Context: ...d-modeling) - Mathematical Modeling - [Performance Metrics](#performance-metric...
(QB_NEW_EN)
[grammar] ~11-~11: There might be a mistake here.
Context: ...cal-modeling) - Performance Metrics - [Scheduling Strategies for Load Balancing...
(QB_NEW_EN)
[grammar] ~12-~12: There might be a mistake here.
Context: ...Scheduling Strategies for Load Balancing](#scheduling-strategies-for-load-balancing) - [Baseline: Round-Robin Token Distribution...
(QB_NEW_EN)
[grammar] ~13-~13: There might be a mistake here.
Context: ...Baseline: Round-Robin Token Distribution](#baseline-round-robin-token-distribution) - [ADP Balance Strategy: Coordinated Waitin...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ... Strategy: Coordinated Waiting Mechanism](#adp-balance-strategy-coordinated-waiting-mechanism) - [Performance Analysis: Baseline vs. ADP B...
(QB_NEW_EN)
[grammar] ~15-~15: There might be a mistake here.
Context: ...mance Analysis: Baseline vs. ADP Balance](#performance-analysis-baseline-vs-adp-balance) - Experiments - [Settin...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: ...aseline-vs-adp-balance) - Experiments - Setting - [Dataset Conf...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...xperiments](#experiments) - Setting - [Dataset Configuration](#dataset-configur...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ...#setting) - Dataset Configuration - [Hardware and Model Configuration](#hardw...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ... - Hardware and Model Configuration - [Performance Results](#performance-result...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...onfiguration) - Performance Results - [Performance Summary](#performance-summar...
(QB_NEW_EN)
[grammar] ~21-~21: There might be a mistake here.
Context: ...ce-results) - Performance Summary - [Baseline Performance](#baseline-performa...
(QB_NEW_EN)
[grammar] ~22-~22: There might be a mistake here.
Context: ...e-summary) - Baseline Performance - [ADP Balance with Context Wait Implementa...
(QB_NEW_EN)
[grammar] ~23-~23: There might be a mistake here.
Context: ...Balance with Context Wait Implementation](#adp-balance-with-context-wait-implementation) - [ADP Balance with Full Strategy Implement...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: ...alance with Full Strategy Implementation](#adp-balance-with-full-strategy-implementation) - [Pareto Analysis: Throughput-Latency Trad...
(QB_NEW_EN)
[grammar] ~25-~25: There might be a mistake here.
Context: ...hroughput-Latency Trade-off Optimization](#pareto-analysis-throughput-latency-trade-off-optimization) - Conclusion - [Acknowledg...
(QB_NEW_EN)
[grammar] ~26-~26: There might be a mistake here.
Context: ...-trade-off-optimization) - Conclusion - Acknowledgement ## M...
(QB_NEW_EN)
[grammar] ~33-~33: There might be a mistake here.
Context: ...ociated with disaggregated architectures - Specific workload patterns: Scenarios ...
(QB_NEW_EN)
[grammar] ~34-~34: There might be a mistake here.
Context: ...) and long output sequence lengths (OSL) - Offline inference: Batch processing en...
(QB_NEW_EN)
[grammar] ~57-~57: There might be a mistake here.
Context: ... two key metrics: #### 1. Balance Ratio The balance_ratio measures the load di...
(QB_NEW_EN)
[grammar] ~64-~64: There might be a mistake here.
Context: ...tokens}}{\text{max_tokens}} ``` where: - $\text{avg_tokens} = \frac{\sum_{i=1}^{...
(QB_NEW_EN)
[grammar] ~65-~65: There might be a mistake here.
Context: ...tokens}_i}{N}$ (average tokens per rank) - $\text{max_tokens} = \max(\text{tokens}...
(QB_NEW_EN)
[grammar] ~66-~66: There might be a mistake here.
Context: ...tokens}_i)$ (maximum tokens on any rank) -
(QB_NEW_EN)
[grammar] ~71-~71: There might be a mistake here.
Context: ...# 2. Speed-of-Light Throughput (SOL TPS) The sol_tps represents the theoretical...
(QB_NEW_EN)
[grammar] ~82-~82: There might be a mistake here.
Context: ...}} \times \text{actual_tps} ``` where: - iter_time_i: Measured execution time of iteration $...
(QB_NEW_EN)
[grammar] ~83-~83: There might be a mistake here.
Context: ...Measured execution time of iteration elapsed_time: Total empirically measured end-to-end ...
(QB_NEW_EN)
[grammar] ~84-~84: There might be a mistake here.
Context: ...cally measured end-to-end execution time - actual_tps: Observed throughput in tokens per seco...
(QB_NEW_EN)
[grammar] ~85-~85: There might be a mistake here.
Context: ...Observed throughput in tokens per second - sol_tps: Theoretical maximum throughput under p...
(QB_NEW_EN)
[grammar] ~113-~113: There might be a mistake here.
Context: ...ontext Synchronization (timeout_iters)** - Purpose: Ensures temporal alignment of...
(QB_NEW_EN)
[grammar] ~118-~118: There might be a mistake here.
Context: ...ch Equilibration (batching_wait_iters)** - Purpose: Balances the number of accumu...
(QB_NEW_EN)
[grammar] ~128-~128: There might be a mistake here.
Context: ...me(ctx)>> Generation processing time:time(gen)` - N new requests arrive sequentially over ...
(QB_NEW_EN)
[grammar] ~131-~131: There might be a mistake here.
Context: ...ver N iterations Baseline Behavior: In the traditional approach, contexts ar...
(QB_NEW_EN)
[grammar] ~143-~143: There might be a mistake here.
Context: ...(ctx)(dominated by context processing) - **Total execution time**:time(ctx) × N` ...
(QB_NEW_EN)
[grammar] ~144-~144: There might be a mistake here.
Context: ...processing) - Total execution time: time(ctx) × N - Balance ratio: `(ctx_len + (M-1) + M ×...
(QB_NEW_EN)
[grammar] ~147-~147: There might be a mistake here.
Context: ...poor balance) ADP Balance Strategy: Our method synchronizes context processi...
(QB_NEW_EN)
[grammar] ~157-~157: There might be a mistake here.
Context: ...rations, time(ctx) for final iteration - Total execution time: `time(gen) × (N-...
(QB_NEW_EN)
[grammar] ~158-~158: There might be a mistake here.
Context: ...l iteration - Total execution time: time(gen) × (N-1) + time(ctx) - Balance ratio: 1.0 (perfect balance) -...
(QB_NEW_EN)
[grammar] ~159-~159: There might be a mistake here.
Context: ...Balance ratio: 1.0 (perfect balance) - Time savings: `(time(ctx) - time(gen))...
(QB_NEW_EN)
[grammar] ~164-~164: There might be a mistake here.
Context: ...rovement** due to optimal load balancing - ✅ Maximized GPU utilization across a...
(QB_NEW_EN)
[grammar] ~165-~165: There might be a mistake here.
Context: ...mized GPU utilization** across all ranks -
(QB_NEW_EN)
[grammar] ~166-~166: There might be a mistake here.
Context: ...TFT** due to strategic waiting mechanism - 📋 Best suited for throughput-orient...
(QB_NEW_EN)
[grammar] ~167-~167: There might be a mistake here.
Context: ...ted scenarios where TTFT is not critical ## Experiments ### Setting #### Dataset C...
(QB_NEW_EN)
[grammar] ~173-~173: There might be a mistake here.
Context: ... ### Setting #### Dataset Configuration We evaluate our approach using a compreh...
(QB_NEW_EN)
[grammar] ~176-~176: There might be a mistake here.
Context: ...Request volume*: 16,000 total requests - Average input length: 803 tokens - **A...
(QB_NEW_EN)
[grammar] ~177-~177: There might be a mistake here.
Context: ...s - Average input length: 803 tokens - Average output length: 3,653 tokens - ...
(QB_NEW_EN)
[grammar] ~178-~178: There might be a mistake here.
Context: ... Average output length: 3,653 tokens - Token distribution: Figure 2 illustrat...
(QB_NEW_EN)
[grammar] ~190-~190: There might be a mistake here.
Context: .... #### Hardware and Model Configuration Infrastructure: - Platform: NVIDIA...
(QB_NEW_EN)
[grammar] ~191-~191: There might be a mistake here.
Context: ... Model Configuration Infrastructure: - Platform: NVIDIA Blackwell GB200 syste...
(QB_NEW_EN)
[grammar] ~192-~192: There might be a mistake here.
Context: ...latform**: NVIDIA Blackwell GB200 system - GPU Count: 8 × GB200 GPUs - Model:...
(QB_NEW_EN)
[grammar] ~193-~193: There might be a mistake here.
Context: ...0 system - GPU Count: 8 × GB200 GPUs - Model: DeepSeek V3 - **Parallelization...
(QB_NEW_EN)
[grammar] ~194-~194: There might be a mistake here.
Context: ... 8 × GB200 GPUs - Model: DeepSeek V3 - Parallelization Strategy: - Attentio...
(QB_NEW_EN)
[grammar] ~195-~195: There might be a mistake here.
Context: ...pSeek V3 - Parallelization Strategy: - Attention module: Data Parallel (DP) siz...
(QB_NEW_EN)
[grammar] ~196-~196: There might be a mistake here.
Context: ...tion module: Data Parallel (DP) size = 8 - MoE module: Expert Parallel (EP) size = ...
(QB_NEW_EN)
[grammar] ~204-~204: There might be a mistake here.
Context: ... 1. Baseline: Round-robin scheduling 2. ADP Balance (Context Wait): Implementi...
(QB_NEW_EN)
[grammar] ~205-~205: There might be a mistake here.
Context: ...lementing timeout_iters parameter only 3. ADP Balance (Full Strategy): Complete ...
(QB_NEW_EN)
[grammar] ~210-~210: There might be a mistake here.
Context: ... Avg Balance Ratio | SOL TPS | Speedup | |---------------|------------|----------...
(QB_NEW_EN)
[grammar] ~211-~211: There might be a mistake here.
Context: ...---------|-------------------|---------| | Baseline | 25,664 | 54.11% | 39,552 | ...
(QB_NEW_EN)
[grammar] ~212-~212: There might be a mistake here.
Context: ...ine | 25,664 | 54.11% | 39,552 | 1.00× | | ADP Balance (Context Wait) | 33,499 | ...
(QB_NEW_EN)
[grammar] ~213-~213: There might be a mistake here.
Context: ...it) | 33,499 | 84.33% | 38,312 | 1.31× | | ADP Balance (Full Strategy) | 34,140 |...
(QB_NEW_EN)
[grammar] ~216-~216: There might be a mistake here.
Context: ... 37,912 | 1.33× | Key Observations: - Context Wait alone delivers a substantia...
(QB_NEW_EN)
[grammar] ~234-~234: There might be a mistake here.
Context: ...s
(QB_NEW_EN)
[grammar] ~237-~237: There might be a mistake here.
Context: ...eration 12,000, all requests transition to generation phase, naturally reducing im...
(QB_NEW_EN)
[grammar] ~248-~248: There might be a mistake here.
Context: ...
(QB_NEW_EN)
[grammar] ~257-~257: There might be a mistake here.
Context: ...nization. Performance Achievements: - Throughput: 33,499 TPS (1.31× speedup)...
(QB_NEW_EN)
[grammar] ~258-~258: There might be a mistake here.
Context: ...Throughput**: 33,499 TPS (1.31× speedup) - Balance improvement: 84.33% average (v...
(QB_NEW_EN)
[grammar] ~259-~259: There might be a mistake here.
Context: ...**: 84.33% average (vs. 54.11% baseline) - Efficiency: Actual TPS significantly c...
(QB_NEW_EN)
[grammar] ~269-~269: There might be a mistake here.
Context: ...em>
(QB_NEW_EN)
[grammar] ~284-~284: There might be a mistake here.
Context: ...` Performance Optimization Results: - Peak throughput: 34,140 TPS (1.33× spe...
(QB_NEW_EN)
[grammar] ~285-~285: There might be a mistake here.
Context: ...throughput**: 34,140 TPS (1.33× speedup) - Optimal balance: 87.70% average balanc...
(QB_NEW_EN)
[grammar] ~286-~286: There might be a mistake here.
Context: ... balance**: 87.70% average balance ratio - Near-theoretical efficiency: Actual TP...
(QB_NEW_EN)
[grammar] ~287-~287: There might be a mistake here.
Context: ...TPS (34,140) approaches SOL TPS (37,912) - System stability: Dramatically reduced...
(QB_NEW_EN)
[grammar] ~290-~290: There might be a mistake here.
Context: ...terations Production Configuration: Users can enable the full ADP Balance st...
(QB_NEW_EN)
[grammar] ~309-~309: There might be a mistake here.
Context: ... Key Improvements Over Context Wait: - Enhanced stability: Balance ratio main...
(QB_NEW_EN)
[grammar] ~314-~314: There might be a mistake here.
Context: ...erations Implementation Trade-offs: - ✅ Maximum throughput improvement: 33...
(QB_NEW_EN)
[grammar] ~315-~315: There might be a mistake here.
Context: ...ut improvement**: 33% gain over baseline - ✅ Near-optimal load balancing: 87.70...
(QB_NEW_EN)
[grammar] ~316-~316: There might be a mistake here.
Context: ...alancing**: 87.70% average balance ratio -
(QB_NEW_EN)
[grammar] ~317-~317: There might be a mistake here.
Context: ...echanisms increase total iteration count -
(QB_NEW_EN)
[grammar] ~341-~341: There might be a mistake here.
Context: ...itiation 4. Configuration Guidance: - Low-load scenarios: `batch_waiting_ite...
(QB_NEW_EN)
[grammar] ~356-~356: There might be a mistake here.
Context: ...mbalances. Experimental Validation: Comprehensive evaluation on the DeepSeek...
(QB_NEW_EN)
[grammar] ~357-~357: There might be a mistake here.
Context: ...tes compelling performance improvements: - 33% throughput increase: From 25,664 t...
(QB_NEW_EN)
[grammar] ~358-~358: There might be a mistake here.
Context: ...ut increase**: From 25,664 to 34,140 TPS - 87% load balance achievement: Dramatic...
(QB_NEW_EN)
[grammar] ~359-~359: There might be a mistake here.
Context: ...: Dramatic improvement from 54% baseline - Near-theoretical efficiency: Actual pe...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
11-11: Link fragments should be valid
(MD051, link-fragments)
100-100: Images should have alternate text (alt text)
(MD045, no-alt-text)
183-183: Images should have alternate text (alt text)
(MD045, no-alt-text)
229-229: Images should have alternate text (alt text)
(MD045, no-alt-text)
243-243: Images should have alternate text (alt text)
(MD045, no-alt-text)
264-264: Images should have alternate text (alt text)
(MD045, no-alt-text)
304-304: Images should have alternate text (alt text)
(MD045, no-alt-text)
328-328: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)
293-298: Config snippet looks correct and consistent.Keys and naming (
enable_balance,batching_wait_iters,timeout_iters) align with text and guidance.
190-198: Naming confirmed: ‘NVIDIA Blackwell GB200’ and ‘DeepSeek V3’ align with official public branding and are used consistently across existing documentation.
edffceb to
97d6533
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)
342-344: Rename parameter to batching_wait_iters for consistency.Reintroduces an earlier naming bug.
- - **Low-load scenarios**: `batch_waiting_iters` provides minimal benefit while adding latency overhead + - **Low-load scenarios**: `batching_wait_iters` provides minimal benefit while adding latency overhead
150-155: Fix generation index in second timeline (use M, not N).Inconsistent with definitions and the final line in the same block.
-iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN] -iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN] +iter_i: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM] +iter_i+1: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM]
143-146: Correct balance ratio expression to match its definition.The current formula conflicts with balance_ratio = avg_tokens / max_tokens.
-- **Balance ratio**: `(ctx_len + (M-1) + M × (N-1)) / (N × ctx_len)` (poor balance) +- **Avg tokens per rank**: `((ctx_len + (M - 1) + M × (N - 1)) / N)` +- **Balance ratio**: `avg_tokens / ctx_len` (poor balance)
🧹 Nitpick comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)
64-67: Unify index notation with earlier zero-based definition.Earlier, ranks use m∈[0,N); here the sum uses i∈[1,N]. Align for consistency.
-- $\text{avg\_tokens} = \frac{\sum_{i=1}^{N} \text{tokens}_{i}}{N}$ (average tokens per rank) -- $\text{max\_tokens} = \max(\text{tokens}_{i})$ (maximum tokens on any rank) -- $\text{tokens}_{i}$ represents the number of tokens processed by rank $i$ +- $\text{avg\_tokens} = \frac{1}{N}\sum_{m=0}^{N-1} \text{tokens}_{m}$ (average tokens per rank) +- $\text{max\_tokens} = \max_{0 \le m < N}(\text{tokens}_{m})$ (maximum tokens on any rank) +- $\text{tokens}_{m}$ represents the number of tokens processed by rank $m$
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (7)
-
docs/source/blogs/media/tech_blog10_baseline_performance_detail.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_performance_overview.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_context_wait_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_dataset_token_distribution.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_full_strategy_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.pngis excluded by!**/*.png
📒 Files selected for processing (2)
-
.gitattributes(1 hunks) -
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
[grammar] ~5-~5: There might be a mistake here.
Context: ... TensorRT-LLM team ## Table of Contents - [ADP Balance Strategy](#adp-balance-strat...
(QB_NEW_EN)
[grammar] ~6-~6: There might be a mistake here.
Context: ...able of Contents - ADP Balance Strategy - Table of Contents ...
(QB_NEW_EN)
[grammar] ~7-~7: There might be a mistake here.
Context: ...balance-strategy) - Table of Contents - [Motivation and Background](#motivation-a...
(QB_NEW_EN)
[grammar] ~8-~8: There might be a mistake here.
Context: ...contents) - Motivation and Background - [Theoretical Analysis and Modeling](#theo...
(QB_NEW_EN)
[grammar] ~9-~9: There might be a mistake here.
Context: ...) - Theoretical Analysis and Modeling - [Mathematical Modeling](#mathematical-mod...
(QB_NEW_EN)
[grammar] ~10-~10: There might be a mistake here.
Context: ...d-modeling) - Mathematical Modeling - [Performance Metrics](#performance-metric...
(QB_NEW_EN)
[grammar] ~11-~11: There might be a mistake here.
Context: ...cal-modeling) - Performance Metrics - [Scheduling Strategies for Load Balancing...
(QB_NEW_EN)
[grammar] ~12-~12: There might be a mistake here.
Context: ...Scheduling Strategies for Load Balancing](#scheduling-strategies-for-load-balancing) - [Baseline: Round-Robin Token Distribution...
(QB_NEW_EN)
[grammar] ~13-~13: There might be a mistake here.
Context: ...Baseline: Round-Robin Token Distribution](#baseline-round-robin-token-distribution) - [ADP Balance Strategy: Coordinated Waitin...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ... Strategy: Coordinated Waiting Mechanism](#adp-balance-strategy-coordinated-waiting-mechanism) - [Performance Analysis: Baseline vs. ADP B...
(QB_NEW_EN)
[grammar] ~15-~15: There might be a mistake here.
Context: ...mance Analysis: Baseline vs. ADP Balance](#performance-analysis-baseline-vs-adp-balance) - Experiments - [Settin...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: ...aseline-vs-adp-balance) - Experiments - Setting - [Dataset Conf...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...xperiments](#experiments) - Setting - [Dataset Configuration](#dataset-configur...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ...#setting) - Dataset Configuration - [Hardware and Model Configuration](#hardw...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ... - Hardware and Model Configuration - [Performance Results](#performance-result...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...onfiguration) - Performance Results - [Performance Summary](#performance-summar...
(QB_NEW_EN)
[grammar] ~21-~21: There might be a mistake here.
Context: ...ce-results) - Performance Summary - [Baseline Performance](#baseline-performa...
(QB_NEW_EN)
[grammar] ~22-~22: There might be a mistake here.
Context: ...e-summary) - Baseline Performance - [ADP Balance with Context Wait Implementa...
(QB_NEW_EN)
[grammar] ~23-~23: There might be a mistake here.
Context: ...Balance with Context Wait Implementation](#adp-balance-with-context-wait-implementation) - [ADP Balance with Full Strategy Implement...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: ...alance with Full Strategy Implementation](#adp-balance-with-full-strategy-implementation) - [Pareto Analysis: Throughput-Latency Trad...
(QB_NEW_EN)
[grammar] ~25-~25: There might be a mistake here.
Context: ...hroughput-Latency Trade-off Optimization](#pareto-analysis-throughput-latency-trade-off-optimization) - Conclusion - [Acknowledg...
(QB_NEW_EN)
[grammar] ~26-~26: There might be a mistake here.
Context: ...-trade-off-optimization) - Conclusion - Acknowledgement ## M...
(QB_NEW_EN)
[grammar] ~33-~33: There might be a mistake here.
Context: ...ociated with disaggregated architectures - Specific workload patterns: Scenarios ...
(QB_NEW_EN)
[grammar] ~34-~34: There might be a mistake here.
Context: ...) and long output sequence lengths (OSL) - Offline inference: Batch processing en...
(QB_NEW_EN)
[grammar] ~57-~57: There might be a mistake here.
Context: ... two key metrics: #### 1. Balance Ratio The balance_ratio measures the load di...
(QB_NEW_EN)
[grammar] ~64-~64: There might be a mistake here.
Context: ...tokens}}{\text{max_tokens}} ``` where: - $\text{avg_tokens} = \frac{\sum_{i=1}^{...
(QB_NEW_EN)
[grammar] ~65-~65: There might be a mistake here.
Context: ...kens}_{i}}{N}$ (average tokens per rank) - $\text{max_tokens} = \max(\text{tokens}...
(QB_NEW_EN)
[grammar] ~66-~66: There might be a mistake here.
Context: ...kens}{i})$ (maximum tokens on any rank) - $\text{tokens}{i}$ represents the numbe...
(QB_NEW_EN)
[grammar] ~71-~71: There might be a mistake here.
Context: ...# 2. Speed-of-Light Throughput (SOL TPS) The sol_tps represents the theoretical...
(QB_NEW_EN)
[grammar] ~82-~82: There might be a mistake here.
Context: ...}} \times \text{actual_tps} ``` where: - iter_time_i: Measured execution time of iteration $...
(QB_NEW_EN)
[grammar] ~83-~83: There might be a mistake here.
Context: ...Measured execution time of iteration elapsed_time: Total empirically measured end-to-end ...
(QB_NEW_EN)
[grammar] ~84-~84: There might be a mistake here.
Context: ...cally measured end-to-end execution time - actual_tps: Observed throughput in tokens per seco...
(QB_NEW_EN)
[grammar] ~85-~85: There might be a mistake here.
Context: ...Observed throughput in tokens per second - sol_tps: Theoretical maximum throughput under p...
(QB_NEW_EN)
[grammar] ~113-~113: There might be a mistake here.
Context: ...ontext Synchronization (timeout_iters)** - Purpose: Ensures temporal alignment of...
(QB_NEW_EN)
[grammar] ~118-~118: There might be a mistake here.
Context: ...ch Equilibration (batching_wait_iters)** - Purpose: Balances the number of accumu...
(QB_NEW_EN)
[grammar] ~128-~128: There might be a mistake here.
Context: ...me(ctx)>> Generation processing time:time(gen)` - N new requests arrive sequentially over ...
(QB_NEW_EN)
[grammar] ~131-~131: There might be a mistake here.
Context: ...ver N iterations Baseline Behavior: In the traditional approach, contexts ar...
(QB_NEW_EN)
[grammar] ~143-~143: There might be a mistake here.
Context: ...(ctx)(dominated by context processing) - **Total execution time**:time(ctx) × N` ...
(QB_NEW_EN)
[grammar] ~144-~144: There might be a mistake here.
Context: ...processing) - Total execution time: time(ctx) × N - Balance ratio: `(ctx_len + (M-1) + M ×...
(QB_NEW_EN)
[grammar] ~147-~147: There might be a mistake here.
Context: ...poor balance) ADP Balance Strategy: Our method synchronizes context processi...
(QB_NEW_EN)
[grammar] ~157-~157: There might be a mistake here.
Context: ...rations, time(ctx) for final iteration - Total execution time: `time(gen) × (N-...
(QB_NEW_EN)
[grammar] ~158-~158: There might be a mistake here.
Context: ...l iteration - Total execution time: time(gen) × (N-1) + time(ctx) - Balance ratio: 1.0 (perfect balance) -...
(QB_NEW_EN)
[grammar] ~159-~159: There might be a mistake here.
Context: ...Balance ratio: 1.0 (perfect balance) - Time savings: `(time(ctx) - time(gen))...
(QB_NEW_EN)
[grammar] ~164-~164: There might be a mistake here.
Context: ...rovement** due to optimal load balancing - ✅ Maximized GPU utilization across a...
(QB_NEW_EN)
[grammar] ~165-~165: There might be a mistake here.
Context: ...mized GPU utilization** across all ranks -
(QB_NEW_EN)
[grammar] ~166-~166: There might be a mistake here.
Context: ...TFT** due to strategic waiting mechanism - 📋 Best suited for throughput-orient...
(QB_NEW_EN)
[grammar] ~167-~167: There might be a mistake here.
Context: ...ted scenarios where TTFT is not critical ## Experiments ### Setting #### Dataset C...
(QB_NEW_EN)
[grammar] ~173-~173: There might be a mistake here.
Context: ... ### Setting #### Dataset Configuration We evaluate our approach using a compreh...
(QB_NEW_EN)
[grammar] ~176-~176: There might be a mistake here.
Context: ...Request volume*: 16,000 total requests - Average input length: 803 tokens - **A...
(QB_NEW_EN)
[grammar] ~177-~177: There might be a mistake here.
Context: ...s - Average input length: 803 tokens - Average output length: 3,653 tokens - ...
(QB_NEW_EN)
[grammar] ~178-~178: There might be a mistake here.
Context: ... Average output length: 3,653 tokens - Token distribution: Figure 2 illustrat...
(QB_NEW_EN)
[grammar] ~190-~190: There might be a mistake here.
Context: .... #### Hardware and Model Configuration Infrastructure: - Platform: NVIDIA...
(QB_NEW_EN)
[grammar] ~191-~191: There might be a mistake here.
Context: ... Model Configuration Infrastructure: - Platform: NVIDIA Blackwell GB200 syste...
(QB_NEW_EN)
[grammar] ~192-~192: There might be a mistake here.
Context: ...latform**: NVIDIA Blackwell GB200 system - GPU Count: 8 × GB200 GPUs - Model:...
(QB_NEW_EN)
[grammar] ~193-~193: There might be a mistake here.
Context: ...0 system - GPU Count: 8 × GB200 GPUs - Model: DeepSeek V3 - **Parallelization...
(QB_NEW_EN)
[grammar] ~194-~194: There might be a mistake here.
Context: ... 8 × GB200 GPUs - Model: DeepSeek V3 - Parallelization Strategy: - Attentio...
(QB_NEW_EN)
[grammar] ~195-~195: There might be a mistake here.
Context: ...pSeek V3 - Parallelization Strategy: - Attention module: Data Parallel (DP) siz...
(QB_NEW_EN)
[grammar] ~196-~196: There might be a mistake here.
Context: ...tion module: Data Parallel (DP) size = 8 - MoE module: Expert Parallel (EP) size = ...
(QB_NEW_EN)
[grammar] ~204-~204: There might be a mistake here.
Context: ... 1. Baseline: Round-robin scheduling 2. ADP Balance (Context Wait): Implementi...
(QB_NEW_EN)
[grammar] ~205-~205: There might be a mistake here.
Context: ...lementing timeout_iters parameter only 3. ADP Balance (Full Strategy): Complete ...
(QB_NEW_EN)
[grammar] ~210-~210: There might be a mistake here.
Context: ... Avg Balance Ratio | SOL TPS | Speedup | |---------------|------------|----------...
(QB_NEW_EN)
[grammar] ~211-~211: There might be a mistake here.
Context: ...---------|-------------------|---------| | Baseline | 25,664 | 54.11% | 39,552 | ...
(QB_NEW_EN)
[grammar] ~212-~212: There might be a mistake here.
Context: ...ine | 25,664 | 54.11% | 39,552 | 1.00× | | ADP Balance (Context Wait) | 33,499 | ...
(QB_NEW_EN)
[grammar] ~213-~213: There might be a mistake here.
Context: ...it) | 33,499 | 84.33% | 38,312 | 1.31× | | ADP Balance (Full Strategy) | 34,140 |...
(QB_NEW_EN)
[grammar] ~216-~216: There might be a mistake here.
Context: ... 37,912 | 1.33× | Key Observations: - Context Wait alone delivers a substantia...
(QB_NEW_EN)
[grammar] ~234-~234: There might be a mistake here.
Context: ...s
(QB_NEW_EN)
[grammar] ~237-~237: There might be a mistake here.
Context: ...eration 12,000, all requests transition to generation phase, naturally reducing im...
(QB_NEW_EN)
[grammar] ~248-~248: There might be a mistake here.
Context: ...
(QB_NEW_EN)
[grammar] ~257-~257: There might be a mistake here.
Context: ...nization. Performance Achievements: - Throughput: 33,499 TPS (1.31× speedup)...
(QB_NEW_EN)
[grammar] ~258-~258: There might be a mistake here.
Context: ...Throughput**: 33,499 TPS (1.31× speedup) - Balance improvement: 84.33% average (v...
(QB_NEW_EN)
[grammar] ~259-~259: There might be a mistake here.
Context: ...**: 84.33% average (vs. 54.11% baseline) - Efficiency: Actual TPS significantly c...
(QB_NEW_EN)
[grammar] ~269-~269: There might be a mistake here.
Context: ...em>
(QB_NEW_EN)
[grammar] ~284-~284: There might be a mistake here.
Context: ...` Performance Optimization Results: - Peak throughput: 34,140 TPS (1.33× spe...
(QB_NEW_EN)
[grammar] ~285-~285: There might be a mistake here.
Context: ...throughput**: 34,140 TPS (1.33× speedup) - Optimal balance: 87.70% average balanc...
(QB_NEW_EN)
[grammar] ~286-~286: There might be a mistake here.
Context: ... balance**: 87.70% average balance ratio - Near-theoretical efficiency: Actual TP...
(QB_NEW_EN)
[grammar] ~287-~287: There might be a mistake here.
Context: ...TPS (34,140) approaches SOL TPS (37,912) - System stability: Dramatically reduced...
(QB_NEW_EN)
[grammar] ~290-~290: There might be a mistake here.
Context: ...terations Production Configuration: Users can enable the full ADP Balance st...
(QB_NEW_EN)
[grammar] ~309-~309: There might be a mistake here.
Context: ... Key Improvements Over Context Wait: - Enhanced stability: Balance ratio main...
(QB_NEW_EN)
[grammar] ~314-~314: There might be a mistake here.
Context: ...erations Implementation Trade-offs: - ✅ Maximum throughput improvement: 33...
(QB_NEW_EN)
[grammar] ~315-~315: There might be a mistake here.
Context: ...ut improvement**: 33% gain over baseline - ✅ Near-optimal load balancing: 87.70...
(QB_NEW_EN)
[grammar] ~316-~316: There might be a mistake here.
Context: ...alancing**: 87.70% average balance ratio -
(QB_NEW_EN)
[grammar] ~317-~317: There might be a mistake here.
Context: ...echanisms increase total iteration count -
(QB_NEW_EN)
[grammar] ~341-~341: There might be a mistake here.
Context: ...itiation 4. Configuration Guidance: - Low-load scenarios: `batch_waiting_ite...
(QB_NEW_EN)
[grammar] ~356-~356: There might be a mistake here.
Context: ...mbalances. Experimental Validation: Comprehensive evaluation on the DeepSeek...
(QB_NEW_EN)
[grammar] ~357-~357: There might be a mistake here.
Context: ...tes compelling performance improvements: - 33% throughput increase: From 25,664 t...
(QB_NEW_EN)
[grammar] ~358-~358: There might be a mistake here.
Context: ...ut increase**: From 25,664 to 34,140 TPS - 87% load balance achievement: Dramatic...
(QB_NEW_EN)
[grammar] ~359-~359: There might be a mistake here.
Context: ...: Dramatic improvement from 54% baseline - Near-theoretical efficiency: Actual pe...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
11-11: Link fragments should be valid
(MD051, link-fragments)
100-100: Images should have alternate text (alt text)
(MD045, no-alt-text)
183-183: Images should have alternate text (alt text)
(MD045, no-alt-text)
229-229: Images should have alternate text (alt text)
(MD045, no-alt-text)
243-243: Images should have alternate text (alt text)
(MD045, no-alt-text)
264-264: Images should have alternate text (alt text)
(MD045, no-alt-text)
304-304: Images should have alternate text (alt text)
(MD045, no-alt-text)
328-328: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)
74-80: Double-check SOL TPS formula clarity and units.Consider adding a brief derivation to justify
sol_tps = (elapsed_time/sol_time) × actual_tpsand define what “elapsed_time” and “actual_tps” measure over (same token budget/time window).
97d6533 to
8e5859b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (10)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (10)
150-155: Correct generation index in ADP timeline (use M, not N).Keeps indices consistent with the legend and prior timeline.
-iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN] -iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN] +iter_i: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM] +iter_i+1: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM]
341-345: Rename parameter to batching_wait_iters for consistency.Fixes mismatch with earlier sections and YAML.
- - **Low-load scenarios**: `batch_waiting_iters` provides minimal benefit while adding latency overhead + - **Low-load scenarios**: `batching_wait_iters` provides minimal benefit while adding latency overhead
262-268: Add alt text to Figure 5 image for accessibility (MD045).- <img src="./../media/tech_blog10_context_wait_performance.png"> + <img alt="Context Wait: improved per-iteration balance ratio stability (100–12,000)" src="./../media/tech_blog10_context_wait_performance.png">
100-104: Add alt text to Figure 1 image for accessibility (MD045).- <img src="./../media/tech_blog10_baseline_round_robin_strategy.png"> + <img alt="Baseline: sort then round-robin distribution of context tokens across ranks" src="./../media/tech_blog10_baseline_round_robin_strategy.png">
302-308: Add alt text to Figure 6 image for accessibility (MD045).- <img src="./../media/tech_blog10_full_strategy_performance.png"> + <img alt="Full Strategy: highest balance stability and throughput (100–12,000)" src="./../media/tech_blog10_full_strategy_performance.png">
227-233: Add alt text to Figure 3 image for accessibility (MD045).- <img src="./../media/tech_blog10_baseline_performance_overview.png"> + <img alt="Baseline: per-iteration average tokens per rank (top) and balance ratio (bottom)" src="./../media/tech_blog10_baseline_performance_overview.png">
326-332: Add alt text to Figure 7 image for accessibility (MD045).- <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png"> + <img alt="Pareto curves: TPS/GPU vs TTFT under different timeout and batching-wait settings" src="./../media/tech_blog10_tps_ttft_pareto_curve.png">
55-61: Add missing "Performance Metrics" heading to fix broken ToC anchor (MD051).The ToC links to #performance-metrics but there’s no matching heading.
To quantify load balance and theoretical performance bounds, we define two key metrics: + +### Performance Metrics
241-247: Add alt text to Figure 4 image for accessibility (MD045).- <img src="./../media/tech_blog10_baseline_performance_detail.png"> + <img alt="Baseline detail: iterations 100–12,000 average tokens and balance ratio" src="./../media/tech_blog10_baseline_performance_detail.png">
181-186: Add alt text to Figure 2 image for accessibility (MD045).- <img src="./../media/tech_blog10_dataset_token_distribution.png"> + <img alt="Histogram of input and output token lengths across 16k requests" src="./../media/tech_blog10_dataset_token_distribution.png">
🧹 Nitpick comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)
31-39: Standardize ADP/DP terminology and define abbreviations on first use.Use “Attention Data Parallel (ADP)” consistently; expand TTFT/TPOT on first mention.
-In DeepSeek MLA + MoE architectures under maximum-throughput scenarios, an Attention Data Parallel (DP) + MoE Expert Parallel (EP) strategy is commonly employed to eliminate redundant KV cache storage, and utilize disaggregated serving to prevent Attention Data Parallel (ADP) imbalances. However, certain deployment scenarios still favor In-Flight Batching (IFB) inference, including: +In DeepSeek MLA + MoE architectures under maximum-throughput scenarios, an Attention Data Parallel (ADP) + MoE Expert Parallel (EP) strategy is commonly employed to eliminate redundant KV cache storage, and utilize disaggregated serving to prevent ADP imbalances. However, certain deployment scenarios still favor In-Flight Batching (IFB) inference, including: @@ -- **Offline inference**: Batch processing environments where TTFT and TPOT are more relaxed +- **Offline inference**: Batch processing environments where Time-To-First-Token (TTFT) and Time-To-Output-Token (TPOT) requirements are more relaxed
3-4: Minor grammar nit: add article.-By NVIDIA TensorRT-LLM team +By the NVIDIA TensorRT-LLM team
143-146: Clarify balance ratio formula and assumptions (lines 143–146)
- Replace the raw expression with the average-workload definition:
Balance ratio = avg_tokens / ctx_len, where
avg_tokens = (ctx_len + M × (N − 1) + (M − 1)) / N- Define variables:
• ctx_len = number of context tokens
• M = generation tokens per iteration
• N = number of ranks- Note assumption: one rank processes the full context while the others handle generation tokens
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (7)
-
docs/source/blogs/media/tech_blog10_baseline_performance_detail.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_performance_overview.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_context_wait_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_dataset_token_distribution.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_full_strategy_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.pngis excluded by!**/*.png
📒 Files selected for processing (2)
-
.gitattributes(1 hunks) -
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
[grammar] ~5-~5: There might be a mistake here.
Context: ... TensorRT-LLM team ## Table of Contents - [ADP Balance Strategy](#adp-balance-strat...
(QB_NEW_EN)
[grammar] ~6-~6: There might be a mistake here.
Context: ...able of Contents - ADP Balance Strategy - Table of Contents ...
(QB_NEW_EN)
[grammar] ~7-~7: There might be a mistake here.
Context: ...balance-strategy) - Table of Contents - [Motivation and Background](#motivation-a...
(QB_NEW_EN)
[grammar] ~8-~8: There might be a mistake here.
Context: ...contents) - Motivation and Background - [Theoretical Analysis and Modeling](#theo...
(QB_NEW_EN)
[grammar] ~9-~9: There might be a mistake here.
Context: ...) - Theoretical Analysis and Modeling - [Mathematical Modeling](#mathematical-mod...
(QB_NEW_EN)
[grammar] ~10-~10: There might be a mistake here.
Context: ...d-modeling) - Mathematical Modeling - [Performance Metrics](#performance-metric...
(QB_NEW_EN)
[grammar] ~11-~11: There might be a mistake here.
Context: ...cal-modeling) - Performance Metrics - [Scheduling Strategies for Load Balancing...
(QB_NEW_EN)
[grammar] ~12-~12: There might be a mistake here.
Context: ...Scheduling Strategies for Load Balancing](#scheduling-strategies-for-load-balancing) - [Baseline: Round-Robin Token Distribution...
(QB_NEW_EN)
[grammar] ~13-~13: There might be a mistake here.
Context: ...Baseline: Round-Robin Token Distribution](#baseline-round-robin-token-distribution) - [ADP Balance Strategy: Coordinated Waitin...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ... Strategy: Coordinated Waiting Mechanism](#adp-balance-strategy-coordinated-waiting-mechanism) - [Performance Analysis: Baseline vs. ADP B...
(QB_NEW_EN)
[grammar] ~15-~15: There might be a mistake here.
Context: ...mance Analysis: Baseline vs. ADP Balance](#performance-analysis-baseline-vs-adp-balance) - Experiments - [Settin...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: ...aseline-vs-adp-balance) - Experiments - Setting - [Dataset Conf...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...xperiments](#experiments) - Setting - [Dataset Configuration](#dataset-configur...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ...#setting) - Dataset Configuration - [Hardware and Model Configuration](#hardw...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ... - Hardware and Model Configuration - [Performance Results](#performance-result...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...onfiguration) - Performance Results - [Performance Summary](#performance-summar...
(QB_NEW_EN)
[grammar] ~21-~21: There might be a mistake here.
Context: ...ce-results) - Performance Summary - [Baseline Performance](#baseline-performa...
(QB_NEW_EN)
[grammar] ~22-~22: There might be a mistake here.
Context: ...e-summary) - Baseline Performance - [ADP Balance with Context Wait Implementa...
(QB_NEW_EN)
[grammar] ~23-~23: There might be a mistake here.
Context: ...Balance with Context Wait Implementation](#adp-balance-with-context-wait-implementation) - [ADP Balance with Full Strategy Implement...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: ...alance with Full Strategy Implementation](#adp-balance-with-full-strategy-implementation) - [Pareto Analysis: Throughput-Latency Trad...
(QB_NEW_EN)
[grammar] ~25-~25: There might be a mistake here.
Context: ...hroughput-Latency Trade-off Optimization](#pareto-analysis-throughput-latency-trade-off-optimization) - Conclusion - [Acknowledg...
(QB_NEW_EN)
[grammar] ~26-~26: There might be a mistake here.
Context: ...-trade-off-optimization) - Conclusion - Acknowledgement ## M...
(QB_NEW_EN)
[grammar] ~33-~33: There might be a mistake here.
Context: ...ociated with disaggregated architectures - Specific workload patterns: Scenarios ...
(QB_NEW_EN)
[grammar] ~34-~34: There might be a mistake here.
Context: ...) and long output sequence lengths (OSL) - Offline inference: Batch processing en...
(QB_NEW_EN)
[grammar] ~57-~57: There might be a mistake here.
Context: ... two key metrics: #### 1. Balance Ratio The balance_ratio measures the load di...
(QB_NEW_EN)
[grammar] ~64-~64: There might be a mistake here.
Context: ...ac{avg_tokens}{max_tokens} ``` where: -
(QB_NEW_EN)
[grammar] ~71-~71: There might be a mistake here.
Context: ...# 2. Speed-of-Light Throughput (SOL TPS) The
(QB_NEW_EN)
[grammar] ~82-~82: There might be a mistake here.
Context: ...ol_time} \times actual_tps ``` where: -
(QB_NEW_EN)
[grammar] ~83-~83: There might be a mistake here.
Context: ...Measured execution time of iteration
(QB_NEW_EN)
[grammar] ~84-~84: There might be a mistake here.
Context: ...cally measured end-to-end execution time -
(QB_NEW_EN)
[grammar] ~85-~85: There might be a mistake here.
Context: ...Observed throughput in tokens per second -
(QB_NEW_EN)
[grammar] ~113-~113: There might be a mistake here.
Context: ...ontext Synchronization (timeout_iters)** - Purpose: Ensures temporal alignment of...
(QB_NEW_EN)
[grammar] ~118-~118: There might be a mistake here.
Context: ...ch Equilibration (batching_wait_iters)** - Purpose: Balances the number of accumu...
(QB_NEW_EN)
[grammar] ~128-~128: There might be a mistake here.
Context: ...me(ctx)>> Generation processing time:time(gen)` - N new requests arrive sequentially over ...
(QB_NEW_EN)
[grammar] ~131-~131: There might be a mistake here.
Context: ...ver N iterations Baseline Behavior: In the traditional approach, contexts ar...
(QB_NEW_EN)
[grammar] ~143-~143: There might be a mistake here.
Context: ...(ctx)(dominated by context processing) - **Total execution time**:time(ctx) × N` ...
(QB_NEW_EN)
[grammar] ~144-~144: There might be a mistake here.
Context: ...processing) - Total execution time: time(ctx) × N - Balance ratio: `(ctx_len + (M-1) + M ×...
(QB_NEW_EN)
[grammar] ~147-~147: There might be a mistake here.
Context: ...poor balance) ADP Balance Strategy: Our method synchronizes context processi...
(QB_NEW_EN)
[grammar] ~157-~157: There might be a mistake here.
Context: ...rations, time(ctx) for final iteration - Total execution time: `time(gen) × (N-...
(QB_NEW_EN)
[grammar] ~158-~158: There might be a mistake here.
Context: ...l iteration - Total execution time: time(gen) × (N-1) + time(ctx) - Balance ratio: 1.0 (perfect balance) -...
(QB_NEW_EN)
[grammar] ~159-~159: There might be a mistake here.
Context: ...Balance ratio: 1.0 (perfect balance) - Time savings: `(time(ctx) - time(gen))...
(QB_NEW_EN)
[grammar] ~164-~164: There might be a mistake here.
Context: ...rovement** due to optimal load balancing - ✅ Maximized GPU utilization across a...
(QB_NEW_EN)
[grammar] ~165-~165: There might be a mistake here.
Context: ...mized GPU utilization** across all ranks -
(QB_NEW_EN)
[grammar] ~166-~166: There might be a mistake here.
Context: ...TFT** due to strategic waiting mechanism - 📋 Best suited for throughput-orient...
(QB_NEW_EN)
[grammar] ~167-~167: There might be a mistake here.
Context: ...ted scenarios where TTFT is not critical ## Experiments ### Setting #### Dataset C...
(QB_NEW_EN)
[grammar] ~173-~173: There might be a mistake here.
Context: ... ### Setting #### Dataset Configuration We evaluate our approach using a compreh...
(QB_NEW_EN)
[grammar] ~176-~176: There might be a mistake here.
Context: ...Request volume*: 16,000 total requests - Average input length: 803 tokens - **A...
(QB_NEW_EN)
[grammar] ~177-~177: There might be a mistake here.
Context: ...s - Average input length: 803 tokens - Average output length: 3,653 tokens - ...
(QB_NEW_EN)
[grammar] ~178-~178: There might be a mistake here.
Context: ... Average output length: 3,653 tokens - Token distribution: Figure 2 illustrat...
(QB_NEW_EN)
[grammar] ~190-~190: There might be a mistake here.
Context: .... #### Hardware and Model Configuration Infrastructure: - Platform: NVIDIA...
(QB_NEW_EN)
[grammar] ~191-~191: There might be a mistake here.
Context: ... Model Configuration Infrastructure: - Platform: NVIDIA Blackwell GB200 syste...
(QB_NEW_EN)
[grammar] ~192-~192: There might be a mistake here.
Context: ...latform**: NVIDIA Blackwell GB200 system - GPU Count: 8 × GB200 GPUs - Model:...
(QB_NEW_EN)
[grammar] ~193-~193: There might be a mistake here.
Context: ...0 system - GPU Count: 8 × GB200 GPUs - Model: DeepSeek V3 - **Parallelization...
(QB_NEW_EN)
[grammar] ~194-~194: There might be a mistake here.
Context: ... 8 × GB200 GPUs - Model: DeepSeek V3 - Parallelization Strategy: - Attentio...
(QB_NEW_EN)
[grammar] ~195-~195: There might be a mistake here.
Context: ...pSeek V3 - Parallelization Strategy: - Attention module: Data Parallel (DP) siz...
(QB_NEW_EN)
[grammar] ~196-~196: There might be a mistake here.
Context: ...tion module: Data Parallel (DP) size = 8 - MoE module: Expert Parallel (EP) size = ...
(QB_NEW_EN)
[grammar] ~204-~204: There might be a mistake here.
Context: ... 1. Baseline: Round-robin scheduling 2. ADP Balance (Context Wait): Implementi...
(QB_NEW_EN)
[grammar] ~205-~205: There might be a mistake here.
Context: ...lementing timeout_iters parameter only 3. ADP Balance (Full Strategy): Complete ...
(QB_NEW_EN)
[grammar] ~210-~210: There might be a mistake here.
Context: ... Avg Balance Ratio | SOL TPS | Speedup | |---------------|------------|----------...
(QB_NEW_EN)
[grammar] ~211-~211: There might be a mistake here.
Context: ...---------|-------------------|---------| | Baseline | 25,664 | 54.11% | 39,552 | ...
(QB_NEW_EN)
[grammar] ~212-~212: There might be a mistake here.
Context: ...ine | 25,664 | 54.11% | 39,552 | 1.00× | | ADP Balance (Context Wait) | 33,499 | ...
(QB_NEW_EN)
[grammar] ~213-~213: There might be a mistake here.
Context: ...it) | 33,499 | 84.33% | 38,312 | 1.31× | | ADP Balance (Full Strategy) | 34,140 |...
(QB_NEW_EN)
[grammar] ~216-~216: There might be a mistake here.
Context: ... 37,912 | 1.33× | Key Observations: - Context Wait alone delivers a substantia...
(QB_NEW_EN)
[grammar] ~234-~234: There might be a mistake here.
Context: ...s
(QB_NEW_EN)
[grammar] ~237-~237: There might be a mistake here.
Context: ...eration 12,000, all requests transition to generation phase, naturally reducing im...
(QB_NEW_EN)
[grammar] ~248-~248: There might be a mistake here.
Context: ...
(QB_NEW_EN)
[grammar] ~257-~257: There might be a mistake here.
Context: ...nization. Performance Achievements: - Throughput: 33,499 TPS (1.31× speedup)...
(QB_NEW_EN)
[grammar] ~258-~258: There might be a mistake here.
Context: ...Throughput**: 33,499 TPS (1.31× speedup) - Balance improvement: 84.33% average (v...
(QB_NEW_EN)
[grammar] ~259-~259: There might be a mistake here.
Context: ...**: 84.33% average (vs. 54.11% baseline) - Efficiency: Actual TPS significantly c...
(QB_NEW_EN)
[grammar] ~269-~269: There might be a mistake here.
Context: ...em>
(QB_NEW_EN)
[grammar] ~284-~284: There might be a mistake here.
Context: ...` Performance Optimization Results: - Peak throughput: 34,140 TPS (1.33× spe...
(QB_NEW_EN)
[grammar] ~285-~285: There might be a mistake here.
Context: ...throughput**: 34,140 TPS (1.33× speedup) - Optimal balance: 87.70% average balanc...
(QB_NEW_EN)
[grammar] ~286-~286: There might be a mistake here.
Context: ... balance**: 87.70% average balance ratio - Near-theoretical efficiency: Actual TP...
(QB_NEW_EN)
[grammar] ~287-~287: There might be a mistake here.
Context: ...TPS (34,140) approaches SOL TPS (37,912) - System stability: Dramatically reduced...
(QB_NEW_EN)
[grammar] ~290-~290: There might be a mistake here.
Context: ...terations Production Configuration: Users can enable the full ADP Balance st...
(QB_NEW_EN)
[grammar] ~309-~309: There might be a mistake here.
Context: ... Key Improvements Over Context Wait: - Enhanced stability: Balance ratio main...
(QB_NEW_EN)
[grammar] ~314-~314: There might be a mistake here.
Context: ...erations Implementation Trade-offs: - ✅ Maximum throughput improvement: 33...
(QB_NEW_EN)
[grammar] ~315-~315: There might be a mistake here.
Context: ...ut improvement**: 33% gain over baseline - ✅ Near-optimal load balancing: 87.70...
(QB_NEW_EN)
[grammar] ~316-~316: There might be a mistake here.
Context: ...alancing**: 87.70% average balance ratio -
(QB_NEW_EN)
[grammar] ~317-~317: There might be a mistake here.
Context: ...echanisms increase total iteration count -
(QB_NEW_EN)
[grammar] ~341-~341: There might be a mistake here.
Context: ...itiation 4. Configuration Guidance: - Low-load scenarios: `batch_waiting_ite...
(QB_NEW_EN)
[grammar] ~356-~356: There might be a mistake here.
Context: ...mbalances. Experimental Validation: Comprehensive evaluation on the DeepSeek...
(QB_NEW_EN)
[grammar] ~357-~357: There might be a mistake here.
Context: ...tes compelling performance improvements: - 33% throughput increase: From 25,664 t...
(QB_NEW_EN)
[grammar] ~358-~358: There might be a mistake here.
Context: ...ut increase**: From 25,664 to 34,140 TPS - 87% load balance achievement: Dramatic...
(QB_NEW_EN)
[grammar] ~359-~359: There might be a mistake here.
Context: ...: Dramatic improvement from 54% baseline - Near-theoretical efficiency: Actual pe...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
11-11: Link fragments should be valid
(MD051, link-fragments)
100-100: Images should have alternate text (alt text)
(MD045, no-alt-text)
183-183: Images should have alternate text (alt text)
(MD045, no-alt-text)
229-229: Images should have alternate text (alt text)
(MD045, no-alt-text)
243-243: Images should have alternate text (alt text)
(MD045, no-alt-text)
264-264: Images should have alternate text (alt text)
(MD045, no-alt-text)
304-304: Images should have alternate text (alt text)
(MD045, no-alt-text)
328-328: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
8e5859b to
6bd367f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)
99-99: Add missing alt text for all images (MD045, accessibility).Screen readers and markdownlint require alt attributes. Add concise, descriptive alt for each figure.
- <img src="./../media/tech_blog10_baseline_round_robin_strategy.png"> + <img alt="Baseline: round-robin distributes context tokens across ranks" src="./../media/tech_blog10_baseline_round_robin_strategy.png">- <img src="./../media/tech_blog10_dataset_token_distribution.png"> + <img alt="Distribution of input and output token lengths" src="./../media/tech_blog10_dataset_token_distribution.png">- <img src="./../media/tech_blog10_baseline_performance_overview.png"> + <img alt="Baseline: average tokens and balance ratio over iterations" src="./../media/tech_blog10_baseline_performance_overview.png">- <img src="./../media/tech_blog10_baseline_performance_detail.png"> + <img alt="Baseline detail: balance ratio for iterations 100–12,000" src="./../media/tech_blog10_baseline_performance_detail.png">- <img src="./../media/tech_blog10_context_wait_performance.png"> + <img alt="Context Wait: improved balance ratio over iterations" src="./../media/tech_blog10_context_wait_performance.png">- <img src="./../media/tech_blog10_full_strategy_performance.png"> + <img alt="Full strategy: near-optimal balance across iterations" src="./../media/tech_blog10_full_strategy_performance.png">- <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png"> + <img alt="Pareto frontier: TPS/GPU vs TTFT across parameter settings" src="./../media/tech_blog10_tps_ttft_pareto_curve.png">Also applies to: 182-182, 228-228, 242-242, 263-263, 303-303, 327-327
🧹 Nitpick comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)
73-79: Clarify SOL TPS equation; decouple from actual_tps for correctness and readability.Express SOL TPS as tokens processed divided by idealized time; keep the measured token count explicit.
-```math -sol\_time = \sum_{i=0}^{\infty} \frac{time_i}{balance\_ratio_i} -``` - -```math -sol\_tps = \frac{elapsed\_time}{sol\_time} \times actual\_tps -``` +```math +sol\_time = \sum_{i=0}^{T-1} \frac{time_i}{balance\_ratio_i} +``` + +```math +total\_tokens = actual\_tps \times elapsed\_time +``` + +```math +sol\_tps = \frac{total\_tokens}{sol\_time} +```Minimal text tweak after the equations:
-where: +where: - $time_i$: Measured execution time of iteration $i$ - - $elapsed\_time$: Total empirically measured end-to-end execution time - - $actual\_tps$: Observed throughput in tokens per second - - $sol\_tps$: Theoretical maximum throughput under perfect load balance + - $elapsed\_time$: Total measured end-to-end execution time + - $actual\_tps$: Observed throughput (tokens/second) + - $total\_tokens$: Total tokens produced during $elapsed\_time$ + - $sol\_tps$: Theoretical throughput under perfect load balanceAlso applies to: 81-86
3-3: Nit: Add article for smoother prose.Change to “By the NVIDIA TensorRT-LLM team.”
-By NVIDIA TensorRT-LLM team +By the NVIDIA TensorRT-LLM team
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (7)
-
docs/source/blogs/media/tech_blog10_baseline_performance_detail.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_performance_overview.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_context_wait_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_dataset_token_distribution.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_full_strategy_performance.pngis excluded by!**/*.png -
docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.pngis excluded by!**/*.png
📒 Files selected for processing (2)
-
.gitattributes(1 hunks) -
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
[grammar] ~5-~5: There might be a mistake here.
Context: ... TensorRT-LLM team ## Table of Contents - [ADP Balance Strategy](#adp-balance-strat...
(QB_NEW_EN)
[grammar] ~6-~6: There might be a mistake here.
Context: ...able of Contents - ADP Balance Strategy - Table of Contents ...
(QB_NEW_EN)
[grammar] ~7-~7: There might be a mistake here.
Context: ...balance-strategy) - Table of Contents - [Motivation and Background](#motivation-a...
(QB_NEW_EN)
[grammar] ~8-~8: There might be a mistake here.
Context: ...contents) - Motivation and Background - [Theoretical Analysis and Modeling](#theo...
(QB_NEW_EN)
[grammar] ~9-~9: There might be a mistake here.
Context: ...) - Theoretical Analysis and Modeling - [Mathematical Modeling](#mathematical-mod...
(QB_NEW_EN)
[grammar] ~10-~10: There might be a mistake here.
Context: ...d-modeling) - Mathematical Modeling - [Scheduling Strategies for Load Balancing...
(QB_NEW_EN)
[grammar] ~11-~11: There might be a mistake here.
Context: ...Scheduling Strategies for Load Balancing](#scheduling-strategies-for-load-balancing) - [Baseline: Round-Robin Token Distribution...
(QB_NEW_EN)
[grammar] ~12-~12: There might be a mistake here.
Context: ...Baseline: Round-Robin Token Distribution](#baseline-round-robin-token-distribution) - [ADP Balance Strategy: Coordinated Waitin...
(QB_NEW_EN)
[grammar] ~13-~13: There might be a mistake here.
Context: ... Strategy: Coordinated Waiting Mechanism](#adp-balance-strategy-coordinated-waiting-mechanism) - [Performance Analysis: Baseline vs. ADP B...
(QB_NEW_EN)
[grammar] ~14-~14: There might be a mistake here.
Context: ...mance Analysis: Baseline vs. ADP Balance](#performance-analysis-baseline-vs-adp-balance) - Experiments - [Settin...
(QB_NEW_EN)
[grammar] ~15-~15: There might be a mistake here.
Context: ...aseline-vs-adp-balance) - Experiments - Setting - [Dataset Conf...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: ...xperiments](#experiments) - Setting - [Dataset Configuration](#dataset-configur...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...#setting) - Dataset Configuration - [Hardware and Model Configuration](#hardw...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ... - Hardware and Model Configuration - [Performance Results](#performance-result...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...onfiguration) - Performance Results - [Performance Summary](#performance-summar...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...ce-results) - Performance Summary - [Baseline Performance](#baseline-performa...
(QB_NEW_EN)
[grammar] ~21-~21: There might be a mistake here.
Context: ...e-summary) - Baseline Performance - [ADP Balance with Context Wait Implementa...
(QB_NEW_EN)
[grammar] ~22-~22: There might be a mistake here.
Context: ...Balance with Context Wait Implementation](#adp-balance-with-context-wait-implementation) - [ADP Balance with Full Strategy Implement...
(QB_NEW_EN)
[grammar] ~23-~23: There might be a mistake here.
Context: ...alance with Full Strategy Implementation](#adp-balance-with-full-strategy-implementation) - [Pareto Analysis: Throughput-Latency Trad...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: ...hroughput-Latency Trade-off Optimization](#pareto-analysis-throughput-latency-trade-off-optimization) - Conclusion - [Acknowledg...
(QB_NEW_EN)
[grammar] ~25-~25: There might be a mistake here.
Context: ...-trade-off-optimization) - Conclusion - Acknowledgement ## M...
(QB_NEW_EN)
[grammar] ~32-~32: There might be a mistake here.
Context: ...ociated with disaggregated architectures - Specific workload patterns: Scenarios ...
(QB_NEW_EN)
[grammar] ~33-~33: There might be a mistake here.
Context: ...) and long output sequence lengths (OSL) - Offline inference: Batch processing en...
(QB_NEW_EN)
[grammar] ~56-~56: There might be a mistake here.
Context: ... two key metrics: #### 1. Balance Ratio The
(QB_NEW_EN)
[grammar] ~63-~63: There might be a mistake here.
Context: ...ac{avg_tokens}{max_tokens} ``` where: -
(QB_NEW_EN)
[grammar] ~70-~70: There might be a mistake here.
Context: ...# 2. Speed-of-Light Throughput (SOL TPS) The
(QB_NEW_EN)
[grammar] ~81-~81: There might be a mistake here.
Context: ...ol_time} \times actual_tps ``` where: -
(QB_NEW_EN)
[grammar] ~82-~82: There might be a mistake here.
Context: ...Measured execution time of iteration
(QB_NEW_EN)
[grammar] ~83-~83: There might be a mistake here.
Context: ...cally measured end-to-end execution time -
(QB_NEW_EN)
[grammar] ~84-~84: There might be a mistake here.
Context: ...Observed throughput in tokens per second -
(QB_NEW_EN)
[grammar] ~112-~112: There might be a mistake here.
Context: ...ontext Synchronization (timeout_iters)** - Purpose: Ensures temporal alignment of...
(QB_NEW_EN)
[grammar] ~117-~117: There might be a mistake here.
Context: ...ch Equilibration (batching_wait_iters)** - Purpose: Balances the number of accumu...
(QB_NEW_EN)
[grammar] ~127-~127: There might be a mistake here.
Context: ...s arrive sequentially over N iterations. - Context processing time: time(ctx) >> ...
(QB_NEW_EN)
[grammar] ~130-~130: There might be a mistake here.
Context: ...ime: time(gen) Baseline Behavior: In the traditional approach, contexts ar...
(QB_NEW_EN)
[grammar] ~142-~142: There might be a mistake here.
Context: ...(ctx)(dominated by context processing) - **Total execution time**:time(ctx) × N` ...
(QB_NEW_EN)
[grammar] ~143-~143: There might be a mistake here.
Context: ...processing) - Total execution time: time(ctx) × N - Balance ratio: `(ctx_len + (M-1) + M ×...
(QB_NEW_EN)
[grammar] ~146-~146: There might be a mistake here.
Context: ...poor balance) ADP Balance Strategy: Our method synchronizes context processi...
(QB_NEW_EN)
[grammar] ~156-~156: There might be a mistake here.
Context: ...rations, time(ctx) for final iteration - Total execution time: `time(gen) × (N-...
(QB_NEW_EN)
[grammar] ~157-~157: There might be a mistake here.
Context: ...l iteration - Total execution time: time(gen) × (N-1) + time(ctx) - Balance ratio: 1.0 (perfect balance) -...
(QB_NEW_EN)
[grammar] ~158-~158: There might be a mistake here.
Context: ...Balance ratio: 1.0 (perfect balance) - Time savings: `(time(ctx) - time(gen))...
(QB_NEW_EN)
[grammar] ~163-~163: There might be a mistake here.
Context: ...rovement** due to optimal load balancing - ✅ Maximized GPU utilization across a...
(QB_NEW_EN)
[grammar] ~164-~164: There might be a mistake here.
Context: ...mized GPU utilization** across all ranks -
(QB_NEW_EN)
[grammar] ~165-~165: There might be a mistake here.
Context: ...TFT** due to strategic waiting mechanism - 📋 Best suited for throughput-orient...
(QB_NEW_EN)
[grammar] ~166-~166: There might be a mistake here.
Context: ...ted scenarios where TTFT is not critical ## Experiments ### Setting #### Dataset C...
(QB_NEW_EN)
[grammar] ~172-~172: There might be a mistake here.
Context: ... ### Setting #### Dataset Configuration We evaluate our approach using a compreh...
(QB_NEW_EN)
[grammar] ~175-~175: There might be a mistake here.
Context: ...Request volume*: 16,000 total requests - Average input length: 803 tokens - **A...
(QB_NEW_EN)
[grammar] ~176-~176: There might be a mistake here.
Context: ...s - Average input length: 803 tokens - Average output length: 3,653 tokens - ...
(QB_NEW_EN)
[grammar] ~177-~177: There might be a mistake here.
Context: ... Average output length: 3,653 tokens - Token distribution: Figure 2 illustrat...
(QB_NEW_EN)
[grammar] ~189-~189: There might be a mistake here.
Context: .... #### Hardware and Model Configuration Infrastructure: - Platform: NVIDIA...
(QB_NEW_EN)
[grammar] ~190-~190: There might be a mistake here.
Context: ... Model Configuration Infrastructure: - Platform: NVIDIA Blackwell GB200 syste...
(QB_NEW_EN)
[grammar] ~191-~191: There might be a mistake here.
Context: ...latform**: NVIDIA Blackwell GB200 system - GPU Count: 8 × GB200 GPUs - Model:...
(QB_NEW_EN)
[grammar] ~192-~192: There might be a mistake here.
Context: ...0 system - GPU Count: 8 × GB200 GPUs - Model: DeepSeek V3 - **Parallelization...
(QB_NEW_EN)
[grammar] ~193-~193: There might be a mistake here.
Context: ... 8 × GB200 GPUs - Model: DeepSeek V3 - Parallelization Strategy: - Attentio...
(QB_NEW_EN)
[grammar] ~194-~194: There might be a mistake here.
Context: ...pSeek V3 - Parallelization Strategy: - Attention module: Data Parallel (DP) siz...
(QB_NEW_EN)
[grammar] ~195-~195: There might be a mistake here.
Context: ...tion module: Data Parallel (DP) size = 8 - MoE module: Expert Parallel (EP) size = ...
(QB_NEW_EN)
[grammar] ~203-~203: There might be a mistake here.
Context: ... 1. Baseline: Round-robin scheduling 2. ADP Balance (Context Wait): Implementi...
(QB_NEW_EN)
[grammar] ~204-~204: There might be a mistake here.
Context: ...lementing timeout_iters parameter only 3. ADP Balance (Full Strategy): Complete ...
(QB_NEW_EN)
[grammar] ~209-~209: There might be a mistake here.
Context: ... Avg Balance Ratio | SOL TPS | Speedup | |---------------|------------|----------...
(QB_NEW_EN)
[grammar] ~210-~210: There might be a mistake here.
Context: ...---------|-------------------|---------| | Baseline | 25,664 | 54.11% | 39,552 | ...
(QB_NEW_EN)
[grammar] ~211-~211: There might be a mistake here.
Context: ...ine | 25,664 | 54.11% | 39,552 | 1.00× | | ADP Balance (Context Wait) | 33,499 | ...
(QB_NEW_EN)
[grammar] ~212-~212: There might be a mistake here.
Context: ...it) | 33,499 | 84.33% | 38,312 | 1.31× | | ADP Balance (Full Strategy) | 34,140 |...
(QB_NEW_EN)
[grammar] ~215-~215: There might be a mistake here.
Context: ... 37,912 | 1.33× | Key Observations: - Context Wait alone delivers a substantia...
(QB_NEW_EN)
[grammar] ~233-~233: There might be a mistake here.
Context: ...s
(QB_NEW_EN)
[grammar] ~236-~236: There might be a mistake here.
Context: ...eration 12,000, all requests transition to generation phase, naturally reducing im...
(QB_NEW_EN)
[grammar] ~247-~247: There might be a mistake here.
Context: ...
(QB_NEW_EN)
[grammar] ~256-~256: There might be a mistake here.
Context: ...nization. Performance Achievements: - Throughput: 33,499 TPS (1.31× speedup)...
(QB_NEW_EN)
[grammar] ~257-~257: There might be a mistake here.
Context: ...Throughput**: 33,499 TPS (1.31× speedup) - Balance improvement: 84.33% average (v...
(QB_NEW_EN)
[grammar] ~258-~258: There might be a mistake here.
Context: ...**: 84.33% average (vs. 54.11% baseline) - Efficiency: Actual TPS significantly c...
(QB_NEW_EN)
[grammar] ~268-~268: There might be a mistake here.
Context: ...em>
(QB_NEW_EN)
[grammar] ~283-~283: There might be a mistake here.
Context: ...` Performance Optimization Results: - Peak throughput: 34,140 TPS (1.33× spe...
(QB_NEW_EN)
[grammar] ~284-~284: There might be a mistake here.
Context: ...throughput**: 34,140 TPS (1.33× speedup) - Optimal balance: 87.70% average balanc...
(QB_NEW_EN)
[grammar] ~285-~285: There might be a mistake here.
Context: ... balance**: 87.70% average balance ratio - Near-theoretical efficiency: Actual TP...
(QB_NEW_EN)
[grammar] ~286-~286: There might be a mistake here.
Context: ...TPS (34,140) approaches SOL TPS (37,912) - System stability: Dramatically reduced...
(QB_NEW_EN)
[grammar] ~289-~289: There might be a mistake here.
Context: ...terations Production Configuration: Users can enable the full ADP Balance st...
(QB_NEW_EN)
[grammar] ~308-~308: There might be a mistake here.
Context: ... Key Improvements Over Context Wait: - Enhanced stability: Balance ratio main...
(QB_NEW_EN)
[grammar] ~313-~313: There might be a mistake here.
Context: ...erations Implementation Trade-offs: - ✅ Maximum throughput improvement: 33...
(QB_NEW_EN)
[grammar] ~314-~314: There might be a mistake here.
Context: ...ut improvement**: 33% gain over baseline - ✅ Near-optimal load balancing: 87.70...
(QB_NEW_EN)
[grammar] ~315-~315: There might be a mistake here.
Context: ...alancing**: 87.70% average balance ratio -
(QB_NEW_EN)
[grammar] ~316-~316: There might be a mistake here.
Context: ...echanisms increase total iteration count -
(QB_NEW_EN)
[grammar] ~340-~340: There might be a mistake here.
Context: ...itiation 4. Configuration Guidance: - Low-load scenarios: `batching_wait_ite...
(QB_NEW_EN)
[grammar] ~355-~355: There might be a mistake here.
Context: ...mbalances. Experimental Validation: Comprehensive evaluation on the DeepSeek...
(QB_NEW_EN)
[grammar] ~356-~356: There might be a mistake here.
Context: ...tes compelling performance improvements: - 33% throughput increase: From 25,664 t...
(QB_NEW_EN)
[grammar] ~357-~357: There might be a mistake here.
Context: ...ut increase**: From 25,664 to 34,140 TPS - 87% load balance achievement: Dramatic...
(QB_NEW_EN)
[grammar] ~358-~358: There might be a mistake here.
Context: ...: Dramatic improvement from 54% baseline - Near-theoretical efficiency: Actual pe...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
99-99: Images should have alternate text (alt text)
(MD045, no-alt-text)
182-182: Images should have alternate text (alt text)
(MD045, no-alt-text)
228-228: Images should have alternate text (alt text)
(MD045, no-alt-text)
242-242: Images should have alternate text (alt text)
(MD045, no-alt-text)
263-263: Images should have alternate text (alt text)
(MD045, no-alt-text)
303-303: Images should have alternate text (alt text)
(MD045, no-alt-text)
327-327: Images should have alternate text (alt text)
(MD045, no-alt-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
/bot run |
1 similar comment
|
/bot run |
|
PR_Github #16857 [ run ] triggered by Bot |
|
PR_Github #16858 [ run ] triggered by Bot |
|
PR_Github #16857 [ run ] completed with state |
|
PR_Github #16858 [ run ] completed with state |
|
I reviewed it after the merge. LGTM |
Summary by CodeRabbit
Documentation
Chores
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.