KEMBAR78
[None][doc] add adp balance blog by yunruis · Pull Request #7213 · NVIDIA/TensorRT-LLM · GitHub
Skip to content

Conversation

@yunruis
Copy link
Contributor

@yunruis yunruis commented Aug 25, 2025

Summary by CodeRabbit

  • Documentation

    • Added a technical blog detailing the ADP Balance Strategy for Attention Data Parallel scheduling with in‑flight batching: theory, two scheduling approaches, tunable controls, state transitions, worked examples, experimental results (throughput, balance metrics, SOL TPS), trade-offs (throughput vs time‑to‑first‑token), production guidance, and illustrative figures.
  • Chores

    • Extended Git LFS tracking for additional blog image assets to improve asset handling.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@yunruis yunruis requested a review from a team as a code owner August 25, 2025 11:08
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 25, 2025

📝 Walkthrough

Walkthrough

Adds three PNG patterns to Git LFS via .gitattributes and adds a new technical blog post docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md documenting the ADP Balance Strategy, formulas, state transitions, experiments, results, and a YAML config snippet.

Changes

Cohort / File(s) Summary
Git LFS configuration
.gitattributes
Added three PNG entries under docs/source/blogs/media: tech_blog10_baseline_performance_detail.png, tech_blog10_full_strategy_performance.png, tech_blog10_context_wait_performance.png to enable Git LFS filtering.
Documentation — Tech blog
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
Added new markdown describing the ADP Balance Strategy for Attention Data Parallel scheduling: per-iteration time model, balance_ratio and SOL TPS formulas, two scheduling approaches (Baseline vs ADP Balance), tunable controls (timeout_iters, batching_wait_iters), per-iteration state transitions, worked examples, experimental setup/results, Pareto analysis, and a YAML config snippet to enable the strategy.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

Community want to contribute

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (8)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (8)

70-75: Specify code-fence languages to satisfy linters and improve readability.

markdownlint flagged MD040. Add explicit languages: use text for illustrative timelines and yaml for the config snippet.

Apply this diff:

@@
-```
+```text
 iter_i:     [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gnM]
 iter_i+1:   [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gnM]
 ...
 iter_i+N-1: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]

@@
- +text
iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
...
iter_i+N-1: [C0, g01, ..., g0M], [C1, g11, ..., g1M], ..., [CN, gN1, ..., gNM]

@@
-```
+```yaml

Also applies to: 79-84, 138-138


54-54: Add alt text to images (MD045) and improve accessibility.

All images lack alt text and are flagged by markdownlint (MD045). Add descriptive alt attributes to the <img> tags.

Apply this diff:

@@
-  <img src="./../media/ADP Balance.png">
+  <img src="./../media/ADP Balance.png" alt="Baseline: sort by tokens then round-robin across ranks">
@@
-  <img src="./../media/combined_token_distribution.png">
+  <img src="./../media/combined_token_distribution.png" alt="Distribution of input and output token lengths">
@@
-  <img src="./../media/si2_ei30235_plot1_token_distribution.png">
+  <img src="./../media/si2_ei30235_plot1_token_distribution.png" alt="Avg tokens and balance_ratio vs iteration (baseline)">
@@
-  <img src="./../media/si100_ei12000_plot1_token_distribution.png">
+  <img src="./../media/si100_ei12000_plot1_token_distribution.png" alt="Avg tokens and balance_ratio in iterations 100–12000 (baseline)">
@@
-  <img src="./../media/balanced_si100_ei12000_plot1_token_distribution.png">
+  <img src="./../media/balanced_si100_ei12000_plot1_token_distribution.png" alt="Avg tokens and balance_ratio in iterations 100–12000 (ADP Balance)">
@@
-  <img src="./../media/combined_charts.png">
+  <img src="./../media/combined_charts.png" alt="Pareto curve of TPS vs timeout_iters and batching_wait_iters">

Optional: consider renaming the asset with a space (ADP Balance.png) to a hyphenated name (e.g., adp-balance.png) to avoid path quoting issues in some build systems.

Also applies to: 97-97, 119-119, 128-128, 147-147, 160-160


22-22: Fix non-ASCII punctuation and tighten the sentence.

There is a Chinese comma and the sentence can be made crisper.

Apply this diff:

-With non-extremely-long context scenarios,we can assume that the overall latency is dominated by the non-MHA portion; hence the time complexity is effectively O(N).
+In non-extremely-long-context scenarios, we can assume the overall latency is dominated by non-MHA components; hence, the per-iteration time scales roughly linearly with the number of ranks.

11-11: Avoid multiple H1 headings; rename the second header to clarify scope.

There are two top-level headers (“Theoretical Analysis and Modeling” and “Theoretical Analysis”). Prefer a single H1 per page.

Apply this diff:

-# Theoretical Analysis
+## Notation and Metrics

Also applies to: 15-15


71-75: Correct subscript typos in illustrative timelines (gnM/gnNgNM/gNN).

A few indices flip case/order and may confuse readers.

Apply this diff:

@@
-iter_i:     [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gnM]
-iter_i+1:   [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gnM]
+iter_i:     [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM]
+iter_i+1:   [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gNM]
@@
-iter_i:     [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
-iter_i+1:   [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
+iter_i:     [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN]
+iter_i+1:   [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN]

Also applies to: 80-84


156-156: Capitalize “Pareto”.

Stylistic consistency in section titles.

Apply this diff:

-The following Figure shows the pareto curve of impact of different `timeout_iters` and `batch_waiting_iters` values on TPS.
+The following figure shows the Pareto curve for the impact of different `timeout_iters` and `batching_wait_iters` values on TPS.

50-60: Image path with space may be brittle across doc toolchains.

./../media/ADP Balance.png contains a space. Some static-site generators and link-checkers stumble on unencoded spaces. Prefer adp-balance.png.

If you decide to rename the asset, update the <img> src accordingly and run the docs build/link check. I can supply a follow-up diff once the asset is renamed.


110-116: Strengthen experimental methodology notes (hardware, seeds, repetitions).

The reported TPS gains are compelling; adding hardware details (GPU model/count, driver/CUDA/TensorRT-LLM commit), random seed control, and number of runs with mean±std would increase reproducibility and credibility.

I can draft a short “Experimental Setup” subsection with placeholders for these fields if you’d like.

Also applies to: 133-136

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a1e03af and a7c2bd0.

📒 Files selected for processing (1)
  • docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md

[grammar] ~11-~11: There might be a mistake here.
Context: ...ks. # Theoretical Analysis and Modeling Optimization Goals: - Minimize the load ...

(QB_NEW_EN)


[grammar] ~12-~12: There might be a mistake here.
Context: ...nalysis and Modeling Optimization Goals: - Minimize the load difference between dif...

(QB_NEW_EN)


[grammar] ~16-~16: There might be a mistake here.
Context: ...ecution time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...tion time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N} iter...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...i = \max{0 \leq m < N} iter_time_{im} $$ where $iter_time_{im}$ represents th...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...= \max_{0 \leq m < N} iter_time_{im} $$ where $iter_time_{im}$ represents the e...

(QB_NEW_EN)


[grammar] ~22-~22: Ensure spelling is correct
Context: ...odule). With non-extremely-long context scenarios,we can assume that the overall latency is ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~22-~22: There might be a mistake here.
Context: ...the time complexity is effectively O(N). Since scheduling is intended to resolve ...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance_ratio = avg_tokens / max_toke...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...lance_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N to...

(QB_NEW_EN)


[grammar] ~27-~27: There might be a mistake here.
Context: ...ce_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N token...

(QB_NEW_EN)


[grammar] ~30-~30: There might be a mistake here.
Context: ... the theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (...

(QB_NEW_EN)


[grammar] ~31-~31: There might be a mistake here.
Context: ...e theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (iter_t...

(QB_NEW_EN)


[grammar] ~32-~32: There might be a mistake here.
Context: ...{\infty} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_t...

(QB_NEW_EN)


[grammar] ~35-~35: There might be a mistake here.
Context: ...} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_time * tps $...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ... sol_tps = elp_time / sol_time * tps $$ Where: - iter_time_i denotes the e...

(QB_NEW_EN)


[grammar] ~77-~77: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...

(QB_NEW_EN)


[grammar] ~90-~90: There might be a mistake here.
Context: ...t a critical requirement. # Experiments ## Dataset The dataset contains 16,000 req...

(QB_NEW_EN)


[grammar] ~106-~106: There might be a mistake here.
Context: ...rameter Setting model=DeepSeekV3-NVFP4, TP=8, EP=8, WideEP=ON, max_num_tokens = ...

(QB_NEW_EN)


[grammar] ~156-~156: There might be a mistake here.
Context: ...following Figure shows the pareto curve of impact of different timeout_iters and...

(QB_NEW_EN)


[grammar] ~166-~166: There might be a mistake here.
Context: ...al1024_wideep_balance_ep_20250810_764586 [2] ADP balance: to50bw10: log_gb200_ifb...

(QB_NEW_EN)


[grammar] ~167-~167: There might be a mistake here.
Context: ...eep_adpbalance_to50_bw10_20250812_766150

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md

54-54: Images should have alternate text (alt text)

(MD045, no-alt-text)


70-70: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


79-79: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


97-97: Images should have alternate text (alt text)

(MD045, no-alt-text)


119-119: Images should have alternate text (alt text)

(MD045, no-alt-text)


128-128: Images should have alternate text (alt text)

(MD045, no-alt-text)


138-138: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


147-147: Images should have alternate text (alt text)

(MD045, no-alt-text)


160-160: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (2)

166-167: Double-check log directory names: “streamterval1024” looks like a typo.

If these are literal experiment paths, keep them as-is; if they’re human-readable labels, consider correcting to “streaming_interval1024”.

Would you confirm whether the suffix is “streaming_interval1024” in your logs? If it is a typo here, I can provide a diff to fix it.


1-10: Overall: Clear motivation, solid modeling, and practical guidance.

The narrative flows well from problem statement to method, with useful metrics and an actionable config snippet. The trade-offs (TTFT vs throughput) are articulated. Once the minor consistency and lint fixes land, this will be a strong addition to the tech blog.

Also applies to: 46-49, 61-68, 86-89, 152-163

@yunruis yunruis force-pushed the user/yunruis/adp_balance_blog branch from a7c2bd0 to b107017 Compare August 25, 2025 11:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (3)

156-156: Capitalize “Pareto” and fix parameter name to batching_wait_iters.

-The following Figure shows the pareto curve of impact of different `timeout_iters` and `batch_waiting_iters` values on TPS.
+The following figure shows the Pareto curve of the impact of different `timeout_iters` and `batching_wait_iters` values on TPS.

64-67: Unify parameter name to timeout_iters (avoid copy/paste config errors).

The prose uses time_out_iters, while examples/YAML use timeout_iters. Please normalize to timeout_iters.

-Specifically, we introduce two parameters: `time_out_iters` and `batching_wait_iters`.
-- `time_out_iters`: The maximum number of steps a rank will wait when it has a context request while others do not.
+Specifically, we introduce two parameters: `timeout_iters` and `batching_wait_iters`.
+- `timeout_iters`: The maximum number of steps a rank will wait when it has a context request while others do not.

135-136: Unify parameter name to batching_wait_iters.

Prose uses batch_waiting_iters but YAML uses batching_wait_iters. This mismatch can cause config errors.

-This is achieved by adding `timeout_iters=50` and `batch_waiting_iters=10` in `config.yaml` to balance ADP tokens through waiting.
+This is achieved by adding `timeout_iters=50` and `batching_wait_iters=10` in `config.yaml` to balance ADP tokens through waiting.
🧹 Nitpick comments (12)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (12)

70-75: Specify a language for fenced blocks (MD040).

Label these illustrative schedules as plain text to satisfy markdownlint and improve rendering.

-```
+```text
 iter_i:     [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gnM]
 iter_i+1:   [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gnM]
 ...
 iter_i+N-1: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]

---

`79-84`: **Specify a language for fenced blocks (MD040).**

Same as above.


```diff
-```
+```text
 iter_i:     [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
 iter_i+1:   [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
 ...
 iter_i+N-1: [*C0*, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]

---

`97-101`: **Add alt text for the image (MD045).**


```diff
-  <img src="./../media/combined_token_distribution.png">
+  <img alt="Input/output token length distributions (combined)" src="./../media/combined_token_distribution.png">

115-116: Pluralize metric name for consistency (“avg_tokens”).

Align with earlier definition and figure captions.

-tps: 25664, avg_balance_ratio=54.11%, sol_tps=39552. The curves for avg_token and balance_ratio over iterations are shown below.
+tps: 25664, avg_balance_ratio=54.11%, sol_tps=39552. The curves for avg_tokens and balance_ratio over iterations are shown below.

119-123: Add alt text for the image (MD045).

-  <img src="./../media/si2_ei30235_plot1_token_distribution.png">
+  <img alt="Avg tokens and balance_ratio by iteration (full range)" src="./../media/si2_ei30235_plot1_token_distribution.png">

128-132: Add alt text for the image (MD045).

-  <img src="./../media/si100_ei12000_plot1_token_distribution.png">
+  <img alt="Avg tokens and balance_ratio by iteration [100, 12000]" src="./../media/si100_ei12000_plot1_token_distribution.png">

138-143: Annotate the config block as YAML (MD040) and keep keys consistent.

-```
+```yaml
 attention_dp_config:
     enable_balance: true
     batching_wait_iters: 10
     timeout_iters: 50

---

`147-151`: **Add alt text for the image (MD045).**


```diff
-  <img src="./../media/balanced_si100_ei12000_plot1_token_distribution.png">
+  <img alt="Avg tokens and balance_ratio with ADP Balance [100, 12000]" src="./../media/balanced_si100_ei12000_plot1_token_distribution.png">

160-163: Add alt text for the image (MD045).

-  <img src="./../media/combined_charts.png">
+  <img alt="Pareto analyses across timeout_iters and batching_wait_iters (TPS vs TTFT trade-offs)" src="./../media/combined_charts.png">

5-5: Use “and” instead of “&&” in heading.

Improves readability and aligns with house style.

-# Motivation && Background
+# Motivation and Background

49-51: Minor grammar: “can only reduce” (word order).

-One approach is to sort the requests in the request queue by `num_tokens` and then dispatch them to different ranks in a round-robin fashion, as shown in the figure below. This method balances the total number of tokens across ranks from a global perspective, and only can reduce the disparity in token numbers between different contexts when all ranks are processing context requests.
+One approach is to sort the requests in the queue by `num_tokens` and then dispatch them to ranks in a round-robin fashion, as shown below. This method balances tokens across ranks from a global perspective, and can only reduce the disparity in token numbers between different contexts when all ranks are processing context requests.

22-23: Fix full-width comma and clarity.

Replace the full-width comma with a standard comma and streamline the clause.

-With non-extremely-long context scenarios,we can assume that the overall latency is dominated by the non-MHA portion; hence the time complexity is effectively O(N).
+In non–extremely-long context scenarios, we assume the overall latency is dominated by the non-MHA portion; hence the time complexity is effectively O(N).
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a7c2bd0 and b107017.

⛔ Files ignored due to path filters (6)
  • docs/source/blogs/media/ADP_Balance.png is excluded by !**/*.png
  • docs/source/blogs/media/balanced_si100_ei12000_plot1_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/combined_charts.png is excluded by !**/*.png
  • docs/source/blogs/media/combined_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/si100_ei12000_plot1_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/si2_ei30235_plot1_token_distribution.png is excluded by !**/*.png
📒 Files selected for processing (2)
  • .gitattributes (1 hunks)
  • docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md

[grammar] ~11-~11: There might be a mistake here.
Context: ...ks. # Theoretical Analysis and Modeling Optimization Goals: - Minimize the load ...

(QB_NEW_EN)


[grammar] ~12-~12: There might be a mistake here.
Context: ...nalysis and Modeling Optimization Goals: - Minimize the load difference between dif...

(QB_NEW_EN)


[grammar] ~16-~16: There might be a mistake here.
Context: ...ecution time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...tion time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N} iter...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...i = \max{0 \leq m < N} iter_time_{im} $$ where $iter_time_{im}$ represents th...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...= \max_{0 \leq m < N} iter_time_{im} $$ where $iter_time_{im}$ represents the e...

(QB_NEW_EN)


[grammar] ~22-~22: Ensure spelling is correct
Context: ...odule). With non-extremely-long context scenarios,we can assume that the overall latency is ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~22-~22: There might be a mistake here.
Context: ...the time complexity is effectively O(N). Since scheduling is intended to resolve ...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance_ratio = avg_tokens / max_toke...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...lance_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N to...

(QB_NEW_EN)


[grammar] ~27-~27: There might be a mistake here.
Context: ...ce_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N token...

(QB_NEW_EN)


[grammar] ~30-~30: There might be a mistake here.
Context: ... the theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (...

(QB_NEW_EN)


[grammar] ~31-~31: There might be a mistake here.
Context: ...e theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (iter_t...

(QB_NEW_EN)


[grammar] ~32-~32: There might be a mistake here.
Context: ...{\infty} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_t...

(QB_NEW_EN)


[grammar] ~35-~35: There might be a mistake here.
Context: ...} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_time * tps $...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ... sol_tps = elp_time / sol_time * tps $$ Where: - iter_time_i denotes the e...

(QB_NEW_EN)


[grammar] ~77-~77: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...

(QB_NEW_EN)


[grammar] ~90-~90: There might be a mistake here.
Context: ...t a critical requirement. # Experiments ## Dataset The dataset contains 16,000 req...

(QB_NEW_EN)


[grammar] ~106-~106: There might be a mistake here.
Context: ...rameter Setting model=DeepSeekV3-NVFP4, TP=8, EP=8, WideEP=ON, max_num_tokens = ...

(QB_NEW_EN)


[grammar] ~156-~156: There might be a mistake here.
Context: ...following Figure shows the pareto curve of impact of different timeout_iters and...

(QB_NEW_EN)


[grammar] ~166-~166: There might be a mistake here.
Context: ...al1024_wideep_balance_ep_20250810_764586 [2] ADP balance: to50bw10: log_gb200_ifb...

(QB_NEW_EN)


[grammar] ~167-~167: There might be a mistake here.
Context: ...eep_adpbalance_to50_bw10_20250812_766150

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md

54-54: Images should have alternate text (alt text)

(MD045, no-alt-text)


70-70: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


79-79: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


97-97: Images should have alternate text (alt text)

(MD045, no-alt-text)


119-119: Images should have alternate text (alt text)

(MD045, no-alt-text)


128-128: Images should have alternate text (alt text)

(MD045, no-alt-text)


138-138: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


147-147: Images should have alternate text (alt text)

(MD045, no-alt-text)


160-160: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (1)
docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md (1)

166-167: Typo in log path: “streamterval” vs “streaminterval”
It looks like both entries use streamterval1024, but you probably meant streaminterval1024. Our repo search shows these strings only in this markdown, with no other occurrences of either spelling—so before updating, make sure that any real log folders or tags on disk (or in your storage/location naming) also use the corrected spelling to avoid breaking links.

• docs/source/blogs/tech_blog/bloc_x_adp-scheduler-opt.md: lines 166–167 [streamterval1024 only appears here]
• Confirm the actual on-disk/log-tag spelling and update this snippet if it’s safe.

@nv-guomingz nv-guomingz requested a review from chenopis August 25, 2025 14:16
@yunruis yunruis force-pushed the user/yunruis/adp_balance_blog branch from b107017 to a0d4efc Compare August 27, 2025 03:19
@yunruis yunruis changed the title doc: add adp balance blog [None][doc] add adp balance blog Aug 27, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (6)

53-59: Replace HTML images with MyST figures and add alt text to satisfy MD045 and improve a11y.

Use MyST {figure} (auto-numbering, captions) and provide :alt:. Example for Lines 53–59; please apply similarly to all image blocks listed in this comment.

-<div align="center">
-<figure>
-  <img src="./../media/tech_blog10_baseline_round_robin_strategy.png">
-</figure>
-</div>
-<p align="center"><sub><em>Figure 1: Baseline: balance number of context request tokens across ranks through sort and round-robin </em></sub></p>
+```{figure} ./../media/tech_blog10_baseline_round_robin_strategy.png
+:alt: Baseline round-robin scheduling balances context tokens across ranks.
+:align: center
+
+Figure 1: Baseline: balance number of context request tokens across ranks through sort and round-robin.
+```

Also applies to: 99-105, 128-134, 137-143, 149-155, 169-175, 186-192


31-45: Clarify summation bounds and variable naming in SOL formulas.

Use a finite iteration count T and rename elp_time to elapsed_time for readability.

-`sol_tps` represents the theoretical upper-bound throughput.
-$$
-    sol\_time = \sum_{i=0}^{\infty} (iter\_time_i / balance\_ratio)
-$$
+`sol_tps` represents the theoretical upper-bound throughput. Let `T` be the total number of scheduling iterations.
+$$
+    sol\_time = \sum_{i=0}^{T-1} \frac{iter\_time_i}{balance\_ratio}
+$$
@@
-$$
-    sol\_tps = elp\_time / sol\_time * tps
-$$
+$$
+    sol\_tps = \frac{elapsed\_time}{sol\_time} \cdot tps
+$$
@@
-- `elp_time` denotes the empirically measured end-to-end elapsed time.
+- `elapsed_time` denotes the empirically measured end-to-end time.

Also applies to: 33-38


7-9: Minor wording tweaks for precision.

“dis-aggregated” → “disaggregated”; “Attention velocity” → “Attention throughput”.

-... dis-aggregated architecture, or for scenarios ...
+... disaggregated architecture, or for scenarios ...
@@
-... which slows down the Attention velocity.
+... which slows down the Attention throughput.

50-61: Avoid manual figure numbering and alignment HTML; let Sphinx/MyST handle captions and numbering.

Switch all figures to {figure} blocks (see earlier example). This prevents drift when figures are added/removed.

Also applies to: 144-156, 156-168, 176-184


116-121: Table terms: align column header with term used in text/code.

Use “SOL TPS” or “sol_tps” consistently; pick one and use it throughout (including formulas, captions, and text).


23-30: Clarify the “dominated by the non-MHA portion; hence O(N)” sentence.

This reads counterintuitively: if non-MHA dominates, attention balance wouldn’t drive end-to-end. Consider rephrasing to state assumptions precisely or drop the O(N) remark.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b107017 and a0d4efc.

⛔ Files ignored due to path filters (7)
  • docs/source/blogs/media/tech_blog10_baseline_performance_detail.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_performance_overview.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_context_wait_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_dataset_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_full_strategy_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png is excluded by !**/*.png
📒 Files selected for processing (2)
  • .gitattributes (1 hunks)
  • docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

[grammar] ~13-~13: There might be a mistake here.
Context: ...ks. # Theoretical Analysis and Modeling Optimization Goals: - Minimize the load ...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ...nalysis and Modeling Optimization Goals: - Minimize the load difference between dif...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...ecution time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...tion time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N} iter...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...i = \max{0 \leq m < N} iter_time_{im} $$ where $iter_time_{im}$ represents th...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...= \max_{0 \leq m < N} iter_time_{im} $$ where $iter_time_{im}$ represents the e...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance_ratio = avg_tokens / max_toke...

(QB_NEW_EN)


[grammar] ~27-~27: There might be a mistake here.
Context: ...lance_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N to...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...ce_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N token...

(QB_NEW_EN)


[grammar] ~31-~31: There might be a mistake here.
Context: ... the theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (...

(QB_NEW_EN)


[grammar] ~32-~32: There might be a mistake here.
Context: ...e theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (iter_t...

(QB_NEW_EN)


[grammar] ~33-~33: There might be a mistake here.
Context: ...{\infty} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_t...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ...} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_time * tps $...

(QB_NEW_EN)


[grammar] ~37-~37: There might be a mistake here.
Context: ... sol_tps = elp_time / sol_time * tps $$ Where: - iter_time_i denotes the e...

(QB_NEW_EN)


[grammar] ~78-~78: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...

(QB_NEW_EN)


[grammar] ~116-~116: There might be a mistake here.
Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---...

(QB_NEW_EN)


[grammar] ~117-~117: There might be a mistake here.
Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 | |...

(QB_NEW_EN)


[grammar] ~118-~118: There might be a mistake here.
Context: ...| | Baseline | 25664 | 54.11% | 39552 | | ADP Balance with Context Wait | 33499 ...

(QB_NEW_EN)


[grammar] ~119-~119: There might be a mistake here.
Context: ...Context Wait | 33499 | 84.33% | 38312 | | ADP Balance with Full Strategy | 34140...

(QB_NEW_EN)


[style] ~198-~198: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative.
Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe...

(PROVE_TO_BE_WORDY)

🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

55-55: Images should have alternate text (alt text)

(MD045, no-alt-text)


71-71: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


80-80: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


101-101: Images should have alternate text (alt text)

(MD045, no-alt-text)


130-130: Images should have alternate text (alt text)

(MD045, no-alt-text)


139-139: Images should have alternate text (alt text)

(MD045, no-alt-text)


151-151: Images should have alternate text (alt text)

(MD045, no-alt-text)


160-160: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


171-171: Images should have alternate text (alt text)

(MD045, no-alt-text)


188-188: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)

160-165: Review resolved: attention_dp_config keys are valid
The keys enable_balance, batching_wait_iters, and timeout_iters are defined in the current release’s AttentionDpConfig schema (tensorrt_llm/llmapi/llm_args.py) and honored at runtime (in both the LLM API and PyTorch executor configurations). No changes to the blog snippet are necessary.

@yunruis yunruis force-pushed the user/yunruis/adp_balance_blog branch from a0d4efc to 20c5c1e Compare August 27, 2025 03:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)

1-204: Add missing alt text to blog images and verify Git LFS tracking

All referenced images under docs/source/blogs/media/ are present, but none include an alt attribute, and Git LFS tracking couldn’t be confirmed in CI. To meet accessibility standards and keep large assets in LFS, please:

• In docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md, update each <img> tag to include a descriptive alt, for example:

<img src="../media/tech_blog10_baseline_round_robin_strategy.png" alt="Baseline round-robin strategy distributing context tokens evenly across DP ranks">

Repeat for:
– tech_blog10_baseline_round_robin_strategy.png
– tech_blog10_dataset_token_distribution.png
– tech_blog10_baseline_performance_overview.png
– tech_blog10_baseline_performance_detail.png
– tech_blog10_context_wait_performance.png
– tech_blog10_full_strategy_performance.png
– tech_blog10_tps_ttft_pareto_curve.png

• Run git lfs ls-files (or install/configure Git LFS) to ensure all large PNGs in docs/source/blogs/media/ are tracked. If any are missing, add them via git lfs track "docs/source/blogs/media/*.png" and recommit.

♻️ Duplicate comments (8)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (8)

65-67: Unify parameter names: timeout_iters and batching_wait_iters.

Use the same names as elsewhere and in the YAML.

-Specifically, we introduce two parameters: `time_out_iters` and `batching_wait_iters`.
-- `time_out_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase.
+Specifically, we introduce two parameters: `timeout_iters` and `batching_wait_iters`.
+- `timeout_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase.

71-76: Add fence language for timeline block (MD040).

-```
+```text
 iter_i:     [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gnM]
 iter_i+1:   [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gnM]
 ...
 iter_i+N-1: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]

---

`80-85`: **Add fence language for timeline block (MD040).**



```diff
-```
+```text
 iter_i:     [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
 iter_i+1:   [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
 ...
 iter_i+N-1: [*C0*, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]

---

`158-165`: **Fix param name and add fence language to YAML block.**




```diff
-With the full ADP Balance strategy (`timeout_iters=50` and `batch_waiting_iters=10`), the system throughput is further boosted. The output TPS reaches 34,140, achieving a 1.33x speedup. The overall `balance_ratio` improves to 87.70%, and the `sol_tps` of 37,912 is very close to the actual `tps` of 34,140, indicating a highly balanced system. Users can enable this feature by adding the following configuration to `config.yaml`:
+With the full ADP Balance strategy (`timeout_iters=50` and `batching_wait_iters=10`), the system throughput is further boosted. The output TPS reaches 34,140, achieving a 1.33x speedup. The overall `balance_ratio` improves to 87.70%, and the `sol_tps` of 37,912 is very close to the actual `tps` of 34,140, indicating a highly balanced system. Users can enable this feature by adding the following configuration to `config.yaml`:
-```
+```yaml
 attention_dp_config:
     enable_balance: true
     batching_wait_iters: 10
     timeout_iters: 50

---

`53-57`: **Add alt text and ensure LFS tracking for baseline RR figure.**

Missing alt breaks MD045; this image was also flagged as not tracked by LFS earlier.




Apply:

```diff
-  <img src="./../media/tech_blog10_baseline_round_robin_strategy.png">
+  <img src="./../media/tech_blog10_baseline_round_robin_strategy.png" alt="Baseline round-robin scheduling strategy illustration">

Also add to .gitattributes (outside this file):

+docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png filter=lfs diff=lfs merge=lfs -text

100-103: Add alt text and ensure LFS tracking for dataset distribution figure.

Alt missing (MD045); image was previously untracked by LFS.

-  <img src="./../media/tech_blog10_dataset_token_distribution.png">
+  <img src="./../media/tech_blog10_dataset_token_distribution.png" alt="Input and output token length distribution histogram">

.gitattributes addition:

+docs/source/blogs/media/tech_blog10_dataset_token_distribution.png filter=lfs diff=lfs merge=lfs -text

128-133: Add alt text and ensure LFS tracking for baseline overview figure.

Alt missing; this file was also called out as not in LFS.

-  <img src="./../media/tech_blog10_baseline_performance_overview.png">
+  <img src="./../media/tech_blog10_baseline_performance_overview.png" alt="Baseline average tokens per rank and balance ratio over iterations">

.gitattributes addition:

+docs/source/blogs/media/tech_blog10_baseline_performance_overview.png filter=lfs diff=lfs merge=lfs -text

186-191: Add alt text and ensure LFS tracking for Pareto curve figure.

Alt missing; this image was previously untracked by LFS.

-  <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png">
+  <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png" alt="Pareto curve of TPS/GPU vs TTFT across timeout_iters and batching_wait_iters settings">

.gitattributes addition:

+docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png filter=lfs diff=lfs merge=lfs -text
🧹 Nitpick comments (5)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (5)

110-111: Pluralize “GPU”.

-The experiments were conducted using the DeepSeekV3 model on a Blackwell GB200 machine, 8 GPU, with a DP size of 8 for the Attention module and an EP size of 8 for the MoE module.
+The experiments were conducted using the DeepSeekV3 model on a Blackwell GB200 machine, 8 GPUs, with a DP size of 8 for the Attention module and an EP size of 8 for the MoE module.

137-142: Add alt text for baseline detail figure (MD045).

-  <img src="./../media/tech_blog10_baseline_performance_detail.png">
+  <img src="./../media/tech_blog10_baseline_performance_detail.png" alt="Baseline detail: tokens and balance ratio between iterations 100 and 12000">

149-154: Add alt text for context-wait figure (MD045).

-  <img src="./../media/tech_blog10_context_wait_performance.png">
+  <img src="./../media/tech_blog10_context_wait_performance.png" alt="ADP Balance with Context Wait: tokens and balance ratio over iterations">

169-174: Add alt text for full-strategy figure (MD045).

-  <img src="./../media/tech_blog10_full_strategy_performance.png">
+  <img src="./../media/tech_blog10_full_strategy_performance.png" alt="ADP Balance full strategy: tokens and balance ratio over iterations">

116-121: Optional: clarify table units and formatting.

Consider adding units (TPS) in header and aligning thousands with commas; minor readability win.

-| Method | TPS | Avg Balance Ratio | Sol TPS |
+| Method | TPS (tokens/s) | Avg Balance Ratio | Sol TPS (tokens/s) |
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a0d4efc and 20c5c1e.

⛔ Files ignored due to path filters (7)
  • docs/source/blogs/media/tech_blog10_baseline_performance_detail.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_performance_overview.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_context_wait_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_dataset_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_full_strategy_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png is excluded by !**/*.png
📒 Files selected for processing (2)
  • .gitattributes (1 hunks)
  • docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

[grammar] ~13-~13: There might be a mistake here.
Context: ...s. ## Theoretical Analysis and Modeling Optimization Goals: - Minimize the load ...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ...nalysis and Modeling Optimization Goals: - Minimize the load difference between dif...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...ecution time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...tion time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N} iter...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...i = \max{0 \leq m < N} iter_time_{im} $$ where $iter_time_{im}$ represents th...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...= \max_{0 \leq m < N} iter_time_{im} $$ where $iter_time_{im}$ represents the e...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance_ratio = avg_tokens / max_toke...

(QB_NEW_EN)


[grammar] ~27-~27: There might be a mistake here.
Context: ...lance_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N to...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...ce_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N token...

(QB_NEW_EN)


[grammar] ~31-~31: There might be a mistake here.
Context: ... the theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (...

(QB_NEW_EN)


[grammar] ~32-~32: There might be a mistake here.
Context: ...e theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (iter_t...

(QB_NEW_EN)


[grammar] ~33-~33: There might be a mistake here.
Context: ...{\infty} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_t...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ...} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_time * tps $...

(QB_NEW_EN)


[grammar] ~37-~37: There might be a mistake here.
Context: ... sol_tps = elp_time / sol_time * tps $$ Where: - iter_time_i denotes the e...

(QB_NEW_EN)


[grammar] ~78-~78: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...

(QB_NEW_EN)


[grammar] ~116-~116: There might be a mistake here.
Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---...

(QB_NEW_EN)


[grammar] ~117-~117: There might be a mistake here.
Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 | |...

(QB_NEW_EN)


[grammar] ~118-~118: There might be a mistake here.
Context: ...| | Baseline | 25664 | 54.11% | 39552 | | ADP Balance with Context Wait | 33499 ...

(QB_NEW_EN)


[grammar] ~119-~119: There might be a mistake here.
Context: ...Context Wait | 33499 | 84.33% | 38312 | | ADP Balance with Full Strategy | 34140...

(QB_NEW_EN)


[style] ~198-~198: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative.
Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe...

(PROVE_TO_BE_WORDY)

🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

55-55: Images should have alternate text (alt text)

(MD045, no-alt-text)


71-71: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


80-80: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


101-101: Images should have alternate text (alt text)

(MD045, no-alt-text)


130-130: Images should have alternate text (alt text)

(MD045, no-alt-text)


139-139: Images should have alternate text (alt text)

(MD045, no-alt-text)


151-151: Images should have alternate text (alt text)

(MD045, no-alt-text)


160-160: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


171-171: Images should have alternate text (alt text)

(MD045, no-alt-text)


188-188: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@yunruis yunruis force-pushed the user/yunruis/adp_balance_blog branch from 20c5c1e to c3778f8 Compare August 27, 2025 05:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)

75-80: Label fenced code blocks with languages (MD040).

Use text for timelines and yaml for config.

-```
+```text
 iter_i:     [*C0*, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gnM]
 iter_i+1:   [g00, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [gN0, gN1, ..., gnM]
 ...
 iter_i+N-1: [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]

- +text
iter_i: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
iter_i+1: [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
...
iter_i+N-1: [C0, g01, ..., g0M], [C1, g11, ..., g1M], ..., [CN, gN1, ..., gNM]


-```
+```yaml
attention_dp_config:
    enable_balance: true
    batching_wait_iters: 10
    timeout_iters: 50

Also applies to: 84-89, 164-169


69-71: Unify parameter names and terminology across text, tables, and config.

Use timeout_iters and batching_wait_iters everywhere; refer to “Context Wait” consistently.

-Specifically, we introduce two parameters: `time_out_iters` and `batching_wait_iters`.
-- `time_out_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase.
+Specifically, we introduce two parameters: `timeout_iters` and `batching_wait_iters`.
+- `timeout_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase.
-The experiments are divided into three groups: a baseline, ADP Balance with `ctx_wait`, and ADP Balance with both `ctx_wait` and `batching_wait`. As the results show, enabling `ctx_wait` provides a significant TPS improvement. Further enabling `batching_wait` enhances the balancing effect even more.
+The experiments are divided into three groups: Baseline, ADP Balance with `timeout_iters` (“Context Wait”), and ADP Balance with both `timeout_iters` and `batching_wait_iters`. Enabling Context Wait provides a significant TPS improvement; adding `batching_wait_iters` further enhances balance.
-With the full ADP Balance strategy (`timeout_iters=50` and `batch_waiting_iters=10`), the system throughput is further boosted.
+With the full ADP Balance strategy (`timeout_iters=50` and `batching_wait_iters=10`), the system throughput is further boosted.

Also applies to: 117-126, 160-169


57-63: Verify all referenced images exist and are tracked by Git LFS.

Past checks flagged four PNGs not tracked by LFS; re-verify after recent edits.

#!/usr/bin/env bash
set -euo pipefail

md="docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md"
media_dir="docs/source/blogs/media"

# Gather image refs from HTML <img> and MyST {figure}
refs=$(rg -No '<img\s+src="\K[^"]+' "$md"; rg -No '^\s*```{figure}\s+\K\S+' "$md" || true)

echo "--- Existence & LFS tracking ---"
while read -r ref; do
  f="$media_dir/$(basename "$ref")"
  status="MISSING"
  [[ -f "$f" ]] && status="FOUND"
  lfs="NOT_TRACKED"
  if git lfs ls-files --name-only | rg -q "^$(printf '%q' "$f" | sed 's/\\//\//g')$"; then
    lfs="TRACKED"
  fi
  printf "%-55s %s, LFS: %s\n" "$f" "$status" "$lfs"
done <<< "$refs"

echo
echo "Patterns in .gitattributes involving png:"
rg -n 'filter=lfs' .gitattributes | rg -n '\.png' -n

Also applies to: 99-109, 128-138, 148-159, 160-169, 171-179, 182-196

🧹 Nitpick comments (4)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (4)

57-61: Add alt text to all images to satisfy MD045 and improve accessibility.

Provide descriptive alt text via the img alt attribute.

-  <img src="./../media/tech_blog10_baseline_round_robin_strategy.png">
+  <img alt="Baseline round-robin strategy balancing context tokens across ranks" src="./../media/tech_blog10_baseline_round_robin_strategy.png">
-  <img src="./../media/tech_blog10_dataset_token_distribution.png">
+  <img alt="Token distribution of input and output lengths across 16k requests" src="./../media/tech_blog10_dataset_token_distribution.png">
-  <img src="./../media/tech_blog10_baseline_performance_overview.png">
+  <img alt="Baseline performance overview: average tokens per rank and balance ratio over iterations" src="./../media/tech_blog10_baseline_performance_overview.png">
-  <img src="./../media/tech_blog10_baseline_performance_detail.png">
+  <img alt="Baseline performance detail in iterations [0, 12000]: tokens and balance ratio" src="./../media/tech_blog10_baseline_performance_detail.png">
-  <img src="./../media/tech_blog10_context_wait_performance.png">
+  <img alt="Context Wait (timeout_iters) performance: tokens and balance ratio over iterations" src="./../media/tech_blog10_context_wait_performance.png">
-  <img src="./../media/tech_blog10_full_strategy_performance.png">
+  <img alt="Full strategy (timeout_iters + batching_wait_iters) performance: tokens and balance ratio" src="./../media/tech_blog10_full_strategy_performance.png">
-  <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png">
+  <img alt="Pareto curves: TPS/GPU vs TTFT for different timeout_iters and batching_wait_iters" src="./../media/tech_blog10_tps_ttft_pareto_curve.png">

Also applies to: 103-107, 132-136, 141-145, 153-157, 173-177, 190-194


180-180: Polish phrasing and capitalization.

-Negative impact: The number of iterations increases due to the waiting mechanism, which had bad impact on tps.
+Negative impact: The number of iterations increases due to the waiting mechanism, which has a negative impact on TPS.

114-115: Minor wording tweak for hardware description.

-The experiments were conducted using the DeepSeekV3 model on a Blackwell GB200 machine, 8 GPU, with a DP size of 8 for the Attention module and an EP size of 8 for the MoE module.
+The experiments used the DeepSeekV3 model on an NVIDIA GB200 system (8 GPUs), with DP size 8 for Attention and EP size 8 for MoE.

41-48: Consider renaming elp_time to elapsed_time for clarity.

Also update the formula and definitions to match.

-    sol\_tps = elp\_time / sol\_time * tps
+    sol\_tps = elapsed\_time / sol\_time * tps
-- `elp_time` denotes the empirically measured end-to-end elapsed time.
+- `elapsed_time` denotes the empirically measured end-to-end elapsed time.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 20c5c1e and c3778f8.

⛔ Files ignored due to path filters (7)
  • docs/source/blogs/media/tech_blog10_baseline_performance_detail.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_performance_overview.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_context_wait_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_dataset_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_full_strategy_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png is excluded by !**/*.png
📒 Files selected for processing (2)
  • .gitattributes (1 hunks)
  • docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

[grammar] ~13-~13: There might be a mistake here.
Context: ...s. ## Theoretical Analysis and Modeling Optimization Goals: - Minimize the loa...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ...sis and Modeling Optimization Goals: - Minimize the load difference between dif...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...ion time. The formula is as follows: $$ iter_time_i = \max_{0 \leq m < N} iter...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...i = \max{0 \leq m < N} iter_time_{im} $$ where $iter_time_{im}$ represents t...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance_ratio = avg_tokens / max_toke...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...lance_ratio = avg_tokens / max_tokens $$ where $avg_tokens = {\sum_{i=1}^N t...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ... theoretical upper-bound throughput. $$ sol_time = \sum_{i=0}^{\infty} (iter_t...

(QB_NEW_EN)


[grammar] ~37-~37: There might be a mistake here.
Context: ...{\infty} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_t...

(QB_NEW_EN)


[grammar] ~40-~40: There might be a mistake here.
Context: ...} (iter_time_i / balance_ratio) $$ $$ sol_tps = elp_time / sol_time * tps $...

(QB_NEW_EN)


[grammar] ~41-~41: There might be a mistake here.
Context: ... sol_tps = elp_time / sol_time * tps $$ Where: - iter_time_i denotes the e...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...

(QB_NEW_EN)


[grammar] ~120-~120: There might be a mistake here.
Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---...

(QB_NEW_EN)


[grammar] ~121-~121: There might be a mistake here.
Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 | |...

(QB_NEW_EN)


[grammar] ~122-~122: There might be a mistake here.
Context: ...| | Baseline | 25664 | 54.11% | 39552 | | ADP Balance with Context Wait | 33499 ...

(QB_NEW_EN)


[grammar] ~123-~123: There might be a mistake here.
Context: ...Context Wait | 33499 | 84.33% | 38312 | | ADP Balance with Full Strategy | 34140...

(QB_NEW_EN)


[style] ~202-~202: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative.
Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe...

(PROVE_TO_BE_WORDY)

🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

59-59: Images should have alternate text (alt text)

(MD045, no-alt-text)


75-75: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


84-84: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


105-105: Images should have alternate text (alt text)

(MD045, no-alt-text)


134-134: Images should have alternate text (alt text)

(MD045, no-alt-text)


143-143: Images should have alternate text (alt text)

(MD045, no-alt-text)


155-155: Images should have alternate text (alt text)

(MD045, no-alt-text)


164-164: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


175-175: Images should have alternate text (alt text)

(MD045, no-alt-text)


192-192: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@yunruis yunruis force-pushed the user/yunruis/adp_balance_blog branch from c3778f8 to deefc99 Compare August 27, 2025 09:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)

118-126: Unify parameter names and terminology across text and config.

Replace ctx_wait/batching_wait/batch_waiting_iters with timeout_iters/batching_wait_iters and refer to “Context Wait.” Prevent copy-paste config errors.

-The experiments are divided into three groups: a baseline, ADP Balance with `ctx_wait`, and ADP Balance with both `ctx_wait` and `batching_wait`. As the results show, enabling `ctx_wait` provides a significant TPS improvement. Further enabling `batching_wait` enhances the balancing effect even more.
+The experiments are divided into three groups: Baseline, ADP Balance with `timeout_iters` (“Context Wait”), and ADP Balance with both `timeout_iters` and `batching_wait_iters`. Enabling Context Wait provides a significant TPS improvement; adding `batching_wait_iters` further enhances balance.
-With the full ADP Balance strategy (`timeout_iters=50` and `batch_waiting_iters=10`), the system throughput is further boosted.
+With the full ADP Balance strategy (`timeout_iters=50` and `batching_wait_iters=10`), the system throughput is further boosted.
-The curves demonstrate the impact of different `timeout_iters` (TO) and `batch_waiting_iters` (BI) configurations on system performance.
+The curves demonstrate the impact of different `timeout_iters` (TO) and `batching_wait_iters` (BI) configurations on system performance.
-In low-load scenarios, `batch_waiting_iters` provides no significant TPS benefit while still increasing TTFT.
+In low-load scenarios, `batching_wait_iters` provides no significant TPS benefit while still increasing TTFT.

Also applies to: 160-168, 182-189


57-63: Track remaining images in Git LFS to avoid repo bloat.

Four referenced PNGs appear not tracked by LFS yet. Add to .gitattributes and re-add via LFS.

+# .gitattributes (append)
+docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png filter=lfs diff=lfs merge=lfs -text
+docs/source/blogs/media/tech_blog10_dataset_token_distribution.png    filter=lfs diff=lfs merge=lfs -text
+docs/source/blogs/media/tech_blog10_baseline_performance_overview.png filter=lfs diff=lfs merge=lfs -text
+docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png         filter=lfs diff=lfs merge=lfs -text

Then:

#!/bin/bash
git lfs track "docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png" \
              "docs/source/blogs/media/tech_blog10_dataset_token_distribution.png" \
              "docs/source/blogs/media/tech_blog10_baseline_performance_overview.png" \
              "docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png"
git add .gitattributes \
        docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png \
        docs/source/blogs/media/tech_blog10_dataset_token_distribution.png \
        docs/source/blogs/media/tech_blog10_baseline_performance_overview.png \
        docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png

Quick verification script:

#!/bin/bash
md="docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md"
rg -nPo '<img\s+src="\K[^"]+\.png' "$md" \
| sed 's#^\./\.\./#docs/source/blogs/#' \
| while read -r p; do
  printf "%-80s %s\n" "$p" "$(git lfs ls-files | awk '{print $NF}' | rg -q "^$p$" && echo LFS || echo NOT_LFS)"
done

Also applies to: 103-109, 132-138, 190-196

🧹 Nitpick comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)

180-181: Polish phrasing for clarity.

Minor grammar and casing fix.

-Negative impact: The number of iterations increases due to the waiting mechanism, which had bad impact on tps.
+Negative impact: The waiting mechanism increases the number of iterations, which negatively impacts TPS.

44-49: Naming nit: elp_time → elapsed_time for readability.

Improves self-documentation of the formula.

-- `elp_time` denotes the empirically measured end-to-end elapsed time.
+- `elapsed_time` denotes the empirically measured end-to-end elapsed time.
-    sol\\_tps = elp\\_time / sol\\_time * tps
+    sol\\_tps = elapsed\\_time / sol\\_time * tps
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c3778f8 and deefc99.

⛔ Files ignored due to path filters (7)
  • docs/source/blogs/media/tech_blog10_baseline_performance_detail.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_performance_overview.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_context_wait_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_dataset_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_full_strategy_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png is excluded by !**/*.png
📒 Files selected for processing (2)
  • .gitattributes (1 hunks)
  • docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

[grammar] ~13-~13: There might be a mistake here.
Context: ...s. ## Theoretical Analysis and Modeling Optimization Goals: - Minimize the loa...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ...sis and Modeling Optimization Goals: - Minimize the load difference between dif...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...ion time. The formula is as follows: $$ iter\time_i = \max{0 \leq m < N} iter...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...i = \max_{0 \leq m < N} iter\time{im} $$ where $iter\time{im}$ represents ...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance\_ratio = avg\_tokens / max\_t...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...ce\_ratio = avg\_tokens / max\_tokens $$ where $avg\tokens = {\sum{i=1}^N ...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ... theoretical upper-bound throughput. $$ sol\time = \sum{i=0}^{\infty} (iter\...

(QB_NEW_EN)


[grammar] ~37-~37: There might be a mistake here.
Context: ...infty} (iter\_time_i / balance\_ratio) $$ $$ sol\_tps = elp\_time / sol...

(QB_NEW_EN)


[grammar] ~40-~40: There might be a mistake here.
Context: ...(iter\_time_i / balance\_ratio) $$ $$ sol\_tps = elp\_time / sol\_time * tp...

(QB_NEW_EN)


[grammar] ~41-~41: There might be a mistake here.
Context: ...ol\_tps = elp\_time / sol\_time * tps $$ Where: - iter_time_i denotes the e...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...

(QB_NEW_EN)


[grammar] ~120-~120: There might be a mistake here.
Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---...

(QB_NEW_EN)


[grammar] ~121-~121: There might be a mistake here.
Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 | |...

(QB_NEW_EN)


[grammar] ~122-~122: There might be a mistake here.
Context: ...| | Baseline | 25664 | 54.11% | 39552 | | ADP Balance with Context Wait | 33499 ...

(QB_NEW_EN)


[grammar] ~123-~123: There might be a mistake here.
Context: ...Context Wait | 33499 | 84.33% | 38312 | | ADP Balance with Full Strategy | 34140...

(QB_NEW_EN)


[style] ~202-~202: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative.
Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe...

(PROVE_TO_BE_WORDY)

🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

59-59: Images should have alternate text (alt text)

(MD045, no-alt-text)


84-84: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


105-105: Images should have alternate text (alt text)

(MD045, no-alt-text)


134-134: Images should have alternate text (alt text)

(MD045, no-alt-text)


143-143: Images should have alternate text (alt text)

(MD045, no-alt-text)


155-155: Images should have alternate text (alt text)

(MD045, no-alt-text)


175-175: Images should have alternate text (alt text)

(MD045, no-alt-text)


192-192: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@yunruis yunruis force-pushed the user/yunruis/adp_balance_blog branch from deefc99 to 567a3f7 Compare August 27, 2025 09:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)

59-60: Add alt text to all images (MD045).

Provide concise, descriptive alt for accessibility and lint compliance.

-  <img src="./../media/tech_blog10_baseline_round_robin_strategy.png">
+  <img alt="Baseline: balance context tokens across ranks via sort + round-robin" src="./../media/tech_blog10_baseline_round_robin_strategy.png">
...
-  <img src="./../media/tech_blog10_dataset_token_distribution.png">
+  <img alt="Input and output token length distribution" src="./../media/tech_blog10_dataset_token_distribution.png">
...
-  <img src="./../media/tech_blog10_baseline_performance_overview.png">
+  <img alt="Baseline average tokens and balance ratio over iterations" src="./../media/tech_blog10_baseline_performance_overview.png">
...
-  <img src="./../media/tech_blog10_baseline_performance_detail.png">
+  <img alt="Baseline average tokens and balance ratio for iterations [100, 12000]" src="./../media/tech_blog10_baseline_performance_detail.png">
...
-  <img src="./../media/tech_blog10_context_wait_performance.png">
+  <img alt="Context Wait: average tokens and balance ratio over iterations" src="./../media/tech_blog10_context_wait_performance.png">
...
-  <img src="./../media/tech_blog10_full_strategy_performance.png">
+  <img alt="Full Strategy: average tokens and balance ratio over iterations" src="./../media/tech_blog10_full_strategy_performance.png">
...
-  <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png">
+  <img alt="Pareto curves of TPS/GPU vs TTFT" src="./../media/tech_blog10_tps_ttft_pareto_curve.png">

Also applies to: 105-106, 134-135, 143-144, 155-156, 175-176, 192-193


84-89: Label the second timeline code fence (MD040).

Add language hint for the illustrative block.

-```
+```text
 iter_i:     [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
 iter_i+1:   [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gnN]
 ...
 iter_i+N-1: [*C0*, g01, ..., g0M], [*C1*, g11, ..., g1M], ..., [*CN*, gN1, ..., gNM]

---

`69-71`: **Unify parameter name: time_out_iters → timeout_iters.**

Use `timeout_iters` consistently to match config and prior references.

```diff
-Specifically, we introduce two parameters: `time_out_iters` and `batching_wait_iters`.
-- `time_out_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase.
+Specifically, we introduce two parameters: `timeout_iters` and `batching_wait_iters`.
+- `timeout_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase.
 - `batching_wait_iters`: The maximum number of batch iterations to wait to balance the number of context batches across ranks. After the initial wait, all ranks will have context batches. However, the ranks that have waited the longest might accumulate multiple context batches, while the ranks that were just freed up might only have one. This can still lead to an imbalance in the number of context batches. By introducing an additional waiting period, newly scheduled ranks can accumulate more context batches, leading to a more balanced load distribution.
🧹 Nitpick comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)

114-115: Minor grammar: pluralize GPU.

Change “8 GPU” → “8 GPUs”; tighten phrasing.

-The experiments were conducted using the DeepSeekV3 model on a Blackwell GB200 machine, 8 GPU, with a DP size of 8 for the Attention module and an EP size of 8 for the MoE module.
+The experiments were conducted using the DeepSeekV3 model on a Blackwell GB200 machine, 8 GPUs, with DP size 8 for the Attention module and EP size 8 for the MoE module.

41-49: Optional: rename elp_time → elapsed_time for clarity.

Improves readability of formulas and definitions.

-    sol\\_tps = elp\\_time / sol\\_time * tps
+    sol\\_tps = elapsed\\_time / sol\\_time * tps
@@
-- `elp_time` denotes the empirically measured end-to-end elapsed time.
+- `elapsed_time` denotes the empirically measured end-to-end elapsed time.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between deefc99 and 567a3f7.

⛔ Files ignored due to path filters (7)
  • docs/source/blogs/media/tech_blog10_baseline_performance_detail.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_performance_overview.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_context_wait_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_dataset_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_full_strategy_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png is excluded by !**/*.png
📒 Files selected for processing (2)
  • .gitattributes (1 hunks)
  • docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

[grammar] ~13-~13: There might be a mistake here.
Context: ...s. ## Theoretical Analysis and Modeling Optimization Goals: - Minimize the loa...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ...sis and Modeling Optimization Goals: - Minimize the load difference between dif...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...ion time. The formula is as follows: $$ iter\time_i = \max{0 \leq m < N} iter...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...i = \max_{0 \leq m < N} iter\time{im} $$ where $iter\time{im}$ represents ...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows: $$ balance\_ratio = avg\_tokens / max\_t...

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...ce\_ratio = avg\_tokens / max\_tokens $$ where $avg\tokens = {\sum{i=1}^N ...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ... theoretical upper-bound throughput. $$ sol\time = \sum{i=0}^{\infty} (iter\...

(QB_NEW_EN)


[grammar] ~37-~37: There might be a mistake here.
Context: ...infty} (iter\_time_i / balance\_ratio) $$ $$ sol\_tps = elp\_time / sol...

(QB_NEW_EN)


[grammar] ~40-~40: There might be a mistake here.
Context: ...(iter\_time_i / balance\_ratio) $$ $$ sol\_tps = elp\_time / sol\_time * tp...

(QB_NEW_EN)


[grammar] ~41-~41: There might be a mistake here.
Context: ...ol\_tps = elp\_time / sol\_time * tps $$ Where: - iter_time_i denotes the e...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...

(QB_NEW_EN)


[grammar] ~120-~120: There might be a mistake here.
Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---...

(QB_NEW_EN)


[grammar] ~121-~121: There might be a mistake here.
Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 | |...

(QB_NEW_EN)


[grammar] ~122-~122: There might be a mistake here.
Context: ...| | Baseline | 25664 | 54.11% | 39552 | | ADP Balance with Context Wait | 33499 ...

(QB_NEW_EN)


[grammar] ~123-~123: There might be a mistake here.
Context: ...Context Wait | 33499 | 84.33% | 38312 | | ADP Balance with Full Strategy | 34140...

(QB_NEW_EN)


[style] ~202-~202: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative.
Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe...

(PROVE_TO_BE_WORDY)

🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

59-59: Images should have alternate text (alt text)

(MD045, no-alt-text)


84-84: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


105-105: Images should have alternate text (alt text)

(MD045, no-alt-text)


118-118: Spaces inside code span elements

(MD038, no-space-in-code)


134-134: Images should have alternate text (alt text)

(MD045, no-alt-text)


143-143: Images should have alternate text (alt text)

(MD045, no-alt-text)


155-155: Images should have alternate text (alt text)

(MD045, no-alt-text)


175-175: Images should have alternate text (alt text)

(MD045, no-alt-text)


192-192: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@yunruis yunruis force-pushed the user/yunruis/adp_balance_blog branch 2 times, most recently from ce51c72 to db8d4a9 Compare August 28, 2025 05:06
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)

1-210: Fix documentation naming, alt text, and code‐fence labels

  • Consolidate parameter naming: replace all occurrences of time_out_iters with timeout_iters in the prose to match the config snippet.
  • Add meaningful alt attributes to every <img> tag (currently missing on seven images).
  • Label all unlabeled fenced code blocks with the correct language (e.g., yaml, text).
♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)

71-73: Unify parameter name: time_out_iters → timeout_iters (consistency with later sections).

Readers will copy/paste; mismatch causes confusion.

Apply:

-Specifically, we introduce two parameters: `time_out_iters` and `batching_wait_iters`.
-- `time_out_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase.
+Specifically, we introduce two parameters: `timeout_iters` and `batching_wait_iters`.
+- `timeout_iters`: The maximum number of steps a rank will wait when it has a context request while others do not. This waiting mechanism ensures that all ranks have pending context requests, thus avoiding the imbalance caused by some ranks processing context requests while others are in the generation phase.

120-120: Fix inline code backticks and wording (MD038).

-The experiments are divided into three groups: a baseline, ADP Balance with ``timeout_iters` (“Context Wait”), and ADP Balance with both `timeout_iters` and `batching_wait_iters`. Enabling Context Wait provides a significant TPS improvement; adding `batching_wait_iters` further enhances balance.
+The experiments are divided into three groups: Baseline, ADP Balance with `timeout_iters` ("Context Wait"), and ADP Balance with both `timeout_iters` and `batching_wait_iters`. Enabling Context Wait provides a significant TPS improvement; adding `batching_wait_iters` further enhances balance.

164-171: Align naming and fence label; fix config section.

Use batching_wait_iters and ```yaml (no space).

-With the full ADP Balance strategy (`timeout_iters=50` and `batch_waiting_iters=10`), the system throughput is further boosted. The output TPS reaches 34,140, achieving a 1.33x speedup. The overall `balance_ratio` improves to 87.70%, and the `sol_tps` of 37,912 is very close to the actual `tps` of 34,140, indicating a highly balanced system. Users can enable this feature by adding the following configuration to `config.yaml`:
-``` yaml
+With the full ADP Balance strategy (`timeout_iters=50` and `batching_wait_iters=10`), the system throughput is further boosted. The output TPS reaches 34,140, achieving a 1.33x speedup. The overall `balance_ratio` improves to 87.70%, and the `sol_tps` of 37,912 is very close to the actual `tps` of 34,140, indicating a highly balanced system. Users can enable this feature by adding the following configuration to `config.yaml`:
+```yaml
 attention_dp_config:
     enable_balance: true
-    batching_wait_iters: 10
+    batching_wait_iters: 10
     timeout_iters: 50
🧹 Nitpick comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)

19-51: Optional: prefer MyST math blocks and clearer variable name.

Consider MyST’s math directives for Sphinx stability and rename elp_timee2e_time (or elapsed_time) in text and formulas for clarity.

-$$
-    sol\_tps = elp\_time / sol\_time * tps
-$$
+$$
+    sol\_tps = e2e\_time / sol\_time \times tps
+$$
@@
-- `elp_time` denotes the empirically measured end-to-end elapsed time.
+- `e2e_time` denotes the empirically measured end-to-end elapsed time.

If desired, convert all display math blocks to:

```{math}
:label: sol_tps
\mathrm{sol\_tps} = \frac{\mathrm{e2e\_time}}{\mathrm{sol\_time}} \times \mathrm{tps}

---

`59-65`: **Nit: captions are good; ensure figures reference numbers match and are sequential.**

Skim after edits to keep Figure N labels and references consistent.



Also applies to: 103-111, 132-140, 141-149, 155-161, 173-181, 192-198

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**💡 Knowledge Base configuration:**

- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 567a3f708b7916132c009bb74e217a6808e7c179 and db8d4a95c40315519f5ebff449c174a77fba98c7.

</details>

<details>
<summary>⛔ Files ignored due to path filters (7)</summary>

* `docs/source/blogs/media/tech_blog10_baseline_performance_detail.png` is excluded by `!**/*.png`
* `docs/source/blogs/media/tech_blog10_baseline_performance_overview.png` is excluded by `!**/*.png`
* `docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png` is excluded by `!**/*.png`
* `docs/source/blogs/media/tech_blog10_context_wait_performance.png` is excluded by `!**/*.png`
* `docs/source/blogs/media/tech_blog10_dataset_token_distribution.png` is excluded by `!**/*.png`
* `docs/source/blogs/media/tech_blog10_full_strategy_performance.png` is excluded by `!**/*.png`
* `docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png` is excluded by `!**/*.png`

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `.gitattributes` (1 hunks)
* `docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md` (1 hunks)

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary>

* .gitattributes

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md</summary>

[grammar] ~13-~13: There might be a mistake here.
Context: ...s.  ## Theoretical Analysis and Modeling **Optimization Goals**: - Minimize the loa...

(QB_NEW_EN)

---

[grammar] ~14-~14: There might be a mistake here.
Context: ...sis and Modeling **Optimization Goals**: - Minimize the load difference between dif...

(QB_NEW_EN)

---

[grammar] ~21-~21: There might be a mistake here.
Context: ...ion time. The formula is as follows:  $$     iter\_time_i = \max_{0 \leq m < N} iter\...

(QB_NEW_EN)

---

[grammar] ~22-~22: There might be a mistake here.
Context: ..._i = \max_{0 \leq m < N} iter\_time_{im} $$  where $iter\_time_{im}$ represents t...

(QB_NEW_EN)

---

[grammar] ~30-~30: There might be a mistake here.
Context: ...ol_tps`) can be computed as follows:  $$     balance\_ratio = avg\_tokens / max\_toke...

(QB_NEW_EN)

---

[grammar] ~31-~31: There might be a mistake here.
Context: ...lance\_ratio = avg\_tokens / max\_tokens $$  where $avg\_tokens = {\sum_{i=1}^N t...

(QB_NEW_EN)

---

[grammar] ~38-~38: There might be a mistake here.
Context: ... theoretical upper-bound throughput.  $$     sol\_time = \sum_{i=0}^{\infty} (iter\_t...

(QB_NEW_EN)

---

[grammar] ~39-~39: There might be a mistake here.
Context: ...{\infty} (iter\_time_i / balance\_ratio) $$  $$     sol\_tps = elp\_time / sol\_t...

(QB_NEW_EN)

---

[grammar] ~42-~42: There might be a mistake here.
Context: ...} (iter\_time_i / balance\_ratio) $$  $$     sol\_tps = elp\_time / sol\_time * tps $...

(QB_NEW_EN)

---

[grammar] ~43-~43: There might be a mistake here.
Context: ...  sol\_tps = elp\_time / sol\_time * tps $$  Where: - `iter_time_i` denotes the e...

(QB_NEW_EN)

---

[grammar] ~84-~84: There might be a mistake here.
Context: ...it for the first N-1 iterations. In the N-th iteration, when the number of context r...

(QB_NEW_EN)

---

[grammar] ~122-~122: There might be a mistake here.
Context: ...od | TPS | Avg Balance Ratio | Sol TPS | | ------ | --- | ----------------- | ---...

(QB_NEW_EN)

---

[grammar] ~123-~123: There might be a mistake here.
Context: ...-- | --- | ----------------- | ------- | | Baseline | 25664 | 54.11% | 39552 |  |...

(QB_NEW_EN)

---

[grammar] ~124-~124: There might be a mistake here.
Context: ...| | Baseline | 25664 | 54.11% | 39552 |  | ADP Balance with Context Wait | 33499 ...

(QB_NEW_EN)

---

[grammar] ~125-~125: There might be a mistake here.
Context: ...Context Wait | 33499 | 84.33% |  38312 | | ADP Balance with Full Strategy | 34140...

(QB_NEW_EN)

---

[style] ~204-~204: Consider replacing ‘prove to be’ with a shorter or less frequently used alternative.
Context: ...nimum latency. The ADP Balance strategy proves to be a powerful tool for optimizing LLM infe...

(PROVE_TO_BE_WORDY)

</details>

</details>
<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

<details>
<summary>docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md</summary>

61-61: Images should have alternate text (alt text)

(MD045, no-alt-text)

---

86-86: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

107-107: Images should have alternate text (alt text)

(MD045, no-alt-text)

---

120-120: Spaces inside code span elements

(MD038, no-space-in-code)

---

136-136: Images should have alternate text (alt text)

(MD045, no-alt-text)

---

145-145: Images should have alternate text (alt text)

(MD045, no-alt-text)

---

157-157: Images should have alternate text (alt text)

(MD045, no-alt-text)

---

177-177: Images should have alternate text (alt text)

(MD045, no-alt-text)

---

194-194: Images should have alternate text (alt text)

(MD045, no-alt-text)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary>

* GitHub Check: Pre-commit Check

</details>

<details>
<summary>🔇 Additional comments (1)</summary><blockquote>

<details>
<summary>docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)</summary><blockquote>

`1-12`: **Clear intro and motivation.**

Good framing and scope for the blog; structure reads well.

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@yunruis yunruis force-pushed the user/yunruis/adp_balance_blog branch from db8d4a9 to edffceb Compare August 28, 2025 08:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)

1-369: Run markdownlint and fix reported doc errors (MD040/MD045/MD051 + others).

Lint (markdownlint-cli2 v0.17.2) shows multiple failures in docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (lines 1–369) and across the docs tree — actionable fixes: add alt text for images (MD045), remove/replace inline HTML with Markdown (MD033), specify fenced-code languages (MD040), correct link fragment targets (MD051), ensure blank lines around headings/lists/code fences (MD022/MD032/MD031), and wrap/shorten long lines (MD013). Re-run: npx --yes markdownlint-cli2 "/docs//*.md" and resolve all errors before merging.

♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)

150-155: Fix generation index in the second timeline (use M, not N).

Indexes in the ADP timeline should mirror the Baseline block (M for generation). Also keeps narrative consistent with Figure captions and text.

Apply:

-iter_i:     [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN]
-iter_i+1:   [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN]
+iter_i:     [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM]
+iter_i+1:   [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM]

341-344: Rename to batching_wait_iters for consistency with config and prior usage.

The bullet uses batch_waiting_iters; everywhere else and in YAML it’s batching_wait_iters.

-  - **Low-load scenarios**: `batch_waiting_iters` provides minimal benefit while adding latency overhead
+  - **Low-load scenarios**: `batching_wait_iters` provides minimal benefit while adding latency overhead

100-100: Add alt text to all images (MD045, accessibility).

Every lacks alt; add concise descriptions.

-  <img src="./../media/tech_blog10_baseline_round_robin_strategy.png">
+  <img alt="Baseline: round-robin distribution across ranks" src="./../media/tech_blog10_baseline_round_robin_strategy.png">
-  <img src="./../media/tech_blog10_dataset_token_distribution.png">
+  <img alt="Input and output token length distributions" src="./../media/tech_blog10_dataset_token_distribution.png">
-  <img src="./../media/tech_blog10_baseline_performance_overview.png">
+  <img alt="Baseline: avg tokens and balance ratio over iterations" src="./../media/tech_blog10_baseline_performance_overview.png">
-  <img src="./../media/tech_blog10_baseline_performance_detail.png">
+  <img alt="Baseline detail: balance ratio for iterations 100–12,000" src="./../media/tech_blog10_baseline_performance_detail.png">
-  <img src="./../media/tech_blog10_context_wait_performance.png">
+  <img alt="Context Wait: improved balance stability 100–12,000" src="./../media/tech_blog10_context_wait_performance.png">
-  <img src="./../media/tech_blog10_full_strategy_performance.png">
+  <img alt="Full Strategy: balance stability 100–12,000" src="./../media/tech_blog10_full_strategy_performance.png">
-  <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png">
+  <img alt="Pareto curves: TPS/GPU vs TTFT across TO/BW settings" src="./../media/tech_blog10_tps_ttft_pareto_curve.png">

Also applies to: 183-183, 229-229, 243-243, 264-264, 304-304, 328-328

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between db8d4a9 and edffceb.

⛔ Files ignored due to path filters (7)
  • docs/source/blogs/media/tech_blog10_baseline_performance_detail.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_performance_overview.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_context_wait_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_dataset_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_full_strategy_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png is excluded by !**/*.png
📒 Files selected for processing (2)
  • .gitattributes (1 hunks)
  • docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

[grammar] ~5-~5: There might be a mistake here.
Context: ... TensorRT-LLM team ## Table of Contents - [ADP Balance Strategy](#adp-balance-strat...

(QB_NEW_EN)


[grammar] ~6-~6: There might be a mistake here.
Context: ...able of Contents - ADP Balance Strategy - Table of Contents ...

(QB_NEW_EN)


[grammar] ~7-~7: There might be a mistake here.
Context: ...balance-strategy) - Table of Contents - [Motivation and Background](#motivation-a...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ...contents) - Motivation and Background - [Theoretical Analysis and Modeling](#theo...

(QB_NEW_EN)


[grammar] ~9-~9: There might be a mistake here.
Context: ...) - Theoretical Analysis and Modeling - [Mathematical Modeling](#mathematical-mod...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...d-modeling) - Mathematical Modeling - [Performance Metrics](#performance-metric...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...cal-modeling) - Performance Metrics - [Scheduling Strategies for Load Balancing...

(QB_NEW_EN)


[grammar] ~12-~12: There might be a mistake here.
Context: ...Scheduling Strategies for Load Balancing](#scheduling-strategies-for-load-balancing) - [Baseline: Round-Robin Token Distribution...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...Baseline: Round-Robin Token Distribution](#baseline-round-robin-token-distribution) - [ADP Balance Strategy: Coordinated Waitin...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ... Strategy: Coordinated Waiting Mechanism](#adp-balance-strategy-coordinated-waiting-mechanism) - [Performance Analysis: Baseline vs. ADP B...

(QB_NEW_EN)


[grammar] ~15-~15: There might be a mistake here.
Context: ...mance Analysis: Baseline vs. ADP Balance](#performance-analysis-baseline-vs-adp-balance) - Experiments - [Settin...

(QB_NEW_EN)


[grammar] ~16-~16: There might be a mistake here.
Context: ...aseline-vs-adp-balance) - Experiments - Setting - [Dataset Conf...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...xperiments](#experiments) - Setting - [Dataset Configuration](#dataset-configur...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...#setting) - Dataset Configuration - [Hardware and Model Configuration](#hardw...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ... - Hardware and Model Configuration - [Performance Results](#performance-result...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...onfiguration) - Performance Results - [Performance Summary](#performance-summar...

(QB_NEW_EN)


[grammar] ~21-~21: There might be a mistake here.
Context: ...ce-results) - Performance Summary - [Baseline Performance](#baseline-performa...

(QB_NEW_EN)


[grammar] ~22-~22: There might be a mistake here.
Context: ...e-summary) - Baseline Performance - [ADP Balance with Context Wait Implementa...

(QB_NEW_EN)


[grammar] ~23-~23: There might be a mistake here.
Context: ...Balance with Context Wait Implementation](#adp-balance-with-context-wait-implementation) - [ADP Balance with Full Strategy Implement...

(QB_NEW_EN)


[grammar] ~24-~24: There might be a mistake here.
Context: ...alance with Full Strategy Implementation](#adp-balance-with-full-strategy-implementation) - [Pareto Analysis: Throughput-Latency Trad...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...hroughput-Latency Trade-off Optimization](#pareto-analysis-throughput-latency-trade-off-optimization) - Conclusion - [Acknowledg...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...-trade-off-optimization) - Conclusion - Acknowledgement ## M...

(QB_NEW_EN)


[grammar] ~33-~33: There might be a mistake here.
Context: ...ociated with disaggregated architectures - Specific workload patterns: Scenarios ...

(QB_NEW_EN)


[grammar] ~34-~34: There might be a mistake here.
Context: ...) and long output sequence lengths (OSL) - Offline inference: Batch processing en...

(QB_NEW_EN)


[grammar] ~57-~57: There might be a mistake here.
Context: ... two key metrics: #### 1. Balance Ratio The balance_ratio measures the load di...

(QB_NEW_EN)


[grammar] ~64-~64: There might be a mistake here.
Context: ...tokens}}{\text{max_tokens}} ``` where: - $\text{avg_tokens} = \frac{\sum_{i=1}^{...

(QB_NEW_EN)


[grammar] ~65-~65: There might be a mistake here.
Context: ...tokens}_i}{N}$ (average tokens per rank) - $\text{max_tokens} = \max(\text{tokens}...

(QB_NEW_EN)


[grammar] ~66-~66: There might be a mistake here.
Context: ...tokens}_i)$ (maximum tokens on any rank) - $\text{tokens}_i$ represents the number ...

(QB_NEW_EN)


[grammar] ~71-~71: There might be a mistake here.
Context: ...# 2. Speed-of-Light Throughput (SOL TPS) The sol_tps represents the theoretical...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...}} \times \text{actual_tps} ``` where: - iter_time_i: Measured execution time of iteration $...

(QB_NEW_EN)


[grammar] ~83-~83: There might be a mistake here.
Context: ...Measured execution time of iteration $i$ - elapsed_time: Total empirically measured end-to-end ...

(QB_NEW_EN)


[grammar] ~84-~84: There might be a mistake here.
Context: ...cally measured end-to-end execution time - actual_tps: Observed throughput in tokens per seco...

(QB_NEW_EN)


[grammar] ~85-~85: There might be a mistake here.
Context: ...Observed throughput in tokens per second - sol_tps: Theoretical maximum throughput under p...

(QB_NEW_EN)


[grammar] ~113-~113: There might be a mistake here.
Context: ...ontext Synchronization (timeout_iters)** - Purpose: Ensures temporal alignment of...

(QB_NEW_EN)


[grammar] ~118-~118: There might be a mistake here.
Context: ...ch Equilibration (batching_wait_iters)** - Purpose: Balances the number of accumu...

(QB_NEW_EN)


[grammar] ~128-~128: There might be a mistake here.
Context: ...me(ctx)>> Generation processing time:time(gen)` - N new requests arrive sequentially over ...

(QB_NEW_EN)


[grammar] ~131-~131: There might be a mistake here.
Context: ...ver N iterations Baseline Behavior: In the traditional approach, contexts ar...

(QB_NEW_EN)


[grammar] ~143-~143: There might be a mistake here.
Context: ...(ctx)(dominated by context processing) - **Total execution time**:time(ctx) × N` ...

(QB_NEW_EN)


[grammar] ~144-~144: There might be a mistake here.
Context: ...processing) - Total execution time: time(ctx) × N - Balance ratio: `(ctx_len + (M-1) + M ×...

(QB_NEW_EN)


[grammar] ~147-~147: There might be a mistake here.
Context: ...poor balance) ADP Balance Strategy: Our method synchronizes context processi...

(QB_NEW_EN)


[grammar] ~157-~157: There might be a mistake here.
Context: ...rations, time(ctx) for final iteration - Total execution time: `time(gen) × (N-...

(QB_NEW_EN)


[grammar] ~158-~158: There might be a mistake here.
Context: ...l iteration - Total execution time: time(gen) × (N-1) + time(ctx) - Balance ratio: 1.0 (perfect balance) -...

(QB_NEW_EN)


[grammar] ~159-~159: There might be a mistake here.
Context: ...Balance ratio: 1.0 (perfect balance) - Time savings: `(time(ctx) - time(gen))...

(QB_NEW_EN)


[grammar] ~164-~164: There might be a mistake here.
Context: ...rovement** due to optimal load balancing - ✅ Maximized GPU utilization across a...

(QB_NEW_EN)


[grammar] ~165-~165: There might be a mistake here.
Context: ...mized GPU utilization** across all ranks - ⚠️ Increased TTFT due to strategic w...

(QB_NEW_EN)


[grammar] ~166-~166: There might be a mistake here.
Context: ...TFT** due to strategic waiting mechanism - 📋 Best suited for throughput-orient...

(QB_NEW_EN)


[grammar] ~167-~167: There might be a mistake here.
Context: ...ted scenarios where TTFT is not critical ## Experiments ### Setting #### Dataset C...

(QB_NEW_EN)


[grammar] ~173-~173: There might be a mistake here.
Context: ... ### Setting #### Dataset Configuration We evaluate our approach using a compreh...

(QB_NEW_EN)


[grammar] ~176-~176: There might be a mistake here.
Context: ...Request volume*: 16,000 total requests - Average input length: 803 tokens - **A...

(QB_NEW_EN)


[grammar] ~177-~177: There might be a mistake here.
Context: ...s - Average input length: 803 tokens - Average output length: 3,653 tokens - ...

(QB_NEW_EN)


[grammar] ~178-~178: There might be a mistake here.
Context: ... Average output length: 3,653 tokens - Token distribution: Figure 2 illustrat...

(QB_NEW_EN)


[grammar] ~190-~190: There might be a mistake here.
Context: .... #### Hardware and Model Configuration Infrastructure: - Platform: NVIDIA...

(QB_NEW_EN)


[grammar] ~191-~191: There might be a mistake here.
Context: ... Model Configuration Infrastructure: - Platform: NVIDIA Blackwell GB200 syste...

(QB_NEW_EN)


[grammar] ~192-~192: There might be a mistake here.
Context: ...latform**: NVIDIA Blackwell GB200 system - GPU Count: 8 × GB200 GPUs - Model:...

(QB_NEW_EN)


[grammar] ~193-~193: There might be a mistake here.
Context: ...0 system - GPU Count: 8 × GB200 GPUs - Model: DeepSeek V3 - **Parallelization...

(QB_NEW_EN)


[grammar] ~194-~194: There might be a mistake here.
Context: ... 8 × GB200 GPUs - Model: DeepSeek V3 - Parallelization Strategy: - Attentio...

(QB_NEW_EN)


[grammar] ~195-~195: There might be a mistake here.
Context: ...pSeek V3 - Parallelization Strategy: - Attention module: Data Parallel (DP) siz...

(QB_NEW_EN)


[grammar] ~196-~196: There might be a mistake here.
Context: ...tion module: Data Parallel (DP) size = 8 - MoE module: Expert Parallel (EP) size = ...

(QB_NEW_EN)


[grammar] ~204-~204: There might be a mistake here.
Context: ... 1. Baseline: Round-robin scheduling 2. ADP Balance (Context Wait): Implementi...

(QB_NEW_EN)


[grammar] ~205-~205: There might be a mistake here.
Context: ...lementing timeout_iters parameter only 3. ADP Balance (Full Strategy): Complete ...

(QB_NEW_EN)


[grammar] ~210-~210: There might be a mistake here.
Context: ... Avg Balance Ratio | SOL TPS | Speedup | |---------------|------------|----------...

(QB_NEW_EN)


[grammar] ~211-~211: There might be a mistake here.
Context: ...---------|-------------------|---------| | Baseline | 25,664 | 54.11% | 39,552 | ...

(QB_NEW_EN)


[grammar] ~212-~212: There might be a mistake here.
Context: ...ine | 25,664 | 54.11% | 39,552 | 1.00× | | ADP Balance (Context Wait) | 33,499 | ...

(QB_NEW_EN)


[grammar] ~213-~213: There might be a mistake here.
Context: ...it) | 33,499 | 84.33% | 38,312 | 1.31× | | ADP Balance (Full Strategy) | 34,140 |...

(QB_NEW_EN)


[grammar] ~216-~216: There might be a mistake here.
Context: ... 37,912 | 1.33× | Key Observations: - Context Wait alone delivers a substantia...

(QB_NEW_EN)


[grammar] ~234-~234: There might be a mistake here.
Context: ...s

Critical Insights: - Imbalance window: Most severe imbalanc...

(QB_NEW_EN)


[grammar] ~237-~237: There might be a mistake here.
Context: ...eration 12,000, all requests transition to generation phase, naturally reducing im...

(QB_NEW_EN)


[grammar] ~248-~248: There might be a mistake here.
Context: ...

Performance Bottlenecks: - Balance ratio frequently drops to **0.4 ...

(QB_NEW_EN)


[grammar] ~257-~257: There might be a mistake here.
Context: ...nization. Performance Achievements: - Throughput: 33,499 TPS (1.31× speedup)...

(QB_NEW_EN)


[grammar] ~258-~258: There might be a mistake here.
Context: ...Throughput**: 33,499 TPS (1.31× speedup) - Balance improvement: 84.33% average (v...

(QB_NEW_EN)


[grammar] ~259-~259: There might be a mistake here.
Context: ...**: 84.33% average (vs. 54.11% baseline) - Efficiency: Actual TPS significantly c...

(QB_NEW_EN)


[grammar] ~269-~269: There might be a mistake here.
Context: ...em>

Remaining Challenges: Despite significant improvements, residu...

(QB_NEW_EN)


[grammar] ~284-~284: There might be a mistake here.
Context: ...` Performance Optimization Results: - Peak throughput: 34,140 TPS (1.33× spe...

(QB_NEW_EN)


[grammar] ~285-~285: There might be a mistake here.
Context: ...throughput**: 34,140 TPS (1.33× speedup) - Optimal balance: 87.70% average balanc...

(QB_NEW_EN)


[grammar] ~286-~286: There might be a mistake here.
Context: ... balance**: 87.70% average balance ratio - Near-theoretical efficiency: Actual TP...

(QB_NEW_EN)


[grammar] ~287-~287: There might be a mistake here.
Context: ...TPS (34,140) approaches SOL TPS (37,912) - System stability: Dramatically reduced...

(QB_NEW_EN)


[grammar] ~290-~290: There might be a mistake here.
Context: ...terations Production Configuration: Users can enable the full ADP Balance st...

(QB_NEW_EN)


[grammar] ~309-~309: There might be a mistake here.
Context: ... Key Improvements Over Context Wait: - Enhanced stability: Balance ratio main...

(QB_NEW_EN)


[grammar] ~314-~314: There might be a mistake here.
Context: ...erations Implementation Trade-offs: - ✅ Maximum throughput improvement: 33...

(QB_NEW_EN)


[grammar] ~315-~315: There might be a mistake here.
Context: ...ut improvement**: 33% gain over baseline - ✅ Near-optimal load balancing: 87.70...

(QB_NEW_EN)


[grammar] ~316-~316: There might be a mistake here.
Context: ...alancing**: 87.70% average balance ratio - ⚠️ Iteration overhead: Waiting mecha...

(QB_NEW_EN)


[grammar] ~317-~317: There might be a mistake here.
Context: ...echanisms increase total iteration count - ⚠️ TTFT impact: Strategic delays aff...

(QB_NEW_EN)


[grammar] ~341-~341: There might be a mistake here.
Context: ...itiation 4. Configuration Guidance: - Low-load scenarios: `batch_waiting_ite...

(QB_NEW_EN)


[grammar] ~356-~356: There might be a mistake here.
Context: ...mbalances. Experimental Validation: Comprehensive evaluation on the DeepSeek...

(QB_NEW_EN)


[grammar] ~357-~357: There might be a mistake here.
Context: ...tes compelling performance improvements: - 33% throughput increase: From 25,664 t...

(QB_NEW_EN)


[grammar] ~358-~358: There might be a mistake here.
Context: ...ut increase**: From 25,664 to 34,140 TPS - 87% load balance achievement: Dramatic...

(QB_NEW_EN)


[grammar] ~359-~359: There might be a mistake here.
Context: ...: Dramatic improvement from 54% baseline - Near-theoretical efficiency: Actual pe...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

11-11: Link fragments should be valid

(MD051, link-fragments)


100-100: Images should have alternate text (alt text)

(MD045, no-alt-text)


183-183: Images should have alternate text (alt text)

(MD045, no-alt-text)


229-229: Images should have alternate text (alt text)

(MD045, no-alt-text)


243-243: Images should have alternate text (alt text)

(MD045, no-alt-text)


264-264: Images should have alternate text (alt text)

(MD045, no-alt-text)


304-304: Images should have alternate text (alt text)

(MD045, no-alt-text)


328-328: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)

293-298: Config snippet looks correct and consistent.

Keys and naming (enable_balance, batching_wait_iters, timeout_iters) align with text and guidance.


190-198: Naming confirmed: ‘NVIDIA Blackwell GB200’ and ‘DeepSeek V3’ align with official public branding and are used consistently across existing documentation.

@yunruis yunruis force-pushed the user/yunruis/adp_balance_blog branch from edffceb to 97d6533 Compare August 28, 2025 09:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)

342-344: Rename parameter to batching_wait_iters for consistency.

Reintroduces an earlier naming bug.

-   - **Low-load scenarios**: `batch_waiting_iters` provides minimal benefit while adding latency overhead
+   - **Low-load scenarios**: `batching_wait_iters` provides minimal benefit while adding latency overhead

150-155: Fix generation index in second timeline (use M, not N).

Inconsistent with definitions and the final line in the same block.

-iter_i:     [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN]
-iter_i+1:   [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN]
+iter_i:     [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM]
+iter_i+1:   [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM]

143-146: Correct balance ratio expression to match its definition.

The current formula conflicts with balance_ratio = avg_tokens / max_tokens.

-- **Balance ratio**: `(ctx_len + (M-1) + M × (N-1)) / (N × ctx_len)` (poor balance)
+- **Avg tokens per rank**: `((ctx_len + (M - 1) + M × (N - 1)) / N)`
+- **Balance ratio**: `avg_tokens / ctx_len` (poor balance)
🧹 Nitpick comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)

64-67: Unify index notation with earlier zero-based definition.

Earlier, ranks use m∈[0,N); here the sum uses i∈[1,N]. Align for consistency.

-- $\text{avg\_tokens} = \frac{\sum_{i=1}^{N} \text{tokens}_{i}}{N}$ (average tokens per rank)
-- $\text{max\_tokens} = \max(\text{tokens}_{i})$ (maximum tokens on any rank)
-- $\text{tokens}_{i}$ represents the number of tokens processed by rank $i$
+- $\text{avg\_tokens} = \frac{1}{N}\sum_{m=0}^{N-1} \text{tokens}_{m}$ (average tokens per rank)
+- $\text{max\_tokens} = \max_{0 \le m < N}(\text{tokens}_{m})$ (maximum tokens on any rank)
+- $\text{tokens}_{m}$ represents the number of tokens processed by rank $m$
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between edffceb and 97d6533.

⛔ Files ignored due to path filters (7)
  • docs/source/blogs/media/tech_blog10_baseline_performance_detail.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_performance_overview.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_context_wait_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_dataset_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_full_strategy_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png is excluded by !**/*.png
📒 Files selected for processing (2)
  • .gitattributes (1 hunks)
  • docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

[grammar] ~5-~5: There might be a mistake here.
Context: ... TensorRT-LLM team ## Table of Contents - [ADP Balance Strategy](#adp-balance-strat...

(QB_NEW_EN)


[grammar] ~6-~6: There might be a mistake here.
Context: ...able of Contents - ADP Balance Strategy - Table of Contents ...

(QB_NEW_EN)


[grammar] ~7-~7: There might be a mistake here.
Context: ...balance-strategy) - Table of Contents - [Motivation and Background](#motivation-a...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ...contents) - Motivation and Background - [Theoretical Analysis and Modeling](#theo...

(QB_NEW_EN)


[grammar] ~9-~9: There might be a mistake here.
Context: ...) - Theoretical Analysis and Modeling - [Mathematical Modeling](#mathematical-mod...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...d-modeling) - Mathematical Modeling - [Performance Metrics](#performance-metric...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...cal-modeling) - Performance Metrics - [Scheduling Strategies for Load Balancing...

(QB_NEW_EN)


[grammar] ~12-~12: There might be a mistake here.
Context: ...Scheduling Strategies for Load Balancing](#scheduling-strategies-for-load-balancing) - [Baseline: Round-Robin Token Distribution...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...Baseline: Round-Robin Token Distribution](#baseline-round-robin-token-distribution) - [ADP Balance Strategy: Coordinated Waitin...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ... Strategy: Coordinated Waiting Mechanism](#adp-balance-strategy-coordinated-waiting-mechanism) - [Performance Analysis: Baseline vs. ADP B...

(QB_NEW_EN)


[grammar] ~15-~15: There might be a mistake here.
Context: ...mance Analysis: Baseline vs. ADP Balance](#performance-analysis-baseline-vs-adp-balance) - Experiments - [Settin...

(QB_NEW_EN)


[grammar] ~16-~16: There might be a mistake here.
Context: ...aseline-vs-adp-balance) - Experiments - Setting - [Dataset Conf...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...xperiments](#experiments) - Setting - [Dataset Configuration](#dataset-configur...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...#setting) - Dataset Configuration - [Hardware and Model Configuration](#hardw...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ... - Hardware and Model Configuration - [Performance Results](#performance-result...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...onfiguration) - Performance Results - [Performance Summary](#performance-summar...

(QB_NEW_EN)


[grammar] ~21-~21: There might be a mistake here.
Context: ...ce-results) - Performance Summary - [Baseline Performance](#baseline-performa...

(QB_NEW_EN)


[grammar] ~22-~22: There might be a mistake here.
Context: ...e-summary) - Baseline Performance - [ADP Balance with Context Wait Implementa...

(QB_NEW_EN)


[grammar] ~23-~23: There might be a mistake here.
Context: ...Balance with Context Wait Implementation](#adp-balance-with-context-wait-implementation) - [ADP Balance with Full Strategy Implement...

(QB_NEW_EN)


[grammar] ~24-~24: There might be a mistake here.
Context: ...alance with Full Strategy Implementation](#adp-balance-with-full-strategy-implementation) - [Pareto Analysis: Throughput-Latency Trad...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...hroughput-Latency Trade-off Optimization](#pareto-analysis-throughput-latency-trade-off-optimization) - Conclusion - [Acknowledg...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...-trade-off-optimization) - Conclusion - Acknowledgement ## M...

(QB_NEW_EN)


[grammar] ~33-~33: There might be a mistake here.
Context: ...ociated with disaggregated architectures - Specific workload patterns: Scenarios ...

(QB_NEW_EN)


[grammar] ~34-~34: There might be a mistake here.
Context: ...) and long output sequence lengths (OSL) - Offline inference: Batch processing en...

(QB_NEW_EN)


[grammar] ~57-~57: There might be a mistake here.
Context: ... two key metrics: #### 1. Balance Ratio The balance_ratio measures the load di...

(QB_NEW_EN)


[grammar] ~64-~64: There might be a mistake here.
Context: ...tokens}}{\text{max_tokens}} ``` where: - $\text{avg_tokens} = \frac{\sum_{i=1}^{...

(QB_NEW_EN)


[grammar] ~65-~65: There might be a mistake here.
Context: ...kens}_{i}}{N}$ (average tokens per rank) - $\text{max_tokens} = \max(\text{tokens}...

(QB_NEW_EN)


[grammar] ~66-~66: There might be a mistake here.
Context: ...kens}{i})$ (maximum tokens on any rank) - $\text{tokens}{i}$ represents the numbe...

(QB_NEW_EN)


[grammar] ~71-~71: There might be a mistake here.
Context: ...# 2. Speed-of-Light Throughput (SOL TPS) The sol_tps represents the theoretical...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...}} \times \text{actual_tps} ``` where: - iter_time_i: Measured execution time of iteration $...

(QB_NEW_EN)


[grammar] ~83-~83: There might be a mistake here.
Context: ...Measured execution time of iteration $i$ - elapsed_time: Total empirically measured end-to-end ...

(QB_NEW_EN)


[grammar] ~84-~84: There might be a mistake here.
Context: ...cally measured end-to-end execution time - actual_tps: Observed throughput in tokens per seco...

(QB_NEW_EN)


[grammar] ~85-~85: There might be a mistake here.
Context: ...Observed throughput in tokens per second - sol_tps: Theoretical maximum throughput under p...

(QB_NEW_EN)


[grammar] ~113-~113: There might be a mistake here.
Context: ...ontext Synchronization (timeout_iters)** - Purpose: Ensures temporal alignment of...

(QB_NEW_EN)


[grammar] ~118-~118: There might be a mistake here.
Context: ...ch Equilibration (batching_wait_iters)** - Purpose: Balances the number of accumu...

(QB_NEW_EN)


[grammar] ~128-~128: There might be a mistake here.
Context: ...me(ctx)>> Generation processing time:time(gen)` - N new requests arrive sequentially over ...

(QB_NEW_EN)


[grammar] ~131-~131: There might be a mistake here.
Context: ...ver N iterations Baseline Behavior: In the traditional approach, contexts ar...

(QB_NEW_EN)


[grammar] ~143-~143: There might be a mistake here.
Context: ...(ctx)(dominated by context processing) - **Total execution time**:time(ctx) × N` ...

(QB_NEW_EN)


[grammar] ~144-~144: There might be a mistake here.
Context: ...processing) - Total execution time: time(ctx) × N - Balance ratio: `(ctx_len + (M-1) + M ×...

(QB_NEW_EN)


[grammar] ~147-~147: There might be a mistake here.
Context: ...poor balance) ADP Balance Strategy: Our method synchronizes context processi...

(QB_NEW_EN)


[grammar] ~157-~157: There might be a mistake here.
Context: ...rations, time(ctx) for final iteration - Total execution time: `time(gen) × (N-...

(QB_NEW_EN)


[grammar] ~158-~158: There might be a mistake here.
Context: ...l iteration - Total execution time: time(gen) × (N-1) + time(ctx) - Balance ratio: 1.0 (perfect balance) -...

(QB_NEW_EN)


[grammar] ~159-~159: There might be a mistake here.
Context: ...Balance ratio: 1.0 (perfect balance) - Time savings: `(time(ctx) - time(gen))...

(QB_NEW_EN)


[grammar] ~164-~164: There might be a mistake here.
Context: ...rovement** due to optimal load balancing - ✅ Maximized GPU utilization across a...

(QB_NEW_EN)


[grammar] ~165-~165: There might be a mistake here.
Context: ...mized GPU utilization** across all ranks - ⚠️ Increased TTFT due to strategic w...

(QB_NEW_EN)


[grammar] ~166-~166: There might be a mistake here.
Context: ...TFT** due to strategic waiting mechanism - 📋 Best suited for throughput-orient...

(QB_NEW_EN)


[grammar] ~167-~167: There might be a mistake here.
Context: ...ted scenarios where TTFT is not critical ## Experiments ### Setting #### Dataset C...

(QB_NEW_EN)


[grammar] ~173-~173: There might be a mistake here.
Context: ... ### Setting #### Dataset Configuration We evaluate our approach using a compreh...

(QB_NEW_EN)


[grammar] ~176-~176: There might be a mistake here.
Context: ...Request volume*: 16,000 total requests - Average input length: 803 tokens - **A...

(QB_NEW_EN)


[grammar] ~177-~177: There might be a mistake here.
Context: ...s - Average input length: 803 tokens - Average output length: 3,653 tokens - ...

(QB_NEW_EN)


[grammar] ~178-~178: There might be a mistake here.
Context: ... Average output length: 3,653 tokens - Token distribution: Figure 2 illustrat...

(QB_NEW_EN)


[grammar] ~190-~190: There might be a mistake here.
Context: .... #### Hardware and Model Configuration Infrastructure: - Platform: NVIDIA...

(QB_NEW_EN)


[grammar] ~191-~191: There might be a mistake here.
Context: ... Model Configuration Infrastructure: - Platform: NVIDIA Blackwell GB200 syste...

(QB_NEW_EN)


[grammar] ~192-~192: There might be a mistake here.
Context: ...latform**: NVIDIA Blackwell GB200 system - GPU Count: 8 × GB200 GPUs - Model:...

(QB_NEW_EN)


[grammar] ~193-~193: There might be a mistake here.
Context: ...0 system - GPU Count: 8 × GB200 GPUs - Model: DeepSeek V3 - **Parallelization...

(QB_NEW_EN)


[grammar] ~194-~194: There might be a mistake here.
Context: ... 8 × GB200 GPUs - Model: DeepSeek V3 - Parallelization Strategy: - Attentio...

(QB_NEW_EN)


[grammar] ~195-~195: There might be a mistake here.
Context: ...pSeek V3 - Parallelization Strategy: - Attention module: Data Parallel (DP) siz...

(QB_NEW_EN)


[grammar] ~196-~196: There might be a mistake here.
Context: ...tion module: Data Parallel (DP) size = 8 - MoE module: Expert Parallel (EP) size = ...

(QB_NEW_EN)


[grammar] ~204-~204: There might be a mistake here.
Context: ... 1. Baseline: Round-robin scheduling 2. ADP Balance (Context Wait): Implementi...

(QB_NEW_EN)


[grammar] ~205-~205: There might be a mistake here.
Context: ...lementing timeout_iters parameter only 3. ADP Balance (Full Strategy): Complete ...

(QB_NEW_EN)


[grammar] ~210-~210: There might be a mistake here.
Context: ... Avg Balance Ratio | SOL TPS | Speedup | |---------------|------------|----------...

(QB_NEW_EN)


[grammar] ~211-~211: There might be a mistake here.
Context: ...---------|-------------------|---------| | Baseline | 25,664 | 54.11% | 39,552 | ...

(QB_NEW_EN)


[grammar] ~212-~212: There might be a mistake here.
Context: ...ine | 25,664 | 54.11% | 39,552 | 1.00× | | ADP Balance (Context Wait) | 33,499 | ...

(QB_NEW_EN)


[grammar] ~213-~213: There might be a mistake here.
Context: ...it) | 33,499 | 84.33% | 38,312 | 1.31× | | ADP Balance (Full Strategy) | 34,140 |...

(QB_NEW_EN)


[grammar] ~216-~216: There might be a mistake here.
Context: ... 37,912 | 1.33× | Key Observations: - Context Wait alone delivers a substantia...

(QB_NEW_EN)


[grammar] ~234-~234: There might be a mistake here.
Context: ...s

Critical Insights: - Imbalance window: Most severe imbalanc...

(QB_NEW_EN)


[grammar] ~237-~237: There might be a mistake here.
Context: ...eration 12,000, all requests transition to generation phase, naturally reducing im...

(QB_NEW_EN)


[grammar] ~248-~248: There might be a mistake here.
Context: ...

Performance Bottlenecks: - Balance ratio frequently drops to **0.4 ...

(QB_NEW_EN)


[grammar] ~257-~257: There might be a mistake here.
Context: ...nization. Performance Achievements: - Throughput: 33,499 TPS (1.31× speedup)...

(QB_NEW_EN)


[grammar] ~258-~258: There might be a mistake here.
Context: ...Throughput**: 33,499 TPS (1.31× speedup) - Balance improvement: 84.33% average (v...

(QB_NEW_EN)


[grammar] ~259-~259: There might be a mistake here.
Context: ...**: 84.33% average (vs. 54.11% baseline) - Efficiency: Actual TPS significantly c...

(QB_NEW_EN)


[grammar] ~269-~269: There might be a mistake here.
Context: ...em>

Remaining Challenges: Despite significant improvements, residu...

(QB_NEW_EN)


[grammar] ~284-~284: There might be a mistake here.
Context: ...` Performance Optimization Results: - Peak throughput: 34,140 TPS (1.33× spe...

(QB_NEW_EN)


[grammar] ~285-~285: There might be a mistake here.
Context: ...throughput**: 34,140 TPS (1.33× speedup) - Optimal balance: 87.70% average balanc...

(QB_NEW_EN)


[grammar] ~286-~286: There might be a mistake here.
Context: ... balance**: 87.70% average balance ratio - Near-theoretical efficiency: Actual TP...

(QB_NEW_EN)


[grammar] ~287-~287: There might be a mistake here.
Context: ...TPS (34,140) approaches SOL TPS (37,912) - System stability: Dramatically reduced...

(QB_NEW_EN)


[grammar] ~290-~290: There might be a mistake here.
Context: ...terations Production Configuration: Users can enable the full ADP Balance st...

(QB_NEW_EN)


[grammar] ~309-~309: There might be a mistake here.
Context: ... Key Improvements Over Context Wait: - Enhanced stability: Balance ratio main...

(QB_NEW_EN)


[grammar] ~314-~314: There might be a mistake here.
Context: ...erations Implementation Trade-offs: - ✅ Maximum throughput improvement: 33...

(QB_NEW_EN)


[grammar] ~315-~315: There might be a mistake here.
Context: ...ut improvement**: 33% gain over baseline - ✅ Near-optimal load balancing: 87.70...

(QB_NEW_EN)


[grammar] ~316-~316: There might be a mistake here.
Context: ...alancing**: 87.70% average balance ratio - ⚠️ Iteration overhead: Waiting mecha...

(QB_NEW_EN)


[grammar] ~317-~317: There might be a mistake here.
Context: ...echanisms increase total iteration count - ⚠️ TTFT impact: Strategic delays aff...

(QB_NEW_EN)


[grammar] ~341-~341: There might be a mistake here.
Context: ...itiation 4. Configuration Guidance: - Low-load scenarios: `batch_waiting_ite...

(QB_NEW_EN)


[grammar] ~356-~356: There might be a mistake here.
Context: ...mbalances. Experimental Validation: Comprehensive evaluation on the DeepSeek...

(QB_NEW_EN)


[grammar] ~357-~357: There might be a mistake here.
Context: ...tes compelling performance improvements: - 33% throughput increase: From 25,664 t...

(QB_NEW_EN)


[grammar] ~358-~358: There might be a mistake here.
Context: ...ut increase**: From 25,664 to 34,140 TPS - 87% load balance achievement: Dramatic...

(QB_NEW_EN)


[grammar] ~359-~359: There might be a mistake here.
Context: ...: Dramatic improvement from 54% baseline - Near-theoretical efficiency: Actual pe...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

11-11: Link fragments should be valid

(MD051, link-fragments)


100-100: Images should have alternate text (alt text)

(MD045, no-alt-text)


183-183: Images should have alternate text (alt text)

(MD045, no-alt-text)


229-229: Images should have alternate text (alt text)

(MD045, no-alt-text)


243-243: Images should have alternate text (alt text)

(MD045, no-alt-text)


264-264: Images should have alternate text (alt text)

(MD045, no-alt-text)


304-304: Images should have alternate text (alt text)

(MD045, no-alt-text)


328-328: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)

74-80: Double-check SOL TPS formula clarity and units.

Consider adding a brief derivation to justify sol_tps = (elapsed_time/sol_time) × actual_tps and define what “elapsed_time” and “actual_tps” measure over (same token budget/time window).

@yunruis yunruis force-pushed the user/yunruis/adp_balance_blog branch from 97d6533 to 8e5859b Compare August 28, 2025 10:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (10)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (10)

150-155: Correct generation index in ADP timeline (use M, not N).

Keeps indices consistent with the legend and prior timeline.

-iter_i:     [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN]
-iter_i+1:   [g00, g01, ..., g0N], [g10, g11, ..., g1N], ..., [gN0, gN1, ..., gNN]
+iter_i:     [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM]
+iter_i+1:   [g00, g01, ..., g0M], [g10, g11, ..., g1M], ..., [gN0, gN1, ..., gNM]

341-345: Rename parameter to batching_wait_iters for consistency.

Fixes mismatch with earlier sections and YAML.

-   - **Low-load scenarios**: `batch_waiting_iters` provides minimal benefit while adding latency overhead
+   - **Low-load scenarios**: `batching_wait_iters` provides minimal benefit while adding latency overhead

262-268: Add alt text to Figure 5 image for accessibility (MD045).

-  <img src="./../media/tech_blog10_context_wait_performance.png">
+  <img alt="Context Wait: improved per-iteration balance ratio stability (100–12,000)" src="./../media/tech_blog10_context_wait_performance.png">

100-104: Add alt text to Figure 1 image for accessibility (MD045).

-  <img src="./../media/tech_blog10_baseline_round_robin_strategy.png">
+  <img alt="Baseline: sort then round-robin distribution of context tokens across ranks" src="./../media/tech_blog10_baseline_round_robin_strategy.png">

302-308: Add alt text to Figure 6 image for accessibility (MD045).

-  <img src="./../media/tech_blog10_full_strategy_performance.png">
+  <img alt="Full Strategy: highest balance stability and throughput (100–12,000)" src="./../media/tech_blog10_full_strategy_performance.png">

227-233: Add alt text to Figure 3 image for accessibility (MD045).

-  <img src="./../media/tech_blog10_baseline_performance_overview.png">
+  <img alt="Baseline: per-iteration average tokens per rank (top) and balance ratio (bottom)" src="./../media/tech_blog10_baseline_performance_overview.png">

326-332: Add alt text to Figure 7 image for accessibility (MD045).

-  <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png">
+  <img alt="Pareto curves: TPS/GPU vs TTFT under different timeout and batching-wait settings" src="./../media/tech_blog10_tps_ttft_pareto_curve.png">

55-61: Add missing "Performance Metrics" heading to fix broken ToC anchor (MD051).

The ToC links to #performance-metrics but there’s no matching heading.

 To quantify load balance and theoretical performance bounds, we define two key metrics:
+
+### Performance Metrics

241-247: Add alt text to Figure 4 image for accessibility (MD045).

-  <img src="./../media/tech_blog10_baseline_performance_detail.png">
+  <img alt="Baseline detail: iterations 100–12,000 average tokens and balance ratio" src="./../media/tech_blog10_baseline_performance_detail.png">

181-186: Add alt text to Figure 2 image for accessibility (MD045).

-  <img src="./../media/tech_blog10_dataset_token_distribution.png">
+  <img alt="Histogram of input and output token lengths across 16k requests" src="./../media/tech_blog10_dataset_token_distribution.png">
🧹 Nitpick comments (3)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (3)

31-39: Standardize ADP/DP terminology and define abbreviations on first use.

Use “Attention Data Parallel (ADP)” consistently; expand TTFT/TPOT on first mention.

-In DeepSeek MLA + MoE architectures under maximum-throughput scenarios, an Attention Data Parallel (DP) + MoE Expert Parallel (EP) strategy is commonly employed to eliminate redundant KV cache storage, and utilize disaggregated serving to prevent Attention Data Parallel (ADP) imbalances. However, certain deployment scenarios still favor In-Flight Batching (IFB) inference, including:
+In DeepSeek MLA + MoE architectures under maximum-throughput scenarios, an Attention Data Parallel (ADP) + MoE Expert Parallel (EP) strategy is commonly employed to eliminate redundant KV cache storage, and utilize disaggregated serving to prevent ADP imbalances. However, certain deployment scenarios still favor In-Flight Batching (IFB) inference, including:
@@
-- **Offline inference**: Batch processing environments where TTFT and TPOT are more relaxed
+- **Offline inference**: Batch processing environments where Time-To-First-Token (TTFT) and Time-To-Output-Token (TPOT) requirements are more relaxed

3-4: Minor grammar nit: add article.

-By NVIDIA TensorRT-LLM team
+By the NVIDIA TensorRT-LLM team

143-146: Clarify balance ratio formula and assumptions (lines 143–146)

  • Replace the raw expression with the average-workload definition:
    Balance ratio = avg_tokens / ctx_len, where
    avg_tokens = (ctx_len + M × (N − 1) + (M − 1)) / N
  • Define variables:
    • ctx_len = number of context tokens
    • M = generation tokens per iteration
    • N = number of ranks
  • Note assumption: one rank processes the full context while the others handle generation tokens
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 97d6533 and 8e5859b.

⛔ Files ignored due to path filters (7)
  • docs/source/blogs/media/tech_blog10_baseline_performance_detail.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_performance_overview.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_context_wait_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_dataset_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_full_strategy_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png is excluded by !**/*.png
📒 Files selected for processing (2)
  • .gitattributes (1 hunks)
  • docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

[grammar] ~5-~5: There might be a mistake here.
Context: ... TensorRT-LLM team ## Table of Contents - [ADP Balance Strategy](#adp-balance-strat...

(QB_NEW_EN)


[grammar] ~6-~6: There might be a mistake here.
Context: ...able of Contents - ADP Balance Strategy - Table of Contents ...

(QB_NEW_EN)


[grammar] ~7-~7: There might be a mistake here.
Context: ...balance-strategy) - Table of Contents - [Motivation and Background](#motivation-a...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ...contents) - Motivation and Background - [Theoretical Analysis and Modeling](#theo...

(QB_NEW_EN)


[grammar] ~9-~9: There might be a mistake here.
Context: ...) - Theoretical Analysis and Modeling - [Mathematical Modeling](#mathematical-mod...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...d-modeling) - Mathematical Modeling - [Performance Metrics](#performance-metric...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...cal-modeling) - Performance Metrics - [Scheduling Strategies for Load Balancing...

(QB_NEW_EN)


[grammar] ~12-~12: There might be a mistake here.
Context: ...Scheduling Strategies for Load Balancing](#scheduling-strategies-for-load-balancing) - [Baseline: Round-Robin Token Distribution...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...Baseline: Round-Robin Token Distribution](#baseline-round-robin-token-distribution) - [ADP Balance Strategy: Coordinated Waitin...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ... Strategy: Coordinated Waiting Mechanism](#adp-balance-strategy-coordinated-waiting-mechanism) - [Performance Analysis: Baseline vs. ADP B...

(QB_NEW_EN)


[grammar] ~15-~15: There might be a mistake here.
Context: ...mance Analysis: Baseline vs. ADP Balance](#performance-analysis-baseline-vs-adp-balance) - Experiments - [Settin...

(QB_NEW_EN)


[grammar] ~16-~16: There might be a mistake here.
Context: ...aseline-vs-adp-balance) - Experiments - Setting - [Dataset Conf...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...xperiments](#experiments) - Setting - [Dataset Configuration](#dataset-configur...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ...#setting) - Dataset Configuration - [Hardware and Model Configuration](#hardw...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ... - Hardware and Model Configuration - [Performance Results](#performance-result...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...onfiguration) - Performance Results - [Performance Summary](#performance-summar...

(QB_NEW_EN)


[grammar] ~21-~21: There might be a mistake here.
Context: ...ce-results) - Performance Summary - [Baseline Performance](#baseline-performa...

(QB_NEW_EN)


[grammar] ~22-~22: There might be a mistake here.
Context: ...e-summary) - Baseline Performance - [ADP Balance with Context Wait Implementa...

(QB_NEW_EN)


[grammar] ~23-~23: There might be a mistake here.
Context: ...Balance with Context Wait Implementation](#adp-balance-with-context-wait-implementation) - [ADP Balance with Full Strategy Implement...

(QB_NEW_EN)


[grammar] ~24-~24: There might be a mistake here.
Context: ...alance with Full Strategy Implementation](#adp-balance-with-full-strategy-implementation) - [Pareto Analysis: Throughput-Latency Trad...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...hroughput-Latency Trade-off Optimization](#pareto-analysis-throughput-latency-trade-off-optimization) - Conclusion - [Acknowledg...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ...-trade-off-optimization) - Conclusion - Acknowledgement ## M...

(QB_NEW_EN)


[grammar] ~33-~33: There might be a mistake here.
Context: ...ociated with disaggregated architectures - Specific workload patterns: Scenarios ...

(QB_NEW_EN)


[grammar] ~34-~34: There might be a mistake here.
Context: ...) and long output sequence lengths (OSL) - Offline inference: Batch processing en...

(QB_NEW_EN)


[grammar] ~57-~57: There might be a mistake here.
Context: ... two key metrics: #### 1. Balance Ratio The balance_ratio measures the load di...

(QB_NEW_EN)


[grammar] ~64-~64: There might be a mistake here.
Context: ...ac{avg_tokens}{max_tokens} ``` where: - $avg\_tokens$ represents the average nu...

(QB_NEW_EN)


[grammar] ~71-~71: There might be a mistake here.
Context: ...# 2. Speed-of-Light Throughput (SOL TPS) The $sol\_tps$ represents the theoretic...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...ol_time} \times actual_tps ``` where: - $time_i$: Measured execution time of ite...

(QB_NEW_EN)


[grammar] ~83-~83: There might be a mistake here.
Context: ...Measured execution time of iteration $i$ - $elapsed\_time$: Total empirically meas...

(QB_NEW_EN)


[grammar] ~84-~84: There might be a mistake here.
Context: ...cally measured end-to-end execution time - $actual\_tps$: Observed throughput in t...

(QB_NEW_EN)


[grammar] ~85-~85: There might be a mistake here.
Context: ...Observed throughput in tokens per second - $sol\_tps$: Theoretical maximum through...

(QB_NEW_EN)


[grammar] ~113-~113: There might be a mistake here.
Context: ...ontext Synchronization (timeout_iters)** - Purpose: Ensures temporal alignment of...

(QB_NEW_EN)


[grammar] ~118-~118: There might be a mistake here.
Context: ...ch Equilibration (batching_wait_iters)** - Purpose: Balances the number of accumu...

(QB_NEW_EN)


[grammar] ~128-~128: There might be a mistake here.
Context: ...me(ctx)>> Generation processing time:time(gen)` - N new requests arrive sequentially over ...

(QB_NEW_EN)


[grammar] ~131-~131: There might be a mistake here.
Context: ...ver N iterations Baseline Behavior: In the traditional approach, contexts ar...

(QB_NEW_EN)


[grammar] ~143-~143: There might be a mistake here.
Context: ...(ctx)(dominated by context processing) - **Total execution time**:time(ctx) × N` ...

(QB_NEW_EN)


[grammar] ~144-~144: There might be a mistake here.
Context: ...processing) - Total execution time: time(ctx) × N - Balance ratio: `(ctx_len + (M-1) + M ×...

(QB_NEW_EN)


[grammar] ~147-~147: There might be a mistake here.
Context: ...poor balance) ADP Balance Strategy: Our method synchronizes context processi...

(QB_NEW_EN)


[grammar] ~157-~157: There might be a mistake here.
Context: ...rations, time(ctx) for final iteration - Total execution time: `time(gen) × (N-...

(QB_NEW_EN)


[grammar] ~158-~158: There might be a mistake here.
Context: ...l iteration - Total execution time: time(gen) × (N-1) + time(ctx) - Balance ratio: 1.0 (perfect balance) -...

(QB_NEW_EN)


[grammar] ~159-~159: There might be a mistake here.
Context: ...Balance ratio: 1.0 (perfect balance) - Time savings: `(time(ctx) - time(gen))...

(QB_NEW_EN)


[grammar] ~164-~164: There might be a mistake here.
Context: ...rovement** due to optimal load balancing - ✅ Maximized GPU utilization across a...

(QB_NEW_EN)


[grammar] ~165-~165: There might be a mistake here.
Context: ...mized GPU utilization** across all ranks - ⚠️ Increased TTFT due to strategic w...

(QB_NEW_EN)


[grammar] ~166-~166: There might be a mistake here.
Context: ...TFT** due to strategic waiting mechanism - 📋 Best suited for throughput-orient...

(QB_NEW_EN)


[grammar] ~167-~167: There might be a mistake here.
Context: ...ted scenarios where TTFT is not critical ## Experiments ### Setting #### Dataset C...

(QB_NEW_EN)


[grammar] ~173-~173: There might be a mistake here.
Context: ... ### Setting #### Dataset Configuration We evaluate our approach using a compreh...

(QB_NEW_EN)


[grammar] ~176-~176: There might be a mistake here.
Context: ...Request volume*: 16,000 total requests - Average input length: 803 tokens - **A...

(QB_NEW_EN)


[grammar] ~177-~177: There might be a mistake here.
Context: ...s - Average input length: 803 tokens - Average output length: 3,653 tokens - ...

(QB_NEW_EN)


[grammar] ~178-~178: There might be a mistake here.
Context: ... Average output length: 3,653 tokens - Token distribution: Figure 2 illustrat...

(QB_NEW_EN)


[grammar] ~190-~190: There might be a mistake here.
Context: .... #### Hardware and Model Configuration Infrastructure: - Platform: NVIDIA...

(QB_NEW_EN)


[grammar] ~191-~191: There might be a mistake here.
Context: ... Model Configuration Infrastructure: - Platform: NVIDIA Blackwell GB200 syste...

(QB_NEW_EN)


[grammar] ~192-~192: There might be a mistake here.
Context: ...latform**: NVIDIA Blackwell GB200 system - GPU Count: 8 × GB200 GPUs - Model:...

(QB_NEW_EN)


[grammar] ~193-~193: There might be a mistake here.
Context: ...0 system - GPU Count: 8 × GB200 GPUs - Model: DeepSeek V3 - **Parallelization...

(QB_NEW_EN)


[grammar] ~194-~194: There might be a mistake here.
Context: ... 8 × GB200 GPUs - Model: DeepSeek V3 - Parallelization Strategy: - Attentio...

(QB_NEW_EN)


[grammar] ~195-~195: There might be a mistake here.
Context: ...pSeek V3 - Parallelization Strategy: - Attention module: Data Parallel (DP) siz...

(QB_NEW_EN)


[grammar] ~196-~196: There might be a mistake here.
Context: ...tion module: Data Parallel (DP) size = 8 - MoE module: Expert Parallel (EP) size = ...

(QB_NEW_EN)


[grammar] ~204-~204: There might be a mistake here.
Context: ... 1. Baseline: Round-robin scheduling 2. ADP Balance (Context Wait): Implementi...

(QB_NEW_EN)


[grammar] ~205-~205: There might be a mistake here.
Context: ...lementing timeout_iters parameter only 3. ADP Balance (Full Strategy): Complete ...

(QB_NEW_EN)


[grammar] ~210-~210: There might be a mistake here.
Context: ... Avg Balance Ratio | SOL TPS | Speedup | |---------------|------------|----------...

(QB_NEW_EN)


[grammar] ~211-~211: There might be a mistake here.
Context: ...---------|-------------------|---------| | Baseline | 25,664 | 54.11% | 39,552 | ...

(QB_NEW_EN)


[grammar] ~212-~212: There might be a mistake here.
Context: ...ine | 25,664 | 54.11% | 39,552 | 1.00× | | ADP Balance (Context Wait) | 33,499 | ...

(QB_NEW_EN)


[grammar] ~213-~213: There might be a mistake here.
Context: ...it) | 33,499 | 84.33% | 38,312 | 1.31× | | ADP Balance (Full Strategy) | 34,140 |...

(QB_NEW_EN)


[grammar] ~216-~216: There might be a mistake here.
Context: ... 37,912 | 1.33× | Key Observations: - Context Wait alone delivers a substantia...

(QB_NEW_EN)


[grammar] ~234-~234: There might be a mistake here.
Context: ...s

Critical Insights: - Imbalance window: Most severe imbalanc...

(QB_NEW_EN)


[grammar] ~237-~237: There might be a mistake here.
Context: ...eration 12,000, all requests transition to generation phase, naturally reducing im...

(QB_NEW_EN)


[grammar] ~248-~248: There might be a mistake here.
Context: ...

Performance Bottlenecks: - Balance ratio frequently drops to **0.4 ...

(QB_NEW_EN)


[grammar] ~257-~257: There might be a mistake here.
Context: ...nization. Performance Achievements: - Throughput: 33,499 TPS (1.31× speedup)...

(QB_NEW_EN)


[grammar] ~258-~258: There might be a mistake here.
Context: ...Throughput**: 33,499 TPS (1.31× speedup) - Balance improvement: 84.33% average (v...

(QB_NEW_EN)


[grammar] ~259-~259: There might be a mistake here.
Context: ...**: 84.33% average (vs. 54.11% baseline) - Efficiency: Actual TPS significantly c...

(QB_NEW_EN)


[grammar] ~269-~269: There might be a mistake here.
Context: ...em>

Remaining Challenges: Despite significant improvements, residu...

(QB_NEW_EN)


[grammar] ~284-~284: There might be a mistake here.
Context: ...` Performance Optimization Results: - Peak throughput: 34,140 TPS (1.33× spe...

(QB_NEW_EN)


[grammar] ~285-~285: There might be a mistake here.
Context: ...throughput**: 34,140 TPS (1.33× speedup) - Optimal balance: 87.70% average balanc...

(QB_NEW_EN)


[grammar] ~286-~286: There might be a mistake here.
Context: ... balance**: 87.70% average balance ratio - Near-theoretical efficiency: Actual TP...

(QB_NEW_EN)


[grammar] ~287-~287: There might be a mistake here.
Context: ...TPS (34,140) approaches SOL TPS (37,912) - System stability: Dramatically reduced...

(QB_NEW_EN)


[grammar] ~290-~290: There might be a mistake here.
Context: ...terations Production Configuration: Users can enable the full ADP Balance st...

(QB_NEW_EN)


[grammar] ~309-~309: There might be a mistake here.
Context: ... Key Improvements Over Context Wait: - Enhanced stability: Balance ratio main...

(QB_NEW_EN)


[grammar] ~314-~314: There might be a mistake here.
Context: ...erations Implementation Trade-offs: - ✅ Maximum throughput improvement: 33...

(QB_NEW_EN)


[grammar] ~315-~315: There might be a mistake here.
Context: ...ut improvement**: 33% gain over baseline - ✅ Near-optimal load balancing: 87.70...

(QB_NEW_EN)


[grammar] ~316-~316: There might be a mistake here.
Context: ...alancing**: 87.70% average balance ratio - ⚠️ Iteration overhead: Waiting mecha...

(QB_NEW_EN)


[grammar] ~317-~317: There might be a mistake here.
Context: ...echanisms increase total iteration count - ⚠️ TTFT impact: Strategic delays aff...

(QB_NEW_EN)


[grammar] ~341-~341: There might be a mistake here.
Context: ...itiation 4. Configuration Guidance: - Low-load scenarios: `batch_waiting_ite...

(QB_NEW_EN)


[grammar] ~356-~356: There might be a mistake here.
Context: ...mbalances. Experimental Validation: Comprehensive evaluation on the DeepSeek...

(QB_NEW_EN)


[grammar] ~357-~357: There might be a mistake here.
Context: ...tes compelling performance improvements: - 33% throughput increase: From 25,664 t...

(QB_NEW_EN)


[grammar] ~358-~358: There might be a mistake here.
Context: ...ut increase**: From 25,664 to 34,140 TPS - 87% load balance achievement: Dramatic...

(QB_NEW_EN)


[grammar] ~359-~359: There might be a mistake here.
Context: ...: Dramatic improvement from 54% baseline - Near-theoretical efficiency: Actual pe...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

11-11: Link fragments should be valid

(MD051, link-fragments)


100-100: Images should have alternate text (alt text)

(MD045, no-alt-text)


183-183: Images should have alternate text (alt text)

(MD045, no-alt-text)


229-229: Images should have alternate text (alt text)

(MD045, no-alt-text)


243-243: Images should have alternate text (alt text)

(MD045, no-alt-text)


264-264: Images should have alternate text (alt text)

(MD045, no-alt-text)


304-304: Images should have alternate text (alt text)

(MD045, no-alt-text)


328-328: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
@yunruis yunruis force-pushed the user/yunruis/adp_balance_blog branch from 8e5859b to 6bd367f Compare August 28, 2025 12:09
@Kefeng-Duan Kefeng-Duan enabled auto-merge (squash) August 28, 2025 12:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1)

99-99: Add missing alt text for all images (MD045, accessibility).

Screen readers and markdownlint require alt attributes. Add concise, descriptive alt for each figure.

-  <img src="./../media/tech_blog10_baseline_round_robin_strategy.png">
+  <img alt="Baseline: round-robin distributes context tokens across ranks" src="./../media/tech_blog10_baseline_round_robin_strategy.png">
-  <img src="./../media/tech_blog10_dataset_token_distribution.png">
+  <img alt="Distribution of input and output token lengths" src="./../media/tech_blog10_dataset_token_distribution.png">
-  <img src="./../media/tech_blog10_baseline_performance_overview.png">
+  <img alt="Baseline: average tokens and balance ratio over iterations" src="./../media/tech_blog10_baseline_performance_overview.png">
-  <img src="./../media/tech_blog10_baseline_performance_detail.png">
+  <img alt="Baseline detail: balance ratio for iterations 100–12,000" src="./../media/tech_blog10_baseline_performance_detail.png">
-  <img src="./../media/tech_blog10_context_wait_performance.png">
+  <img alt="Context Wait: improved balance ratio over iterations" src="./../media/tech_blog10_context_wait_performance.png">
-  <img src="./../media/tech_blog10_full_strategy_performance.png">
+  <img alt="Full strategy: near-optimal balance across iterations" src="./../media/tech_blog10_full_strategy_performance.png">
-  <img src="./../media/tech_blog10_tps_ttft_pareto_curve.png">
+  <img alt="Pareto frontier: TPS/GPU vs TTFT across parameter settings" src="./../media/tech_blog10_tps_ttft_pareto_curve.png">

Also applies to: 182-182, 228-228, 242-242, 263-263, 303-303, 327-327

🧹 Nitpick comments (2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (2)

73-79: Clarify SOL TPS equation; decouple from actual_tps for correctness and readability.

Express SOL TPS as tokens processed divided by idealized time; keep the measured token count explicit.

-```math
-sol\_time = \sum_{i=0}^{\infty} \frac{time_i}{balance\_ratio_i}
-```
-
-```math
-sol\_tps = \frac{elapsed\_time}{sol\_time} \times actual\_tps
-```
+```math
+sol\_time = \sum_{i=0}^{T-1} \frac{time_i}{balance\_ratio_i}
+```
+
+```math
+total\_tokens = actual\_tps \times elapsed\_time
+```
+
+```math
+sol\_tps = \frac{total\_tokens}{sol\_time}
+```

Minimal text tweak after the equations:

-where:
+where:
   - $time_i$: Measured execution time of iteration $i$
-  - $elapsed\_time$: Total empirically measured end-to-end execution time
-  - $actual\_tps$: Observed throughput in tokens per second
-  - $sol\_tps$: Theoretical maximum throughput under perfect load balance
+  - $elapsed\_time$: Total measured end-to-end execution time
+  - $actual\_tps$: Observed throughput (tokens/second)
+  - $total\_tokens$: Total tokens produced during $elapsed\_time$
+  - $sol\_tps$: Theoretical throughput under perfect load balance

Also applies to: 81-86


3-3: Nit: Add article for smoother prose.

Change to “By the NVIDIA TensorRT-LLM team.”

-By NVIDIA TensorRT-LLM team
+By the NVIDIA TensorRT-LLM team
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8e5859b and 6bd367f.

⛔ Files ignored due to path filters (7)
  • docs/source/blogs/media/tech_blog10_baseline_performance_detail.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_performance_overview.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_baseline_round_robin_strategy.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_context_wait_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_dataset_token_distribution.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_full_strategy_performance.png is excluded by !**/*.png
  • docs/source/blogs/media/tech_blog10_tps_ttft_pareto_curve.png is excluded by !**/*.png
📒 Files selected for processing (2)
  • .gitattributes (1 hunks)
  • docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .gitattributes
🧰 Additional context used
🪛 LanguageTool
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

[grammar] ~5-~5: There might be a mistake here.
Context: ... TensorRT-LLM team ## Table of Contents - [ADP Balance Strategy](#adp-balance-strat...

(QB_NEW_EN)


[grammar] ~6-~6: There might be a mistake here.
Context: ...able of Contents - ADP Balance Strategy - Table of Contents ...

(QB_NEW_EN)


[grammar] ~7-~7: There might be a mistake here.
Context: ...balance-strategy) - Table of Contents - [Motivation and Background](#motivation-a...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ...contents) - Motivation and Background - [Theoretical Analysis and Modeling](#theo...

(QB_NEW_EN)


[grammar] ~9-~9: There might be a mistake here.
Context: ...) - Theoretical Analysis and Modeling - [Mathematical Modeling](#mathematical-mod...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...d-modeling) - Mathematical Modeling - [Scheduling Strategies for Load Balancing...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...Scheduling Strategies for Load Balancing](#scheduling-strategies-for-load-balancing) - [Baseline: Round-Robin Token Distribution...

(QB_NEW_EN)


[grammar] ~12-~12: There might be a mistake here.
Context: ...Baseline: Round-Robin Token Distribution](#baseline-round-robin-token-distribution) - [ADP Balance Strategy: Coordinated Waitin...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ... Strategy: Coordinated Waiting Mechanism](#adp-balance-strategy-coordinated-waiting-mechanism) - [Performance Analysis: Baseline vs. ADP B...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ...mance Analysis: Baseline vs. ADP Balance](#performance-analysis-baseline-vs-adp-balance) - Experiments - [Settin...

(QB_NEW_EN)


[grammar] ~15-~15: There might be a mistake here.
Context: ...aseline-vs-adp-balance) - Experiments - Setting - [Dataset Conf...

(QB_NEW_EN)


[grammar] ~16-~16: There might be a mistake here.
Context: ...xperiments](#experiments) - Setting - [Dataset Configuration](#dataset-configur...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ...#setting) - Dataset Configuration - [Hardware and Model Configuration](#hardw...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ... - Hardware and Model Configuration - [Performance Results](#performance-result...

(QB_NEW_EN)


[grammar] ~19-~19: There might be a mistake here.
Context: ...onfiguration) - Performance Results - [Performance Summary](#performance-summar...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...ce-results) - Performance Summary - [Baseline Performance](#baseline-performa...

(QB_NEW_EN)


[grammar] ~21-~21: There might be a mistake here.
Context: ...e-summary) - Baseline Performance - [ADP Balance with Context Wait Implementa...

(QB_NEW_EN)


[grammar] ~22-~22: There might be a mistake here.
Context: ...Balance with Context Wait Implementation](#adp-balance-with-context-wait-implementation) - [ADP Balance with Full Strategy Implement...

(QB_NEW_EN)


[grammar] ~23-~23: There might be a mistake here.
Context: ...alance with Full Strategy Implementation](#adp-balance-with-full-strategy-implementation) - [Pareto Analysis: Throughput-Latency Trad...

(QB_NEW_EN)


[grammar] ~24-~24: There might be a mistake here.
Context: ...hroughput-Latency Trade-off Optimization](#pareto-analysis-throughput-latency-trade-off-optimization) - Conclusion - [Acknowledg...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...-trade-off-optimization) - Conclusion - Acknowledgement ## M...

(QB_NEW_EN)


[grammar] ~32-~32: There might be a mistake here.
Context: ...ociated with disaggregated architectures - Specific workload patterns: Scenarios ...

(QB_NEW_EN)


[grammar] ~33-~33: There might be a mistake here.
Context: ...) and long output sequence lengths (OSL) - Offline inference: Batch processing en...

(QB_NEW_EN)


[grammar] ~56-~56: There might be a mistake here.
Context: ... two key metrics: #### 1. Balance Ratio The $balance\_ratio$ measures the load ...

(QB_NEW_EN)


[grammar] ~63-~63: There might be a mistake here.
Context: ...ac{avg_tokens}{max_tokens} ``` where: - $avg\_tokens$ represents the average nu...

(QB_NEW_EN)


[grammar] ~70-~70: There might be a mistake here.
Context: ...# 2. Speed-of-Light Throughput (SOL TPS) The $sol\_tps$ represents the theoretic...

(QB_NEW_EN)


[grammar] ~81-~81: There might be a mistake here.
Context: ...ol_time} \times actual_tps ``` where: - $time_i$: Measured execution time of ite...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...Measured execution time of iteration $i$ - $elapsed\_time$: Total empirically meas...

(QB_NEW_EN)


[grammar] ~83-~83: There might be a mistake here.
Context: ...cally measured end-to-end execution time - $actual\_tps$: Observed throughput in t...

(QB_NEW_EN)


[grammar] ~84-~84: There might be a mistake here.
Context: ...Observed throughput in tokens per second - $sol\_tps$: Theoretical maximum through...

(QB_NEW_EN)


[grammar] ~112-~112: There might be a mistake here.
Context: ...ontext Synchronization (timeout_iters)** - Purpose: Ensures temporal alignment of...

(QB_NEW_EN)


[grammar] ~117-~117: There might be a mistake here.
Context: ...ch Equilibration (batching_wait_iters)** - Purpose: Balances the number of accumu...

(QB_NEW_EN)


[grammar] ~127-~127: There might be a mistake here.
Context: ...s arrive sequentially over N iterations. - Context processing time: time(ctx) >> ...

(QB_NEW_EN)


[grammar] ~130-~130: There might be a mistake here.
Context: ...ime: time(gen) Baseline Behavior: In the traditional approach, contexts ar...

(QB_NEW_EN)


[grammar] ~142-~142: There might be a mistake here.
Context: ...(ctx)(dominated by context processing) - **Total execution time**:time(ctx) × N` ...

(QB_NEW_EN)


[grammar] ~143-~143: There might be a mistake here.
Context: ...processing) - Total execution time: time(ctx) × N - Balance ratio: `(ctx_len + (M-1) + M ×...

(QB_NEW_EN)


[grammar] ~146-~146: There might be a mistake here.
Context: ...poor balance) ADP Balance Strategy: Our method synchronizes context processi...

(QB_NEW_EN)


[grammar] ~156-~156: There might be a mistake here.
Context: ...rations, time(ctx) for final iteration - Total execution time: `time(gen) × (N-...

(QB_NEW_EN)


[grammar] ~157-~157: There might be a mistake here.
Context: ...l iteration - Total execution time: time(gen) × (N-1) + time(ctx) - Balance ratio: 1.0 (perfect balance) -...

(QB_NEW_EN)


[grammar] ~158-~158: There might be a mistake here.
Context: ...Balance ratio: 1.0 (perfect balance) - Time savings: `(time(ctx) - time(gen))...

(QB_NEW_EN)


[grammar] ~163-~163: There might be a mistake here.
Context: ...rovement** due to optimal load balancing - ✅ Maximized GPU utilization across a...

(QB_NEW_EN)


[grammar] ~164-~164: There might be a mistake here.
Context: ...mized GPU utilization** across all ranks - ⚠️ Increased TTFT due to strategic w...

(QB_NEW_EN)


[grammar] ~165-~165: There might be a mistake here.
Context: ...TFT** due to strategic waiting mechanism - 📋 Best suited for throughput-orient...

(QB_NEW_EN)


[grammar] ~166-~166: There might be a mistake here.
Context: ...ted scenarios where TTFT is not critical ## Experiments ### Setting #### Dataset C...

(QB_NEW_EN)


[grammar] ~172-~172: There might be a mistake here.
Context: ... ### Setting #### Dataset Configuration We evaluate our approach using a compreh...

(QB_NEW_EN)


[grammar] ~175-~175: There might be a mistake here.
Context: ...Request volume*: 16,000 total requests - Average input length: 803 tokens - **A...

(QB_NEW_EN)


[grammar] ~176-~176: There might be a mistake here.
Context: ...s - Average input length: 803 tokens - Average output length: 3,653 tokens - ...

(QB_NEW_EN)


[grammar] ~177-~177: There might be a mistake here.
Context: ... Average output length: 3,653 tokens - Token distribution: Figure 2 illustrat...

(QB_NEW_EN)


[grammar] ~189-~189: There might be a mistake here.
Context: .... #### Hardware and Model Configuration Infrastructure: - Platform: NVIDIA...

(QB_NEW_EN)


[grammar] ~190-~190: There might be a mistake here.
Context: ... Model Configuration Infrastructure: - Platform: NVIDIA Blackwell GB200 syste...

(QB_NEW_EN)


[grammar] ~191-~191: There might be a mistake here.
Context: ...latform**: NVIDIA Blackwell GB200 system - GPU Count: 8 × GB200 GPUs - Model:...

(QB_NEW_EN)


[grammar] ~192-~192: There might be a mistake here.
Context: ...0 system - GPU Count: 8 × GB200 GPUs - Model: DeepSeek V3 - **Parallelization...

(QB_NEW_EN)


[grammar] ~193-~193: There might be a mistake here.
Context: ... 8 × GB200 GPUs - Model: DeepSeek V3 - Parallelization Strategy: - Attentio...

(QB_NEW_EN)


[grammar] ~194-~194: There might be a mistake here.
Context: ...pSeek V3 - Parallelization Strategy: - Attention module: Data Parallel (DP) siz...

(QB_NEW_EN)


[grammar] ~195-~195: There might be a mistake here.
Context: ...tion module: Data Parallel (DP) size = 8 - MoE module: Expert Parallel (EP) size = ...

(QB_NEW_EN)


[grammar] ~203-~203: There might be a mistake here.
Context: ... 1. Baseline: Round-robin scheduling 2. ADP Balance (Context Wait): Implementi...

(QB_NEW_EN)


[grammar] ~204-~204: There might be a mistake here.
Context: ...lementing timeout_iters parameter only 3. ADP Balance (Full Strategy): Complete ...

(QB_NEW_EN)


[grammar] ~209-~209: There might be a mistake here.
Context: ... Avg Balance Ratio | SOL TPS | Speedup | |---------------|------------|----------...

(QB_NEW_EN)


[grammar] ~210-~210: There might be a mistake here.
Context: ...---------|-------------------|---------| | Baseline | 25,664 | 54.11% | 39,552 | ...

(QB_NEW_EN)


[grammar] ~211-~211: There might be a mistake here.
Context: ...ine | 25,664 | 54.11% | 39,552 | 1.00× | | ADP Balance (Context Wait) | 33,499 | ...

(QB_NEW_EN)


[grammar] ~212-~212: There might be a mistake here.
Context: ...it) | 33,499 | 84.33% | 38,312 | 1.31× | | ADP Balance (Full Strategy) | 34,140 |...

(QB_NEW_EN)


[grammar] ~215-~215: There might be a mistake here.
Context: ... 37,912 | 1.33× | Key Observations: - Context Wait alone delivers a substantia...

(QB_NEW_EN)


[grammar] ~233-~233: There might be a mistake here.
Context: ...s

Critical Insights: - Imbalance window: Most severe imbalanc...

(QB_NEW_EN)


[grammar] ~236-~236: There might be a mistake here.
Context: ...eration 12,000, all requests transition to generation phase, naturally reducing im...

(QB_NEW_EN)


[grammar] ~247-~247: There might be a mistake here.
Context: ...

Performance Bottlenecks: - Balance ratio frequently drops to **0.4 ...

(QB_NEW_EN)


[grammar] ~256-~256: There might be a mistake here.
Context: ...nization. Performance Achievements: - Throughput: 33,499 TPS (1.31× speedup)...

(QB_NEW_EN)


[grammar] ~257-~257: There might be a mistake here.
Context: ...Throughput**: 33,499 TPS (1.31× speedup) - Balance improvement: 84.33% average (v...

(QB_NEW_EN)


[grammar] ~258-~258: There might be a mistake here.
Context: ...**: 84.33% average (vs. 54.11% baseline) - Efficiency: Actual TPS significantly c...

(QB_NEW_EN)


[grammar] ~268-~268: There might be a mistake here.
Context: ...em>

Remaining Challenges: Despite significant improvements, residu...

(QB_NEW_EN)


[grammar] ~283-~283: There might be a mistake here.
Context: ...` Performance Optimization Results: - Peak throughput: 34,140 TPS (1.33× spe...

(QB_NEW_EN)


[grammar] ~284-~284: There might be a mistake here.
Context: ...throughput**: 34,140 TPS (1.33× speedup) - Optimal balance: 87.70% average balanc...

(QB_NEW_EN)


[grammar] ~285-~285: There might be a mistake here.
Context: ... balance**: 87.70% average balance ratio - Near-theoretical efficiency: Actual TP...

(QB_NEW_EN)


[grammar] ~286-~286: There might be a mistake here.
Context: ...TPS (34,140) approaches SOL TPS (37,912) - System stability: Dramatically reduced...

(QB_NEW_EN)


[grammar] ~289-~289: There might be a mistake here.
Context: ...terations Production Configuration: Users can enable the full ADP Balance st...

(QB_NEW_EN)


[grammar] ~308-~308: There might be a mistake here.
Context: ... Key Improvements Over Context Wait: - Enhanced stability: Balance ratio main...

(QB_NEW_EN)


[grammar] ~313-~313: There might be a mistake here.
Context: ...erations Implementation Trade-offs: - ✅ Maximum throughput improvement: 33...

(QB_NEW_EN)


[grammar] ~314-~314: There might be a mistake here.
Context: ...ut improvement**: 33% gain over baseline - ✅ Near-optimal load balancing: 87.70...

(QB_NEW_EN)


[grammar] ~315-~315: There might be a mistake here.
Context: ...alancing**: 87.70% average balance ratio - ⚠️ Iteration overhead: Waiting mecha...

(QB_NEW_EN)


[grammar] ~316-~316: There might be a mistake here.
Context: ...echanisms increase total iteration count - ⚠️ TTFT impact: Strategic delays aff...

(QB_NEW_EN)


[grammar] ~340-~340: There might be a mistake here.
Context: ...itiation 4. Configuration Guidance: - Low-load scenarios: `batching_wait_ite...

(QB_NEW_EN)


[grammar] ~355-~355: There might be a mistake here.
Context: ...mbalances. Experimental Validation: Comprehensive evaluation on the DeepSeek...

(QB_NEW_EN)


[grammar] ~356-~356: There might be a mistake here.
Context: ...tes compelling performance improvements: - 33% throughput increase: From 25,664 t...

(QB_NEW_EN)


[grammar] ~357-~357: There might be a mistake here.
Context: ...ut increase**: From 25,664 to 34,140 TPS - 87% load balance achievement: Dramatic...

(QB_NEW_EN)


[grammar] ~358-~358: There might be a mistake here.
Context: ...: Dramatic improvement from 54% baseline - Near-theoretical efficiency: Actual pe...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

99-99: Images should have alternate text (alt text)

(MD045, no-alt-text)


182-182: Images should have alternate text (alt text)

(MD045, no-alt-text)


228-228: Images should have alternate text (alt text)

(MD045, no-alt-text)


242-242: Images should have alternate text (alt text)

(MD045, no-alt-text)


263-263: Images should have alternate text (alt text)

(MD045, no-alt-text)


303-303: Images should have alternate text (alt text)

(MD045, no-alt-text)


327-327: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

Copy link
Collaborator

@nv-guomingz nv-guomingz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Kefeng-Duan
Copy link
Collaborator

/bot run

1 similar comment
@nv-guomingz
Copy link
Collaborator

/bot run

@Kefeng-Duan Kefeng-Duan requested review from chenopis and removed request for chenopis August 28, 2025 12:15
@tensorrt-cicd
Copy link
Collaborator

PR_Github #16857 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16858 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16857 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16858 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #12656 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@Kefeng-Duan Kefeng-Duan merged commit c4f8233 into NVIDIA:main Aug 28, 2025
5 checks passed
@chenopis
Copy link
Collaborator

I reviewed it after the merge. LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants