KEMBAR78
[CI] Adds support for selecting experiments for workflows on runner determinator by jeanschmidt · Pull Request #137614 · pytorch/pytorch · GitHub
Skip to content

Conversation

@jeanschmidt
Copy link
Contributor

@jeanschmidt jeanschmidt commented Oct 9, 2024

adds a default tag to experiment configurations, allowing to remove some experiments by default on the random draw:

        experiments:
            lf:
                rollout_perc: 25
            otherExp:
                rollout_perc: 25
                default: false
        ---

and includes the configuration to filter what experiments are of interest for a particular workflow (comma separated):

  get-test-label-type:
    name: get-test-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...
      check_experiments: "awsa100"

The end goal, is to enable us to run multiple experiments, that are independent from one another. For example, while we still runs the LF infra experiment, we want to migrate other runners leveraging the current solution. A immediate UC is for the A100 instances, where we want to migrate to AWS.

Those new instances will during the migration period be labeled both awsa100.linux.gcp.a100 and linux.aws.a100. Once the experiment ends, we will remove the first confusing one.

jobs:
  get-build-label-type:
    name: get-build-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...

  get-test-label-type:
    name: get-test-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...
      check_experiments: "awsa100"
      
  linux-focal-cuda12_1-py3_10-gcc9-inductor-build:
    name: cuda12.1-py3.10-gcc9-sm80
    uses: ./.github/workflows/_linux-build.yml
    needs:
      - get-build-label-type
      - get-test-label-type
    with:
      runner_prefix: "${{ needs.get-build-label-type.outputs.label-type }}"
      ...
      test-matrix: |
        { include: [
          { config: "inductor_huggingface_perf_compare", shard: 1, num_shards: 1, runner: "${{ needs.get-test-label-type.outputs.label-type }}linux.gcp.a100" },
          ...
        ]}
      ...
experiments:
    lf:
        rollout_perc: 50
    awsa100:
        rollout_perc: 50
         default: false

@jeanschmidt jeanschmidt requested a review from a team as a code owner October 9, 2024 18:22
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137614

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3118452 with merge base b71d0ac (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 9, 2024
@jeanschmidt jeanschmidt changed the title Adds support for select experiment on runner determinator Adds support for selecting experiments for workflows on runner determinator Oct 9, 2024
)
parser.add_argument(
"--check-experiments",
type=_str_comma_separated_to_set,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: you can pass in a function here!

Comment on lines 396 to 401
do_check = True
if check_experiments:
if experiment_name not in check_experiments:
exp_list = ", ".join(check_experiments)
log.info(f"Skipping experiment '{experiment_name}', as it is not in the check_experiments list: {exp_list}")
do_check = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When an experiment is not a default experiment and not a check-experiment, even if the user is explicitly opted in the experiment should be disabled for them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh good point...

exp_list = ", ".join(check_experiments)
log.info(f"Skipping experiment '{experiment_name}', as it is not in the check_experiments list: {exp_list}")
do_check = False
elif not experiment_settings.default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if check experiments is enabled, then we should ignore the default setting and only use the check experiments settings.

Else you'll pull in the lf prefix as well

help="Current GitHub ref type, branch or tag",
)
parser.add_argument(
"--check-experiments",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming suggestion:

Suggested change
"--check-experiments",
"--eligible-experiments",

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please commit the suggested changes from pytorch's linter.

Comment on lines 233 to 238
"""
prefix = rd.get_runner_prefix(settings_text, ["User2"], USER_BRANCH, {"otherExp"})
self.assertEqual(
"otherExp.", prefix, "Runner prefix not correct for User2"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
prefix = rd.get_runner_prefix(settings_text, ["User2"], USER_BRANCH, {"otherExp"})
self.assertEqual(
"otherExp.", prefix, "Runner prefix not correct for User2"
)
"""
prefix = rd.get_runner_prefix(
settings_text, ["User2"], USER_BRANCH, {"otherExp"}
)
self.assertEqual("otherExp.", prefix, "Runner prefix not correct for User2")

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please commit the suggested changes from pytorch's linter.

Comment on lines 376 to 378
for experiment_name, experiment_settings in settings.experiments.items():
enabled = False

if not experiment_settings.all_branches and is_exception_branch(branch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for experiment_name, experiment_settings in settings.experiments.items():
enabled = False
if not experiment_settings.all_branches and is_exception_branch(branch):
for experiment_name, experiment_settings in settings.experiments.items():
if not experiment_settings.all_branches and is_exception_branch(branch):

@jeanschmidt jeanschmidt changed the title Adds support for selecting experiments for workflows on runner determinator [CI] Adds support for selecting experiments for workflows on runner determinator Oct 11, 2024
@jeanschmidt jeanschmidt dismissed ZainRizvi’s stale review October 11, 2024 16:38

issues addressed

@jeanschmidt
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 11, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jeanschmidt jeanschmidt self-assigned this Oct 11, 2024
@zxiiro zxiiro self-requested a review October 11, 2024 18:18
@github-actions github-actions bot deleted the jeanschmidt/runner_determinator_default branch November 11, 2024 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants