-
Notifications
You must be signed in to change notification settings - Fork 25.7k
ps sparse rpc #58003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ps sparse rpc #58003
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 0b479e7 (more details on the Dr. CI page and at hud.pytorch.org/pr/58003):
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
[ghstack-poisoned]
[ghstack-poisoned]
ghstack-source-id: b4f2a86 Pull Request resolved: pytorch#58003
[ghstack-poisoned]
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested before with workarounds with everything passing. I can't test until the pull request for the lock is merged and the issue for RuntimeError is closed.
| ): | ||
| super().__init__(rank) | ||
|
|
||
| self.lock = threading.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull request to fix issue with instance lock #57943
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change anything, just FYI. As a temporary solution, IIUC, we can add a __getstates__ to tell avoid pickling RRefs.
def __getstates__():
return {}
benchmarks/distributed/rpc/parameter_server/trainers/DdpSparseRpcTrainer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/servers/AverageParameterServer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/servers/AverageParameterServer.py
Outdated
Show resolved
Hide resolved
adds base trainer class RpcTrainerBase adds trainer class DdpSparseRpcTrainer adds base server classes ParameterServerBase and AverageParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF [ghstack-poisoned]
...rks/distributed/rpc/parameter_server/experiment_scripts/ddp_cpu_sparse_rpc_nccl_allreduce.sh
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/configurations/benchmark_configurations.json
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/configurations/benchmark_configurations.json
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/configurations/server_configurations.json
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/servers/AverageParameterServer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/servers/AverageParameterServerBase.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/servers/ParameterServerBase.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/servers/ParameterServerBase.py
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/servers/AverageParameterServer.py
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/servers/ParameterServerBase.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/DdpSparseRpcTrainer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/DdpSparseRpcTrainer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/DdpSparseRpcTrainer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/DdpSparseRpcTrainer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/DdpSparseRpcTrainer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/RpcTrainerBase.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/RpcTrainerBase.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/RpcTrainerBase.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/configurations/benchmark_configurations.json
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/DdpSparseRpcTrainer.py
Outdated
Show resolved
Hide resolved
adds base trainer class RpcTrainerBase adds trainer class DdpSparseRpcTrainer adds base server classes ParameterServerBase and AverageParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF [ghstack-poisoned]
adds base trainer class RpcTrainerBase adds trainer class DdpSparseRpcTrainer adds base server classes ParameterServerBase and AverageParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF [ghstack-poisoned]
adds trainer class DdpTrainer adds trainer class DdpRpcTrainer adds trainer class DdpSparseRpcTrainer adds base server classes ParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF [ghstack-poisoned]
adds trainer class DdpTrainer adds trainer class DdpRpcTrainer adds trainer class DdpSparseRpcTrainer adds base server classes ParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF [ghstack-poisoned]
benchmarks/distributed/rpc/parameter_server/servers/AverageParameterServer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/DdpTrainer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/DdpTrainer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/DdpSparseRpcTrainer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/DdpSparseRpcTrainer.py
Outdated
Show resolved
Hide resolved
benchmarks/distributed/rpc/parameter_server/trainers/DdpRpcHelper.py
Outdated
Show resolved
Hide resolved
adds trainer class DdpTrainer adds trainer class DdpRpcTrainer adds trainer class DdpSparseRpcTrainer adds base server classes ParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF [ghstack-poisoned]
adds trainer class DdpTrainer adds trainer class DdpRpcTrainer adds trainer class DdpSparseRpcTrainer adds base server classes ParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF [ghstack-poisoned]
|
@gcramer23 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@gcramer23 merged this pull request in 4ed2d5d. |
Summary: Pull Request resolved: pytorch#58003 adds trainer class DdpTrainer adds trainer class DdpSparseRpcTrainer adds server class ParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29379696 Pulled By: gcramer23 fbshipit-source-id: 9cf5fb7398ba2fa3eb694afbddc4ed00d97f205f
Summary: Pull Request resolved: #58003 adds trainer class DdpTrainer adds trainer class DdpSparseRpcTrainer adds server class ParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29379696 Pulled By: gcramer23 fbshipit-source-id: 9cf5fb7398ba2fa3eb694afbddc4ed00d97f205f
Stack from ghstack:
adds trainer class DdpTrainer
adds trainer class DdpSparseRpcTrainer
adds server class ParameterServerBase
adds server class AverageParameterServer
adds experiment ddp_cpu_sparse_rpc_nccl_allreduce
adds experiment ddp_cuda_sparse_rpc_nccl_allreduce
quip document https://fb.quip.com/iQUtAeKIxWpF
Differential Revision: D29379696