intermediate_source/FSDP_tutorial.rst 번역 #953

uddk6215 · 2024-10-02T13:34:49Z

라이선스 동의

변경해주시는 내용에 BSD 3항 라이선스가 적용됨을 동의해주셔야 합니다.

더 자세한 내용은 기여하기 문서를 참고해주세요.

동의하시면 아래 [ ]를 [x]로 만들어주세요.

기여하기 문서를 확인하였으며, 본 PR 내용에 BSD 3항 라이선스가 적용됨에 동의합니다.

PR 종류

이 PR에 해당되는 종류 앞의 [ ]을 [x]로 변경해주세요.

오탈자를 수정하거나 번역을 개선하는 기여
번역되지 않은 튜토리얼을 번역하는 기여
공식 튜토리얼 내용을 반영하는 기여
위 종류에 포함되지 않는 기여

PR 설명

FSDP 튜토리얼 문서를 일부 번역하였습니다. 곧 전체 번역 완료할 예정입니다.

hyoyoung

긴 문서 번역하느라 수고하셨습니다.
몇가지 어색한 부분이 있는데 확인 후 수정 부탁드립니다.

hyoyoung · 2024-10-23T15:09:52Z

intermediate_source/FSDP_tutorial.rst

-Training AI models at a large scale is a challenging task that requires a lot of compute power and resources. 
-It also comes with considerable engineering complexity to handle the training of these very large models.
-`PyTorch FSDP <https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/>`__, released in PyTorch 1.11 makes this easier.
+대규모 AI 모델을 학습하는 것은 많은 컴퓨팅 파워와 리소스를 필요로 하는 어려운 작업입니다.


리소스는 자원으로 순화해도 좋을 듯합니다

hyoyoung · 2024-10-23T15:11:11Z

intermediate_source/FSDP_tutorial.rst

+`DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__ (DDP) 학습에서는,
+각 process/ worker가 모델의 복제본을 소유하고 데이터 배치를 처리한 후, 최종적으로 all-reduce를 사용하여 서로 다른 worker들의 변화도를 합산합니다. 
+DDP에서는 모델 가중치와 옵티마이저 상태가 모든 worker들에 걸쳐 복제됩니다. 
+FSDP는 모델 파라미터, 옵티마이저 상태, 변화도를 DDP rank들에 걸쳐 샤딩하는 데이터 병렬 처리 방식입니다.


parameter는 용어집에서 매개변수로 쓰도록 권고하고 있습니다.

hyoyoung · 2024-10-23T15:11:52Z

intermediate_source/FSDP_tutorial.rst

-When training with FSDP, the GPU memory footprint is smaller than when training with DDP across all workers. This makes the training of some very large models feasible by allowing larger models or batch sizes to fit on device. This comes with the cost of increased communication volume. The communication overhead is reduced by internal optimizations like overlapping communication and computation.
+FSDP로 학습할 때, GPU 메모리 사용량은 모든 work들에 걸쳐 DDP로 학습할 때보다 작습니다. 
+이로 인해 더 큰 모델이나 배치 크기를 디바이스에 맞출 수 있어 매우 큰 모델의 학습이 가능해집니다. 
+다만 이는 통신량 증가라는 비용을 수반합니다. 이 때 발생하는 오버헤드는 통신과 계산을 중첩하는 등의 내부 최적화를 통해 줄어듭니다.


이 때 발생하는 통신 오버헤드라고 해주는데 더 명확할거 같습니다.

hyoyoung · 2024-10-23T15:12:17Z

intermediate_source/FSDP_tutorial.rst

+*생성자에서*

-* Shard model parameters and each rank only keeps its own shard
+* 모델 파라미터들을 샤딩하고 각 랭크는 자신의 샤드만 유지합니다.


parameter는 용어집 참조

hyoyoung · 2024-10-23T15:12:44Z

intermediate_source/FSDP_tutorial.rst

+* all_gather를 실행하여 모든 랭크로부터 모든 샤드를 수집해 이 FSDP 유닛의 전체 파라미터를 복원합니다.
+* 역전파 연산을 실행합니다.
+* reduce_scatter를 실행하여 변화도를 동기화합니다.
+* 파라미터를 폐기합니다.


폐기보다는 버린다고 하는게 더 알기 쉬울거 같습니다.

모두 반영완료하였습니다. 감사합니다!

hyoyoung

몇가지 확인 사항을 부탁드립니다

hyoyoung · 2024-11-10T13:07:52Z

intermediate_source/FSDP_tutorial.rst

-2.4 Define a distributed train function that wraps the model in FSDP
-
-**Note: to save the FSDP model, we need to call the state_dict on each rank then on Rank 0 save the overall states.**
+2.4 모델을 FSDP로 래핑하는 분산 학습 함수 정의


여기서 래핑은 감싸다로 순화 가능할거 같습니다

hyoyoung · 2024-11-10T13:08:15Z

intermediate_source/FSDP_tutorial.rst


-Wrapping the model with FSDP, the model will look as follows, we can see the model has been wrapped in one FSDP unit.
-Alternatively, we will look at adding the fsdp_auto_wrap_policy next and will discuss the differences. 
+FSDP로 모델을 래핑하면, 모델은 다음과 같이 보일 것입니다. 모델이 하나의 FSDP 유닛으로 래핑된 것을 볼 수 있습니다.


여기 랩핑도 마찬가지로 순화가능할거 같습니다

hyoyoung · 2024-11-10T13:08:53Z

intermediate_source/FSDP_tutorial.rst

+이 API는 변경될 수 있습니다. 기본값은 None이며, 이 경우 오프로딩이 수행되지 않습니다.

-Using this feature may slow down the training considerably, due to frequent copying of tensors from host to device, but it could help improve memory efficiency and train larger scale models. 
+이 기능을 사용하면 호스트와 디바이스 간 텐서의 빈번한 복사로 인해 학습 속도가 상당히 느려질 수 있지만, 


텐서는 일반적으로 tensor로 두고, 번역하지 않습니다

수정완료하였습니다.

hyoyoung

good

uddk6215 added 2 commits October 2, 2024 22:32

Draft FSDP translataion

97ac83b

Complete translation of the FSDP tutorial

935ebc5

uddk6215 marked this pull request as ready for review October 5, 2024 14:01

hyoyoung requested changes Oct 23, 2024

View reviewed changes

Fixes applied

3c8232c

uddk6215 force-pushed the translate_FSDP branch from 972f6f2 to 3c8232c Compare November 2, 2024 02:47

uddk6215 requested a review from hyoyoung November 2, 2024 02:53

hyoyoung requested changes Nov 10, 2024

View reviewed changes

2nd: Fixes applied

e102132

uddk6215 requested a review from hyoyoung November 17, 2024 07:56

hyoyoung approved these changes Nov 25, 2024

View reviewed changes

hyoyoung merged commit 0b3273a into PyTorchKorea:master Nov 30, 2024

intermediate_source/FSDP_tutorial.rst 번역 #953

intermediate_source/FSDP_tutorial.rst 번역 #953

Uh oh!

Conversation

uddk6215 commented Oct 2, 2024

라이선스 동의

관련 이슈 번호

PR 종류

PR 설명

Uh oh!

hyoyoung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hyoyoung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hyoyoung left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants