recipes_source/recipes/tuning_guide.py 번역 #879

ohkingtaek · 2024-08-18T14:52:12Z

라이선스 동의

변경해주시는 내용에 BSD 3항 라이선스가 적용됨을 동의해주셔야 합니다.

더 자세한 내용은 기여하기 문서를 참고해주세요.

동의하시면 아래 [ ]를 [x]로 만들어주세요.

기여하기 문서를 확인하였으며, 본 PR 내용에 BSD 3항 라이선스가 적용됨에 동의합니다.

PR 종류

이 PR에 해당되는 종류 앞의 [ ]을 [x]로 변경해주세요.

오탈자를 수정하거나 번역을 개선하는 기여
번역되지 않은 튜토리얼을 번역하는 기여
공식 튜토리얼 내용을 반영하는 기여
위 종류에 포함되지 않는 기여

PR 설명

recipes_source/recipes/tuning_guide.py 번역 완료 하였습니다. 검토 부탁드립니다

jkworldchampion · 2024-09-07T11:04:50Z

recipes_source/recipes/tuning_guide.py


 ###############################################################################
-# Enable asynchronous data loading and augmentation
+# 비동기식으로 데이터 가져오기 및 데이터 증강을 활성화하는 방법


'데이터 증강을 활성화하는 방법'을 '데이터 증강법'으로 바꾸는 것은 어떨까요?

hyoyoung

긴 문서를 전반적으로 읽기 좋게 잘 정리해주셨습니다

몇가지 수정사항 및 제안사항을 남기니 확인해보시고 반영가능한 것은 반영 부탁드립니다.

hyoyoung · 2024-09-11T15:39:25Z

recipes_source/recipes/tuning_guide.py

+# 는 각각 워커의 subprocess에서 비동기식 데이터 로딩과 데이터 증강을 지원합니다. 
+# ``DataLoader`` 의 num_worker 기본 설정은 ``num_worker=0`` 으로, 이는 데이터 로딩이 
+# 동기적으로 이루어지며 메인 프로세스에서 실행됨을 의미합니다. 결론은 메인 학습 프로세스는 데이터를 
+# 사용할 수 있을 때까지 기다려야 실행할 수 있습니다.


As a result는 결론이라기 보다는
따라서나, 결과적으로 정도로 바꿔보는게 어떨까요?

hyoyoung · 2024-09-11T15:44:40Z

recipes_source/recipes/tuning_guide.py

-# requirements enables increasing the batch size that can improve utilization.
+# 버퍼 체크포인트 저장은 모델 학습 중 메모리 용량 부담을 완화하기 위한 기법입니다. 역전파에서 앞부분의 
+# 변화도를 계산하기 위해 모든 계층의 입력을 저장하는 대신, 일부 계층의 입력만 저장하고 나머지는
+# 역전파 중에 재계산합니다. 메모리 요구 사항이 줄어들어 배치 크기를 증가시킬 수 있으며, 이는 활용도를 


여기서 utilization은 활용 효율이 조금 더 어울릴거 같습니다

hyoyoung · 2024-09-11T15:47:22Z

recipes_source/recipes/tuning_guide.py

-# up/down sampling and matrix-vector operations with small accumulation depth.
+# 체크포인트 저장할 대상은 신중하게 선택해야 합니다. 가장 좋은 방법은 재계산 비용이 적은 대규모 
+# 레이어의 출력을 저장하지 않는 것입니다. 예를 들어, 활성화 함수(예: ``ReLU`` , ``Sigmoid`` , 
+# ``Tanh`` ), up/down 샘플링, 그리고 적은 축적 뎁스를 가진 행렬-벡터 연산 등이 체크포인트 저장 


영어 문법 a,b and c일경우 a,b와 그리고 c입니다로 하지 않고
a,b,c입니다로 해도 자연스럽습니다

축적 뎁스라고 하니 조금 어색해보이는데
작은 누적 깊이(accumulation depth)정도는 어떨까요?

hyoyoung · 2024-09-11T15:51:26Z

recipes_source/recipes/tuning_guide.py

 ###############################################################################
-# Match the order of layers in constructors and during the execution if using ``DistributedDataParallel(find_unused_parameters=True)``
-# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# ``DistributedDataParallel(find_unused_parameters=True)`` 를 사용할 때 모델 생성자와 실제 실행 중인 것의 레이어 순서를 일치시키는 방법


DistributedDataParallel(find_unused_parameters=True) 를 사용할 때 생성자와 실행 레이어 순서를 일치시키는 방법

이렇게 해도 괜찮지 않을까요?

hyoyoung · 2024-09-11T15:52:53Z

recipes_source/recipes/tuning_guide.py

+# 은 ``find_unused_parameters=True`` 와 함께 모델 생성자에서의 레이어와 파라미터 순서를 
+# 사용하여 ``DistributedDataParallel`` 변화도 all-reduce를 위한 버킷을 만듭니다. 
+# ``DistributedDataParallel`` 은 all-reduce를 역전파와 겹치게 수행합니다. 특정 버킷에 대한 
+# all-reduce는 주어진 버킷의 모든 파라미터에 대한 변화도가 모두 준비되었을 때 비동기적으로 트리거됩니다.


트리거됩니다 보다 작동되다라는 말로 순화해보면 어떨까요

hyoyoung · 2024-09-11T15:53:09Z

recipes_source/recipes/tuning_guide.py

+# 파라미터의 순서를 재조정할 필요가 없습니다.
+
+###############################################################################
+# 분산 설정에서 작업 로드 밸런싱하는 방법


로드 밸런싱은 부하 분산으로 순화 가능할 것 같습니다

uddk6215

전체적으로 좋은 번역 같습니다.

ohkingtaek · 2024-09-17T11:59:12Z

@jkworldchampion @hyoyoung @uddk6215 피드백 감사합니다! 피드백에 맞춰 수정 및 번역 모범 사례 반영하여 추가 수정완료했습니다

jason9865 · 2024-09-22T07:05:43Z

recipes_source/recipes/tuning_guide.py

+# 사용할 수 있을 때까지 기다려야 실행할 수 있습니다.
+#
+# ``num_workers > 0`` 으로 설정하면 비동기식 데이터 로딩과 학습과 데이터 로딩의 동시 처리가 
+# 가능해집니다. ``num_workers`` 값은 작업량, CPU, GPU, 학습 데이터의 위치에 따라 조정해야 


수동형 -> 능동형으로 바꾸면 가독성이 향상될 것 같습니다.
가능해집니다 -> 가능합니다

jason9865 · 2024-09-22T07:07:02Z

recipes_source/recipes/tuning_guide.py

+#
+# ``DataLoader`` 는 ``pin_memory`` 인자를 받으며 기본값은 ``False`` 입니다. GPU를 
+# 사용하는 경우 ``pin_memory=True`` 로 설정하는 것이 좋습니다. 이는 ``DataLoader`` 가 
+# 고정된 메모리를 사용하게 되고, 호스트에서 GPU로 더 빠르고 비동기적인 메모리 복사를 가능하게 합니다.


수동형 -> 능동형으로 바꾸면 가독성이 향상될 것 같습니다.

사용하게 되고 -> 사용하고

비동기적인 메모리 복사를 가능하게 합니다 -> 비동기적으로 메모리를 복사합니다

jason9865 · 2024-09-22T07:14:33Z

recipes_source/recipes/tuning_guide.py

+# 연산들을 결합하여 최적화하는 방법
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~
+# 행렬에서 element-wise 덧셈, 곱셈 같은 연산과 `sin()` , `cos()` , `sigmoid()` 같은 수학 
+# 함수 등의 point-wise 연산들은 하나의 커널로 결합할 수 있습니다. 이러한 결합은 메모리 접근과 커널 


사소한 부분이기는 하지만

연산들 -> 연산
과 같이 번역 모범 사례와 KIGO 표준 스타일 가이드에 명시된 것처럼
복수 표현을 단수표현으로 나타내면 좋을 것 같습니다.

jason9865 · 2024-09-22T07:29:11Z

recipes_source/recipes/tuning_guide.py

+# OpenMP는 병렬 계산 작업의 성능을 향상시키기 위해 사용됩니다.
+# ``OMP_NUM_THREADS`` 는 계산 속도를 높이는 가장 간단한 환경 변수입니다. 이는 OpenMP 계산에 
+# 사용되는 스레드 수를 결정합니다.
+# CPU 친화도 설정은 작업이 여러 코어에 분배되는 방식을 제어합니다. 이는 통신 오버헤드와 캐시 라인 


"친화도" 대신 Affinity를 그대로 사용하는 것이 어떨까요?

CPU 친화도 -> CPU Affinity

jaeseong98

작업하시느라 수고많으셨습니다!

jaeseong98 · 2024-09-22T07:39:58Z

recipes_source/recipes/tuning_guide.py

+# 수행하고 (종종 가장 시간이 적게 걸리는 단계) 결과를 다시 메모리에 쓰는 과정이 필요합니다.
+#
+# 결합된 연산자를 사용하면 여러 point-wise 연산을 위해 단 하나의 커널만 실행되고, 데이터는 한 
+# 번만 로드되고 저장됩니다. 특히 이러한 효율적인 방법은 활성화 함수, 옵티마이저, 직접 수정한 RNN 셀 


바로 95번째 문장에서 load를 "불러오다"라는 의미로 사용한것으로 보아 99번째에서도 로드되고 -> 불러오고 쓰면 어떨까요? 337번쨰 행도 마찬가지입니다!

jason9865 · 2024-09-22T07:41:22Z

긴 문서 번역하시느라 고생 많으셨습니다.
몇 가지 comment를 달아놓았으니 확인 부탁드립니다!

hyoyoung · 2024-10-06T10:21:37Z

recipes_source/recipes/tuning_guide.py

 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-# For small scale models or memory-bound models, such as DLRM, training on CPU is also a good choice. On a machine with multiple sockets, distributed training brings a high-efficient hardware resource usage to accelerate the training process. `Torch-ccl <https://github.com/intel/torch-ccl>`_, optimized with Intel(R) ``oneCCL`` (collective communications library) for efficient distributed deep learning training implementing such collectives like ``allreduce``, ``allgather``, ``alltoall``, implements PyTorch C10D ``ProcessGroup`` API and can be dynamically loaded as external ``ProcessGroup``. Upon optimizations implemented in PyTorch DDP module, ``torch-ccl`` accelerates communication operations. Beside the optimizations made to communication kernels, ``torch-ccl`` also features simultaneous computation-communication functionality.
+# DLRM과 같은 소규모 모델 또는 메모리에 바인딩 된 모델의 경우 CPU에서 학습하는 것도 좋은 선택입니다. 
+# 다중 소켓을 가진 머신에서는 분산 학습이 고효율의 하드웨어 자원 사용을 통해 학습 과정을 가속화합니다. 


여러 소켓을 가진 머신에서는, 분산 학습으로 고효율의 하드웨어 자원 사용하여 학습 과정을 가속할 수 있습니다.

이정도로 의역해보면 어떨까요?

hyoyoung · 2024-10-06T10:22:43Z

recipes_source/recipes/tuning_guide.py

-#   to run convolutional networks with autotuner disabled to avoid the overhead
-#   associated with algorithm selection for each input size.
+#   를 참조하세요.
+# * 입력 크기가 매우 가변적인 경우처럼 드문 상황에서는, 각 입력 크기에 대해 알고리즘 선택과 관련된 


드문 상황에서, 예를 들면 입력 크기가 가변적인 경우,

이렇게 바꿔보는게 문맥을 이해하기 쉬워질거 같습니다

hyoyoung

good

ohkingtaek added 2 commits August 18, 2024 23:49

first commit for draft pr

662a848

recipes_source/recipes/Tuning_guide.py 번역

3d14e08

ohkingtaek marked this pull request as ready for review September 4, 2024 16:13

jkworldchampion reviewed Sep 7, 2024

View reviewed changes

hyoyoung requested changes Sep 11, 2024

View reviewed changes

uddk6215 approved these changes Sep 14, 2024

View reviewed changes

번역 모범 사례 및 PR Review에 의한 recipes_source/recipes/tuning_guide.py 수정

b0c8ee2

jason9865 reviewed Sep 22, 2024

View reviewed changes

jaeseong98 reviewed Sep 22, 2024

View reviewed changes

번역 피드백에 의한 recipes_source/recipes/tuning_guide.py 수정

ff49f67

hyoyoung reviewed Oct 6, 2024

View reviewed changes

번역 피드백에 의한 recipes_source/recipes/tuning_guide.py 수정

3d30a08

hyoyoung approved these changes Oct 14, 2024

View reviewed changes

hyoyoung merged commit 9b1ac85 into PyTorchKorea:master Oct 15, 2024

recipes_source/recipes/tuning_guide.py 번역 #879

recipes_source/recipes/tuning_guide.py 번역 #879

Uh oh!

Conversation

ohkingtaek commented Aug 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

라이선스 동의

관련 이슈 번호

PR 종류

PR 설명

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hyoyoung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

uddk6215 left a comment

Choose a reason for hiding this comment

Uh oh!

ohkingtaek commented Sep 17, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jason9865 Sep 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaeseong98 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jason9865 commented Sep 22, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hyoyoung left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ohkingtaek commented Aug 18, 2024 •

edited

Loading

jason9865 Sep 22, 2024 •

edited

Loading