KEMBAR78
remove text nlp collection by dimapihtar · Pull Request #14110 · NVIDIA-NeMo/NeMo · GitHub
Skip to content

Conversation

@dimapihtar
Copy link
Collaborator

@dimapihtar dimapihtar commented Jul 2, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Removes next parts of nlp collection (with their examples): text_normalization_as_tagging , text_classification, text2sparql, spellchecking_asr_customization, duplex_text_normalization.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
@github-actions github-actions bot added the NLP label Jul 2, 2025
dimapihtar and others added 4 commits July 2, 2025 10:51
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
…NeMo into dpykhtar/remove_text_nlp

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
from nemo.collections.nlp.data.data_utils import *
from nemo.collections.nlp.data.entity_linking.entity_linking_dataset import EntityLinkingDataset
from nemo.collections.nlp.data.information_retrieval.information_retrieval_dataset import (
from nemo.collections.nlp.data.data_utils import * # noqa: F401

Check notice

Code scanning / CodeQL

'import *' may pollute namespace Note

Import pollutes the enclosing namespace, as the imported module
nemo.collections.nlp.data.data_utils
does not define '__all__'.

Copilot Autofix

AI 4 months ago

To fix the issue, replace the from nemo.collections.nlp.data.data_utils import * statement with explicit imports of the specific names required from the data_utils module. This ensures that only the necessary names are imported, avoiding namespace pollution.

Steps:

  1. Identify the specific names used from the data_utils module in the current file or elsewhere in the codebase.
  2. Replace the import * statement with explicit imports of those names.

Suggested changeset 1
nemo/collections/nlp/data/__init__.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/nlp/data/__init__.py b/nemo/collections/nlp/data/__init__.py
--- a/nemo/collections/nlp/data/__init__.py
+++ b/nemo/collections/nlp/data/__init__.py
@@ -14,3 +14,8 @@
 
-from nemo.collections.nlp.data.data_utils import *  # noqa: F401
+from nemo.collections.nlp.data.data_utils import (  # noqa: F401
+    function_name_1,
+    function_name_2,
+    class_name_1,
+    class_name_2,
+)
 from nemo.collections.nlp.data.entity_linking.entity_linking_dataset import EntityLinkingDataset  # noqa: F401
EOF
@@ -14,3 +14,8 @@

from nemo.collections.nlp.data.data_utils import * # noqa: F401
from nemo.collections.nlp.data.data_utils import ( # noqa: F401
function_name_1,
function_name_2,
class_name_1,
class_name_2,
)
from nemo.collections.nlp.data.entity_linking.entity_linking_dataset import EntityLinkingDataset # noqa: F401
Copilot is powered by AI and may make mistakes. Always verify output.
dimapihtar and others added 12 commits July 2, 2025 04:20
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Jul 2, 2025

[🤖]: Hi @dimapihtar 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

@github-actions github-actions bot removed the Run CICD label Jul 2, 2025
dimapihtar and others added 2 commits July 3, 2025 04:40
yaoyu-33
yaoyu-33 previously approved these changes Jul 3, 2025
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2025

[🤖]: Hi @dimapihtar 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

@dimapihtar dimapihtar requested a review from yaoyu-33 July 6, 2025 12:18
@dimapihtar dimapihtar merged commit c67474d into main Jul 8, 2025
197 checks passed
@dimapihtar dimapihtar deleted the dpykhtar/remove_text_nlp branch July 8, 2025 16:40
AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Jul 23, 2025
* remove text nlp collection

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix style

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove duplex_text

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix imports

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove spellchecking

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix import

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix style

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix import

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove examples

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: Amir Hussein <amhussein@nvidia.com>
AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Aug 5, 2025
* remove text nlp collection

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix style

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove duplex_text

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix imports

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove spellchecking

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix import

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix style

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix import

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove examples

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: Amir Hussein <amhussein@nvidia.com>
AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Aug 5, 2025
* remove text nlp collection

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix style

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove duplex_text

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix imports

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove spellchecking

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix import

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix style

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix import

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove examples

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: Amir Hussein <amhussein@nvidia.com>
nasretdinovr pushed a commit to nasretdinovr/NeMo that referenced this pull request Aug 8, 2025
* remove text nlp collection

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix style

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove duplex_text

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix imports

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove spellchecking

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix import

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix style

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix import

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove examples

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
guyueh1 pushed a commit to guyueh1/NeMo that referenced this pull request Aug 25, 2025
* remove text nlp collection

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix style

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove duplex_text

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix imports

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove spellchecking

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix import

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix style

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix import

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove examples

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants