KEMBAR78
Fix SnippetsFilter Functional test by bjee19 · Pull Request #3751 · nginx/nginx-gateway-fabric · GitHub
Skip to content

Conversation

@bjee19
Copy link
Contributor

@bjee19 bjee19 commented Aug 15, 2025

Add a check to wait for the GRPCRoute to be ready and increase the request timeout to cover an issue regarding an nginx worker process taking an extended period of time to shut down.

Closes: #3753

@bjee19 bjee19 requested a review from a team as a code owner August 15, 2025 20:59
@bjee19 bjee19 changed the title Fix SnippetsFilter Functional test DRAFT: Fix SnippetsFilter Functional test Aug 15, 2025
@github-actions github-actions bot added bug Something isn't working tests Pull requests that update tests labels Aug 15, 2025
@codecov
Copy link

codecov bot commented Aug 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.99%. Comparing base (fc6aa9a) to head (bb32261).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3751      +/-   ##
==========================================
- Coverage   87.02%   86.99%   -0.04%     
==========================================
  Files         128      128              
  Lines       16404    16404              
  Branches       62       62              
==========================================
- Hits        14276    14271       -5     
- Misses       1953     1957       +4     
- Partials      175      176       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sarthyparty
sarthyparty previously approved these changes Aug 15, 2025
@sarthyparty sarthyparty dismissed their stale review August 18, 2025 14:28

it was draft

@bjee19 bjee19 force-pushed the bug/functional-tests-sf-fail branch from af17ad4 to 4469ac3 Compare August 18, 2025 23:30
@bjee19
Copy link
Contributor Author

bjee19 commented Aug 19, 2025

Adding flake attempts, not quite sure whats causing these 502 errors and traffic not flowing to the httproutes as seen here: https://github.com/nginx/nginx-gateway-fabric/actions/runs/17055138233/job/48355054192#step:17:729.

The "no live upstreams while connecting to upstream" is a common error we've seen in other tests so it could be related to that. This error also seems to be quite inconsistent, in the past 8 local runs and 12 (4 times per pipeline run, and i ran it 3 times) runs of a functional test in the pipeline it only occurred once.

@sjberman
Copy link
Collaborator

no live upstreams is indicative of a problem with the backend apps themselves, not nginx. It would be nice if we could diagnose why those might be having issues (like logs, for example).

@bjee19
Copy link
Contributor Author

bjee19 commented Aug 19, 2025

I guess in our case the backend apps are just nginxdemos/nginx-hello:plain-text which shouldn't really be facing issues, do you think its worth diagnosing them? Especially when its very infrequent? @sjberman

Actually, it looks like all the non-plus SF tests fail on the GRPC test case which should be solved by the waiting for apps to be ready check. But the Plus function test runs fail on the httproute not being able to send traffic. See https://github.com/nginx/nginx-gateway-fabric/actions/runs/17075769277/job/48417202770 for an example. I'll keep taking a look.

@sjberman
Copy link
Collaborator

@bjee19 Well there is clearly something going wrong here. If the apps themselves are having problems (or k8s is restarting the pods, or something else), it's causing our tests to be flaky, which is annoying. We do still have to ensure our infrastructure works. Can't ignore the potential for that issue just because it isn't a product issue. Or the other potential issue is that nginx has the wrong IP addresses for the endpoints. In that case, it could be our problem directly.

There are definitely cases where rerunning flaky tests is valid, but I also don't want us to hide issues.

@bjee19
Copy link
Contributor Author

bjee19 commented Aug 21, 2025

There are two errors/failing tests related to the snippets filter functional test.

 [FAIL] SnippetsFilter when SnippetsFilters are applied to the resources nginx directives are set properly for [It] GRPCRoute [functional, snippets-filter]
  /home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/tests/suite/snippets_filter_test.go:130

This error has been occurring more frequently and will get fixed with the check to see if the GRPCRoute is ready.

  [FAIL] SnippetsFilter when SnippetsFilters are applied to the resources verify working traffic [It] should return a 200 response for HTTPRoute [functional, snippets-filter]
  /home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/tests/suite/snippets_filter_test.go:105

This error has been occurring less frequently and only occurs when using NGINX Plus. This error is caused when NGF updates the upstreams and in between sending the nginx configuration and sending an nginx plus api call to update the upstreams, there is a singular nginx worker process which fails to shut down when the processes are reloaded. This causes a hangup in the NGINX Agent code which waits for all the nginx worker processes to shutdown before sending back a successful configuration update. The NGINX Agent code blocks for a maximum of 30 seconds which is longer than our 10 second httprequest timeout.

This error is still under investigation of why a singular nginx worker process would take exactly 60 seconds to shut down while all the other worker processes take a few seconds. However, an interim fix is increasing the request timeout to 30 seconds. In testing, I saw that when the test would fail to pass in 0.5 seconds, it would instead pass in around ~27 seconds.

As a side note, this second error would not occur when running nginx in debug mode because Agent did not block on those worker processes because of a bug on their side.

@bjee19 bjee19 changed the title DRAFT: Fix SnippetsFilter Functional test Fix SnippetsFilter Functional test Aug 21, 2025
@bjee19 bjee19 force-pushed the bug/functional-tests-sf-fail branch from b76fd21 to bb32261 Compare August 21, 2025 18:22
@bjee19 bjee19 merged commit 8f6d24b into main Aug 21, 2025
65 of 66 checks passed
@bjee19 bjee19 deleted the bug/functional-tests-sf-fail branch August 21, 2025 19:25
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in NGINX Gateway Fabric Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working tests Pull requests that update tests

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Fix failing SnippetsFilter functional test

4 participants