-
Notifications
You must be signed in to change notification settings - Fork 140
Fix SnippetsFilter Functional test #3751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3751 +/- ##
==========================================
- Coverage 87.02% 86.99% -0.04%
==========================================
Files 128 128
Lines 16404 16404
Branches 62 62
==========================================
- Hits 14276 14271 -5
- Misses 1953 1957 +4
- Partials 175 176 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
af17ad4 to
4469ac3
Compare
|
Adding flake attempts, not quite sure whats causing these 502 errors and traffic not flowing to the httproutes as seen here: https://github.com/nginx/nginx-gateway-fabric/actions/runs/17055138233/job/48355054192#step:17:729. The "no live upstreams while connecting to upstream" is a common error we've seen in other tests so it could be related to that. This error also seems to be quite inconsistent, in the past 8 local runs and 12 (4 times per pipeline run, and i ran it 3 times) runs of a functional test in the pipeline it only occurred once. |
|
|
|
I guess in our case the backend apps are just Actually, it looks like all the non-plus SF tests fail on the GRPC test case which should be solved by the waiting for apps to be ready check. But the Plus function test runs fail on the httproute not being able to send traffic. See https://github.com/nginx/nginx-gateway-fabric/actions/runs/17075769277/job/48417202770 for an example. I'll keep taking a look. |
|
@bjee19 Well there is clearly something going wrong here. If the apps themselves are having problems (or k8s is restarting the pods, or something else), it's causing our tests to be flaky, which is annoying. We do still have to ensure our infrastructure works. Can't ignore the potential for that issue just because it isn't a product issue. Or the other potential issue is that nginx has the wrong IP addresses for the endpoints. In that case, it could be our problem directly. There are definitely cases where rerunning flaky tests is valid, but I also don't want us to hide issues. |
|
There are two errors/failing tests related to the snippets filter functional test. This error has been occurring more frequently and will get fixed with the check to see if the GRPCRoute is ready. This error has been occurring less frequently and only occurs when using NGINX Plus. This error is caused when NGF updates the upstreams and in between sending the nginx configuration and sending an nginx plus api call to update the upstreams, there is a singular nginx worker process which fails to shut down when the processes are reloaded. This causes a hangup in the NGINX Agent code which waits for all the nginx worker processes to shutdown before sending back a successful configuration update. The NGINX Agent code blocks for a maximum of 30 seconds which is longer than our 10 second httprequest timeout. This error is still under investigation of why a singular nginx worker process would take exactly 60 seconds to shut down while all the other worker processes take a few seconds. However, an interim fix is increasing the request timeout to 30 seconds. In testing, I saw that when the test would fail to pass in 0.5 seconds, it would instead pass in around ~27 seconds. As a side note, this second error would not occur when running nginx in debug mode because Agent did not block on those worker processes because of a bug on their side. |
b76fd21 to
bb32261
Compare
Add a check to wait for the GRPCRoute to be ready and increase the request timeout to cover an issue regarding an nginx worker process taking an extended period of time to shut down.
Closes: #3753