-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
gh-126835: make CFG optimizer skip over NOP's when looking for const sequence construction #129703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Good job. This problem also blocks #128802 where's NOP isn't taken into account in between. |
|
One thing to keep in mind about I don't see a problem related to this here, just want you to be aware of this. |
|
I don't think adding complexity to the individual optimizations is the way to go. Take the constant |
The issue here is skipping NOPs that are not going to be removed (because source locations), but they are in the middle of a const sequence construction. |
Current Python version does not fold it to constant
Do you suggest this is something we will be adding? |
|
The example I give could be folded using AST transforms, even if it happens not to be. @iritkatriel what's your take on this? |
|
But this particular logic does not interfere with potential control flow simplification. We just fold inside individual basic blocks. Sometimes multiple such foldings happen together and we need to take into account intermediate |
We have some of that in |
|
Oh, turns out @iritkatriel merged a1417b2, so tests are failing because |
2fe4e7d to
5866faa
Compare
|
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks, Yan! Just to make sure there are no refleaks, I will run the test suite locally in huntrefleak mode.
is_constant_sequence to take NOP's into accountCo-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
Results: == Tests result: SUCCESS ==
22 tests skipped:
test.test_asyncio.test_windows_events
test.test_asyncio.test_windows_utils test_android test_apple
test_devpoll test_free_threading test_idle test_kqueue
test_launcher test_msvcrt test_smtpnet test_ssl test_startfile
test_tcl test_tkinter test_ttk test_ttk_textonly test_turtle
test_winapi test_winconsoleio test_winreg test_wmi
8 tests skipped (resource denied):
test_curses test_peg_generator test_pyrepl test_socketserver
test_urllib2net test_urllibnet test_winsound test_zipfile64
454 tests OK.
Total duration: 13 min 49 sec
Total tests: run=44,920 skipped=2,362
Total test files: run=476/484 skipped=22 resource_denied=8
Result: SUCCESS |
We need to account for optimizations interfering with each other when moving more and more AST optimizations to CFG. Currently
is_constant_sequence& (fold_tuple_on_constants&optimize_if_const_list_or_setthat use it) do not take into accountNOPs in between. This becomes problematic when there are several optimizations performed one after another as theyNOPout unused instructions. Example:This expression will trigger subscript & binop foldings. With current
is_constant_sequenceimplementation, resulting instruction sequence cannot be optimized correctly. Here is final dis:1 RESUME 0 2 LOAD_SMALL_INT 1 LOAD_CONST 1 ((3,)) BUILD_TUPLE 2 LOAD_SMALL_INT 1 BINARY_SUBSCR LOAD_SMALL_INT 0 BINARY_SUBSCR RETURN_VALUEWhen correct sequence should be:
1 RESUME 0 2 LOAD_SMALL_INT 3 RETURN_VALUEHere is basic block dump in a state when
BUILD_TUPLEfails to be optimized byfold_tuple_on_constants:As you can see,
BUILD_TUPLE 2has 2 consts before it but they are separated byNOPs due to previous binop folding. Obvious solution would be to removeNOPs on every iteration inoptimize_basic_blockwhich makes sense, but we cannot do that because we would be changing basic block size while iterating over it which breaks everything. So a different approach should be implemented. This PR presents one of the possible approaches to prepare us for the next optimizations migration to CFG. My main concern is complexity & readability as I think they suffer a bit from it. Another cleaner way I see is to usePyMem_Mallocto store indexes of corresponding instructions which seems cleaner but should be discussed.cc @Eclips4 @tomasr8 @iritkatriel