KEMBAR78
Perf. improvement for `CuStateVecCircuitSimulator::observe` by 1tnguyen · Pull Request #1002 · NVIDIA/cuda-quantum · GitHub
Skip to content

Conversation

@1tnguyen
Copy link
Collaborator

@1tnguyen 1tnguyen commented Dec 5, 2023

Description

Currently, we don't use the batched Pauli expectation API for multi-term spin_op expectation calculation.

This API is substantially faster than applying change-of-basis gates then observing <ZZZ..ZZ> for each term.

Hence, this PR modifies custatevec backend to use batched Pauli expectation API by default for deterministic cudaq::observe (no shots)

(1) Change CircuitSimulator::observe to return observe_result, which can encapsulate the final expectation value as well as individual terms' expectation values.

(2) Use custatevec batched API and populate the observe_result accordingly.

(3) Add a backward compatibility test for the term-by-term mode.

See also: #1477, #1430

This mode is only activated with env. variable
CUDAQ_OBSERVE_FROM_SAMPLING=OFF/0/FALSE.

Currently, we didn't use the batched Pauli APIs and skipped it entirely
for fp-32.

Modify it to use batched Pauli expectation APIs for perf.
@github-actions
Copy link

github-actions bot commented Dec 5, 2023

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

github-actions bot pushed a commit that referenced this pull request Dec 5, 2023
Copy link
Collaborator

@schweitzpgi schweitzpgi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Collaborator

@bmhowe23 bmhowe23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we need some docs for the CUDAQ_OBSERVE_FROM_SAMPLING environment variable.

@amccaskey
Copy link
Collaborator

@1tnguyen seems like we should also update CircuitSimulator::shouldObserveFromSampling() to return false by default. What do you think? Should this new approach be the default? Is it always faster?

@1tnguyen
Copy link
Collaborator Author

@1tnguyen seems like we should also update CircuitSimulator::shouldObserveFromSampling() to return false by default. What do you think? Should this new approach be the default? Is it always faster?

Yes, it's always faster: a lot faster for large Hamiltonians since the time to run change-of-basis gates and then uncompute them is not insignificant.

There is an issue with making it the default. Currently, we expect that cudaq::observe does return term-by-term results like in this test.
Hence, we need to make an additional change to make CircuitSimulator::observe able to return term-by-term results if it can (like in this case, we perform batch exp-val computation and do have all the individual results). Then, we can make CuStateVecCircuitSimulator::shouldObserveFromSampling return true by default.

What do you think?

@amccaskey
Copy link
Collaborator

@1tnguyen seems like we should also update CircuitSimulator::shouldObserveFromSampling() to return false by default. What do you think? Should this new approach be the default? Is it always faster?

Yes, it's always faster: a lot faster for large Hamiltonians since the time to run change-of-basis gates and then uncompute them is not insignificant.

There is an issue with making it the default. Currently, we expect that cudaq::observe does return term-by-term results like in this test. Hence, we need to make an additional change to make CircuitSimulator::observe able to return term-by-term results if it can (like in this case, we perform batch exp-val computation and do have all the individual results). Then, we can make CuStateVecCircuitSimulator::shouldObserveFromSampling return true by default.

What do you think?

I say go for it - make that change here in this PR.

@github-actions
Copy link

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

github-actions bot pushed a commit that referenced this pull request Jan 30, 2024
This allows a fast batched observation while still allowing
per-term data to be progagated up the stack.
- Use Shots > 0 condition to know if sampling is required.

- Adjust base shouldObserveFromSampling to set default if needed. Plus,
  the environment variable can be used to both turn the feature on or off.
  e.g., for benchmarking purposes.
@github-actions
Copy link

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

github-actions bot pushed a commit that referenced this pull request Mar 18, 2024
@github-actions
Copy link

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

github-actions bot pushed a commit that referenced this pull request Mar 21, 2024
@1tnguyen
Copy link
Collaborator Author

This PR is ready for (re-)review. Ping reviewers... :-)

@github-actions
Copy link

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

github-actions bot pushed a commit that referenced this pull request Mar 21, 2024
Copy link
Collaborator

@bmhowe23 bmhowe23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a very nice optimization. Are all the interface changes invisible to the end user?

Copy link
Collaborator

@bmhowe23 bmhowe23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@github-actions
Copy link

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

github-actions bot pushed a commit that referenced this pull request Mar 21, 2024
@1tnguyen 1tnguyen merged commit c1f0af9 into NVIDIA:main Mar 21, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 21, 2024
@bettinaheim bettinaheim added this to the release 0.7.1 milestone Apr 17, 2024
@bettinaheim bettinaheim added the enhancement New feature or request label Apr 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement New feature or request performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants