KEMBAR78
Fix multi pages search for gh search by leudz · Pull Request #10767 · cli/cli · GitHub
Skip to content

Conversation

@leudz
Copy link
Contributor

@leudz leudz commented Apr 11, 2025

This fixes how gh search retrieves items when more than 100 items are requested and not a multiple of 100.

When using pagination, two variables are used: page and perPage. Pages can contain a maximum of 100 items.
The previous implementation tried to save the API some work by requesting less items for the last page.
But it didn't handle pagination correctly and ended up requesting some items a second time.

For example if 150 items are queried, the first page returns items 0->99.
Then it would try to query 50 items by asking for page 2 with 50 per page.
But this would return items 50->99 which isn't what we want.

This new version instead picks a fixed amount per page, in this case 100.
The first page returns items 0->99 and the second 100->199.
Then we discard items 150->199.

When less than 100 items are requested there is no change, we only request what we need.


In theory it's possible to both save the API some work and keep track of pagination but that would be more complex.

Fixes #9749

@leudz leudz temporarily deployed to cli-automation April 11, 2025 01:54 — with GitHub Actions Inactive
// otherwise add all the results.
itemsToAdd := min(len(page.Items), toRetrieve)

result.Total += itemsToAdd
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure what this total is.
I assume it's the total number of elements returned.

It used to only return the last page's total which affect the pagination test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total is used in displaying the results for communicating how many results out of the total results are being shown.

For example, gh search code:

func displayResults(io *iostreams.IOStreams, results search.CodeResult) error {
cs := io.ColorScheme()
if io.IsStdoutTTY() {
fmt.Fprintf(io.Out, "\nShowing %d of %d results\n\n", len(results.Items), results.Total)
for i, code := range results.Items {
if i > 0 {
fmt.Fprint(io.Out, "\n")
}
fmt.Fprintf(io.Out, "%s %s\n", cs.Blue(code.Repository.FullName), cs.GreenBold(code.Path))
for _, match := range code.TextMatches {
lines := formatMatch(match.Fragment, match.Matches, io)
for _, line := range lines {
fmt.Fprintf(io.Out, "\t%s\n", strings.TrimSpace(line))
}
}
}
return nil
}
for _, code := range results.Items {
for _, match := range code.TextMatches {
lines := formatMatch(match.Fragment, match.Matches, io)
for _, line := range lines {
fmt.Fprintf(io.Out, "%s:%s: %s\n", cs.Blue(code.Repository.FullName), cs.GreenBold(code.Path), strings.TrimSpace(line))
}
}
}
return nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I have reverted this logic because it causes the X of Y results header to report --limit for Y instead of the total number of matching search results.

		result.Total = page.Total

Demo

Below, we see this 150-limited code search reporting exactly 150 total matches are found:

$ bin/gh search code "fmt.Fprintf" --repo cli/cli --limit 150

Showing 150 of 150 results

cli/cli script/build.go
	fmt.Fprintf(os.Stderr, "Don't know how to build task `%s`.\n", task)
	fmt.Fprintf(os.Stderr, "%s: building task `%s` failed.\n", self, task)

with original logic, we see 150 of 161 results which is the total number of matching search entries:

$ bin/gh search code "fmt.Fprintf" --repo cli/cli --limit 150

Showing 150 of 161 results

cli/cli script/build.go
	fmt.Fprintf(os.Stderr, "Don't know how to build task `%s`.\n", task)
	fmt.Fprintf(os.Stderr, "%s: building task `%s` failed.\n", self, task)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so it's the total matching the search query... sorry for the useless changes. Would it be worth add a small comment on the field?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a solid call out as it really isn't obvious, yeah 💯

reg.Register(firstReq, firstRes)
reg.Register(secondReq, secondRes)
},
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only 4 small changes in this tests.
per_page is 30, both in the header and the second request.

The total for the first and second responses have gone from 2 to 1.
This link to a previous review comment. I assume this total is the number of items in the response.
In that case, it has always been 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leudz : As the .Total logic changes needed to be reverted, there are going to be some changes to these tests as .Total is not the number of items in the page, but the total number of matches to the search query.

Additionally, I've made some enhancements around test table data initialization.

@leudz leudz marked this pull request as ready for review April 11, 2025 02:05
@leudz leudz requested a review from a team as a code owner April 11, 2025 02:05
@leudz leudz requested a review from andyfeller April 11, 2025 02:05
@leudz leudz temporarily deployed to cli-automation April 11, 2025 02:05 — with GitHub Actions Inactive
@cliAutomation cliAutomation added the external pull request originating outside of the CLI core team label Apr 11, 2025
@andyfeller
Copy link
Member

Thank you for your patience and efforts to improve the GitHub CLI, @leudz! ❤️ I'm working to provide some initial feedback today so we can get this shaped up and shipped out. 💪

This change restores the original logic of passing the search total count logic as is to the result.

Additionally, this undoes some of the contributor's formatting changes that increase the changed lines to review.
@andyfeller
Copy link
Member

Demo

After restoring the .Total logic, I wanted to manually confirm the experience relative to the GitHub REST Search API results.

gh search code

$ gh api "search/code?q=fmt.Fprintf+repo:cli/cli" -X GET --jq '.total_count'
161

$ gh search code fmt.Fprintf --repo cli/cli

Showing 30 of 161 results

gh search commits

$ gh api "search/commits?q=author:andyfeller+repo:cli/cli" -X GET --jq '.total_count'
335

$ gh search commits --repo cli/cli author:andyfeller      

Showing 30 of 335 commits

gh search issues

$ gh api "search/issues?q=author:andyfeller+repo:cli/cli+type:issue" -X GET --jq '.total_count'
39

$ gh search issues --repo cli/cli author:andyfeller                                            

Showing 30 of 39 issues

gh search prs

$ gh api "search/issues?q=author:andyfeller+repo:cli/cli+type:pr" -X GET --jq '.total_count'
51

$ gh search prs --repo cli/cli author:andyfeller                                       

Showing 30 of 51 pull requests

gh search repos

$ gh api "search/repositories?q=user:andyfeller" -X GET --jq '.total_count'
65

$ gh search repos --owner andyfeller

Showing 30 of 65 repositories

This commit is focused on fixing the `searcher` tests for a few reasons:

1. Correcting the `.Total` logic in the previous commit caused changes to tests to fail
2. Tests involving results that exceed the max per page have been improved with new initialize helper, allowing testing table scenarios to be self contained
3. Tests that stub JSON response payloads have been standardized on maps rather than GitHub type primitives (Repository, Issue, Commit, Code, etc)
4. Tests had some minor formatting changes to make them easier to understand and maintain
This commit moves the remaining searcher tests from using JSON marshaled types to using JSON responses for consistency.  There appears to be a weird JSON marshaling error with search.Repository that does not map `Name` field in the process.

Additionally, the test scenarios around pulling multiple pages beneath the total results have been updated to demonstrate that the REST API returns full pages in both of these cases, which is below the total number of results.
@andyfeller andyfeller requested a review from BagToad April 15, 2025 21:20
Comment on lines +600 to +637
httpStubs: func(reg *httpmock.Registry) {
firstReq := httpmock.QueryMatcher("GET", "search/repositories", url.Values{
"page": []string{"1"},
"per_page": []string{"100"},
"order": []string{"desc"},
"sort": []string{"stars"},
"q": []string{"keyword stars:>=5 topic:topic"},
})
firstRes := httpmock.JSONResponse(map[string]interface{}{
"incomplete_results": false,
"total_count": 287,
"items": initialize(0, 100, func(i int) interface{} {
return map[string]interface{}{
"name": fmt.Sprintf("name%d", i),
}
}),
})
firstRes = httpmock.WithHeader(firstRes, "Link", `<https://api.github.com/search/repositories?page=2&per_page=100&q=org%3Agithub>; rel="next"`)
secondReq := httpmock.QueryMatcher("GET", "search/repositories", url.Values{
"page": []string{"2"},
"per_page": []string{"100"},
"order": []string{"desc"},
"sort": []string{"stars"},
"q": []string{"keyword stars:>=5 topic:topic"},
})
secondRes := httpmock.JSONResponse(map[string]interface{}{
"incomplete_results": false,
"total_count": 287,
"items": initialize(100, 200, func(i int) interface{} {
return map[string]interface{}{
"name": fmt.Sprintf("name%d", i),
}
}),
})
reg.Register(firstReq, firstRes)
reg.Register(secondReq, secondRes)
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concern: originally, these tests relied upon the httpmock.Registry to JSON marshal the *Result types to JSON, however there is something about the search.Repository type where Name field was not being preserved in the marshaling.

Rather than having inconsistent test setups, I aligned the tests on capturing the JSON we expect from the REST API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing the loop here, it appears that search.Repository JSON marshaling is a problem because of this logic used for exporting gh search code as it also contains a repository field:

cli/pkg/search/result.go

Lines 356 to 385 in 5d8dbc0

func (repo Repository) ExportData(fields []string) map[string]interface{} {
v := reflect.ValueOf(repo)
data := map[string]interface{}{}
for _, f := range fields {
switch f {
case "license":
data[f] = map[string]interface{}{
"key": repo.License.Key,
"name": repo.License.Name,
"url": repo.License.URL,
}
case "owner":
data[f] = repo.Owner.ExportData()
default:
sf := fieldByName(v, f)
data[f] = sf.Interface()
}
}
return data
}
func (repo Repository) MarshalJSON() ([]byte, error) {
return json.Marshal(map[string]interface{}{
"id": repo.ID,
"nameWithOwner": repo.FullName,
"url": repo.URL,
"isPrivate": repo.IsPrivate,
"isFork": repo.IsFork,
})
}

Given this surprising behavior, the move away from using the Results type for response mocking makes more sense.

secondRes := httpmock.JSONResponse(map[string]interface{}{
"incomplete_results": false,
"total_count": 287,
"items": initialize(100, 200, func(i int) interface{} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: while discussing these tests with @jtmcg, Tyler made a call out how the REST API will return a full second page. He suggested initializing this to be a full page, which is closer to the behavior we will encounter in use.

},
},
{
name: "collect full and partial pages under total number of matching search results",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: after talking with @jtmcg, Tyler suggested renaming these test scenarios to explain they are ensuring these commands process REST responses and return a collection of full and partial pages to the user while still being beneath the total number of matching search results.

})
firstRes := httpmock.JSONResponse(map[string]interface{}{
"incomplete_results": false,
"total_count": 287,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: the REST search API returns the total number of matching search results on every page. In order to have a number stand out separate from the client-side limit, Tyler suggested 287, which is as good as any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very nice change 👍

@andyfeller
Copy link
Member

@leudz : Just because I did a bit more changes to this PR rather than opening up a ton of suggestions, I've asked a fellow maintainer to be the final PR reviewer as I'd value a 2nd set of eyes. 🙇 Thank you again for your patience!

andyfeller and others added 2 commits April 16, 2025 14:22
- update local variables to communicate what they are
- added docblock explaining search results populated
@leudz
Copy link
Contributor Author

leudz commented Apr 16, 2025

@andyfeller I've added a few comments for Total like we talked about.
With the doc on search you've added it's a bit less valuable but it might still avoid someone from missing it.

Copy link
Member

@BagToad BagToad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ✨ The problem and fix make sense, and this branch does fix the problem when I tested it out myself.

@andyfeller andyfeller merged commit 0ef0c42 into cli:trunk Apr 16, 2025
9 checks passed
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request May 10, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [cli/cli](https://github.com/cli/cli) | minor | `v2.69.0` -> `v2.72.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>cli/cli (cli/cli)</summary>

### [`v2.72.0`](https://github.com/cli/cli/releases/tag/v2.72.0): GitHub CLI 2.72.0

[Compare Source](cli/cli@v2.71.2...v2.72.0)

#### :accessibility: Accessibility public preview

This release marks the public preview of several accessibility improvements to the GitHub CLI that have been under development over the past year in partnership with our friends at [Charm](https://github.com/charmbracelet) including:

-   customizable and contrasting colors
-   non-interactive user input prompting
-   text-based spinners

These new experiences are captured in a new `gh a11y` help topic command, which goes into greater detail into the motivation behind each of them as well as opt-in configuration settings / environment variables.

We would like you to share your feedback and join us on this journey through one of [GitHub Accessibility feedback channels](https://accessibility.github.com/feedback)! 🙌

#### What's Changed

##### ✨ Features

-   Introduce `gh accessibility` help topic highlighting GitHub CLI accessibility experiences by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#10890
-   \[gh pr view] Support `closingIssuesReferences` JSON field by [@&#8203;iamazeem](https://github.com/iamazeem) in cli/cli#10544

##### 🐛 Fixes

-   Fix expected error output of `TestRepo/repo-set-default` by [@&#8203;aconsuegra](https://github.com/aconsuegra) in cli/cli#10884
-   Ensure accessible password and auth token prompters disable echo mode by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#10885
-   Fix: Accessible multiselect prompt respects default selections by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#10901

#### New Contributors

-   [@&#8203;aconsuegra](https://github.com/aconsuegra) made their first contribution in cli/cli#10884

**Full Changelog**: cli/cli@v2.71.2...v2.72.0

### [`v2.71.2`](https://github.com/cli/cli/releases/tag/v2.71.2): GitHub CLI 2.71.2

[Compare Source](cli/cli@v2.71.1...v2.71.2)

#### What's Changed

-   Fix pr create when push.default tracking and no merge ref by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#10863

**Full Changelog**: cli/cli@v2.71.1...v2.71.2

### [`v2.71.1`](https://github.com/cli/cli/releases/tag/v2.71.1): GitHub CLI 2.71.1

[Compare Source](cli/cli@v2.71.0...v2.71.1)

#### What's Changed

-   Fix pr create when branch name contains slashes by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#10859

**Full Changelog**: cli/cli@v2.71.0...v2.71.1

### [`v2.71.0`](https://github.com/cli/cli/releases/tag/v2.71.0): GitHub CLI 2.71.0

[Compare Source](cli/cli@v2.70.0...v2.71.0)

#### What's Changed

##### ✨ Features

-   `gh pr create`: Support Git's `@{push}` revision syntax for determining head ref by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#10513
-   Introduce option to opt-out of spinners by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#10773
-   Update configuration support for accessible colors by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#10820
-   `gh config`: add config settings for accessible prompter and disabling spinner by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#10846

##### 🐛 Fixes

-   Fix multi pages search for gh search by [@&#8203;leudz](https://github.com/leudz) in cli/cli#10767
-   Fix: `project` commands use shared progress indicator by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#10817
-   Issue commands should parse args early by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#10811
-   Feature detect v1 projects on `issue view` by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#10813
-   Feature detect v1 projects on non web-mode `issue create` by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#10815
-   Feature detect v1 projects on web mode issue create by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#10818
-   Feature detect v1 projects on issue edit by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#10819

##### 📚 Docs & Chores

-   Refactor Sigstore verifier logic by [@&#8203;malancas](https://github.com/malancas) in cli/cli#10750

##### :dependabot: Dependencies

-   chore(deps): bump github.com/sigstore/sigstore-go from 0.7.1 to 0.7.2 by [@&#8203;dependabot](https://github.com/dependabot) in cli/cli#10787
-   Bump google.golang.org/grpc from 1.71.0 to 1.71.1 by [@&#8203;dependabot](https://github.com/dependabot) in cli/cli#10758

#### New Contributors

-   [@&#8203;leudz](https://github.com/leudz) made their first contribution in cli/cli#10767

**Full Changelog**: cli/cli@v2.70.0...v2.71.0

### [`v2.70.0`](https://github.com/cli/cli/releases/tag/v2.70.0): GitHub CLI 2.70.0

[Compare Source](cli/cli@v2.69.0...v2.70.0)

#### Accessibility

This release contains dark shipped changes that are part of a larger GitHub CLI accessibility preview still under development.  More information about these will be announced later this month including various channels to work with GitHub and GitHub CLI maintainers on shaping these experiences.

##### Ensure table headers are thematically contrasting

[#&#8203;8292](cli/cli#8292) is a long time issue where table headers were difficult to see in terminals with light background.  Ahead of the aforementioned preview, `v2.70.0` has shipped changes that improve the out-of-the-box experience based on terminal background detection.

The following screenshots demonstrate the Mac Terminal using the Basic profile, which responds to user's appearance preferences:

<img width="1512" alt="Screenshot of gh repo list in light background terminal" src="https://github.com/user-attachments/assets/87413dde-eec8-43eb-9c16-dc84f8249ddf" />

<img width="1512" alt="Screenshot of gh repo list in dark background terminal" src="https://github.com/user-attachments/assets/7430b42c-7267-402b-b565-a296beb4d5ea" />

For more information including demos from various official distributions, see [#&#8203;10649](cli/cli#10649).

#### What's Changed

##### ✨ Features

-   Update go-gh and document available sprig funcs by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#10680
-   Introducing experimental support for rendering markdown with customizable, accessible colors by [@&#8203;andyfeller](https://github.com/andyfeller) [@&#8203;jtmcg](https://github.com/jtmcg) in cli/cli#10680
-   Ensure table datetime columns have thematic, customizable muted text by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#10709
-   Ensure table headers are thematically contrasting by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#10649
-   Introduce configuration setting for displaying issue and pull request labels in rich truecolor by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#10720
-   Ensure muted text is thematic and customizable by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#10737
-   \[gh repo create] Show host name in repo creation prompts by [@&#8203;iamazeem](https://github.com/iamazeem) in cli/cli#10516
-   Introduce accessible prompter for screen readers (preview) by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#10710

##### 🐛 Fixes

-   `run list`: do not fail on organization/enterprise ruleset imposed workflows by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#10660
-   Implement safeguard for `gh alias delete` test, prevent wiping out GitHub CLI configuration by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#10683
-   Pin third party actions to commit sha by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#10731
-   Fallback to job run logs when step logs are missing by [@&#8203;babakks](https://github.com/babakks) in cli/cli#10740
-   \[gh ext] Fix `GitKind` extension directory path by [@&#8203;iamazeem](https://github.com/iamazeem) in cli/cli#10609
-   Fix job log resolution to skip legacy logs in favour of normal/new ones by [@&#8203;babakks](https://github.com/babakks) in cli/cli#10769

##### 📚 Docs & Chores

-   `./script/sign` cleanup by [@&#8203;iamazeem](https://github.com/iamazeem) in cli/cli#10599
-   Fix typos in CONTRIBUTING.md by [@&#8203;rylwin](https://github.com/rylwin) in cli/cli#10657
-   Improve `gh at verify --help`, document json output by [@&#8203;phillmv](https://github.com/phillmv) in cli/cli#10685
-   Acceptance test issue/pr create/edit with project by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#10707
-   Escape dots in regexp pattern in `README.md` by [@&#8203;babakks](https://github.com/babakks) in cli/cli#10742
-   Simplify cosign verification example by not using a regex. by [@&#8203;kommendorkapten](https://github.com/kommendorkapten) in cli/cli#10759
-   Document UNKNOWN STEP in run view by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#10770

##### :dependabot: Dependencies

-   Update github.com/sigstore/sigstore-go to 0.7.1 and fix breaking function change by [@&#8203;malancas](https://github.com/malancas) in cli/cli#10749

#### New Contributors

-   [@&#8203;rylwin](https://github.com/rylwin) made their first contribution in cli/cli#10657

**Full Changelog**: cli/cli@v2.69.0...v2.70.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4yNTkuMCIsInVwZGF0ZWRJblZlciI6IjM5LjI2NC4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external pull request originating outside of the CLI core team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gh search commands returns duplicate data when --limit greater than 100 but not a multiple of 100

4 participants