KEMBAR78
implement closed-notebook search by andreamah · Pull Request #174287 · microsoft/vscode · GitHub
Skip to content

Conversation

@andreamah
Copy link
Contributor

@andreamah andreamah commented Feb 13, 2023

Fixes #173726

An introductory PR for closed-notebook search. This only searches inputs in closed notebooks, and might have a few extra bugs with opening at the correct position, but it is behind an experimental flag (off by default for now).

@andreamah andreamah changed the title deserialize script to measure perf implement closed-notebook search Jun 21, 2023
}


setCellModel(cell: ICellViewModel) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the model for the cell will be CellSearchModel until it's opened. Then, you can set the proper ICellViewModel. CellSearchModel just has simple info from deserialization like inputs.

const fileMatch = cellMatches.length > 0 ? {
resource: uri, cellResults: cellMatches
} : null;
results.set(uri, fileMatch);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't create a race condition, since all URIs are unique

@andreamah andreamah requested review from rebornix and roblourens June 23, 2023 16:52
@andreamah andreamah marked this pull request as ready for review June 23, 2023 16:55
@vscodenpa vscodenpa added this to the June 2023 milestone Jun 23, 2023
@andreamah andreamah modified the milestones: June 2023, July 2023 Jun 27, 2023
Copy link
Member

@roblourens roblourens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks like the right approach to me, nice work

const uris = new ResourceSet(result, uri => this.uriIdentityService.extUri.getComparisonKey(uri));
return Array.from(uris.keys());
}
private async getClosedNotebookResults(textQuery: ITextQuery, scannedFiles: ResourceSet, token: CancellationToken): Promise<{ results: ResourceMap<IFileMatchWithCells | null>; limitHit: boolean }> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a code organization note, I feel like we should try to move code that does the search out of SearchModel and into some service (SearchService or a NotebookSearchService?), and try to keep SearchModel as just a model for the results. We were already doing the local file update from here since forever which maybe we should have avoided. This is fine for now but something to think about

filePattern: include,
folderQueries: textQuery.folderQueries
};
return this.searchService.fileSearch(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many file searches does this trigger? One per notebook type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this apply the max result count by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, yes, it is once per notebook. Each of them provide their own glob pattern. I was originally planning on concatenating all the globs into one glob using brace expansion, but ripgrep doesn't support nested braces in glob patterns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's a smart way to parse the globs and build one single pattern than matches the others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most straightforward way of doing this would likely be with using braces ({ globA, globB, globC }), but if the individual globs have braces in them, ripgrep won't support this because it doesn't support nested alternate groups.
image

We could perform brace expansion, but I was thinking that it'd be an optimization for later.


const start = Date.now();

const filesToScan = await this.runFileQueries(includes, token, textQuery);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better if we could stream file results back, alas file search does not support it. But you could structure it so that you can handle results from each query as they complete, rather than wait for all queries to finish

};
}

async notebookSearch(query: ITextQuery, token: CancellationToken, onProgress?: (result: ISearchProgressItem) => void): Promise<{ completeData: ISearchComplete; scannedFiles: ResourceSet }> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the notebook search could be done in parallel with the basic text search

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that'd be ideal. I think there can be a lot more parallel code than what's there now, but we also need to have a list of open notebooks that the basic text search shouldn't find results for. Either that, or it does the whole search in parallel and then we dispose of results we don't need. I'll have to see what performs better.

let closedResults = undefined;
if (experimentalNotebooksEnabled) {
const locallyScannedNotebookFiles = new ResourceSet([...localResults.results.keys()], uri => this.uriIdentityService.extUri.getComparisonKey(uri));
closedResults = await this.getClosedNotebookResults(query, locallyScannedNotebookFiles, token);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can closed notebooks be searched in parallel with open notebooks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it currently needs a list of notebooks that are already open, but that shouldn't be too bad to provide ahead of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies to this as #174287 (comment)

const deserializedNotebooks = new ResourceMap<NotebookTextModel>();
const textModels = this.notebookService.getNotebookTextModels();
for (const notebook of textModels) {
deserializedNotebooks.set(notebook.uri, notebook);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these notebooks that are already open? But won't you find them from the local notebook search?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are notebooks that have already been deserialized and the notebook side has cached the deserialized info.

this._entries = new ResourceMap<INotebookDataEditInfo>(uri => this.uriIdentityService.extUri.getComparisonKey(uri));
}

async getSerializer(notebookUri: URI): Promise<INotebookSerializer | undefined> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be private

}

async getNotebookData(notebookUri: URI): Promise<NotebookData> {
const mTime = (await this.fileService.stat(notebookUri)).mtime;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of back and forth calls to the EH for this code to run and I'm wondering whether more of it could actually live in the extension host process. That could hugely cut down on your latency. However it might be hard to interact with notebook serializers from the EH (even though that's where they live) and the code that does things like checking whether a notebook is already open and using its model of course has to stay in the renderer process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that I talked about with @rebornix. I'll probably move it over in a future PR (or at least try to reduce the number of calls)

@andreamah
Copy link
Contributor Author

andreamah commented Jul 1, 2023

I probably want to merge in what I have now for this, but my next work forward would look something like:

  • Improve performance (move closed notebook search to extension host, stream results back while we get them in closed notebook search, make text search and notebook search in parallel).
  • Possibly search in outputs for closed notebooks (would need to think a bit more about how filtering would work for this).
  • unit tests

@andreamah andreamah closed this Jul 1, 2023
@andreamah andreamah reopened this Jul 1, 2023
@andreamah andreamah requested a review from roblourens July 4, 2023 18:01
@andreamah andreamah merged commit 60578ed into main Jul 11, 2023
@andreamah andreamah deleted the andreamah/notebook-deserialize-experiment branch July 11, 2023 19:11
@github-actions github-actions bot locked and limited conversation to collaborators Aug 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

investigate searching is closed notebooks

4 participants