KEMBAR78
Exploration on how to improve the diff algorithm · Issue #227928 · microsoft/vscode · GitHub
Skip to content

Exploration on how to improve the diff algorithm #227928

@DonJayamanne

Description

@DonJayamanne
  • Improved algorithm for matching/pairing of cells
    • Lots of tests
    • More improvements
    • if we have 500 characters and we need 50 edits, thats only 10% change and its a likely match and should be considered in the second pass.
    • What if we have notebooks with md cells or simple code, and there are lots of cells with the same content scattered all over, in that case matching goes out of wack.
    • What if we encounter swapped cells, today we exclude them, we should probably ignore that and let LCS handle the diffing (treat as modified cells)
  • Setting
    • New algorithm enabled for insiders
    • New algorithm behind a setting so stable users can revert if things go pear shaped
    • In 1-2 months enable for all in stable
    • In 3-4 months, remove this setting (if no issues reported).
  • Avoid generating hash of outputs/metadata, etc
  • Avoid the need to synchronize anything other than cell input into the worker (i.e. no need to send outputs, metadata into worker)
  • Support internalMetadata.cellId as id for diffing (https://github.com/microsoft/vscode/pull/240733/files)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions