KEMBAR78
Add support for LSP `positionEncoding` `utf-8` and `utf-32` by tmtm · Pull Request #14492 · rubocop/rubocop · GitHub
Skip to content

Conversation

@tmtm
Copy link
Contributor

@tmtm tmtm commented Aug 30, 2025

Add support for positionEncoding 'utf-8' and 'utf-32'


Before submitting the PR make sure the following are checked:

  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
  • Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

@tmtm tmtm force-pushed the positionEncoding-utf8-utf32 branch from 3d31eef to dca0ea6 Compare August 30, 2025 05:03
@tmtm tmtm changed the title Add support for positionEncoding 'utf-8' and 'utf-32' Add support for positionEncoding utf-8 and utf-32 Aug 30, 2025
@tmtm tmtm marked this pull request as ready for review August 30, 2025 05:27
uri
end

def char_pos(nchars = nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems better to take this opportunity to use an unabbreviated name instead of character_position. The argument nchars should also be given a non-abbreviated name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to to_position_character(utf8_index)

def char_pos(nchars = nil)
str = nchars ? @offense.source_line[0, nchars] : @offense.source_line
case @position_encoding
when 'utf-32'
Copy link
Member

@koic koic Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since UTF-8 is considered the most likely match as Ruby's default encoding, it seems more appropriate to place the handling of 'utf-8' in the first when.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, is it correct to understand that in LSP the default position encoding is UTF-16? If so, can you add the specification link and a brief explanation as a source code comment?
https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the order to utf-8, utf-32, and default(utf-16) and add comment.


def line_pos(line, char)
case @position_encoding
when 'utf-32'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

end

def offenses(path, text, document_encoding = nil, prism_result: nil)
def offenses(path, text, _document_encoding = nil, prism_result: nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this unused variable is truly unnecessary, the best course would be to remove it.

diff --git a/lib/rubocop/lsp/runtime.rb b/lib/rubocop/lsp/runtime.rb
index 86db8ba8d..8163929e1 100644
--- a/lib/rubocop/lsp/runtime.rb
+++ b/lib/rubocop/lsp/runtime.rb
@@ -45,7 +45,7 @@ module RuboCop
         @runner.formatted_source
       end

-      def offenses(path, text, _document_encoding = nil, prism_result: nil)
+      def offenses(path, text, prism_result: nil)
         diagnostic_options = {}
         diagnostic_options[:only] = config_only_options if @lint_mode || @layout_mode

diff --git a/lib/ruby_lsp/rubocop/runtime_adapter.rb b/lib/ruby_lsp/rubocop/runtime_adapter.rb
index ad2ec72c6..1e668c502 100644
--- a/lib/ruby_lsp/rubocop/runtime_adapter.rb
+++ b/lib/ruby_lsp/rubocop/runtime_adapter.rb
@@ -16,12 +16,7 @@ module RubyLsp
       end

       def run_diagnostic(uri, document)
-        @runtime.offenses(
-          uri_to_path(uri),
-          document.source,
-          document.encoding,
-          prism_result: prism_result(document)
-        )
+        @runtime.offenses(uri_to_path(uri), document.source, prism_result: prism_result(document))
       end

       def run_formatting(uri, document)

@@ -0,0 +1 @@
* [#14492](https://github.com/rubocop/rubocop/pull/14492): Add support for `positionEncoding` `utf-8` and `utf-32`. ([@tmtm][])
Copy link
Member

@koic koic Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you adjust the changelog text in a bit more detail for users? At the very least, it seems necessary to include an explanation that this is in the context of LSP.

'utf-32'
else
'utf-16'
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems better to align the ordering with the other UTF encodings.

@tmtm tmtm force-pushed the positionEncoding-utf8-utf32 branch 2 times, most recently from d3f4ae2 to 22e4dac Compare September 7, 2025 15:57
@bbatsov
Copy link
Collaborator

bbatsov commented Sep 8, 2025

@tmtm Can you rebase this on top of master?

document.encoding,
prism_result: prism_result(document)
)
@runtime.offenses(uri_to_path(uri), document.source, prism_result: prism_result(document))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think ignoring the document encoding for the ruby lsp addon is correct. They explicitly pass us an encoding, why should we ignore that? For the addon, the routes never get called, so we have no idea what the encoding actually is unless we look at it from the document they provide.

Do note that the addon is missing tests for different encodings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As LSP is a JSON-RPC based protocol, I would expect source code to always be transmitted in UTF-8. I'm not very familiar with RubyLSP, but could document.encoding ever be something other than UTF-8?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document.encoding for the ruby-lsp addon is pretty much also just positionEncodings. Basically, just pass it along for both, something like this should be ok:

diff --git a/lib/rubocop/lsp/routes.rb b/lib/rubocop/lsp/routes.rb
index 7da8a4c56..ef1a64512 100644
--- a/lib/rubocop/lsp/routes.rb
+++ b/lib/rubocop/lsp/routes.rb
@@ -228,12 +228,13 @@ module RuboCop
 
       def diagnostic(file_uri, text)
         @text_cache[file_uri] = text
+        offenses = @server.offenses(convert_file_uri_to_path(file_uri), text, @position_encoding)
 
         {
           method: 'textDocument/publishDiagnostics',
           params: {
             uri: file_uri,
-            diagnostics: @server.offenses(convert_file_uri_to_path(file_uri), text)
+            diagnostics: offenses
           }
         }
       end
diff --git a/lib/rubocop/lsp/runtime.rb b/lib/rubocop/lsp/runtime.rb
index 8163929e1..5135dd8cf 100644
--- a/lib/rubocop/lsp/runtime.rb
+++ b/lib/rubocop/lsp/runtime.rb
@@ -17,7 +17,7 @@ module RuboCop
     # Runtime for Language Server Protocol of RuboCop.
     # @api private
     class Runtime
-      attr_writer :safe_autocorrect, :lint_mode, :layout_mode, :position_encoding
+      attr_writer :safe_autocorrect, :lint_mode, :layout_mode
 
       def initialize(config_store)
         RuboCop::LSP.enable
@@ -28,7 +28,6 @@ module RuboCop
         @safe_autocorrect = true
         @lint_mode = false
         @layout_mode = false
-        @position_encoding = 'utf-16'
       end
 
       def format(path, text, command:, prism_result: nil)
@@ -45,14 +44,14 @@ module RuboCop
         @runner.formatted_source
       end
 
-      def offenses(path, text, prism_result: nil)
+      def offenses(path, text, position_encoding, prism_result: nil)
         diagnostic_options = {}
         diagnostic_options[:only] = config_only_options if @lint_mode || @layout_mode
 
         @runner.run(path, text, diagnostic_options, prism_result: prism_result)
         @runner.offenses.map do |offense|
           Diagnostic.new(
-            @position_encoding, offense, path, @cop_registry[offense.cop_name]&.first
+            position_encoding, offense, path, @cop_registry[offense.cop_name]&.first
           ).to_lsp_diagnostic(@runner.config_for_working_directory)
         end
       end
diff --git a/lib/rubocop/lsp/server.rb b/lib/rubocop/lsp/server.rb
index e435a7fce..ea463097e 100644
--- a/lib/rubocop/lsp/server.rb
+++ b/lib/rubocop/lsp/server.rb
@@ -51,15 +51,14 @@ module RuboCop
         @runtime.format(path, text, command: command)
       end
 
-      def offenses(path, text)
-        @runtime.offenses(path, text)
+      def offenses(path, text, position_encoding)
+        @runtime.offenses(path, text, position_encoding)
       end
 
       def configure(options)
         @runtime.safe_autocorrect = options[:safe_autocorrect]
         @runtime.lint_mode = options[:lint_mode]
         @runtime.layout_mode = options[:layout_mode]
-        @runtime.position_encoding = options[:position_encoding]
       end
 
       def stop(&block)
diff --git a/lib/ruby_lsp/rubocop/runtime_adapter.rb b/lib/ruby_lsp/rubocop/runtime_adapter.rb
index 7c7cd21c2..a533dd1a5 100644
--- a/lib/ruby_lsp/rubocop/runtime_adapter.rb
+++ b/lib/ruby_lsp/rubocop/runtime_adapter.rb
@@ -18,7 +18,12 @@ module RubyLsp
       end
 
       def run_diagnostic(uri, document)
-        @runtime.offenses(uri_to_path(uri), document.source, prism_result: prism_result(document))
+        @runtime.offenses(
+          uri_to_path(uri),
+          document.source,
+          document.encoding,
+          prism_result: prism_result(document)
+        )
       end
 
       def run_formatting(uri, document)

This way it doesn't change behaviour for the ruby-lsp addon. I can see about adding proper tests for this later, I wanted to import some tests anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do note that I'm not actually very familiar with the encoding stuff. It looks like it only gets used for calculating positions (formatting doesn't need to know about the encoding at all), so what you wrote about the actual source being in utf-8 is probably correct.

@tmtm tmtm force-pushed the positionEncoding-utf8-utf32 branch from 22e4dac to c81f982 Compare September 14, 2025 09:02
@tmtm tmtm changed the title Add support for positionEncoding utf-8 and utf-32 Add support for LSP positionEncoding utf-8 and utf-32 Sep 14, 2025
@tmtm tmtm force-pushed the positionEncoding-utf8-utf32 branch from c81f982 to cd3c71a Compare September 14, 2025 23:25
@bbatsov bbatsov merged commit 422cde4 into rubocop:master Sep 15, 2025
22 checks passed
@bbatsov
Copy link
Collaborator

bbatsov commented Sep 15, 2025

I think that's in a good enough shape for now. We can iterate on it additionally down the road if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants