Add support for LSP `positionEncoding` `utf-8` and `utf-32` #14492

tmtm · 2025-08-30T04:59:00Z

Add support for positionEncoding 'utf-8' and 'utf-32'

Before submitting the PR make sure the following are checked:

The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
Wrote good commit messages.
Commit message starts with [Fix #issue-number] (if the related issue exists).
Feature branch is up-to-date with master (if not - rebase it).
Squashed related commits together.
Added tests.
Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

koic · 2025-08-30T15:49:35Z

lib/rubocop/lsp/diagnostic.rb

        uri
      end
+
+      def char_pos(nchars = nil)


It seems better to take this opportunity to use an unabbreviated name instead of character_position. The argument nchars should also be given a non-abbreviated name.

Changed to to_position_character(utf8_index)

koic · 2025-08-30T15:49:38Z

lib/rubocop/lsp/diagnostic.rb

+      def char_pos(nchars = nil)
+        str = nchars ? @offense.source_line[0, nchars] : @offense.source_line
+        case @position_encoding
+        when 'utf-32'


Since UTF-8 is considered the most likely match as Ruby's default encoding, it seems more appropriate to place the handling of 'utf-8' in the first when.

Ah, is it correct to understand that in LSP the default position encoding is UTF-16? If so, can you add the specification link and a brief explanation as a source code comment?
https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/

Changed the order to utf-8, utf-32, and default(utf-16) and add comment.

koic · 2025-08-30T15:49:40Z

lib/rubocop/lsp/routes.rb


+      def line_pos(line, char)
+        case @position_encoding
+        when 'utf-32'


koic · 2025-08-30T15:54:19Z

lib/rubocop/lsp/runtime.rb

      end

-      def offenses(path, text, document_encoding = nil, prism_result: nil)
+      def offenses(path, text, _document_encoding = nil, prism_result: nil)


If this unused variable is truly unnecessary, the best course would be to remove it.

diff --git a/lib/rubocop/lsp/runtime.rb b/lib/rubocop/lsp/runtime.rb index 86db8ba8d..8163929e1 100644 --- a/lib/rubocop/lsp/runtime.rb +++ b/lib/rubocop/lsp/runtime.rb @@ -45,7 +45,7 @@ module RuboCop @runner.formatted_source end - def offenses(path, text, _document_encoding = nil, prism_result: nil) + def offenses(path, text, prism_result: nil) diagnostic_options = {} diagnostic_options[:only] = config_only_options if @lint_mode || @layout_mode diff --git a/lib/ruby_lsp/rubocop/runtime_adapter.rb b/lib/ruby_lsp/rubocop/runtime_adapter.rb index ad2ec72c6..1e668c502 100644 --- a/lib/ruby_lsp/rubocop/runtime_adapter.rb +++ b/lib/ruby_lsp/rubocop/runtime_adapter.rb @@ -16,12 +16,7 @@ module RubyLsp end def run_diagnostic(uri, document) - @runtime.offenses( - uri_to_path(uri), - document.source, - document.encoding, - prism_result: prism_result(document) - ) + @runtime.offenses(uri_to_path(uri), document.source, prism_result: prism_result(document)) end def run_formatting(uri, document)

koic · 2025-08-30T17:09:28Z

changelog/change_support_for_position_encoding_utf_8_and_utf_32_20250830135957.md

@@ -0,0 +1 @@
+* [#14492](https://github.com/rubocop/rubocop/pull/14492): Add support for `positionEncoding` `utf-8` and `utf-32`. ([@tmtm][])


Can you adjust the changelog text in a bit more detail for users? At the very least, it seems necessary to include an explanation that this is in the context of LSP.

koic · 2025-09-01T01:24:08Z

lib/rubocop/lsp/routes.rb

+          'utf-32'
+        else
+          'utf-16'
+        end


It seems better to align the ordering with the other UTF encodings.

bbatsov · 2025-09-08T05:20:01Z

@tmtm Can you rebase this on top of master?

Earlopain · 2025-09-08T12:10:27Z

lib/ruby_lsp/rubocop/runtime_adapter.rb

-          document.encoding,
-          prism_result: prism_result(document)
-        )
+        @runtime.offenses(uri_to_path(uri), document.source, prism_result: prism_result(document))


I don't think ignoring the document encoding for the ruby lsp addon is correct. They explicitly pass us an encoding, why should we ignore that? For the addon, the routes never get called, so we have no idea what the encoding actually is unless we look at it from the document they provide.

Do note that the addon is missing tests for different encodings.

As LSP is a JSON-RPC based protocol, I would expect source code to always be transmitted in UTF-8. I'm not very familiar with RubyLSP, but could document.encoding ever be something other than UTF-8?

document.encoding for the ruby-lsp addon is pretty much also just positionEncodings. Basically, just pass it along for both, something like this should be ok:

diff --git a/lib/rubocop/lsp/routes.rb b/lib/rubocop/lsp/routes.rb index 7da8a4c56..ef1a64512 100644 --- a/lib/rubocop/lsp/routes.rb +++ b/lib/rubocop/lsp/routes.rb @@ -228,12 +228,13 @@ module RuboCop def diagnostic(file_uri, text) @text_cache[file_uri] = text + offenses = @server.offenses(convert_file_uri_to_path(file_uri), text, @position_encoding) { method: 'textDocument/publishDiagnostics', params: { uri: file_uri, - diagnostics: @server.offenses(convert_file_uri_to_path(file_uri), text) + diagnostics: offenses } } end diff --git a/lib/rubocop/lsp/runtime.rb b/lib/rubocop/lsp/runtime.rb index 8163929e1..5135dd8cf 100644 --- a/lib/rubocop/lsp/runtime.rb +++ b/lib/rubocop/lsp/runtime.rb @@ -17,7 +17,7 @@ module RuboCop # Runtime for Language Server Protocol of RuboCop. # @api private class Runtime - attr_writer :safe_autocorrect, :lint_mode, :layout_mode, :position_encoding + attr_writer :safe_autocorrect, :lint_mode, :layout_mode def initialize(config_store) RuboCop::LSP.enable @@ -28,7 +28,6 @@ module RuboCop @safe_autocorrect = true @lint_mode = false @layout_mode = false - @position_encoding = 'utf-16' end def format(path, text, command:, prism_result: nil) @@ -45,14 +44,14 @@ module RuboCop @runner.formatted_source end - def offenses(path, text, prism_result: nil) + def offenses(path, text, position_encoding, prism_result: nil) diagnostic_options = {} diagnostic_options[:only] = config_only_options if @lint_mode || @layout_mode @runner.run(path, text, diagnostic_options, prism_result: prism_result) @runner.offenses.map do |offense| Diagnostic.new( - @position_encoding, offense, path, @cop_registry[offense.cop_name]&.first + position_encoding, offense, path, @cop_registry[offense.cop_name]&.first ).to_lsp_diagnostic(@runner.config_for_working_directory) end end diff --git a/lib/rubocop/lsp/server.rb b/lib/rubocop/lsp/server.rb index e435a7fce..ea463097e 100644 --- a/lib/rubocop/lsp/server.rb +++ b/lib/rubocop/lsp/server.rb @@ -51,15 +51,14 @@ module RuboCop @runtime.format(path, text, command: command) end - def offenses(path, text) - @runtime.offenses(path, text) + def offenses(path, text, position_encoding) + @runtime.offenses(path, text, position_encoding) end def configure(options) @runtime.safe_autocorrect = options[:safe_autocorrect] @runtime.lint_mode = options[:lint_mode] @runtime.layout_mode = options[:layout_mode] - @runtime.position_encoding = options[:position_encoding] end def stop(&block) diff --git a/lib/ruby_lsp/rubocop/runtime_adapter.rb b/lib/ruby_lsp/rubocop/runtime_adapter.rb index 7c7cd21c2..a533dd1a5 100644 --- a/lib/ruby_lsp/rubocop/runtime_adapter.rb +++ b/lib/ruby_lsp/rubocop/runtime_adapter.rb @@ -18,7 +18,12 @@ module RubyLsp end def run_diagnostic(uri, document) - @runtime.offenses(uri_to_path(uri), document.source, prism_result: prism_result(document)) + @runtime.offenses( + uri_to_path(uri), + document.source, + document.encoding, + prism_result: prism_result(document) + ) end def run_formatting(uri, document)

This way it doesn't change behaviour for the ruby-lsp addon. I can see about adding proper tests for this later, I wanted to import some tests anyways.

Do note that I'm not actually very familiar with the encoding stuff. It looks like it only gets used for calculating positions (formatting doesn't need to know about the encoding at all), so what you wrote about the actual source being in utf-8 is probably correct.

bbatsov · 2025-09-15T06:21:03Z

I think that's in a good enough shape for now. We can iterate on it additionally down the road if needed.

tmtm force-pushed the positionEncoding-utf8-utf32 branch from 3d31eef to dca0ea6 Compare August 30, 2025 05:03

tmtm changed the title ~~Add support for positionEncoding 'utf-8' and 'utf-32'~~ Add support for positionEncoding utf-8 and utf-32 Aug 30, 2025

tmtm marked this pull request as ready for review August 30, 2025 05:27

koic reviewed Aug 30, 2025

View reviewed changes

koic reviewed Sep 1, 2025

View reviewed changes

tmtm force-pushed the positionEncoding-utf8-utf32 branch 2 times, most recently from d3f4ae2 to 22e4dac Compare September 7, 2025 15:57

Earlopain reviewed Sep 8, 2025

View reviewed changes

tmtm force-pushed the positionEncoding-utf8-utf32 branch from 22e4dac to c81f982 Compare September 14, 2025 09:02

tmtm changed the title ~~Add support for positionEncoding utf-8 and utf-32~~ Add support for LSP positionEncoding utf-8 and utf-32 Sep 14, 2025

Add support for LSP positionEncoding 'utf-8' and 'utf-32'

cd3c71a

tmtm force-pushed the positionEncoding-utf8-utf32 branch from c81f982 to cd3c71a Compare September 14, 2025 23:25

bbatsov merged commit 422cde4 into rubocop:master Sep 15, 2025
22 checks passed

		@@ -0,0 +1 @@
		* [#14492](https://github.com/rubocop/rubocop/pull/14492): Add support for `positionEncoding` `utf-8` and `utf-32`. ([@tmtm][])

Uh oh!

Add support for LSP positionEncoding utf-8 and utf-32 #14492

Add support for LSP positionEncoding utf-8 and utf-32 #14492

Uh oh!

Conversation

tmtm commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koic Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koic Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bbatsov commented Sep 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bbatsov commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add support for LSP `positionEncoding` `utf-8` and `utf-32` #14492

Add support for LSP `positionEncoding` `utf-8` and `utf-32` #14492

tmtm commented Aug 30, 2025 •

edited

Loading

koic Aug 30, 2025 •

edited

Loading

koic Aug 30, 2025 •

edited

Loading