KEMBAR78
(#2890) - websql: avoid hex() for binaries by nolanlawson · Pull Request #2900 · pouchdb/pouchdb · GitHub
Skip to content

Conversation

nolanlawson
Copy link
Member

See #2899 for backstory. Rebased on #2818 because I didn't want to bitrot myself.

For a demonstration, compare these two blocks: before and after. The difference should be obvious.

Using hex() was a nice hack, but decoding hex strings in JavaScript is slow. Apparently just using replace() to swap out the \u0000 is much faster. I borrowed @neojski's trick from pouch-collate for this.

@nolanlawson
Copy link
Member Author

re-rebased on #2818

@calvinmetcalf
Copy link
Member

rebased off master

@nolanlawson
Copy link
Member Author

I also kept the perf improvements from #2899 for hex-parsing, because we're still using hex-parsing for IDs, since those are short and would be kinda nuts to migrate.

@daleharvey
Copy link
Member

+1 when green

1 similar comment
@daleharvey
Copy link
Member

+1 when green

@nolanlawson
Copy link
Member Author

thanks, 2693736

@nolanlawson nolanlawson deleted the 2890-3 branch October 31, 2014 16:37
@jason-s
Copy link

jason-s commented Apr 26, 2015

COBS would be more efficient for preventing embedded nulls. Use a 16-bit version of it, if a null is a 0000.

@nolanlawson
Copy link
Member Author

That's interesting! I would want to see a jsperf, though, to confirm that running COBS over strings is faster in most browsers than replace()ing a single character 3 times. E.g. I wonder if, by replacing a single character with more than one characters, this maybe causes an array shift under the hood, which would actually slow things down.

@jason-s
Copy link

jason-s commented Apr 27, 2015

to confirm that running COBS over strings is faster

Faster: no. (though comparable, I would think, depending on how string manipulation is implemented.) You can run the COBS algorithm in place; it creates a string with 1 additional byte (or word) in length.

More efficient in storage: yes. The thing about replacing unicode 0000, 0001, and 0002 is that although these are unusual string characters, they are likely to occur in binary data, so if you have data with lots of 0000, then the approach used here essentially doubles the data size.

@nolanlawson
Copy link
Member Author

Ah, that's true. I forgot that we're replacing a single character with two characters here. OK, it's worth giving it a shot! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants