KEMBAR78
GitHub - TkTech/can_ada: Python bindings for Ada, a fast and spec-compliant URL parser.
Skip to content
/ can_ada Public

Python bindings for Ada, a fast and spec-compliant URL parser.

License

Notifications You must be signed in to change notification settings

TkTech/can_ada

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

can_ada

[Fast] Python bindings for Ada, a fast and WHATWG spec-compliant URL parser. This is the URL parser used in projects like Node.js.

Installation

pip install can_ada

Binary wheels are available for most platforms. If not available, a C++17-or-greater compiler will be required to build the underlying Ada library.

WHATWG URL compliance

Unlike the standard library's urllib.parse module, this library is compliant with the WHATWG URL specification.

import can_ada
urlstring = "https://www.GOoglé.com/./path/../path2/"
url = can_ada.parse(urlstring)
# prints www.xn--googl-fsa.com, the correctly parsed domain name according
# to WHATWG
print(url.hostname)
# prints /path2/, which is the correctly parsed pathname according to WHATWG
print(url.pathname)

import urllib.parse
urlstring = "https://www.GOoglé.com/./path/../path2/"
url = urllib.parse.urlparse(urlstring)
# prints www.googlé.com
print(url.hostname)
# prints /./path/../path2/
print(url.path)

Usage

Parsing is simple:

from can_ada import parse

url = parse("https://tkte.ch/search?q=canada")
print(url.protocol) # https:
print(url.host) # tkte.ch
print(url.pathname) # /search
print(url.search) # ?q=canada

You can also modify URLs:

from can_ada import parse

url = parse("https://tkte.ch/search?q=canada")
url.host = "google.com"
url.search = "?q=canada&safe=off"
print(url) # https://google.com/search?q=canada&safe=off

can_ada also supports the URLSearchParams API:

from can_ada import URLSearchParams

params = URLSearchParams("q=canada&safe=off")
params.append("page", "2")
params.append("page", "3")
params["q"] = "usa"
print(params) # q=usa&safe=off&page=2&page=3
print(params.has("q")) # True
print(params.get("page")) # 2
print(params.get_all("page")) # [2, 3]
print(params.keys()) # ["q", "safe", "page"]
print(params.values()) # ["usa", "off", "2", "3"]

Performance

We find that can_ada is typically ~4x faster than urllib:

---------------------------------------------------------------------------------
Name (time in ms)              Min                 Max                Mean       
---------------------------------------------------------------------------------
test_can_ada_parse         54.1304 (1.0)       54.6734 (1.0)       54.3699 (1.0) 
test_ada_python_parse     107.5653 (1.99)     108.1666 (1.98)     107.7817 (1.98)
test_urllib_parse         251.5167 (4.65)     255.1327 (4.67)     253.2407 (4.66)
---------------------------------------------------------------------------------

To run the benchmarks locally, use:

pytest --runslow

About

Python bindings for Ada, a fast and spec-compliant URL parser.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Contributors 2

  •  
  •