KEMBAR78
GitHub - atgreen/cl-text-splitter: A Common Lisp text splitting library
Skip to content

atgreen/cl-text-splitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cl-text-splitter

A Common Lisp text splitting library

Usage

text-splitter is available via ocicl. Install it like so:

$ ocicl install text-splitter

Load and split documents like so:

(split (make-document-from-file "report.pdf"))

This will produce a list of strings split from report.pdf using the default size and overlap values (5000 and 200 characters respectively).

You can also create document instances manually like so:

(split (make-instance 'html-document :text MY-HTML-STRING) :size 10000 :overlap 0)

The split function will take advantage of document structure as it computes the splits, which is why it is helpful to know what kind of document we're splitting.

split will return nil if it doesn't recognize the document type.

Related Projects

Related projects include:

Author and License

cl-text-splitter was written by Anthony Green and is distributed under the terms of the MIT license.

About

A Common Lisp text splitting library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •