This repository contains a sample of Go code that uses Regex matching to break apart text documents into small enough chunks to create vector embeddings.
- Go 1.16 or higher. You can download it from here.
-
Clone the repository:
git clone https://github.com/projectrefuge/resolutions-data-engr-sample.git cd resolutions-data-engr-sample
-
Install dependencies (if any):
go mod tidy
To run the code, execute the following command in the root directory of the repository:
go run main.go
After running the code, you should see text files created in the resolutions
directory, each containing chunks of the input text document.
MongoDB Atlas provides a way to create embeddings from text data using the Atlas Vector Search feature. You can learn more about it here. You can choose from Go, Python, Java, or Node.js to create embeddings.
Contributions are welcome! Please open an issue or submit a pull request.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.