KEMBAR78
GitHub - projectrefuge/resolutions-data-engr-sample: sample data engineering workflow
Skip to content

projectrefuge/resolutions-data-engr-sample

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

resolutions-data-engr-sample

This repository contains a sample of Go code that uses Regex matching to break apart text documents into small enough chunks to create vector embeddings.

Prerequisites

  • Go 1.16 or higher. You can download it from here.

Installation

  1. Clone the repository:

    git clone https://github.com/projectrefuge/resolutions-data-engr-sample.git
    cd resolutions-data-engr-sample
  2. Install dependencies (if any):

    go mod tidy

Usage

To run the code, execute the following command in the root directory of the repository:

go run main.go

Example Output

After running the code, you should see text files created in the resolutions directory, each containing chunks of the input text document.

Next Steps

MongoDB Atlas provides a way to create embeddings from text data using the Atlas Vector Search feature. You can learn more about it here. You can choose from Go, Python, Java, or Node.js to create embeddings.

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

About

sample data engineering workflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages