This project provides a blueprint for a single-page application (SPA) designed to facilitate fast, scalable, and accurate semantic search for UN General Assembly resolutions, Security Council resolutions, and Presidential statements. The solution integrates a Vue.js frontend, a serverless AWS Lambda/API Gateway backend, and a preconfigured MongoDB Atlas cluster to enable users to perform natural language queries with instant results.
- Text Extraction and Structuring Raw text was extracted from resolution PDFs using Amazon Textract. A custom Go script parsed and segmented these documents into individual resolutions using Regex. (link to the script when publicly available). The structured data was made searchable by converting decades of resolutions into query-ready formats.
- Search-Ready Database Resolutions were stored in MongoDB Atlas, optimized for semantic search. An adapted Node.js script processed and uploaded the data as 1,536-dimension vector embeddings.
- User-Friendly Interface Built with Vue.js, the frontend provides a simple, intuitive interface. Users can type natural language queries (e.g., “respect for humanitarian law”) and receive precise results instantly.
- Scalable Backend on AWS Backend services leverage AWS Lambda and API Gateway for scalability and cost efficiency. The entire application is hosted as a subdomain for seamless accessibility.
- Collaboration and Open Source The project is designed as a blueprint for others. In collaboration with the ICRC, the solution is being prepared for release as open-source under their Open Source Program Office (OSPO). Usage
-
MongoDB Atlas Cluster: Ensure you have a MongoDB Atlas cluster set up. You can create one here. See this repository for a sample script and docs to upload data to MongoDB Atlas.
-
AWS Account: You need an AWS account to access AWS services.
-
Node.js: Make sure you have Node.js installed. You can download it from here.
-
AWS CLI: Install the AWS CLI to interact with AWS services from your terminal. You can install it from here.
-
Git: Ensure you have Git installed to clone the repository. You can download it from here.
-
Personal Access Token (PAT): Generate a Personal Access Token (PAT) with fine-grained permissions for your GitHub organization. Optionally, test access to ensure the token works:
curl -H "Authorization: token <your_github_pat>" https://api.github.com/repos/[your-org]/[your-repository]
-
Environment Variables: Set up the necessary environment variables for your application, such as MongoDB connection string and AWS credentials.
-
Create an S3 Bucket: Create an S3 bucket to store your Lambda function code.
-
Update Lambda Code: Update the Lambda code in the
function
directory to include your MongoDB connection string. Then, zip the contents and upload it to the S3 bucket with the pathlambda.zip
. (A skeleton shell script is provided in the root directory to help with this process.) -
Create SSM Parameter: Create an SSM parameter for the GitHub PAT created in the prerequisites section.
-
Create a New Stack: Go to the CloudFormation console and create a new stack using the
cfn-init.yml
template file in theinit
directory. -
Add Stack Name and Parameters: Add a stack name and populate the parameters with the resource names you created in the AWS steps above.
-
Deploy the Stack: Deploy the stack with default settings.
-
Reconnect Repository: In AWS Amplify, go to the app's settings under Branch Settings, click
Reconnect Repository
. -
Configure GitHub App: Click
Configure GitHub App
and authorize the app to access your repository. -
Set Production Branch: Set the
main
branch as the production branch in AWS Amplify. To deploy, navigate to the Overview tab, select your production branch, and run job. This will deploy your app, and future pushes to the main branch will automatically trigger new deployments. -
Optional: Set up a custom domain: To set up a custom domain, follow the instructions in the AWS Amplify documentation.
-
Optional: Add branch: To add a branch, navigate to the App's Branches tab, click
Add Branch
and select the branch you want to add. This will create a staging area with new subdomain to screen changes to the main the branch.
Natural Language Search: Query decades of resolutions instantly using semantic search. Scalable Infrastructure: Built on AWS Lambda for cost-effectiveness and high availability. Open Source Blueprint: Enable other organizations to adapt and reuse the tool.
This project is licensed under the Apache License 2.0.