LLMINTEGRATED APPLICATIONS LLM REASONING WITH PROGRAMAIDED LANGUAGE & REACT
CHAINOFTHOUGHT PROMPTING
LLM-Powered Applications • Knowledge can be out of date.
• LLMs struggle with certain tasks (e.g., math). Complex reasoning is challenging for LLMs.
Program-Aided Language (PAL)
• LLMs can confidently provide wrong answers Generate scripts and pass it to the interpreter.
E.g., problems with multiple steps, mathematical reasoning
("hallucination").
MODEL OPTIMIZATION LLM should serve as a reasoning engine. Prompt
FOR DEPLOYMENT Leverage external app or data sources The prompt and completion are important! Q: Roger has 5 tennis balls. [...]
A: CoT reasoning
Inference challenges: High computing and storage demands LLM-integrated application # Roger started with 5 tennis balls
1.Plan actions 2.Format outputs 3.Validate actions
Ext data sources tennis_balles=5 PAL execution
Shrink model size, maintain performance Orchestrator
Set of instructions Requires formatting Collect information
Frontend E.g., # 2 cans of tennis balls each is
User Step1: Get for applications to that allows validation
bought_balls=2*3
Model Distillation Ext applications customer ID understand actions of an action
API Step2: Reset # tennis balls. The answer is
Python password answer = tennis_balls + bought_balls
• Scale down model complexity while preserving accuracy. LLM
Q. [...]
• Train a small student model to mimic a large frozen
teacher model. Chain-of-Thought (CoT)
Retrieval Augmented Generation (RAG) Completion is handed off to a Python interpreter.
• Prompts the model to break down problems into
LLM Soft AI framework that integrates external data sources sequential steps. Calculations are accurate and reliable.
Teacher labels
Distillation and apps (e.g., documents, private databases, etc.). • Operates by integrating intermediate reasoning steps ReAct
Knowledge
distillation Soft loss Multiple implementations exist, will depend on the into examples for one-or few-shot inference.
predictions
LLM details of the task and the data format. Prompting strategy that combines CoT reasoning and
Student Hard
Prompt action planning, employing structured examples to
Labeled predictions
training data Student
Q: Roger has 5 tennis balls. He buys 2 more cans of guide an LLM in problem-solving and decision-making
loss
Hard Retriever for solutions
labels
tennis balls. Each can has 3 tennis balls. How many
Query Query External
LLM Answer tennis balls does he have now?
encoder knowledge Instructions: Define the task,
User A: Roger started with 5 balls. 2 cans of 3 tennis Instructions
• Soft labels: Teacher completions serve as ground what is a thought and the
truth labels. balls each is 6 tennis balls. 5+6=11. The answer is 11. actions
• Student and distillation losses update student model • We retrieve documents most similar to the input query Q: The cafeteria had 23 apples. If they used 20 to Thought: Analysis of the Question
weights via backpropagation. current situation and the
in the external data. make lunch and bought 6 more, how many apples next steps to take
Thought
• The student LLM can be used for inference. • We combine the documents with the input query and do they have? Action: The actions are from
send the prompt to the LLM to receive the answer. a predetermined list and
Action
Post Training Quantization (PTQ) Completion
defined in the set of
instructions in the prompt
Observation
A: The cafeteria had 23 apples. They used 20 to The loop ends when the
PTQ reduces model weight precision to 16-bit float or ! Size of the context window can be a limitation. make lunch. 23-20=3. They bought 6 more apples, action is finish []
8-bit integer. Use multiple chunks (e.g., with LangChain) so 3+6=9. The answer is 9. Observation: Result of the Question to be answered
previous action
• Can target both weights and activation layers for impact. ! Data must be in format that allows its relevance
• May sacrifice performance, yet beneficial for cost to be assessed at inference time. In the completion, the whole prompt is included.
savings and performance gains. LangChain can be used to connect multiple
Use embedding vectors (vector store)
Improves performance but struggles with components through agents, tools, etc.
Model Pruning
precision-demanding tasks like tax computation Agents: Interpret the user input and determine which
or discount application. tool to use for the task (LangChain includes agents for
Removes redundant model parameters that contribute Vector database: Stores vectors and associated
little to the model performance. metadata, enabling efficient nearest-neighbor PAL & ReAct).
Solution: Allow the LLM to communicate with a proficient
Some methods require full model training, while others are vector search. math program, as a Python interpreter. ReAct reduces the risks of errors.
in the PEFT category (LoRA).