0% found this document useful (0 votes)

65 views27 pages

The Snake Eyes Project Tasking Handbook

The Snake Eyes Project Tasking Handbook outlines the process for creating and rating prompts for AI models, emphasizing the importance of user intent and natural conversation flow. It includes guidelines for writing effective prompts, evaluating model responses using RLHF dimensions, and ensuring diversity in prompt categories across multiple turns. The document also provides a detailed workflow for tasking, including self-assessment questions and common errors to avoid.

Uploaded by

chrisborura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views27 pages

The Snake Eyes Project Tasking Handbook

Uploaded by

chrisborura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Published using Google Docs

Report abuseLearn more

The Snake Eyes Project Tasking Handbook

Updated automatically every 5 minutes

The Snake Eyes Project Tasking Handbook

Prompts and RLHF
Daily Webinars/Office Hours: 8a PT/11a ET AND 11a PT/2p ET
Link Posted Daily in Outlier Community Threads
⚠️Knowledge Cutoff Date: Jan 31, 2025

Update Log
As projects move along, we will add updates here. Make sure you keep coming back to the instructions,
which is the ground truth for tasking well for this project.
Date Announcement/Updates Notes

Mar 31, 2025 Introduced “Part 0: Plan for Your

Conversation” section to emphasize having
user intent in mind when initiating a
conversation

Mar 31, 2025 Additional instructions on taking a “real user

perspective” to plan for multi-turn
conversation and create effective prompts

Mar 31, 2025 Guidance on regenerating responses if

neither one is justifiably better than the other
Mar 28, 2025 Model Cutoff Date is now JANUARY 31,
2025

Project Overview
The purpose of this project is to help a customer hone its model to take on increasingly challenging
everyday prompts. Tasks are a mix of single- and multi-turn, including two main components: creating
prompts and rating model responses. AI models will learn from your ratings. 🤖
Turn 1:

 Write an initial prompt in your assigned category and topic, verifying that you can envision
a realistic user, goal, and ways to progress the conversation for this prompt
 Rate both model responses using the RLHF dimensions
 Select the best response out of the two, or regenerate the responses if neither one is truly
better than the other

Turn 2 through Final Turn:

 Create a follow-up prompt that continues the conversation naturally

 You are encouraged to use diverse prompt categories across turns

 Rate both model responses for that turn

 Select the best response out of the two, or regenerate the responses if neither one is truly
better than the other

Special Notes:

 Each complete exchange (your prompt + model responses) counts as one "turn"
 Tasks have specific conversation lengths (between 3-15 turns). Complete each task using exactly
the number of turns specified, staying on the same topic throughout the entire conversation.
 Keep the conversation flowing naturally on the same topic, as if talking to a real assistant
 Complete all rating dimensions for both model responses in each turn
🎯 Please read this thoroughly before beginning tasking and/or reviewing on this project.

🧐Part 0: Plan for your Conversation

💡 Key Mindset Shift: You're starting a multi-turn conversation with real user needs in mind, not
just asking a single question or trying to trip up the model.

Who is your user and why are they using the model?
A rough mental outline of who could be entering this prompt, and why, allows us to sense check that our
first prompt is realistic, set up natural constraints that could be helpful to this user, and chart
a meaningful progression from one turn to the next.
If your first prompt is truly realistic and can sustain a conversation, you should be able to answer all of
the following questions:

1. What kind of person might use this prompt?

2. Why would this person use this prompt?

3. How might this person continue the conversation?

Quick Self-Assessment
As a basic guideline, if you cannot envision answers to any of these questions for your first prompt, it is
not realistic or sustainable enough for this project. Rethink your initial prompt before proceeding.

✍️ Part 1: Write a Great Prompt

✍️ Task Workflow
Step 1: Acknowledge the Suggested Topic and Prompt Category (1st turn)
Step 2: Write an initial Prompt with your “Who” and “Why” in mind
⚠️ For the first turn, You MUST Adhere to assigned category for the first turn.
For the subsequent turns, you are strongly encouraged to choose different categories for follow-
up prompts you create in order to assure your prompts are diverse.

Prompt Category
For examples of good prompts for each category, look at this table in the Appendix!
Note: Be careful that your Brainstorming and Chitchat prompts are not actually Open QA! As a
rule, if the prompt asks for structured and non-creative advice on a decision (or possibilities for
that decision), it is Open QA.
For Classification prompts, it is crucial that you include all classification groups in your prompt
and define them if they do not have universally-accepted definitions.

Suggested Topic
It’s critical that we deliver diverse data to the customer. You can occasionally skip or ignore the topic if
it’s something you’re not an expert in; consistent use of the same topic in all of your attempts will be
flagged as your quality is reviewed.
Key Do & Don’ts for Great Prompts
The best prompts read like authentic human communication - straightforward, purposeful, and focused on
the task at hand.
📎 How to Use Reference Texts
Reference texts are required for extraction, rewriting and summarization tasks - to attach a reference
text to your prompt, click on the “+” button (shown below) and input the text into the New Reference
Text box. Also include a corresponding link in Reference URL if the reference text was accessed via an
online source, and click Add Reference Text to save.
Note: Reference texts do not need to be included in the prompt for well-known texts or famous speeches
(e.g. a presidential address).
It is also okay if you attach a reference text more recent than Jan 2025 as long as the model is not
expected to independently recall information past its knowledge cutoff date.

Common Errors to avoid when writing prompts

1) Do NOT ask ❌ Bad examples:

unrealistic,
contrived questions  Extract the number of words starting with "t" and add the number of
that a typical words starting with "Q", then divide by 5
person would  List every single important person in history with the first name
never ask. “Matthew”

What real user would ever be interested in these questions? What real life tasks
or learning opportunities could these possibly contribute to?

2) Do NOT add The constraints we use to add complexity and direction to prompts should not be
unnecessary or random or designed specifically to trigger model failure - instead, they
unhelpful should align with our image of the prompt’s user and their goal.
constraints! Suppose we are writing a prompt from the perspective of an American college
student, who wants to find books on the Vietnam War from a mix of
perspectives. Let’s look at two different ways to use constraints:
❌ What are some good books on the Vietnam War? They should all include
some reference to an American military operation name, and should be
organized in reverse alphabetical order of title.
The constraints (include references to an American military operation name and
reverse alphabetical order) serve no purpose other than to add complexity to
the prompt and do not serve to help our user learn more about the Vietnam
War.
✅ What are some good books for a mix of perspectives on the Vietnam War.
I’m looking for one from a Vietnamese writer from the northern side, one from
the southern, and an American. I’d prefer only ones with an English translation.

The constraints (one book from a writer from each side of the conflict, English
translations only) all align with our user and their goal. The diversity of
authors serves to further knowledge on the conflict through multilateral
perspectives, while the English translation component makes sense for an
American student.

3) Do not ❌ Bad example: Tell me who the first president is, and then tell me their last
stack name, then give me four more examples of leaders with the same name, but
questions – different birth years
Focus Even if these questions are related, no one would ever overload a single prompt
prompts on in this fashion!
one
question/ask.
4) Do not ❌ Bad examples:
ask simple
“trivia”  What year was “The Godfather” released?
questions  Who was the CEO of Apple when the iPhone was invented?

Your prompt should not be resolvable through a couple words or a simple

Internet search - remember, the model should have to reason.

Prompt Quality Checklist

✅Realistic: A genuine question someone would ask an AI assistant

✅Strategic: open paths for follow-up and further exploration
✅Challenging: help the model reason instead of directly retrieving facts
✅Natural: Avoids artificial constraints that feel contrived
✅ Any constraints make sense for the prompt’s target user and goal
✅Follow the assigned prompt category & suggested topic in the first turn & diversify in the
following turns

🔢Part 2: RLHF Grading

✍️ Task Workflow
Step 1: Rate Model Responses on 6 dimensions
Step 2: Select the Better Response

Six Dimensions to Evaluate the 2 Model Responses

You will be asked to compare 2 model responses side by side. First, you are going to grade that response
in the following dimensions:
Issue Definition

Harmfulness Evaluates whether the response contains any harmful, offensive, or inappropriate
content that could negatively affect the user.

Instruction Evaluates how well the response addresses all explicit and reasonably implied
Following elements of the prompt. It assesses the model's ability to fulfill both direct
requests and constraints intended to guide those requests.
Truthfulness Checks whether all claims are accurate and supported by reputable sources, such
as trusted news outlets or scientific publications.

Writing Style and Assesses the clarity, organization, and readability of the response. The tone
Tone should be natural and conversational, encouraging next steps, without being
preachy or overly formal.

Content Evaluates whether the response provides enough relevant information to fully
Completeness address the prompt, without omitting key details or important content

Content Measures whether the response is concise, containing only necessary content.
Conciseness & Each sentence should add value, with additional suggestions or conversational
Relevance elements being relevant and non-repetitive.

Each dimension is evaluated from 1-3:

Score Severity

Major issues → 1 The model fails to meet the dimension or has significant flaws.

Minor Issues → 2 The model partially meets the dimension with some small gaps or
imperfections.

No issues → 3 The model fully meets the dimension.

You will be asked to provide justification for issues in Instruction Following, Harmfulness and
Truthfulness:
After evaluating the preferred response using the RLHF dimensions, you’ll provide an Overall Score to
the preferred Model Response from 1-5.
OVERALL Severity
EVAL
EXCELLENT → 5 Response doesn’t have ANY flaw and cannot be meaningfully improved. There
are NO major or minor issues in any dimensions of the rubric. In other words,
the response addresses the main user intent and instructions exceptionally well,
in a way that is extremely clear, fluent, natural in its use of language and
organization, and does not have any repetitive or unnecessary information.

GOOD → 4 The response is good overall, with NO major issues and just a few minor issues.
Response successfully fulfills the user’s intent.

ADEQUATE → 3 Response addresses the main user intent and instructions with NO major issues,
but has several minor issues (e.g. includes unnecessary details, misses certain
elements in following the instructions, etc).

BAD → 2 Minor room for improvement: The response has a major issue (whether in one of
the above dimensions, or along some other dimension you observed) and/or does
not really satisfy the user’s intent, with the exception of avoiding safety issues.

POOR → 1 Response has multiple major issues and is really unhelpful and frustrating
Select Your Preferred Response
After evaluating each dimension, select your overall preference between the two model responses using
the 6-point scale shown below.

⚠️Important:

 You must choose one response over the other - no ties permitted
 Neither response needs a failure in order for this to be a good turn, but one response does need to
be justifiably better!
 Your overall preference must be consistent with your dimensional ratings

 Example of inconsistency: Rating all dimensions favorably toward Response 1 but giving
an overall preference that favors Response 2

 Rating inconsistencies will reduce the quality of your evaluation and the model's learning.

Justify Your Preference

After ranking, write down specific reasons why you preferred one response over the other.
⚠️ If you are not confident that either response is is even slightly better, or you feel like you are
“forcing” this justification, you should REGENERATE the responses until you can justify that
a response is better.
Examples of “forced” justifications:

 Both responses present the same information without issues, but one is marginally more
concise than the other
 Both responses have a major factual error, and you are “splitting hairs” trying to explain why
one is more impactful than the other

Now, Take More Turns!

✍️ Task Workflow (for each turn after the first)
Step 1: Create a follow-up prompt
Step 2: Rate both responses on 6 dimensions
Step 3: Select the Better Response
You will have a min 3 turns and max 15 turns. # of turns would be specified for each task.

Write Follow Up Prompts

Once you have confirmed there are no catastrophic errors, continue the chat by writing follow up
prompts. Keep conversation flow natural on the same topic, as if talking to a real assistant.

How to Create High-quality Follow-up Prompts

DOs ❌DON’Ts

🏖️Keep It Natural and On Point 🙅Avoid Derail, Detour or Duplicate

 Dig deeper into interesting points  Change topics completely

 Ask about real-world applications  Reply with minimal responses that don't
 Respectfully challenge ideas and advance the conversation (e.g., "Thanks!" or
assumptions when relevant "Sounds good")
 Connect related topics within the same
domain  Repeatedly ask the model to rewrite or
rephrase both previous responses since the
model only sees the winning response.
Limit asking for rewrites to 1 time in
shorter conversations (<7 turns) and 2
times in longer conversations
 Send identical or very similar prompts over
and over again

GOOD EXAMPLES BAD EXAMPLES

 "You mentioned meditation helps with  "Thanks for explaining gardening. Can you
stress - what specific techniques work tell me about black holes?"
best for beginners?"  "Cool, thanks!"
 "That's interesting about renewable  "I don't like how you explained that recipe.
energy - how would it work for someone Do it again."
living in an apartment?"
 "I see your point about team leadership,
but what happens when team members
strongly disagree?"
Good Examples: A Natural Progression of Prompts
In the following multi-turn prompt progressions, a hypothetical user goal is pursued through a series of
questions that seek follow-up details, clarify responses to previous prompts, explore new facets of the
main topics, and utilize diverse prompt categories.
Example 1: the user’s goal is specific and targeted, with every prompt building on previous information
to help inform the destination and build an itinerary and budget for the weekend.
Example 2: Here, the user begins with a more general-interest goal of understanding baseball history
and expands to questions specifically about David Ortiz based on previous prompts’ answers - though
more flexible in terms of topics, this is still a natural progression!
Initial Prompt Category: Rewriting

User Perspective (Knowledge Seeking): A user who is not too knowledgeable about sports
wants to learn more about the factors underlying the 2004 Red Sox comeback, beginning with
getting a better understanding of an article on the topic.

 Prompt 1: Rewrite the section on the key players with notes explaining any baseball jargon (e.g.
RBI, walk-offs).
 Prompt 2: Give me a summary, using that same article, of how the Red Sox played differently in
2004 from previous seasons.
 Prompt 3: Across all the listed metrics, which Red Sox player was most impactful in terms of
scoring runs in the 2004 season? I just want one player, but break down into runs batted in versus
home runs.
 Prompt 4: I remember hearing about David Ortiz a lot as a kid… what was special about his
playing style?
 Prompt 5: What were some of the other factors that made David Ortiz so culturally significant?
 Prompt 6: What were the key points David Ortiz made in his famous 2013 speech after the
Boston Marathon?

Catastrophic Errors
Check
✅ Before you proceed to next turn, you need to confirm there’s no catastrophic errors.
🛑 You may terminate the task if you spotted any catastrophic errors.
⚠️ Make sure that you complete all rating dimensions for both model responses & select the
preferred response in each turn.
Appendix: RLHF Grading Rubric
See more details of each dimension. You will also see the rubrics again in the Task Instructions!
(1) Instruction Following

 No Issues

 Response completely follows all instructions from the prompt, fully respecting the user’s
prompt.

 Minor Issue(s)

 Response follows most of the instructions from the prompt, satisfying the user’s primary intent,
but misses certain elements.

 Major Issue(s)

 The response ignores, circumvents, or violates key components of the prompt, rendering the
response unhelpful to the user. The response punts unnecessarily on a non-safety related issue.

(2) Truthfulness
TAKE THE TIME YOU NEED to fact-check responses that contain factual claims. Spending 10ish minutes
across responses is not unreasonable.

 No issues

All claims are accurate based on reputable web evidence.

 Minor issue(s)

 Primary claims (necessary for addressing the prompt) are accurate, but at least one secondary
claim is inaccurate, unsupported, or disputed according to reputable web evidence.

 Major issue(s)

 At least one primary claim is inaccurate, unsupported, or disputed according to reputable web
evidence.

(3) Harmlessness

 No Issues

 Response has no unsafe or toxic language.

 Minor Issue(s)

 Response contains minor/questionable aspects related to unsafe or toxic language, but they are
not highly concerning.

 Major Issue(s)

 Response contains significant safety or toxic language issue(s).

(4) Content Conciseness & Relevance

 No Issues

 Response contains only necessary content. Each sentence is relevant to the prompt and rich in
value. Any additional summaries, suggestions, considerations, and conversational questions are
clearly helpful and relevant and not repetitive.

 Minor Issue(s)

 Response is generally relevant to the prompt but contains a small portion of unnecessary
content that is repetitive, unhelpful, or irrelevant.

 Major Issue(s)

 Response contains a significant amount of unnecessary content that is repetitive, unhelpful, or

irrelevant.

(5) Content Completeness

 No Issues

 The response gives enough information and sufficient detail to helpfully fulfill the prompt;
there is no important and relevant content missing.

 Minor Issue(s)

 There is some relevant information that is missing the response, reducing its helpfulness. For
example, the response might be technically correct but far too terse, leaving the user
dissatisfied.

 Major Issue(s)

 Relevant content is missing to such an extent that the response does not at all fulfill the user’s
intent.
(6) Writing Style & Tone
 No Issues

 Response is written and organized such that it’s easy to understand and take next steps.
Response is communicated in a natural-sounding, conversational tone that makes it engaging.
Response does not preach at or lecture the user.

 Minor Issue(s)

 Response has minor issues of writing quality, such as being stilted or unnatural. Phrasing could
be more concise or appropriate for the conversational context. Response may contain some
stylistic issues that reduce how engaging it is, or be overly formatted in a distracting way (e.g.
unnecessarily nested bullet points or over bolding).

 Major Issue(s)

 Response is stylistically unnatural, unengaging, or formatted poorly enough that it is difficult to

read and understand. Or, the response preaches to or lectures the user.

Overall Quality

 Cannot be improved

 Response doesn’t have ANY flaw and cannot be meaningfully improved. There are NO major
or minor issues in any dimensions of the rubric. In other words, the response addresses the
main user intent and instructions exceptionally well, in a way that is extremely clear, fluent,
natural in its use of language and organization, and does not have any repetitive or unnecessary
information.

 Minor room for improvement

 The response is good overall, with NO major issues and just a few minor issues. Response
successfully fulfills the user’s intent.

 Okay

 Response addresses the main user intent and instructions with NO major issues, but has several
minor issues (e.g. includes unnecessary details, misses certain elements in following the
instructions, etc).

 Pretty bad
 The response has a major issue (whether in one of the above dimensions, or along some other
dimension you observed) and/or does not really satisfy the user’s intent, with the exception of
avoiding safety issues.

 Horrible

 Response has multiple major issues and is really unhelpful and frustrating.

Appendix: Good Prompt Examples

Category Good Example Need Reference Text

(MAX 2000 Words)

Open QA What are some good alliterative titles for this NOT ALLOWED
Advice and Guidance: model is asked to film essay I wrote? I’d like it to include a
reason through a problem/a good response caption.
will provide a framework or set of
considerations that could solve the problem. What's a good creatine brand, and how should
I take it to optimize strength while avoiding
any risks? (budget is not an issue)

Closed QA Based on this article, how can I get access to REQUIRED

Model is asked a question that can be only weight-loss medications like Ozempic? live in
answered by THINKING THROUGH California if that matters.
information contained entirely within a [Text from Website that includes information
reference text. about lots of weightloss options, including
⚠️ALL information necessary to respond Ozempic]
to the prompt must be contained within
the pasted in reference text

Extraction According to this document, which women REQUIRED

Model is asked to retrieve information from were involved in the events of the revolution?
within a reference text. I'd like to know a bit about their involvement
⚠️ALL information to be extracted must and, if applicable, their legacy
be contained within the pasted in [Reference text about french wars, has 4
reference text. women mentioned related to the Revolution,
1 of whom had a long lasting legacy]
🚫 Prompts that simply require the model
to find a list of names/events/dates are not
allowed.
Category Good Example Need Reference Text
(MAX 2000 Words)

Rewriting Modify the product spec below into a sales REQUIRED

Model is asked to re-write, adjust, annotate, pitch, targeting a 20-30 year old, male,
summarize, stylize, re-organize, or otherwise American demographic.
modify an existing reference text. ref text: [Product Description]
⚠️CORE information necessary for the re-
write must be contained within the pasted
in reference text.
Classification Of countries that participated in World OPTIONAL
Categorizes data or content into defined War II, which aligned with the Allied
groups or labels. You MUST provide Powers, Axis Powers, or remained
definitions for categories included in the Neutral? If any switched sides, include
prompt. them in the group they aligned with the
longest
Chitchat I've been thinking about getting a pet but I OPTIONAL
Casual, open-ended conversations on never know if I’m actually ready for
general topics, often light and informal. one. I’m in New York.
Brainstorming I need to create a unique team-building OPTIONAL
Offers creative ideas or solutions for a given activity for our remote work group of 15
problem or topic. people.
Roleplay You're a historical tour guide at the OPTIONAL
Simulates interactions with the model Roman Colosseum. I'm a tourist who
adopting a specific persona or expertise knows very little about ancient Rome.
area. Give me a 5-minute introduction to what
I'm seeing.
Summarization I need a concise summary of this article REQUIRED
Extracts and presents main points and on climate change adaptation strategies
essential information from longer texts. in coastal cities. Focus on the key
⚠️ALL information necessary for the findings and recommendations for urban
summary must be contained within the planners.
pasted in reference text.
[Add article text]
Writing Write a short story about someone who OPTIONAL
Produces creative content like text, stories, discovers they can communicate with
or dialogue from a prompt. plants.
Other OPTIONAL
Anything you want.

Goldfish Crackers
No ratings yet
Goldfish Crackers
9 pages
Instructions - Winter Wonderland RLHF
No ratings yet
Instructions - Winter Wonderland RLHF
46 pages
(LIVE) (EXT) Cabbage Patch - Specifications
No ratings yet
(LIVE) (EXT) Cabbage Patch - Specifications
33 pages
Instructions - Winter Wonderland RLH
No ratings yet
Instructions - Winter Wonderland RLH
50 pages
AI Prompting for Effective Training
No ratings yet
AI Prompting for Effective Training
9 pages
Unit 2 Notes Genai
No ratings yet
Unit 2 Notes Genai
45 pages
Rubrics v2 Instructions
No ratings yet
Rubrics v2 Instructions
36 pages
Principals and Heuristics of Effective Prompt Engineering
No ratings yet
Principals and Heuristics of Effective Prompt Engineering
10 pages
Short Guide To Prompting
No ratings yet
Short Guide To Prompting
14 pages
Prompt Cook Book
No ratings yet
Prompt Cook Book
24 pages
Instructions - Winter Wonderland RLHF
No ratings yet
Instructions - Winter Wonderland RLHF
31 pages
Multiverse Course
No ratings yet
Multiverse Course
26 pages
Mandolin Parade v2 Project Instructions
No ratings yet
Mandolin Parade v2 Project Instructions
23 pages
Instructions - Winter Wonderland RLHF
No ratings yet
Instructions - Winter Wonderland RLHF
31 pages
AI Prompting Guide for Beginners
No ratings yet
AI Prompting Guide for Beginners
3 pages
Lemur Astrologer Coding
No ratings yet
Lemur Astrologer Coding
28 pages
Cypher RLHF Prompt Writing
No ratings yet
Cypher RLHF Prompt Writing
11 pages
(CBT) Multimodal RLHF Tasking Specifications
No ratings yet
(CBT) Multimodal RLHF Tasking Specifications
27 pages
? - ? Vision SFT Handbook
No ratings yet
? - ? Vision SFT Handbook
11 pages
Week2 Llms
No ratings yet
Week2 Llms
25 pages
? Flamingo WFE
No ratings yet
? Flamingo WFE
18 pages
Comprehensive Writing Guide For English Benchmark 1
No ratings yet
Comprehensive Writing Guide For English Benchmark 1
18 pages
(EXT) Starfish Mint RTL - Specifications
No ratings yet
(EXT) Starfish Mint RTL - Specifications
39 pages
Everything I'll Forget About Prompting LLMs
No ratings yet
Everything I'll Forget About Prompting LLMs
36 pages
PromptEngg Mod1
No ratings yet
PromptEngg Mod1
15 pages
Pangolin Safety - Handbook
No ratings yet
Pangolin Safety - Handbook
9 pages
Learn To Use AI Prompt Mechanics For High-Quality Output
No ratings yet
Learn To Use AI Prompt Mechanics For High-Quality Output
6 pages
Prompt Eng
No ratings yet
Prompt Eng
22 pages
Prompt Eng Techniques
100% (3)
Prompt Eng Techniques
17 pages
GPT Prompt Engineering Handbook: Ernest Simon
80% (5)
GPT Prompt Engineering Handbook: Ernest Simon
22 pages
Generative AI for Tech Professionals
No ratings yet
Generative AI for Tech Professionals
54 pages
Prompt Engg. MGNM575
No ratings yet
Prompt Engg. MGNM575
38 pages
1 UsingLLMs
No ratings yet
1 UsingLLMs
24 pages
Prompt Engineering
100% (2)
Prompt Engineering
26 pages
Prompt Engineering Training
No ratings yet
Prompt Engineering Training
7 pages
Guide To Using GPT Prompts
No ratings yet
Guide To Using GPT Prompts
2 pages
ChatGPT3 Free Prompt List
No ratings yet
ChatGPT3 Free Prompt List
4 pages
AI Prompt Engineering Research
No ratings yet
AI Prompt Engineering Research
4 pages
Prompting 101 - A Guide by HumAIn+
No ratings yet
Prompting 101 - A Guide by HumAIn+
14 pages
Cluster3 Prompt Engineering Generative AI Practice Summary
No ratings yet
Cluster3 Prompt Engineering Generative AI Practice Summary
6 pages
LLM SFT Data Guideline v2.0
No ratings yet
LLM SFT Data Guideline v2.0
13 pages
A M A: A: SK E Nything Simple Strategy For Prompting Language Models
No ratings yet
A M A: A: SK E Nything Simple Strategy For Prompting Language Models
63 pages
Ask Me Anything
No ratings yet
Ask Me Anything
59 pages
Prompt Engineering
No ratings yet
Prompt Engineering
24 pages
Prompt Engineering
No ratings yet
Prompt Engineering
5 pages
Week 5 TLM Working With Prompts
No ratings yet
Week 5 TLM Working With Prompts
11 pages
# ChatGPT Prompt Mastery - 20250519 - 204724 - 0000
No ratings yet
# ChatGPT Prompt Mastery - 20250519 - 204724 - 0000
44 pages
Unit 4PromptEngineering T1744796196164
No ratings yet
Unit 4PromptEngineering T1744796196164
41 pages
Mastering Prompt Engineering
No ratings yet
Mastering Prompt Engineering
15 pages
Unit 2 - Gai
No ratings yet
Unit 2 - Gai
14 pages
Tasking Guidelines - Project Shield
No ratings yet
Tasking Guidelines - Project Shield
10 pages
Mod 1
No ratings yet
Mod 1
31 pages
Prof Bernard Leong - Prompt Engineering Training Workbook - InJourney
No ratings yet
Prof Bernard Leong - Prompt Engineering Training Workbook - InJourney
22 pages
Hack Chat GPT's Brain Guide Book PDF
No ratings yet
Hack Chat GPT's Brain Guide Book PDF
51 pages
Day 1
No ratings yet
Day 1
3 pages
Aristotle Reasoning (All Domains) Guidelines
No ratings yet
Aristotle Reasoning (All Domains) Guidelines
22 pages
Prompt Engineering Fundamentals
No ratings yet
Prompt Engineering Fundamentals
6 pages
Prompt Engineering Guide For Students
No ratings yet
Prompt Engineering Guide For Students
5 pages
Appen Aragorn en Transcription Test Invite
No ratings yet
Appen Aragorn en Transcription Test Invite
2 pages
2 Search Experience To Ads Evaluation General Guidelines SEAU2023
100% (1)
2 Search Experience To Ads Evaluation General Guidelines SEAU2023
3 pages
Instruction For Retail (AGI)
No ratings yet
Instruction For Retail (AGI)
18 pages
UOLO
100% (1)
UOLO
7 pages
Uolo Custom Search and Labelling UI
100% (1)
Uolo Custom Search and Labelling UI
23 pages
Edu Given Prompt Evals - Attempter Instruction
No ratings yet
Edu Given Prompt Evals - Attempter Instruction
24 pages
Rater Hub Task
43% (7)
Rater Hub Task
2 pages
Uolo Base Guidelines V.5 & V.6
100% (1)
Uolo Base Guidelines V.5 & V.6
40 pages
(Trainers Copy) xAI Instructions and Guidelines-2
No ratings yet
(Trainers Copy) xAI Instructions and Guidelines-2
10 pages
Touchet Quiz Answer Sheet 12.14
67% (3)
Touchet Quiz Answer Sheet 12.14
4 pages
Project Callisto Search Quality Review Module 4 - Version 3
100% (1)
Project Callisto Search Quality Review Module 4 - Version 3
25 pages
Omni Project Summarized Rules
No ratings yet
Omni Project Summarized Rules
7 pages
Awesome ChatGPT Prompts PDF
100% (13)
Awesome ChatGPT Prompts PDF
103 pages
Use Case Labeling Rater Guidelines
No ratings yet
Use Case Labeling Rater Guidelines
25 pages
The Best ChatGPT
100% (49)
The Best ChatGPT
8 pages
Uolo V2 Questions & Answers (Updated)
100% (2)
Uolo V2 Questions & Answers (Updated)
60 pages
ChatGPT Prompt Words: A Comprehensive Guide For Industry-Specific Information Retrieval
100% (14)
ChatGPT Prompt Words: A Comprehensive Guide For Industry-Specific Information Retrieval
92 pages
226 ChatGPT Prompts A-Z ChatGPT Prompt Engineering BootCamp
90% (20)
226 ChatGPT Prompts A-Z ChatGPT Prompt Engineering BootCamp
120 pages
Video Relevance Judging Guide
No ratings yet
Video Relevance Judging Guide
11 pages
Uolo Base Guidelines V.5 and V.6 Basic Labeling
85% (13)
Uolo Base Guidelines V.5 and V.6 Basic Labeling
32 pages
250+ ChatGPT Prompts & AI Tools
100% (8)
250+ ChatGPT Prompts & AI Tools
37 pages
MILKY WAY 4 Beta Updated and Sorted
75% (4)
MILKY WAY 4 Beta Updated and Sorted
6 pages
ChatGPT Prompt Cheat Sheet
91% (11)
ChatGPT Prompt Cheat Sheet
11 pages
Final Exam RWS Project Callisto
100% (6)
Final Exam RWS Project Callisto
18 pages
Milky Way NEW-1
100% (1)
Milky Way NEW-1
10 pages
M4 - Deep Relevance (MAPSv2 M4) - v1.4 (2021-08-07)
No ratings yet
M4 - Deep Relevance (MAPSv2 M4) - v1.4 (2021-08-07)
16 pages
Annotation & QA Guide
No ratings yet
Annotation & QA Guide
42 pages
Touchet Onboarding
100% (1)
Touchet Onboarding
4 pages
Textingfactory 3
100% (1)
Textingfactory 3
4 pages
Search - 2 Video Complex Queries - BaseLine
No ratings yet
Search - 2 Video Complex Queries - BaseLine
8 pages
Jellyfish Instruction Hierarchy Safety - Instructions
No ratings yet
Jellyfish Instruction Hierarchy Safety - Instructions
184 pages
Form 119
No ratings yet
Form 119
2 pages
Torino Enterprises Limited V Attorney General (Petition 5 (E006) Of2022) 2023KESC79 (KLR) (22september2023) (Judgment)
No ratings yet
Torino Enterprises Limited V Attorney General (Petition 5 (E006) Of2022) 2023KESC79 (KLR) (22september2023) (Judgment)
20 pages
Central Bank of Kenya Independence
No ratings yet
Central Bank of Kenya Independence
99 pages
Taxation and Inequality
No ratings yet
Taxation and Inequality
5 pages
Pipe Riser
No ratings yet
Pipe Riser
4 pages
Group Dynamics and Cohesion
No ratings yet
Group Dynamics and Cohesion
5 pages
Personality Differences Between Males and Females Based On Big Five Factors - An Empirical Study
No ratings yet
Personality Differences Between Males and Females Based On Big Five Factors - An Empirical Study
11 pages
Slicing and Non Slicing Floor Planning
0% (1)
Slicing and Non Slicing Floor Planning
22 pages
History of Communication
100% (1)
History of Communication
41 pages
English Holiday Homework Class 11
100% (1)
English Holiday Homework Class 11
8 pages
Geology For Ce Case Study
No ratings yet
Geology For Ce Case Study
6 pages
HR Policy Manual for Employees
100% (5)
HR Policy Manual for Employees
58 pages
Health and Safety
100% (1)
Health and Safety
24 pages
Cataloguingintroduction
No ratings yet
Cataloguingintroduction
10 pages
Aviation Security Manual Doc 8973 Restricted
0% (2)
Aviation Security Manual Doc 8973 Restricted
2 pages
Deleuze y El Estoicismo
No ratings yet
Deleuze y El Estoicismo
20 pages
Cricut Users: Easy Cutting Software
No ratings yet
Cricut Users: Easy Cutting Software
25 pages
Optimization (Linear Programming) Using Matlab
100% (1)
Optimization (Linear Programming) Using Matlab
13 pages
InteliVision 5 Reference Guide
No ratings yet
InteliVision 5 Reference Guide
45 pages
Transformational Leadership Behavioe Inventory (TLI) by Podsakoff Et Al. (PG 117)
No ratings yet
Transformational Leadership Behavioe Inventory (TLI) by Podsakoff Et Al. (PG 117)
131 pages
C 34367
No ratings yet
C 34367
19 pages
Nunc Conical Tubes Brochure EN
No ratings yet
Nunc Conical Tubes Brochure EN
6 pages
Configuration Guide Basic Configurations (V200R001C01 - 03)
75% (8)
Configuration Guide Basic Configurations (V200R001C01 - 03)
188 pages
174601691468121a9206adailovepdf Merged
No ratings yet
174601691468121a9206adailovepdf Merged
64 pages
Summative 3.1
No ratings yet
Summative 3.1
2 pages
Whole Numbers: Number Sense to 100,000
No ratings yet
Whole Numbers: Number Sense to 100,000
44 pages
Cec 500 2017 008
No ratings yet
Cec 500 2017 008
141 pages
From To Everything You Wanted To Know About The Future of Your Work But Were Afraid To Ask Codex4799 PDF
No ratings yet
From To Everything You Wanted To Know About The Future of Your Work But Were Afraid To Ask Codex4799 PDF
55 pages
[Blue Film SEX] Indian Viral Videos on Social Media
No ratings yet
[Blue Film SEX] Indian Viral Videos on Social Media
3 pages
5.0 Quality Management
No ratings yet
5.0 Quality Management
59 pages
Urbanization Concepts, Dimensions and Factors: January 2018
No ratings yet
Urbanization Concepts, Dimensions and Factors: January 2018
12 pages
Decision Criteria For Selecting Main Contractors in Malaysia
No ratings yet
Decision Criteria For Selecting Main Contractors in Malaysia
8 pages
Evolution of OB
No ratings yet
Evolution of OB
16 pages

The Snake Eyes Project Tasking Handbook

Uploaded by

The Snake Eyes Project Tasking Handbook

Uploaded by

Published using Google Docs

Report abuseLearn more

The Snake Eyes Project Tasking Handbook

The Snake Eyes Project Tasking Handbook

Mar 31, 2025 Introduced “Part 0: Plan for Your

Mar 31, 2025 Additional instructions on taking a “real user

Mar 31, 2025 Guidance on regenerating responses if

Turn 2 through Final Turn:

 Create a follow-up prompt that continues the conversation naturally

 You are encouraged to use diverse prompt categories across turns

 Rate both model responses for that turn

🧐Part 0: Plan for your Conversation

1. What kind of person might use this prompt?

2. Why would this person use this prompt?

3. How might this person continue the conversation?

✍️ Part 1: Write a Great Prompt

Common Errors to avoid when writing prompts

1) Do NOT ask ❌ Bad examples:

Your prompt should not be resolvable through a couple words or a simple

Prompt Quality Checklist

✅Realistic: A genuine question someone would ask an AI assistant

🔢Part 2: RLHF Grading

Six Dimensions to Evaluate the 2 Model Responses

Each dimension is evaluated from 1-3:

No issues → 3 The model fully meets the dimension.

Justify Your Preference

Now, Take More Turns!

Write Follow Up Prompts

How to Create High-quality Follow-up Prompts

🏖️Keep It Natural and On Point 🙅Avoid Derail, Detour or Duplicate

 Dig deeper into interesting points  Change topics completely

GOOD EXAMPLES BAD EXAMPLES

All claims are accurate based on reputable web evidence.

 Response has no unsafe or toxic language.

 Response contains significant safety or toxic language issue(s).

(4) Content Conciseness & Relevance

 Response contains a significant amount of unnecessary content that is repetitive, unhelpful, or

(5) Content Completeness

 Response is stylistically unnatural, unengaging, or formatted poorly enough that it is difficult to

 Minor room for improvement

Appendix: Good Prompt Examples

Category Good Example Need Reference Text

Closed QA Based on this article, how can I get access to REQUIRED

Extraction According to this document, which women REQUIRED

Rewriting Modify the product spec below into a sales REQUIRED

You might also like