Example Task OUTLIER
Example Task OUTLIER
Here is an example task! Please go through the task and ensure that you thoroughly understand every
answer as the next part of the course is a graded assessment.
Note: Your justifications should be more precise than the ones in the example
Read-only
Complexity: Medium
Provide a prompt related to HTML, under the category of The overarching discipline encompasses the
design, creation, testing, and deployment of software applications across various platforms and
purposes, utilizing diverse programming languages, frameworks, and tools to meet specific user needs
and functionalities and can be deployed across different devices and environments., domain The
practice of optimizing web applications to improve load times, responsiveness, and resource efficiency
for better user experience and search engine optimization..
Use at least one of the following libraries/frameworks according to your programming language:
Node.js, ASP.NET, React, Angular, Vue.js, Spring, Hibernate, Django, Flask, Ruby on Rails, Sass, Less,
jQuery, AngularJS, Bootstrap
Your task has a pair prompt/response related to Application Development, whose definition is:
The overarching discipline encompasses the design, creation, testing, and deployment of software
applications across various platforms and purposes, utilizing diverse programming languages,
frameworks, and tools to meet specific user needs and functionalities and can be deployed across
different devices and environments.
The process of designing, building, and maintaining websites, web applications, and web services using
programming languages such as HTML, CSS, JavaScript, and server-side languages like PHP, Python, and
Ruby. This includes front-end development (client-side), back-end development (server-side), devOps,
deployment, testing, tuning, architecture, as well as web frameworks, libraries, and databases.
The practice of optimizing web applications to improve load times, responsiveness, and resource
efficiency for better user experience and search engine optimization.
Finally, according to the programming language assigned in this task, the following libraries/frameworks
could be used in your pair prompt/response (you can also see other libraries/frameworks that meet the
task programming language) :
Node.js, ASP.NET, React, Angular, Vue.js, Spring, Hibernate, Django, Flask, Ruby on Rails, Sass, Less,
jQuery, AngularJS, Bootstrap.
A front-line LLM is any publicly accessible, state-of-the-art model that engineering teams already rely on
for day-to-day coding help. Examples today include ChatGPT-4-turbo, Gemini 1.5 Pro, Claude 3 Opus,
etc.
When we write a prompt we need to ask ourselves “Does the prompt feel too easy for testing a front-
line LLM?”, we mean: Would one of these top models likely solve it fully, on the first try, with little-to-no
reasoning or iteration?
If the answer is “yes,” the prompt is too easy for this project.
In this project we MUST produce prompts of Medium/Hard complexity. Avoid the Easy category.
Use this table when deciding whether a prompt is Easy, Medium, or Hard. We only accept Medium or
Hard:
Level Experience Assumed Knowledge & Context Required Nature of a Valid Solution
Easy (✗ not allowed) Upper-undergraduate CS student. Only basic language syntax, trivial APIs,
or one self-contained function. No architectural background needed. Single-concept fix or toy script;
no tricky edge cases. A front-line LLM usually solves it in ≤ 1 answer.
Hard (✓ encouraged) Domain expert (5 + years) or specialist (e.g., ML ops, low-latency C++,
Kubernetes SRE). Deep API surface, multi-file context, precise performance/security constraints,
or production deployment details. Involves multiple inter-dependent sub-problems, subtle bugs, or
performance/security traps. Even a front-line LLM is likely to miss parts without careful prompting and
chain-of-thought. Extensive explanation is needed for a non-expert reader.
Rule of thumb:
If you would confidently give the prompt to ChatGPT-4-turbo and expect a perfect answer in one
response, mark it Easy (and reject it). Otherwise, craft it to meet Medium or Hard as per the table
above.
Visit the Complexity Section in the instructions, How to craft a Medium / Hard coding-prompt (one-turn
answerable, but challenging). https://docs.google.com/document/d/e/2PACX-
1vTXfzW93QLJIht7ITKMVhIz-L3I4FiTe_cTtecsvQr4mwyYQTra0t8_2eZdTlXbD77WR07O5sIGtPU3/pub
The pair prompt/response must meet the assigned Code Edit - Corrective (e.g. repair, debugging). Here
you have a table with the intent coverage definitions.
Code Generation/Completion Generating code from starter code, examples and/or with user’s
description.
Code Edit - Corrective [hardBreak][hardBreak](e.g. repair, debugging) Fixing errors or bugs in code,
assisting developers in debugging
Code Search Locating relevant code snippets from context and user description
Test Case Generation Generating test case for automatic regression testing or bug reproduction
Code Summarization Generating commit messages from two versions of code. Generating
documentation for a script.
Code Explanation Generating docstrings or comments. Explain the functionality, purpose and I/O
reasoning of a function/query/script/class
Code Review Reviewing code changes and evaluating whether they align with the intended
functionality, project and style requirements, efficiency etc.
Prompt
Turn #1 - Prompt
Read-only
I have this blog HTML code, but when I try to load the images in this section on a mobile, the page
freezes, like loading eternally. Then, I tried to check if the accessibility is ok, testing a screen-reader, and
it is not working. Can you tell me why this is happening and how to correct it?
HTML
Collapse
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Storyteller Blog</title>
<style>
body {
margin: 0;
padding: 1rem;
line-height: 1.6;
header {
text-align: center;
margin-bottom: 2rem;
.icon-bar {
text-align: center;
margin-bottom: 2rem;
.icon-bar a {
margin: 0 .5rem;
color: #333;
font-size: 1.5rem;
text-decoration: none;
.post {
max-width: 700px;
padding-bottom: 2rem;
.post h2 {
margin-bottom: 0.5rem;
.post .meta {
color: #666;
font-size: 0.9rem;
margin-bottom: 1rem;
}
.post img {
width: 100%;
height: auto;
display: block;
margin: 1rem 0;
.share-btn {
background: none;
border: none;
cursor: pointer;
font-size: 1.2rem;
margin-left: 0.5rem;
</style>
</head>
<body>
<header>
<h1>Storyteller Blog</h1>
</header>
<section class="post">
<img
src="forest-small.jpg"
srcset="
forest-small.jpg 400w,
forest-medium.jpg 800w,
forest-large.jpg 1200w
"
</button>
</section>
<section class="post">
<img
src="starship-small.jpg"
srcset="
starship-small.jpg 400w,
starship-medium.jpg 800w,
starship-large.jpg 1200w
"
</button>
</section>
<script>
let fetchCount = 0;
try {
fetchCount++;
if (!article) {
document.getElementById('article-content').innerHTML =
return;
document.getElementById('article-content').innerHTML = `
<h2>${article.title}</h2>
${article.body}
`;
} catch (err) {
document.getElementById('article-content').innerHTML =
</body>
</html>
793 words
Read-only
Before you rate the model responses, please answer if the L0 Application Development fits the prompt.
Yes
No
Before you rate the model responses, please answer if the L1 Web Development fits the prompt.
Yes
No
Before you rate the model responses, please answer if the L2 Front-end Development fits the prompt.
Yes
No
Before you rate the model responses, please answer if the intent coverage Code Edit - Corrective (e.g.
repair, debugging) fits the prompt.
Yes
No
Read-only
Now you will see four (4) model responses that must be evaluated following the rubric related to the L1
domain.
Evaluation Rubrics
Read-only
We define a rating rubric by each domain, with general quality dimensions (correctness, completeness,
clarity & readability) and domain-specific dimensions.
The criteria for “correctness” should also be tailored to the specific intent types, e.g.:
Code generation: Did the code execute correctly, adhere to prompt and produce valid output under all
possible conditions?
Code edit (corrective, perfective, adaptive) : Did it correctly locate the root cause? Was the fix/edit
appropriate? Does the edit improve the code? Does it preserve behavior? Is the reasoning for the
change sound?
Test generation: Were edge cases, failure paths, and expected behavior included? Is test logic clean and
self-contained? Is it thorough?
Code explanation: Was the explanation accurate, conceptual, and helpful for the intended audience?
Rubric:
Correctness All implemented features function correctly across different scenarios and devices.
Debug fixes, edits, or explanations accurately resolve or describe the underlying issue. Application
works mostly as intended with minor issues that do not affect the overall experience. Fixes or
descriptions are mostly correct. Specific interactions or devices face occasional functionality issues. Fixes
are partial or only superficially relevant. Application fails to function correctly on most platforms. Critical
features are non-functional or break often. Fix or explanation is wrong or misleading.
Completeness All UI and functionality requirements are fully met, with comprehensive handling of
edge cases. Responses cover full context, dependencies, and behaviors. Key interactions are present,
but some functionalities or rationale could be expanded. Some key components, cases, or
insights are missing or incomplete. Minimal functionality or insight provided. Large gaps make the
result barely usable.
Clarity & Readability Code is well-structured with clean components and follows naming conventions.
Comments or explanations significantly enhance understanding. Overall clarity is maintained with minor
bloat or repetition. Comments and structure support comprehension. Readability is hindered due to
sparse comments, verbose or inconsistent structure. Code or explanation lacks organization and
clarity. Major refactoring or restructuring is needed.
Visual Appeal / UX Consistent styling and clean design enhance aesthetic appeal and usability. UI
follows modern web design and UX best practices. Design is generally appealing with small
inconsistencies. UX is solid with minor flaws in flow or layout. The UI is functional but lacks polish.
Layout or visual design appears default or underdeveloped. Poor styling choices impair usability.
Cluttered or broken UI makes interaction unpleasant or difficult.
Responsiveness Fully responsive design adapts cleanly to all screen sizes and devices. Includes mobile-
first practices and accessibility considerations. Mostly responsive with minor layout or styling issues on
some devices. Limited responsiveness with usability issues on tablets or mobile. Accessibility is
minimal. Lacks responsiveness, with layout and performance problems across non-desktop
devices.
Framework Proficiency Effective use of advanced framework features (e.g., routing, state management,
component lifecycles). Code reflects idiomatic use of chosen tools (React, Vue, etc.). General
framework usage is correct and efficient. May miss advanced optimizations or idioms. Limited
understanding; avoids advanced features or contains inefficient patterns. Heavy reliance on
boilerplate or plugins due to poor framework understanding. Misuses key patterns or APIs.
Turn #1 - Response
Read-only
Response 1
Okay, let's break down the potential issues and how to fix them.
Problem: You're loading the entire FontAwesome library via JavaScript (all.js). This file is quite large (can
be over 1MB). On a slower mobile connection, downloading, parsing, and executing this JavaScript can
block rendering or make the page feel very sluggish, especially while other resources like images are also
trying to load. This is a very likely culprit for the perceived "eternal loading."
Solution: Use the CSS version of FontAwesome. It's generally much more performant as the browser
handles it more efficiently.
Solution: Ensure your images are properly optimized for the web:
Use appropriate formats (JPEG for photos, PNG for graphics with transparency, WebP for better
compression if browser support is adequate for your audience).
Compress them aggressively (e.g., using tools like TinyPNG/TinyJPG, Squoosh, or image editing
software's "Save for Web" features). A "small" 400px wide image should ideally be well under 100KB,
often much less.
Problem: The loadArticle(1) call on DOMContentLoaded fetches stories.json and then manipulates the
DOM. If stories.json is very large or the article.body content it injects is complex (e.g., contains many
more images or heavy elements), this could also contribute to slowness, though it's less likely to cause
an "eternal freeze" for the initial images unless it's extremely heavy. The main issue here is fetching all
stories just to display one.
Solution (Minor optimization, less related to initial freeze): If stories.json becomes very large, consider
fetching only the specific article data if your backend/API supports it (e.g., fetch('stories/1.json')). For
now, with a small number of stories, it's less critical.
Problem: Your icon bar links look like this: <a href="/category/fantasy" title="Fantasy"><i class="fas fa-
hat-wizard"></i></a>.
The title attribute is often not read by screen readers for links if there's other content (or in this case, no
direct text content).
The <i> tag with a FontAwesome class provides a visual icon, but no text for a screen reader. Screen
readers will likely announce "link" and then silence, or "link, hat wizard icon" if FontAwesome JS does its
magic, which isn't ideal.
Solution: Provide accessible text for each link. You can do this in a few ways:
href="#" is a placeholder that often signifies a link that doesn't navigate in the traditional sense but
triggers JavaScript. Screen readers will announce "link, The Enchanted Forest, hash" or similar.
While functional for mouse users, it's not ideal. If JavaScript fails, the link goes nowhere useful (top of
the page).
Ideally, these links should point to a real URL for the article (e.g., /posts/the-enchanted-forest). Your
loadArticle JavaScript can then event.preventDefault() and load content dynamically for an SPA-like feel,
but it provides a graceful fallback.
If you must keep it as a JS-only trigger, consider using a <button> styled as a link, or add role="button" to
the <a> tag to better signify its action. However, for a post title, a true link is semantically more correct.
For now, let's focus on the immediate screen reader improvements.
Problem: While your share buttons are good, for the icon bar, if you add accessible text (e.g. via aria-
label on the <a>), the icon itself becomes decorative.
Solution: Add aria-hidden="true" to the <i> tags within links if the link itself has an aria-label or screen-
reader-only text. This prevents the screen reader from redundantly announcing the icon if it has a
default accessible name from FontAwesome.
Problem: When loadArticle injects content into <div id="article-content">, screen readers might not
automatically announce this new content to the user.
Solution: Use ARIA live regions. Add aria-live="polite" to the #article-content div so that changes to its
content are announced by screen readers.
HTML
Collapse
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Storyteller Blog</title>
<style>
body {
margin: 0;
padding: 1rem;
line-height: 1.6;
header {
text-align: center;
margin-bottom: 2rem;
.icon-bar {
text-align: center;
margin-bottom: 2rem;
.icon-bar a {
margin: 0 .5rem;
color: #333;
font-size: 1.5rem;
text-decoration: none;
.post {
max-width: 700px;
padding-bottom: 2rem;
.post h2 {
margin-bottom: 0.5rem;
.post .meta {
color: #666;
font-size: 0.9rem;
margin-bottom: 1rem;
.post img {
width: 100%;
display: block;
margin: 1rem 0;
.share-btn {
background: none;
border: none;
cursor: pointer;
font-size: 1.2rem;
margin-left: 0.5rem;
.sr-only {
position: absolute;
width: 1px;
height: 1px;
padding: 0;
margin: -1px;
overflow: hidden;
white-space: nowrap;
border-width: 0;
</style>
</head>
<body>
<header>
<h1>Storyteller Blog</h1>
</header>
</a>
</a>
</a>
</a>
</a>
</div>
<main> <!-- Added main landmark for better page structure -->
<!-- **CORRECTION 4: Consider better link targets, but keeping JS for now. Added aria-labelledby**
-->
<img
src="forest-small.jpg"
srcset="
forest-small.jpg 400w,
forest-medium.jpg 800w,
forest-large.jpg 1200w
"
>
<button class="share-btn" aria-label="Share The Enchanted Forest"> <!-- Made label more specific --
>
</button>
</section>
<img
src="starship-small.jpg"
srcset="
starship-small.jpg 400w,
starship-medium.jpg 800w,
starship-large.jpg 1200w
"
sizes="(max-width:600px) 400px, (max-width:900px) 800px, 1200px"
>
<button class="share-btn" aria-label="Share Tales of the Starship"> <!-- Made label more specific -->
</button>
</section>
</main>
<script>
let fetchCount = 0;
try {
fetchCount++;
if (!article) {
articleContentElement.innerHTML =
return;
articleContentElement.innerHTML = `
<h2>${article.title}</h2>
</article>
`;
// if (firstFocusableElement) firstFocusableElement.focus();
} catch (err) {
articleContentElement.innerHTML =
}
// Load the first article on page load as an example
window.addEventListener('DOMContentLoaded', () => {
// You might not want to auto-load an article if the page is just a list of posts.
});
// Add event listeners to post title links if you use the href + event.preventDefault() pattern
link.addEventListener('click', function(event) {
if (articleIdAttr) {
loadArticle(parseInt(match[1]));
});
});
</script>
</body>
</html>
Added a standard CSS class to visually hide text but keep it accessible to screen readers. This is an
alternative to aria-label for providing text inside links if you prefer.
Added aria-hidden="true" to the <i> tags inside these links. Since the <a> now has an aria-label, the icon
is purely decorative and should be hidden from assistive technologies to avoid redundancy.
Added event.preventDefault() to the onclick handler if you want to maintain the dynamic loading
behavior while having a real fallback URL.
Added id to <h2> and aria-labelledby to the section.post to associate the heading with its section,
improving screen reader navigation.
Added loading="lazy" to your <img> tags. Modern browsers will defer loading these images until they
are close to entering the viewport, which can significantly improve initial page load time and reduce
data usage, especially on mobile.
aria-live="polite": Tells screen readers to announce changes to this region when the user is idle.
aria-atomic="true": Tells screen readers to announce the entire content of the region when it changes,
not just the parts that were updated. This is usually better for article content.
JavaScript Enhancements:
The content injected now uses an <article> tag for better semantics.
Important: If stories.json is not in the same directory as your HTML file when testing locally,
fetch('stories.json') will fail. Ensure the path is correct or use an absolute path if it's served from a
specific public directory (e.g., fetch('/data/stories.json')).
Modified the DOMContentLoaded listener to be more explicit about not auto-loading an article unless
intended.
Added a more robust way to attach event listeners to post title links rather than using inline onclick. This
example still parses the onclick attribute, but using data-article-id attributes on the links would be
cleaner.
CRITICAL: Optimize your actual image files (forest-small.jpg, starship-small.jpg, etc.). Make sure they are
small in file size for their dimensions. This, combined with using FontAwesome CSS, will likely resolve the
freezing.
Verify stories.json path: Make sure fetch('stories.json') can actually find your JSON file. Open your
browser's developer console (usually F12) and check the "Network" tab for any 404 errors when
loadArticle runs. Create a simple stories.json file for testing if you don't have one, e.g.:
JSON
"id": 1,
"title": "The Enchanted Forest (Full Article)",
"body": "<p>This is the full story of the enchanted forest. It was a dark and stormy night...</p><img
src='forest-large.jpg' alt='Large view of the mystical forest path' style='width:100%; height:auto;'/>"
},
"id": 2,
"body": "<p>The starship Voyager drifted silently. Captain Eva Rostova gazed out at the swirling
nebula...</p><img src='starship-large.jpg' alt='Detailed view of starship against nebula'
style='width:100%; height:auto;'/>"
By implementing these changes, your blog should be significantly more performant on mobile and much
more accessible to screen reader users.
Response 2
Page Freezing on Mobile: The page freezing issue on mobile could be due to several factors such as large
image sizes, excessive JavaScript execution, or other resource-intensive tasks. In this code, one potential
cause is the large image sizes used in the srcset attribute of the img tags.
Screen Reader Not Working: The screen reader issue could be related to the accessibility attributes used
in the HTML. In this code, some accessibility attributes are missing or not properly used.
Large Image Sizes: The srcset attribute is used to specify different image sizes for different screen sizes.
However, the image sizes used (400w, 800w, 1200w) might still be too large for mobile devices, causing
the page to freeze.
Missing loading Attribute: The loading attribute can be used to specify whether an image should be
loaded lazily or not. By default, images are loaded eagerly, which can cause performance issues on
mobile devices.
Insufficient Accessibility Attributes: Some accessibility attributes are missing or not properly used. For
example, the alt attribute is used, but the aria-label attribute is used for the share button, which is
correct. However, the aria-hidden attribute is used on the Font Awesome icon inside the share button,
which is correct but not consistently used.
Fixed Solution:
HTML
Collapse
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Storyteller Blog</title>
<style>
body {
margin: 0;
padding: 1rem;
line-height: 1.6;
}
header {
text-align: center;
margin-bottom: 2rem;
.icon-bar {
text-align: center;
margin-bottom: 2rem;
.icon-bar a {
margin: 0 .5rem;
color: #333;
font-size: 1.5rem;
text-decoration: none;
.post {
max-width: 700px;
padding-bottom: 2rem;
.post h2 {
margin-bottom: 0.5rem;
.post .meta {
color: #666;
font-size: 0.9rem;
margin-bottom: 1rem;
}
.post img {
width: 100%;
height: auto;
display: block;
margin: 1rem 0;
.share-btn {
background: none;
border: none;
cursor: pointer;
font-size: 1.2rem;
margin-left: 0.5rem;
</style>
</head>
<body>
<header>
<h1>Storyteller Blog</h1>
</header>
<section class="post">
<!-- Responsive image with srcset & sizes, and lazy loading -->
<img
src="forest-small.jpg"
srcset="
forest-small.jpg 400w,
forest-medium.jpg 800w,
forest-large.jpg 1200w
"
loading="lazy">
</button>
</section>
<section class="post">
<!-- Responsive image with srcset & sizes, and lazy loading -->
<img
src="starship-small.jpg"
srcset="
starship-small.jpg 400w,
starship-medium.jpg 800w,
starship-large.jpg 1200w
"
loading="lazy">
</button>
</section>
<script>
let fetchCount = 0;
try {
if (!article) {
document.getElementById('article-content').innerHTML =
return;
document.getElementById('article-content').innerHTML = `
<h2>${article.title}</h2>
${article.body}
`;
} catch (err) {
document.getElementById('article-content').innerHTML =
}
// Load the first article on page load as an example
</script>
</body>
</html>
Explanation of Changes:
Added loading="lazy" attribute: To lazy load images, which can improve performance on mobile devices.
Added aria-hidden="true" attribute: To Font Awesome icons to hide them from screen readers, as they
are decorative and not essential for understanding the content.
Use a screen reader to test the accessibility of the page. The screen reader should now be able to read
the content correctly.
Response 3
I've identified multiple issues that could be causing your mobile freeze problem and accessibility
challenges:
Problem: The webpage tries to load images that may not exist (forest-small.jpg, etc.), which can cause
infinite loading if the images aren't available.
JAVASCRIPT
images.forEach(img => {
img.onerror = function() {
this.onerror = null;
};
});
});
2. Accessibility Issues
Solution:
Problem: The page tries to load an article when the content loads, but if stories.json doesn't exist or is
incorrect, it can freeze.
JAVASCRIPT
Collapse
try {
// Set a timeout for fetch
clearTimeout(timeoutId);
} catch (err) {
document.getElementById('article-content').innerHTML =
Problem: Loading all high-resolution images at once on mobile can cause performance issues.
HTML
<img
loading="lazy"
src="forest-small.jpg"
srcset="..."
sizes="..."
CSS
a:focus, button:focus {
outline-offset: 2px;
Solution: The aria-hidden="true" attribute should be on all icons, and you should add descriptive text
visually hidden but available to screen readers:
CSS
.sr-only {
position: absolute;
width: 1px;
height: 1px;
padding: 0;
margin: -1px;
overflow: hidden;
white-space: nowrap;
border-width: 0;
}
Then use:
HTML
</a>
These changes should resolve both the mobile freezing issues and improve accessibility for screen
readers.
Response 4
The code you've provided seems to be mostly correct, but there are a few potential issues that could be
causing the problems you're experiencing:
Images Loading Eternally: The issue with images loading eternally on mobile devices could be due to the
srcset and sizes attributes on the img tags. These attributes are used to provide different versions of an
image for different screen sizes, but they can sometimes cause issues if not implemented correctly.
To fix this, you can try removing the srcset and sizes attributes and just use the src attribute to load a
single version of the image. If you still want to use responsive images, make sure that the images are
correctly sized and compressed for web use.
Accessibility Issues: The accessibility issues you're experiencing could be due to the lack of alt text on
some of the icons. For example, the i tags inside the button elements do not have alt text, which could
cause issues for screen readers.
To fix this, you can add alt text to all icons, or use a different method to provide a text description of the
icon. You can also use ARIA attributes to provide more information about the icons and buttons.
Fetch API Issues: The Fetch API is used to load the stories.json file, but there is no error handling if the
file does not exist or cannot be loaded.
To fix this, you can add more robust error handling to the loadArticle function to handle cases where the
file does not exist or cannot be loaded.
HTML
Collapse
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Storyteller Blog</title>
<style>
body {
font-family: 'Roboto', sans-serif;
margin: 0;
padding: 1rem;
line-height: 1.6;
header {
text-align: center;
margin-bottom: 2rem;
.icon-bar {
text-align: center;
margin-bottom: 2rem;
.icon-bar a {
margin: 0 .5rem;
color: #333;
font-size: 1.5rem;
text-decoration: none;
.post {
max-width: 700px;
padding-bottom: 2rem;
.post h2 {
margin-bottom: 0.5rem;
.post .meta {
color: #666;
font-size: 0.9rem;
margin-bottom: 1rem;
.post img {
width: 100%;
height: auto;
display: block;
margin: 1rem 0;
.share-btn {
background: none;
border: none;
cursor: pointer;
font-size: 1.2rem;
margin-left: 0.5rem;
</style>
</head>
<body>
<header>
<h1>Storyteller Blog</h1>
</header>
</div>
<section class="post">
<img
src="forest-small.jpg"
</button>
</section>
<section class="post">
src="starship-small.jpg"
</button>
</section>
<script>
let fetchCount = 0;
try {
fetchCount++;
if (!res.ok) {
if (!article) {
document.getElementById('article-content').innerHTML =
return;
document.getElementById('article-content').innerHTML = `
<h2>${article.title}</h2>
${article.body}
`;
} catch (err) {
document.getElementById('article-content').innerHTML =
</script>
</body>
</html>
I hope this updated code helps to fix the issues you're experiencing. Let me know if you have any further
questions or if there's anything else I can help with!
Continue
Model Failure
Read-only
If NO, please go back to the prompt, edit the prompt, and re-try the four model responses.
Yes
No
In what ways did the model fail? (Select all that apply) *
Correctness
Completeness
Clarity/Readability/Reasoning
Provide a brief justification explaining what the failed model response is. Also, add the dimension that
failed and a brief justification.
14 words
Yes
No
In what ways did the model fail? (Select all that apply) *
Correctness
Completeness
Clarity/Readability/Reasoning
Provide a brief justification explaining what the failed model response is. Also, add the dimension that
failed and a brief justification.
Yes
No
In what ways did the model fail? (Select all that apply) *
Correctness
Completeness
Clarity/Readability/Reasoning
Provide a brief justification explaining what the failed model response is. Also, add the dimension that
failed and a brief justification.
9 words
No
In what ways did the model fail? (Select all that apply) *
Correctness
Completeness
Clarity/Readability/Reasoning
Provide a brief justification explaining what the failed model response is. Also, add the dimension that
failed and a brief justification.
9 words
Select the rating that you consider best fits the rubric dimension.
Read-only
Response 1
Read-only
Decrease
Increase
Okay, let's break down the potential issues and how to fix them.
Problem: You're loading the entire FontAwesome library via JavaScript (all.js). This file is quite large (can
be over 1MB). On a slower mobile connection, downloading, parsing, and executing this JavaScript can
block rendering or make the page feel very sluggish, especially while other resources like images are also
trying to load. This is a very likely culprit for the perceived "eternal loading."
Solution: Use the CSS version of FontAwesome. It's generally much more performant as the browser
handles it more efficiently.
Problem: While you're using srcset and sizes (which is great!), if forest-small.jpg (400w) and starship-
small.jpg (400w) are still large in file size (e.g., hundreds of KBs or >1MB), they can still take a significant
time to load on mobile, contributing to the freeze. The browser will try to download the image specified
by src first as a fallback, and then evaluate srcset.
Solution: Ensure your images are properly optimized for the web:
Use appropriate formats (JPEG for photos, PNG for graphics with transparency, WebP for better
compression if browser support is adequate for your audience).
Compress them aggressively (e.g., using tools like TinyPNG/TinyJPG, Squoosh, or image editing
software's "Save for Web" features). A "small" 400px wide image should ideally be well under 100KB,
often much less.
Problem: The loadArticle(1) call on DOMContentLoaded fetches stories.json and then manipulates the
DOM. If stories.json is very large or the article.body content it injects is complex (e.g., contains many
more images or heavy elements), this could also contribute to slowness, though it's less likely to cause
an "eternal freeze" for the initial images unless it's extremely heavy. The main issue here is fetching all
stories just to display one.
Solution (Minor optimization, less related to initial freeze): If stories.json becomes very large, consider
fetching only the specific article data if your backend/API supports it (e.g., fetch('stories/1.json')). For
now, with a small number of stories, it's less critical.
Problem: Your icon bar links look like this: <a href="/category/fantasy" title="Fantasy"><i class="fas fa-
hat-wizard"></i></a>.
The title attribute is often not read by screen readers for links if there's other content (or in this case, no
direct text content).
The <i> tag with a FontAwesome class provides a visual icon, but no text for a screen reader. Screen
readers will likely announce "link" and then silence, or "link, hat wizard icon" if FontAwesome JS does its
magic, which isn't ideal.
Solution: Provide accessible text for each link. You can do this in a few ways:
href="#" is a placeholder that often signifies a link that doesn't navigate in the traditional sense but
triggers JavaScript. Screen readers will announce "link, The Enchanted Forest, hash" or similar.
While functional for mouse users, it's not ideal. If JavaScript fails, the link goes nowhere useful (top of
the page).
Ideally, these links should point to a real URL for the article (e.g., /posts/the-enchanted-forest). Your
loadArticle JavaScript can then event.preventDefault() and load content dynamically for an SPA-like feel,
but it provides a graceful fallback.
If you must keep it as a JS-only trigger, consider using a <button> styled as a link, or add role="button" to
the <a> tag to better signify its action. However, for a post title, a true link is semantically more correct.
For now, let's focus on the immediate screen reader improvements.
Solution: Add aria-hidden="true" to the <i> tags within links if the link itself has an aria-label or screen-
reader-only text. This prevents the screen reader from redundantly announcing the icon if it has a
default accessible name from FontAwesome.
Problem: When loadArticle injects content into <div id="article-content">, screen readers might not
automatically announce this new content to the user.
Solution: Use ARIA live regions. Add aria-live="polite" to the #article-content div so that changes to its
content are announced by screen readers.
HTML
Collapse
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Storyteller Blog</title>
<style>
body {
margin: 0;
padding: 1rem;
line-height: 1.6;
header {
text-align: center;
margin-bottom: 2rem;
.icon-bar {
text-align: center;
margin-bottom: 2rem;
.icon-bar a {
margin: 0 .5rem;
color: #333;
font-size: 1.5rem;
text-decoration: none;
.post {
max-width: 700px;
padding-bottom: 2rem;
}
.post h2 {
margin-bottom: 0.5rem;
.post .meta {
color: #666;
font-size: 0.9rem;
margin-bottom: 1rem;
.post img {
width: 100%;
display: block;
margin: 1rem 0;
.share-btn {
background: none;
border: none;
cursor: pointer;
font-size: 1.2rem;
margin-left: 0.5rem;
.sr-only {
position: absolute;
width: 1px;
height: 1px;
padding: 0;
margin: -1px;
overflow: hidden;
white-space: nowrap;
border-width: 0;
</style>
</head>
<body>
<header>
<h1>Storyteller Blog</h1>
</header>
</a>
</a>
</a>
</a>
</a>
</div>
<main> <!-- Added main landmark for better page structure -->
<!-- **CORRECTION 4: Consider better link targets, but keeping JS for now. Added aria-labelledby**
-->
<img
src="forest-small.jpg"
srcset="
forest-small.jpg 400w,
forest-medium.jpg 800w,
forest-large.jpg 1200w
"
>
<button class="share-btn" aria-label="Share The Enchanted Forest"> <!-- Made label more specific --
>
</button>
</section>
<section class="post" aria-labelledby="post-title-2">
<img
src="starship-small.jpg"
srcset="
starship-small.jpg 400w,
starship-medium.jpg 800w,
starship-large.jpg 1200w
"
>
<button class="share-btn" aria-label="Share Tales of the Starship"> <!-- Made label more specific -->
</button>
</section>
</main>
let fetchCount = 0;
try {
fetchCount++;
const res = await fetch('stories.json'); // Assuming stories.json is in the same directory or public
path
if (!article) {
articleContentElement.innerHTML =
return;
articleContentElement.innerHTML = `
<h2>${article.title}</h2>
<div class="meta">Published on ${article.published}</div>
</article>
`;
// if (firstFocusableElement) firstFocusableElement.focus();
} catch (err) {
articleContentElement.innerHTML =
window.addEventListener('DOMContentLoaded', () => {
// You might not want to auto-load an article if the page is just a list of posts.
});
// Add event listeners to post title links if you use the href + event.preventDefault() pattern
link.addEventListener('click', function(event) {
loadArticle(parseInt(match[1]));
});
});
</script>
</body>
</html>
Added a standard CSS class to visually hide text but keep it accessible to screen readers. This is an
alternative to aria-label for providing text inside links if you prefer.
Added aria-hidden="true" to the <i> tags inside these links. Since the <a> now has an aria-label, the icon
is purely decorative and should be hidden from assistive technologies to avoid redundancy.
Added event.preventDefault() to the onclick handler if you want to maintain the dynamic loading
behavior while having a real fallback URL.
Added id to <h2> and aria-labelledby to the section.post to associate the heading with its section,
improving screen reader navigation.
Added loading="lazy" to your <img> tags. Modern browsers will defer loading these images until they
are close to entering the viewport, which can significantly improve initial page load time and reduce
data usage, especially on mobile.
aria-live="polite": Tells screen readers to announce changes to this region when the user is idle.
aria-atomic="true": Tells screen readers to announce the entire content of the region when it changes,
not just the parts that were updated. This is usually better for article content.
JavaScript Enhancements:
The content injected now uses an <article> tag for better semantics.
Important: If stories.json is not in the same directory as your HTML file when testing locally,
fetch('stories.json') will fail. Ensure the path is correct or use an absolute path if it's served from a
specific public directory (e.g., fetch('/data/stories.json')).
Modified the DOMContentLoaded listener to be more explicit about not auto-loading an article unless
intended.
Added a more robust way to attach event listeners to post title links rather than using inline onclick. This
example still parses the onclick attribute, but using data-article-id attributes on the links would be
cleaner.
To Do For You (Regarding Freezing):
CRITICAL: Optimize your actual image files (forest-small.jpg, starship-small.jpg, etc.). Make sure they are
small in file size for their dimensions. This, combined with using FontAwesome CSS, will likely resolve the
freezing.
Verify stories.json path: Make sure fetch('stories.json') can actually find your JSON file. Open your
browser's developer console (usually F12) and check the "Network" tab for any 404 errors when
loadArticle runs. Create a simple stories.json file for testing if you don't have one, e.g.:
JSON
"id": 1,
"body": "<p>This is the full story of the enchanted forest. It was a dark and stormy night...</p><img
src='forest-large.jpg' alt='Large view of the mystical forest path' style='width:100%; height:auto;'/>"
},
"id": 2,
"body": "<p>The starship Voyager drifted silently. Captain Eva Rostova gazed out at the swirling
nebula...</p><img src='starship-large.jpg' alt='Detailed view of starship against nebula'
style='width:100%; height:auto;'/>"
By implementing these changes, your blog should be significantly more performant on mobile and much
more accessible to screen reader users.
2,540 words
Correctness
Poor
Fair
Good
Excellent
Correctness Explanation *
28 words
Completeness
Poor
Fair
Good
Excellent
Completeness Explanation *
Please provide a brief description.
Covers major performance and accessibility fixes, but misses finer points (e.g., CSS media-queries for
image breakpoints, fallback for no-JS).
22 words
Poor
Fair
Good
Excellent
Clarity & Readability Explanation *
Explanations are thorough but overly verbose; mixing code and commentary makes it hard to scan key
changes quickly.
18 words
Visual Appeal / UX
N/A
Poor
Fair
Good
Excellent
Introduces lazy-loading and semantics, yet doesn’t address styling polish (e.g., loading indicators,
skeleton UIs) for better perceived performance.
21 words
Responsiveness
2
3
N/A
Poor
Fair
Good
Excellent
Responsiveness Explanation *
Adds loading="lazy" but doesn’t refine CSS breakpoints or optimize image delivery (e.g., use of WebP or
adaptive formats).
21 words
Framework Proficiency
0
1
N/A
Poor
Fair
Good
Excellent
14 words
Response 2
Read-only
Decrease
Increase
Page Freezing on Mobile: The page freezing issue on mobile could be due to several factors such as large
image sizes, excessive JavaScript execution, or other resource-intensive tasks. In this code, one potential
cause is the large image sizes used in the srcset attribute of the img tags.
Screen Reader Not Working: The screen reader issue could be related to the accessibility attributes used
in the HTML. In this code, some accessibility attributes are missing or not properly used.
Large Image Sizes: The srcset attribute is used to specify different image sizes for different screen sizes.
However, the image sizes used (400w, 800w, 1200w) might still be too large for mobile devices, causing
the page to freeze.
Missing loading Attribute: The loading attribute can be used to specify whether an image should be
loaded lazily or not. By default, images are loaded eagerly, which can cause performance issues on
mobile devices.
Insufficient Accessibility Attributes: Some accessibility attributes are missing or not properly used. For
example, the alt attribute is used, but the aria-label attribute is used for the share button, which is
correct. However, the aria-hidden attribute is used on the Font Awesome icon inside the share button,
which is correct but not consistently used.
Fixed Solution:
HTML
Collapse
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<style>
body {
margin: 0;
padding: 1rem;
line-height: 1.6;
header {
text-align: center;
margin-bottom: 2rem;
.icon-bar {
text-align: center;
margin-bottom: 2rem;
.icon-bar a {
margin: 0 .5rem;
color: #333;
font-size: 1.5rem;
text-decoration: none;
.post {
max-width: 700px;
padding-bottom: 2rem;
.post h2 {
margin-bottom: 0.5rem;
.post .meta {
color: #666;
font-size: 0.9rem;
margin-bottom: 1rem;
.post img {
width: 100%;
height: auto;
display: block;
margin: 1rem 0;
.share-btn {
background: none;
border: none;
cursor: pointer;
font-size: 1.2rem;
margin-left: 0.5rem;
}
</style>
</head>
<body>
<header>
<h1>Storyteller Blog</h1>
</header>
</div>
<section class="post">
<!-- Responsive image with srcset & sizes, and lazy loading -->
<img
src="forest-small.jpg"
srcset="
forest-small.jpg 400w,
forest-medium.jpg 800w,
forest-large.jpg 1200w
"
loading="lazy">
</button>
</section>
<section class="post">
<!-- Responsive image with srcset & sizes, and lazy loading -->
<img
src="starship-small.jpg"
srcset="
starship-small.jpg 400w,
starship-medium.jpg 800w,
starship-large.jpg 1200w
"
loading="lazy">
<button class="share-btn" aria-label="Share this story">
</button>
</section>
<script>
let fetchCount = 0;
try {
fetchCount++;
if (!article) {
document.getElementById('article-content').innerHTML =
document.getElementById('article-content').innerHTML = `
<h2>${article.title}</h2>
${article.body}
`;
} catch (err) {
document.getElementById('article-content').innerHTML =
</script>
</body>
</html>
Explanation of Changes:
Added loading="lazy" attribute: To lazy load images, which can improve performance on mobile devices.
Added aria-hidden="true" attribute: To Font Awesome icons to hide them from screen readers, as they
are decorative and not essential for understanding the content.
Use a screen reader to test the accessibility of the page. The screen reader should now be able to read
the content correctly.
1,061 words
Correctness
Poor
Fair
Good
Excellent
Correctness Explanation *
27 words
Completeness
Poor
Fair
Good
Excellent
Completeness Explanation *
Please provide a brief description.
Only two fixes provided, lazy loading and decorative-icon hiding, while neglecting link semantics, ARIA
live regions, icon-bar labeling, script path issues, and general performance improvements.
27 words
Poor
Fair
Good
Excellent
Clarity & Readability Explanation *
Response is brief and to the point, but too shallow; doesn’t clearly structure which issues remain or
prioritize fixes, leaving the reader unsure of next steps.
26 words
Visual Appeal / UX
N/A
Poor
Fair
Good
Excellent
No UX enhancements beyond lazy loading; doesn’t address loading indicators, fallback content, or
styling adjustments to improve perceived performance or polish.
21 words
Responsiveness
3
4
N/A
Poor
Fair
Good
Excellent
Responsiveness Explanation *
Introduces lazy loading (a plus) but ignores proper responsive image formats (WebP), CSS breakpoints,
or compression, so mobile performance gains are minimal.
23 words
Framework Proficiency
0
1
N/A
Poor
Fair
Good
Excellent
Vanilla HTML/JS only, with no use of any framework patterns or best practices (e.g., modular code,
components, state handling).
21 words
Response 3
Read-only
Decrease
Increase
I've identified multiple issues that could be causing your mobile freeze problem and accessibility
challenges:
Problem: The webpage tries to load images that may not exist (forest-small.jpg, etc.), which can cause
infinite loading if the images aren't available.
JAVASCRIPT
document.addEventListener('DOMContentLoaded', function() {
images.forEach(img => {
img.onerror = function() {
this.onerror = null;
};
});
});
2. Accessibility Issues
Solution:
Problem: The page tries to load an article when the content loads, but if stories.json doesn't exist or is
incorrect, it can freeze.
JAVASCRIPT
Collapse
try {
clearTimeout(timeoutId);
document.getElementById('article-content').innerHTML =
Problem: Loading all high-resolution images at once on mobile can cause performance issues.
HTML
<img
loading="lazy"
src="forest-small.jpg"
srcset="..."
sizes="..."
Problem: Interactive elements lack proper focus states for keyboard navigation.
CSS
a:focus, button:focus {
outline-offset: 2px;
}
Solution: The aria-hidden="true" attribute should be on all icons, and you should add descriptive text
visually hidden but available to screen readers:
CSS
.sr-only {
position: absolute;
width: 1px;
height: 1px;
padding: 0;
margin: -1px;
overflow: hidden;
white-space: nowrap;
border-width: 0;
Then use:
HTML
</a>
These changes should resolve both the mobile freezing issues and improve accessibility for screen
readers.
437 words
Correctness
Poor
Fair
Good
Excellent
Correctness Explanation *
Please provide a brief description.
Suggests error handlers and timeouts, but doesn’t address the core render-blocking FontAwesome JS or
unoptimized assets; image-existence fallback won’t fix initial freeze.
25 words
Completeness
Poor
Fair
Good
Excellent
Completeness Explanation *
Covers a few concerns (error handling, lazy loading, focus states), yet omits ARIA live regions, proper
landmark roles, semantic link fixes, and network-path checks.
25 words
Poor
Fair
Good
Excellent
Organized into clear sections with code snippets, its own heading and pairs, making fixes easy to follow,
some sections Missing Image Error Handling and Initial Article Loading remain at a surface ‐level
suggestion and do not explore best practices.
39 words
Visual Appeal / UX
4
N/A
Poor
Fair
Good
Excellent
Adds focus outlines and lazy loading, but doesn't apply user feedback patterns (loading indicators). The
style is not polished.
19 words
Responsiveness
2
3
N/A
Poor
Fair
Good
Excellent
Responsiveness Explanation *
Proposes lazy loading and error timeouts, but does not optimize image delivery (compression, modern
formats, breakpoints).
16 words
Framework Proficiency
0
N/A
Poor
Fair
Good
Excellent
18 words
Response 4
Read-only
Decrease
Increase
The code you've provided seems to be mostly correct, but there are a few potential issues that could be
causing the problems you're experiencing:
Images Loading Eternally: The issue with images loading eternally on mobile devices could be due to the
srcset and sizes attributes on the img tags. These attributes are used to provide different versions of an
image for different screen sizes, but they can sometimes cause issues if not implemented correctly.
To fix this, you can try removing the srcset and sizes attributes and just use the src attribute to load a
single version of the image. If you still want to use responsive images, make sure that the images are
correctly sized and compressed for web use.
Accessibility Issues: The accessibility issues you're experiencing could be due to the lack of alt text on
some of the icons. For example, the i tags inside the button elements do not have alt text, which could
cause issues for screen readers.
To fix this, you can add alt text to all icons, or use a different method to provide a text description of the
icon. You can also use ARIA attributes to provide more information about the icons and buttons.
Fetch API Issues: The Fetch API is used to load the stories.json file, but there is no error handling if the
file does not exist or cannot be loaded.
To fix this, you can add more robust error handling to the loadArticle function to handle cases where the
file does not exist or cannot be loaded.
Here is an updated version of the code with these issues fixed:
HTML
Collapse
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Storyteller Blog</title>
<style>
body {
margin: 0;
padding: 1rem;
line-height: 1.6;
header {
text-align: center;
margin-bottom: 2rem;
.icon-bar {
text-align: center;
margin-bottom: 2rem;
.icon-bar a {
margin: 0 .5rem;
color: #333;
font-size: 1.5rem;
text-decoration: none;
.post {
max-width: 700px;
padding-bottom: 2rem;
.post h2 {
margin-bottom: 0.5rem;
.post .meta {
color: #666;
font-size: 0.9rem;
margin-bottom: 1rem;
.post img {
width: 100%;
height: auto;
display: block;
margin: 1rem 0;
.share-btn {
background: none;
border: none;
cursor: pointer;
font-size: 1.2rem;
margin-left: 0.5rem;
</style>
</head>
<body>
<header>
<h1>Storyteller Blog</h1>
</header>
</div>
<!-- Post: The Enchanted Forest -->
<section class="post">
<img
src="forest-small.jpg"
</button>
</section>
<section class="post">
<img
src="starship-small.jpg"
</section>
<script>
let fetchCount = 0;
try {
fetchCount++;
if (!res.ok) {
if (!article) {
document.getElementById('article-content').innerHTML =
return;
}
document.getElementById('article-content').innerHTML = `
<h2>${article.title}</h2>
${article.body}
`;
} catch (err) {
document.getElementById('article-content').innerHTML =
</script>
</body>
</html>
I hope this updated code helps to fix the issues you're experiencing. Let me know if you have any further
questions or if there's anything else I can help with!
1,003 words
Correctness
1
2
Poor
Fair
Good
Excellent
Correctness Explanation *
Recommends dropping srcset entirely (losing responsive images) and misuses alt on <i> elements;
ignores the real freeze culprits like large JS bundles. Also, the code introduces aria-hidden="true", but
the section Accessibility Issues doesn't explain this change.
38 words
Completeness
1
Poor
Fair
Good
Excellent
Completeness Explanation *
Only three superficial fixes: removing srcset, adding alt to icons, and basic error handling; while omitting
performance, accessibility, and UX best practices (lazy loading, ARIA live regions, semantic
improvements).
29 words
Clarity & Readability
Poor
Fair
Good
Excellent
25 words
Visual Appeal / UX
N/A
Poor
Fair
Good
Excellent
No enhancements to user feedback, loading states, or overall design polish; removing responsive image
support can worsen UX on different devices.
21 words
Responsiveness
4
N/A
Poor
Fair
Good
Excellent
Responsiveness Explanation *
Eliminates responsive image techniques rather than improving them; doesn’t introduce lazy loading,
compression, or adaptive formats, so mobile performance remains poor.
21 words
Framework Proficiency
2
3
N/A
Poor
Fair
Good
Excellent
Still pure HTML/JS with no demonstration of modern framework or modular coding patterns.
14 words
Evaluate the quality of this turn, specifically focusing on quality of work on this turn before you make
any changes. We'll use these results to add additional review on the task as needed and surface
feedback to earlier contributors when relevant!
Read-only
Turn Quality: Prompt
Poor
Adequate
Excellent
Yes
No
Poor
Adequate
Excellent
Poor
Adequate
Excellent
Quality of the Task
Evaluate the quality of the overall task, specifically focusing on quality of work on the task before you
make any changes. We'll use these results to add additional review on the task as needed and surface
feedback to earlier contributors when relevant!
Read-only
Poor
Adequate
Excellent
Does the task follow all instructions provided in tasking instructions or instructional steps?
Yes
No
Write a few sentences of feedback to the contributor who last worked on this task. Try and be as
actionable and specific as possible, this feedback will be directly surfaced to the relevant contributors!
Good task
2 words
Guidelines
CONFIDENTIAL INFORMATION
This document contains confidential and proprietary information intended solely for the use of the
individual or entity to whom it is disclosed.
Prompt + Eval ST
Check for periodic updates and review any changes to stay successful on this task.
Change Log
Date
Summary of changes
Changes requested by (link to reference)
Release date.
Internal.
Internal / Client
Internal
Task Overview
You will receive a prompt and generate four (4) model responses.
Step
What you do
Purpose
Draft (or revise) a prompt that meets all prompt-writing criteria—correct programming-language tag,
domain, full intent coverage, and an explicit complexity level.
Re-read the prompt you just received/wrote and check it against the same criteria (language, domain,
intent coverage, complexity). Confirm that it is acceptable or fix it before proceeding.
Generate the responses. Read each model’s response explanation. Decide whether at least one model
failed to meet the task requirements. Record Yes/No. A model fails if it fails on any single dimension
(either 1 or 2 in a dimension)
D – Per-model response ratings & justifications (repeat for all four models)
2. For each rate you provide, write a justification of at least 80 characters explaining the weakness, or
why you chose your score.
3. If a dimension is truly not applicable (N/A) → mark N/A. You need to bring justification for N/A.
Produces structured, rubric-based feedback and written reasoning for low or non-applicable scores.
Follow these instructions carefully so the dataset contains high-quality prompts and reliable model-
evaluation labels.
Note 1: If the task does not involve code (e.g., pure design advice), simply ignore the rubric rows marked
“(if code is present).”
Please DO NOT plagiarize prompts from any other source. Doing so will result in a posterior verification
of your account.
Your code should be self-contained or contain all context necessary for a reviewer to test the solution
and verify the output.
Prompt Quality
Context
Each task will be frontloaded with certain requirements. For the next few steps, it is your job to make
sure the prompt completely meets all requirements.
This is what you will see in each task, and a quick visual guide for how to think of each dimension.
Requirements
MUST NOT copy from Stack Overflow, public GitHub, or other LLM outputs.
MUST align the prompt with the task’s L0 category, L1 category, and L2 domain.
MUST align the prompt with the task’s specified intent coverage.
MUST be rewritten until all of the above “MUST” conditions are simultaneously satisfied.
MAY use any appropriate libraries or frameworks, including—but not limited to—those listed.
IMPORTANT!
We need realistic coding prompts to test how well different large-language models (LLMs) can handle
programming tasks.
High-quality prompts help researchers understand current limitations and build more effective future
models.
L0
L1
L2 Domain
Programming Languages
Libraries / Framework
Core Foundations
Software Design Patterns & Development, Programming Paradigms, Debugging & Testing, Code
Documentation, Version Control, Algorithm Design & Data Structures
C, C++, C#, Java, JavaScript, TypeScript, Python, Go, Rust, Swift, Ruby, Kotlin, Scala, Haskell, Clojure,
Pascal, Fortran, Lisp, SQL, Prolog, R, HTML/CSS, Julia, PHP
.NET, Node.js, LLVM, GCC, JDK, NumPy, Rust Standard Library, Spring Framework, Django, Angular,
React, Ruby on Rails, Laravel, Boost, JUnit, Jest, Git, Swagger
Python, Ruby, Perl, Bash, Batch, PowerShell, Zsh, Fish, Csh/Tcsh, Dash, VBScript, JScript, JavaScript, Java,
Go, PHP, C#
AWK, Sed, Grep, jq, yq, Expect, NCurses, Zenity, WMIC, PsExec, AutoIt, Windows Script Host, WinSCP,
Ansible, Puppet, Chef, Selenium, Appium, TestNG, Cucumber, Node.js, BeautifulSoup, Puppeteer.
Database Management
Database Implementation & Query Processing, Data Architecture & Storage Systems, Database
Administration
SQL, Python, Java, PL/SQL, C#, T-SQL, C++, Bash, PowerShell, Go, PHP
MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server, MongoDB, MySQL Workbench, SQL Server
Management Studio, pgAdmin, Apache Cassandra, MariaDB, SQLite, AWS RDS, Azure SQL Database,
Elasticsearch, DynamoDB, Hadoop HDFS, Google Cloud SQL, phpMyAdmin, Oracle Enterprise Manager,
Lucidchart
Data Architecture & Infrastructure, Data Engineering & DataOps, Data Analysis & Visualization
Apache Hadoop, Apache Airflow, Apache Spark, Apache Kafka, Pandas, NumPy, Matplotlib, Seaborn,
Tableau, Power BI, Jupyter, Snowflake, Google BigQuery, Databricks, dbt, D3.js, Plotly, Scikit-learn,
Hadoop, Azure Synapse, Kubernetes, Amazon Redshift, Terraform, Jenkins, GitLab CI, Luigi, Bokeh,
ggplot2, Apache NiFi
Natural Language Processing (NLP) & Speech, Computer Vision, Generative AI , Robotics, ML Ops
TensorFlow, PyTorch, Keras, Scikit-learn, Hugging Face, OpenCV, NumPy, Pandas, NLTK, spaCy, XGBoost,
MLflow, Kubeflow, LangChain, ROS, FastAI, Transformers, Docker, Kubernetes, Airflow, TFX, DeepSpeed,
Gensim, OpenAI GPT API, MATLAB
Application Development
Web Development
Front-end Development, Back-end Development, DevOps & Deployment, Web Testing & Performance
Optimization, Architecture & Patterns
Node.js, ASP.NET, React, Angular, Vue.js, Spring, Hibernate, Django, Flask, Ruby on Rails, Sass, Less,
jQuery, AngularJS, Bootstrap
Android App Development, iOS App Development, Cross-platform Development, Mobile App Testing,
Mobile App Performance Tuning
Android SDK, iOS SDK, React Native, Flutter, Xcode, CocoaPods, Xamarin, Ionic, PhoneGap, Appium,
Selenium, JUnit, TestNG, Cucumber
Game Development
Game Systems & Engine Development, Art & Animation, Sound Design & Audio, Platform Deployment &
Testing, Interactive & Immersive Experiences
MonoGame, LibGDX, Pygame, Phaser, Cocos2d-x, and SFML, Unity, Unreal Engine, Godot, CryEngine,
PhysX, Havok, Box2D, OpenCV
.
Intent Coverage
Task/Intent Coverage
Description/Example
Defect/Vulnerability Detection
Discovering security vulnerabilities or potential weaknesses in code that could be exploited, e.g. SQL
injection vulnerability in a web application's login form
Code Generation/Completion
Generating code from starter code, examples and/or with user’s description.
Code Edit
Corrective
Adaptive
Modifying an existing function to serve a new purpose or integrate into a different context while
retaining its core functionality
Code Search
NOTE: You're asking the assistant to find relevant code within the provided context, not to create or
rewrite it.
Code Summarization
Generating commit messages from two versions of code. Generating documentation for a script.
Code Explanation
Generating docstrings or comments. Explain the functionality, purpose and I/O reasoning of a
function/query/script/class
Code Review
Reviewing code changes and evaluating whether they align with the intended functionality, project and
style requirements, efficiency etc.
Complexity
Use this table when deciding whether a prompt is Easy, Medium, or Hard.
Level
Experience Assumed
Upper-undergraduate CS student.
Only basic language syntax, trivial APIs, or one self-contained function. No architectural background
needed.
Single-concept fix or toy script; no tricky edge cases. A front-line LLM usually solves it in ≤ 1 answer.
Medium (✓ acceptable)
Familiarity with standard algorithms or mainstream libraries (e.g., pandas, Spring, Express, SQL joins)
plus several lines of surrounding code or config.
Requires decomposing the task into 2-3 logical steps, handling corner cases, or refactoring non-trivial
code. A front-line LLM may need iterative reasoning/tests, not a one-shot reply.
Hard (✓ encouraged)
Domain expert (5 + years) or specialist (e.g., ML ops, low-latency C++, Kubernetes SRE).
Rule of thumb:
If you would confidently give the prompt to ChatGPT-4-turbo and expect a perfect answer in one
response, then the prompt is Easy - rewrite it! Otherwise, craft it to meet Medium or Hard as per the
table above.
What is a “front-line LLM”?
A front-line LLM is any publicly accessible, state-of-the-art model that engineering teams already rely on
for day-to-day coding help. Examples today include ChatGPT-4-turbo, Chat GPT 4o, Gemini 1.5 Pro,
Claude 3 Opus, etc.
When we write a prompt we need to ask ourselves “Does the prompt feel too easy for testing a front-
line LLM?”, we mean: Would one of these top models likely solve it fully, on the first try, with little-to-no
reasoning or iteration?
If the answer is “yes,” the prompt is too easy for this project.
Use the checklist as guidance; you don’t need every item, but hitting several usually lifts a prompt to
Medium/Hard.
Checklist item
Forces the model to read, cross-reference, and reason—rather than generate from scratch.
“Refactor for O(n log n) time and add a retry mechanism with exponential backoff.”
“Return the patched function plus a 3-line docstring that explains the algorithm.”
REQUIREMENTS:
When a Dimension (ex. Engine Tool Proficiency) is not relevant to the prompt and response, then mark
N/A.
If a specific Dimension relates to code and there is no code available from the response, then mark N/A.
MUST be at least 3 sentences long. This provides enough detail to explain your reasoning thoroughly.
Now you have at least one (1) model failure, you will need to evaluate the four (4) model responses
under a rubric that is aligned with the L1 domain of the task.
To evaluate the response, focus on the quality of the response itself, not on additional work the model
could have done. Be objective and cite concrete evidence (e.g., line numbers, missing edge cases, style
violations, performance estimates).
Database Management
Web Development
Game Development
Dimension
4 - Excellent
3 - Good
2 - Fair
1 - Poor
Correctness
Response is factually and technically correct for the task — e.g., the fix works, the explanation is
accurate, or the code is valid.
Completeness
Fully addresses all task requirements, including edge cases, context, or follow-up considerations.
Clear, well-structured explanation and/or code. Easy to follow, with meaningful naming, good
formatting, and logical flow.
Code is modular and reusable, demonstrating appropriate abstraction and effective use of design
patterns.
Scalability
The solution is efficient, highly optimized for performance and resource usage and would scale well in
production.
Rationale is clear, technically sound, and insightful. Explains why the approach works or what the fix
does.
Dimension
4 - Excellent
3 - Good
2 - Fair
1 - Poor
Correctness
All implemented features function correctly across different scenarios and devices. Debug fixes, edits, or
explanations accurately resolve or describe the underlying issue.
Application works mostly as intended with minor issues that do not affect the overall experience. Fixes
or descriptions are mostly correct.
Specific interactions or devices face occasional functionality issues. Fixes are partial or only superficially
relevant.
Application fails to function correctly on most platforms. Critical features are non-functional or break
often. Fix or explanation is wrong or misleading.
Completeness
All UI and functionality requirements are fully met, with comprehensive handling of edge cases.
Responses cover full context, dependencies, and behaviors.
Key interactions are present, but some functionalities or rationale could be expanded.
Minimal functionality or insight provided. Large gaps make the result barely usable.
Code is well-structured with clean components and follows naming conventions. Comments or
explanations significantly enhance understanding.
Overall clarity is maintained with minor bloat or repetition. Comments and structure support
comprehension.
Readability is hindered due to sparse comments, verbose or inconsistent structure.
Code or explanation lacks organization and clarity. Major refactoring or restructuring is needed.
Visual Appeal / UX
Consistent styling and clean design enhance aesthetic appeal and usability. UI follows modern web
design and UX best practices.
Design is generally appealing with small inconsistencies. UX is solid with minor flaws in flow or layout.
The UI is functional but lacks polish. Layout or visual design appears default or underdeveloped.
Poor styling choices impair =usability. Cluttered or broken UI makes interaction unpleasant or difficult.
Responsiveness
Fully responsive design adapts cleanly to all screen sizes and devices. Includes mobile-first practices and
accessibility considerations.
Lacks responsiveness, with layout and performance problems across non-desktop devices.
Framework Proficiency
Effective use of advanced framework features (e.g., routing, state management, component lifecycles).
Code reflects idiomatic use of chosen tools (React, Vue, etc.).
General framework usage is correct and efficient. May miss advanced optimizations or idioms.
Heavy reliance on boilerplate or plugins due to poor framework understanding. Misuses key patterns or
APIs.
Dimension
4 - Excellent
3 - Good
2 - Fair
1 - Poor
Correctness
Code, analysis, or explanations are statistically valid and technically correct. Computations, logic, and
results align with standard practices.
Mostly correct with minor mistakes that don’t significantly affect outcomes.
Contains notable errors in logic, math, or assumptions that could mislead.
Completeness
Fully addresses the task with necessary components (e.g., preprocessing, validation, edge cases).
Nothing critical is omitted.
Severely incomplete — core steps like loading, cleaning, modeling, or evaluating are absent.
Code and/or explanations are well-structured, commented, and logically organized. Output is
interpretable and easy to follow.
Mostly clear, with small areas of clutter or verbosity. Variable names and comments generally support
understanding.
Statistical Soundness
Uses appropriate statistical methods, distributions, or metrics. Shows awareness of biases, assumptions,
and modeling limitations.
Demonstrates robust data loading, cleaning, transformation, and validation practices. Handles edge
cases and missing values gracefully.
Poor or no data preprocessing. Code breaks or misbehaves with dirty, missing, or unexpected data.
(if applicable)
Most visualizations are effective but could be refined. Useful plots and tables included despite minor
formatting issues.
Visuals are present but lack essential details. Plots and tables are unclear or messy, hindering
understanding.
Dimension
4 - Excellent
3 - Good
2 - Fair
1 - Poor
Correctness
Response is technically accurate with no factual errors. Aligns with current ML practices, theories, and
tool behavior.
Mostly accurate with minor oversights or simplifications that don’t affect core validity.
Fully addresses all aspects of the user's question, including assumptions, caveats, and relevant details
(e.g., data shape, model compatibility, evaluation logic).
Covers core points but omits some secondary context or best practices.
Superficial or partially complete. Misses key components or steps needed to apply the answer.
Only a fragmentary or generic response. Ignores the actual question or task structure.
Clear and logically structured explanation. Progresses step-by-step, justifies recommendations, and
communicates assumptions.
Mostly clear with minor jumps in logic or imprecise language. Reasoning is understandable.
Conceptual Soundness
Demonstrates understanding of underlying ML/AI theory (e.g., optimization, generalization, model bias,
overfitting). Accurately links theory to practice.
Misrepresents core AI/ML principles. Shows poor grasp of theory or its practical relevance.
Practical Relevance
Response provides actionable advice that maps to real-world tools (e.g., PyTorch, TensorFlow, Hugging
Face) or typical workflows (e.g., training, evaluation, deployment).
Provides generally useful guidance but lacks direct tie-in to practical tools or scenarios.
Evaluation Awareness
Demonstrates appropriate selection and interpretation of metrics (e.g., accuracy vs. F1, AUC, loss
curves). Discusses evaluation trade-offs clearly.
Dimension
4 - Excellent
3 - Good
2 - Fair
1 - Poor
Correctness
SQL syntax, logic, and DBMS-specific behaviors are correct. Joins, conditions, and outputs are valid and
accurate.
Severely flawed. Invalid SQL, broken assumptions, or completely incorrect database logic.
Completeness
Fully answers the query or task. Handles edge cases, covers constraints, indexes, or joins as needed.
Covers main components of the task. Some advanced handling or edge cases may be omitted.
Partial answer — key clauses, constraints, or considerations are missing.
Clear structure and explanation of queries, schema, or DB principles. Easy to follow and educational.
Somewhat confusing or shallow in explanation. Relies too much on code without context.
Poorly explained, confusing, or misleading. Offers no insight into DB concepts or query logic.
Suggests efficient queries with awareness of indexing, filtering, JOIN strategies, and query plans.
Mostly efficient with minor missed opportunities (e.g., unindexed filters, redundant clauses).
Inefficient or dangerous (e.g., full table scans, Cartesian joins). Ignores performance entirely.
Poor design — violates normalization, uses ambiguous types or keys, lacks integrity.
Context Awareness
Adapts answer to DBMS (e.g., MySQL vs. PostgreSQL), user goals, data size, and usage pattern. Mentions
relevant tradeoffs.
Completely ignores user’s context, DBMS, or use case. Offers irrelevant or risky advice.
Game Development
Dimension
4 - Excellent
3 - Good
2 - Fair
1 - Poor
Correctness
Code logic and explanations are fully aligned with game engine behavior and programming language
specifics (e.g., Unity C#, Unreal Blueprints). No technical flaws.
Mostly correct with minor mistakes or outdated syntax that don’t block functionality.
Fundamentally flawed. Shows lack of understanding of game engine, physics systems, or scripting.
Completeness
Fully addresses the task. Covers input, output, timing, interactions, and integration (e.g., character
movement, animation triggers).
Covers the main logic but skips secondary systems or edge handling (e.g., state resets, collisions).
Partially complete or too abstract. Key components like input handling or UI feedback are missing.
Clear, modular explanation of game logic and design decisions. Makes assumptions explicit and explains
trade-offs.
Generally understandable, but some steps or reasoning may be implicit or oversimplified.
Difficult to follow or jumps between ideas. Doesn’t clearly explain how the game logic fits together.
Engine/Tool Proficiency
Uses game engine features effectively (e.g., Unity Events, Unreal Components). Understands scene
setup, asset use, prefabs/blueprints.
Uses appropriate tools with some missed optimization or slightly verbose solutions.
Limited grasp of built-in engine systems. Relies too heavily on manual or brute-force approaches.
Gameplay Awareness
Demonstrates understanding of game design principles (e.g., feedback loops, pacing, affordances).
Suggestions enhance player experience.
Mechanics are functional but lack polish or feedback. User experience feels secondary.
No regard for game feel or player experience. May produce jarring or unplayable results.
Performance Consideration
Aware of performance best practices (e.g., update loops, physics load, draw calls, garbage collection).
Avoids unnecessary computation.
Dimension
4 - Excellent
3 - Good
2 - Fair
1 - Poor
Correctness
Script or automation flow is technically sound, logically coherent, and compatible with the intended
runtime environment (e.g., Bash, Windows, GitHub Actions).
Mostly correct with only minor platform-specific oversights or slight inefficiencies.
Contains noticeable logic or syntax issues; may not work without user intervention.
Completeness
Fully addresses task requirements: setup, dependencies, error handling, logging, and platform nuances.
Core functionality is covered, but missing optional flags, error checks, or system integration steps.
Barebones response with major omissions. Fails to fulfill most of the intended automation task.
Response uses well-structured, readable, and reusable scripting practices (e.g., named functions,
comments, parameterization).
Generally clear with slight verbosity or tight coupling. Mostly easy to understand and reuse.
Rudimentary or verbose usage of tools. Might rely on outdated patterns or unnecessary complexity.
Misuses tools or uses incorrect conventions (e.g., bad quoting, insecure paths, wrong YAML keys).
Includes solid input validation, exit codes, logging, retries, and cleanup logic. Suitable for production use.
Basic error handling present. May miss edge cases or assume ideal inputs.
Minimal validation or error handling. Could break under common real-world conditions.
Correctly considers OS, shell, file system structure, environment variables, and external tools (e.g., cron,
systemd, pip).
IMPORTANT!
To maintain fairness and scoring consistency across all responses, please refer to and complete the
following checklist: Link to Checklist Document
Appendix
L0
L1
L2
Definition
Core Foundations
Essential principles of software engineering, scripting, and automation focusing on building reliable,
maintainable software systems and streamlining tasks through automated scripts to meet user and
business needs.
General Software Engineering
The discipline of designing, developing, testing, documenting, and maintaining software systems by
applying software engineering principles, programming methodologies with the goal of building reliable,
scalable, and maintainable software solutions that meet user and business needs.
The process of identifying, analyzing, and applying reusable solutions including creational, structural,
and behavioral patterns to common software design problems with the goal of improving code
organization, modularity, and maintainability. The process of writing, organizing, and refining source
code using programming languages, tools, and techniques to create functional software components
and applications.
Programming Paradigms
The set of methodologies used to organize and structure software solutions based on distinct styles and
principles of programming, such as imperative, declarative, functional, procedural programming, etc.
The process of identifying, analyzing, and fixing errors, bugs, and defects in software systems through
various testing methods including unit, integration, and system testing to ensure correctness,
robustness, reliability, and optimal performance of the system.
Code Documentation
The process of writing and maintaining descriptive and structured text, comments, and explanations of
the code, its functionality, structure, and usage with the goal of facilitating understanding, collaboration,
and future maintenance.
Version Control
The process of managing changes to source code and related files over time using systems that record
revisions, track history of modification, and facilitate collaboration among multiple developers and
collaborators.
The process of designing, creating, and managing data structures and algorithms for solving
computational problems and structuring data in a way that optimizes access, storage and manipulation
by software programs.
The process of writing scripts to automate tasks, including shell scripting, batch scripting, automation
scripting, and web scripting.
Shell Scripting
The process of writing scripts designed to run by Unix-like operating systems, including bash, zsh, and
fish.
Batch Scripting
The process of writing scripts designed to be run in DOS, OS/2, and Windows, including batch files and
PowerShell.
Web Scripting
The process of writing scripts to automate web-related tasks such as web scraping, browser automation,
and form submission.
Data Management
& Intelligence
The comprehensive discipline of organizing, maintaining, and analyzing data through structured systems
and intelligent algorithms to extract actionable insights that drive informed decision-making and create
strategic value.
Database Management
The process of designing, implementing, storing, and maintaining databases to ensure efficient data
organization, retrieval, and integrity, including defining structures, optimizing queries, managing storage
models, and performing administrative tasks such as configuration and recovery.
The process of implementing a database system based on the design specifications and optimizing query
execution to ensure efficient data retrieval, manipulation, and management through effective indexing,
sorting and joining techniques.
The process of conceptualizing, designing, and modeling the structure of a database and creating logical
and physical schemas that ensure efficient data storage, retrieval, and integrity. This includes defining
entities, relationships, and normalization principles. It also includes defining and managing physical
storage structures within a database system, including row-based, column-based, object-oriented, and
NoSQL storage models to optimize data access, storage efficiency, and scalability.
Database Administration
The process of managing and maintaining databases, including configuration, backup and recovery, and
performance optimization.
The discipline of designing, building, maintaining, and leveraging data systems to transform raw data
into actionable insights, encompassing the full data lifecycle from infrastructure design and pipeline
development to analysis and end visual representation, ensuring organizations can effectively utilize
data as a strategic asset for informed decision-making.
The process of designing, developing, and managing the structures and systems that store, process, and
govern data throughout its lifecycle, including databases, data lakes, data warehouses, and cloud
infrastructure to ensure scalability, reliability, and security.
The practice of building and maintaining reliable, scalable data pipelines and applying DevOps principles
to data workflows for continuous integration, delivery, monitoring, and automation.
The process of exploring, visualizing, and interpreting structured and unstructured data to discover
patterns, generate insights, and support data-driven decision making. This includes presenting complex
data in a clear and meaningful way using visualizations like charts, graphs, and maps to enhance
understanding and communication of key findings.
The process of developing algorithms and models that can understand, interpret, and generate human
language, including text and speech.
Computer Vision
The process of developing algorithms and models that can interpret and understand visual data from
images and videos.
Generative AI
The process of developing AI models that can create new content, including text, images, music, and
videos, by learning patterns from existing data.
Robotics
The process of designing, developing, and operating robotic systems that interact with the physical
world, utilizing sensors, actuators, and control systems alongside artificial intelligence and machine
learning to perform tasks autonomously.
ML Ops
The process of automating and managing the lifecycle of machine learning models in production,
encompassing deployment, monitoring, versioning, scalability, and collaboration across data science and
engineering teams.
Application
Development
The overarching discipline encompasses the design, creation, testing, and deployment of software
applications across various platforms and purposes, utilizing diverse programming languages,
frameworks, and tools to meet specific user needs and functionalities and can be deployed across
different devices and environments.
Web Development
The process of designing, building, and maintaining websites, web applications, and web services using
programming languages such as HTML, CSS, JavaScript, and server-side languages like PHP, Python, and
Ruby. This includes front-end development (client-side), back-end development (server-side), devOps,
deployment, testing, tuning, architecture, as well as web frameworks, libraries, and databases.
Front-end Development
The process of building the user interface and user experience of a website or web application using
programming languages like HTML, CSS, and JavaScript, as well as frameworks like React and Angular.
Back-end Development
The process of building the server-side logic and database integration of a website or web application
using programming languages like Java, Python, and Ruby, as well as frameworks like Node.js and
Django.
The process of verifying and validating the functionality, usability, performance, and security of different
components of web applications and their compliance with best practices, ensuring reliability and user
satisfaction. This also includes the practice of optimizing web applications to improve load times,
responsiveness, and resource efficiency for better user experience and search engine optimization.
Design principles and structural approaches that organize code and systems to enhance maintainability,
scalability, and reliability of web applications.
The process of designing, building, and deploying mobile applications for Android and iOS devices, using
programming languages like Java, Swift, and Kotlin. This includes native app development, cross-
platform development using frameworks like React Native and Flutter, and hybrid app development, as
well as mobile app testing, debugging, and tuning.
The process of building mobile applications for Android devices using programming languages like Java
and Kotlin, as well as frameworks like Android Studio.
The process of building mobile applications for iOS devices using programming languages like Swift and
Objective-C, as well as frameworks like Xcode.
Cross-platform Development
The process of building mobile applications that can run on multiple platforms, including Android and
iOS, using frameworks like React Native and Flutter.
The process of testing and debugging mobile applications to ensure they are stable, secure, and meet
user requirements.
The process of optimizing mobile applications to improve speed, responsiveness, resource efficiency,
and battery usage, identifying bottlenecks, reducing memory leaks, optimizing network calls, and
enhancing UI rendering to ensure smooth user experiences across devices and OS versions.
Game Development
The process of designing, building, and testing video games using programming languages like C++, Java,
and Python, as well as game engines like Unity and Unreal Engine.
The process of conceptualizing and designing gameplay mechanics, rules, storylines, characters, and
user experiences, alongside developing the core systems that power games. This includes building and
refining game engines to handle rendering, animation, and physics, as well as integrating artificial
intelligence and physics simulations to enable realistic behaviors and dynamic interactions within virtual
environments.
The process of creating, editing, and integrating sound effects, background music, voiceovers, and
ambient audio to enhance the atmosphere and immersion of a game.
The process of building, deploying, testing, and debugging games across multiple platforms such as PC,
consoles, mobile devices, ensuring compatibility, performance, and compliance.
The process of designing and developing immersive applications that blend real and virtual
environments using technologies such as Virtual Reality, Augmented Reality, and Mixed Reality, focusing
on user interaction and sensory engagement.
Aspect
What it means
Goal
Identify security flaws, logic errors, or mis-configurations that could be exploited.
Typical inputs
Typical outputs
• A list of vulnerable lines or blocks, each with an explanation (“unsanitised SQL parameter → SQL
injection”).
Focus areas
Authentication / authorisation gaps, input validation, injection, XSS, insecure crypto, race conditions,
hard-coded secrets, dependency CVEs, overly permissive IAM policies.
Granularity
From single-line nits (e.g. eval(user_input)) to architectural issues (no rate-limiting on login endpoint).
Audience
Examples
Analyse the attached deploy_cluster.sh and its companion Ansible playbook. Identify command-injection
or privilege-escalation risks when run by junior ops staff. For each risk, explain an exploitation scenario
and propose a mitigated snippet (e.g., -- to end ssh options, set -euo pipefail, quoting). Prioritise findings
with High/Med/Low severity.
deploy_cluster.sh
```shell
#!/usr/bin/env bash
# ------------------------------------------------------------
# ------------------------------------------------------------
src=$2
dest=$3
```
playbook.yml
```yaml
---
- hosts: app
vars:
tasks:
become: yes
copy:
become: yes
- name: Restart service # <-- runs as root, no become_user
service:
name: myapp
state: restarted
become: yes
```
Aspect
What it means
Goal
Produce new code that satisfies the user’s description or finishes partially written code.
Typical inputs
Typical outputs
Focus areas
Correctness, idiomatic style for the chosen language, minimal external dependencies unless requested,
compliance with provided patterns or frameworks.
Granularity
Anything from a single line (“complete this list comprehension”) to a multi-file scaffold (“generate CRUD
API with tests”).
Audience
Developers who want rapid prototyping, boilerplate avoidance, or inspiration for tricky logic; educators
demonstrating patterns.
Examples
in dag.rs.
a simple DAG,
detection of a cycle.
Keep time complexity ≤ O(|V| + |E|) and memory complexity ≤ O(|V| + |E|).
```rust
//! dag.rs
//! -------------------------------------------
//
//! -------------------------------------------
/// Error returned when the input graph contains at least one cycle.
///
/// # Arguments
/// * `graph` - adjacency list where each key maps to the list of its outgoing-edge targets.
///
/// # Returns
///
/// # Complexity
/// Must run in **O(|V| + |E|)** time and use **O(|V| + |E|)** additional memory.
pub fn topological_sort(
//
// 1. Compute indegrees.
//
//
/* ---------------------------------------------------------------- */
#[cfg(test)]
mod tests {
use super::*;
use std::iter::FromIterator;
HashMap::from_iter(edges.iter().map(|(src, dsts)| {
}))
#[test]
fn simple_dag() {
let _ = g;
// assert_eq!(order.len(), 3);
#[test]
fn multiple_valid_orders() {
// TODO
#[test]
fn detects_cycle() {
// TODO
}
```
Aspect
What it means
Goal
Diagnose and fix errors, crashes, failing tests, incorrect results, or mis-handled edge cases.
Typical inputs
Typical outputs
• Patched code that compiles/runs and passes provided tests.
Focus areas
Logic mistakes, off-by-one errors, null-pointer access, wrong API usage, incorrect type conversions, race
conditions.
Granularity
From a single line change to a focused patch touching a few related files.
Audience
Example
The quantile_normalize function intermittently returns NaNs when run in parallel on large DataFrames.
Debug and patch it so the result exactly matches the single-threaded reference output for all inputs ≤ 1
GB.
"""
quantile_normalize.py
"""
import multiprocessing as mp
import numpy as np
import pandas as pd
ranked = chunk.stack().groupby(chunk.rank(method="first").stack().astype(int)).mean()
return chunk.rank(method="min").stack().astype(int).map(ranked).unstack()
def quantile_normalize(
) -> pd.DataFrame:
"""
----------
df : pd.DataFrame
n_workers : int
chunk_size : int
Returns
-------
pd.DataFrame
"""
return _qn_single(df)
normalised = pd.concat(partials)
return normalised.loc[df.index]
```
Aspect
What it means
Goal
Typical inputs
Typical outputs
Focus areas
Big-O improvements, removal of dead code, extracting reusable modules, applying design patterns,
conforming to lint rules.
Granularity
Example
usp_generate_invoice intermittently throws ORA-01403: no data found when called from the nightly
job. Debug and patch the procedure so it handles missing customer rows gracefully and logs anomalies
to invoice_errors. Provide a one-paragraph root-cause explanation and a script that recreates the failing
case.
```sql
p_customer_id IN customers.customer_id%TYPE,
p_order_id IN orders.order_id%TYPE,
) AS
l_customer customers%ROWTYPE;
l_order orders%ROWTYPE;
BEGIN
----------------------------------------------------------------
----------------------------------------------------------------
SELECT *
INTO l_customer
FROM customers
----------------------------------------------------------------
SELECT *
INTO l_order
FROM orders
----------------------------------------------------------------
-- 3. Create invoice
----------------------------------------------------------------
invoice_id,
customer_id,
order_id,
inv_date,
amount
VALUES (
invoices_seq.NEXTVAL,
l_customer.customer_id,
l_order.order_id,
SYSDATE,
l_order.total_price
END usp_generate_invoice;
```
Aspect
What it means
Goal
Alter existing functionality to satisfy new requirements or integrate with a different environment/API
while preserving core logic.
Typical inputs
• Legacy function plus new specification (“must accept JSON instead of XML”).
Typical outputs
• Modified code compatible with the new context.
• Adapter/wrapper layers.
Focus areas
Granularity
From a small parameter-list tweak to a full module rewrite that plugs into a new framework.
Audience
Example
report_plot.py rebuilds the same 80 k-row DataFrame three times. Refactor to compute once, cache in
memory, and rewrite the plotting section to use seaborn’s lineplot, reducing runtime from 25 s to < 5 s.
Keep CLI behaviour intact; update docstring and add a benchmark in README.md
```python
#!/usr/bin/env python3
import argparse
import time
import pandas as pd
df = _load_raw(path)
df["hour"] = df["timestamp"].dt.floor("h")
return (
df.groupby("hour")["latency_ms"].mean().reset_index(name="latency_ms")
df = _load_raw(path)
df["hour"] = df["timestamp"].dt.floor("h")
return (
df.groupby("hour")["request_id"].count().reset_index(name="requests")
df = _load_raw(path)
df["hour"] = df["timestamp"].dt.floor("h")
return (
df[df["is_error"] == 1]
.groupby("hour")["is_error"]
.count()
.reset_index(name="errors")
# ----------------------------------------------------------------------
# ----------------------------------------------------------------------
ax.set_ylabel("Latency (ms)")
ax.legend(loc="upper left")
ax.set_ylabel("Requests")
ax.legend(loc="upper left")
ax.set_ylabel("Errors")
ax.legend(loc="upper left")
# ----------------------------------------------------------------------
# ----------------------------------------------------------------------
t0 = time.time()
args = parser.parse_args()
latency_df = _build_latency_df(args.input)
requests_df = _build_requests_df(args.input)
errors_df = _build_errors_df(args.input)
_plot_latency(axes[0], latency_df)
_plot_requests(axes[1], requests_df)
_plot_errors(axes[2], errors_df)
axes[2].set_xlabel("Hour")
plt.tight_layout()
fig.savefig(args.output, dpi=150)
if __name__ == "__main__":
main()
```
Code Search
Aspect
What it means
Goal
Identify and return the exact snippet(s) inside a supplied codebase that match a description, keyword,
signature, or behavior—no new code written.
Typical inputs
• Natural-language query (“find the function that hashes passwords with bcrypt”).
Typical outputs
Function names, class definitions, API endpoints, SQL queries, regex patterns—anything already present
that fulfills the search intent.
Granularity
Audience
Developers navigating large repositories, reviewers checking for duplicates, auditors verifying
compliance.
Example
(file excerpts provided; each block begins with the file path)
```swift
// MyFinance/Services/KeychainService.swift
import Security
@discardableResult
// write
@discardableResult
// write (update)
```
```swift
// MyFinance/Onboarding/AuthManager.swift
import Security
class AuthManager {
```
```swift
// MyFinance/Utilities/TokenStorage.swift
import Security
struct TokenStorage {
// direct write
```
Locate every function that writes to the Keychain.
File path
Function signature
Five lines before and after the write call (or the closest you can).
“Write tests so the code stays fixed (or the bug is reproducible).”
Aspect
What it means
Goal
Typical inputs
Typical outputs
• Test files/functions with assertions.
Focus areas
Boundary values, error handling, regression paths, performance thresholds (if measurable), security
edge-cases.
Granularity
Audience
Example
Generate GoogleTest unit tests for AStarPathfinder::findPath to cover: unreachable target, diagonal
movement allowed, tile with zero cost, and map size > 10 000 nodes. Provide a fixture that loads
sample_map.json and assert that result.TotalCost matches a pre-computed baseline you derive inside
the test.
Return only the new test source file (AStarPathfinderTests.cpp). Do not modify the stub implementation.
```cpp
// Source/MyGame/AI/AStarPathfinder.h
#pragma once
#include "CoreMinimal.h"
class AStarPathfinder
public:
struct FResult
int32 TotalCost = 0;
};
bool bAllowDiagonal);
private:
int32 Width = 0;
int32 Height = 0;
TArray<TArray<FTile>> Grid;
};
```
```cpp
// Source/MyGame/AI/AStarPathfinder.cpp
#include "AStarPathfinder.h"
AStarPathfinder::AStarPathfinder(
: Grid(InGrid), bDiag(bAllowDiagonal)
Height = Grid.Num();
AStarPathfinder::FResult
return {};
```
```json
// Content/Sample/sample_map.json
"width": 5,
"height": 5,
"grid": [
[1, 1, 1, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 5, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 1, 1, 1]
```
Code Summarization
Aspect
What it means
Goal
Condense one or more changes or a whole file into a short, human-readable narrative.
Typical inputs
• A one-line commit title plus a multi-line body (conventional commit style, GitHub default, etc.).
• A “changelog” bullet.
• An executive summary paragraph at the top of a script (“This tool crawls… then exports…”).
Focus areas
What changed and why it matters at a bird’s-eye level—features added, bugs fixed, refactors,
performance gains.
Granularity
File-level or repo-level. Rarely discusses individual lines unless they illustrate a major change.
Audience
Reviewers, future maintainers, project managers—anyone who wants to grasp scope & intent quickly.
Examples
ML / AI Foundations — MATLAB
Read cnn_training_script.m, augment.m, and train_options.json. Produce a 5-line summary for the
project wiki: dataset used, augmentation steps, model architecture, training hyper-params, and
expected accuracy. Begin with a one-sentence overview, then four bullets.
cnn_training_script.m
```matlab
% cnn_training_script.m
% -------------------------------------------------------------
% from train_options.json.
% -------------------------------------------------------------
clear; clc;
%% 1) Dataset
"LabelSource", "foldernames");
"IncludeInfo", true);
layers = [
imageInputLayer([64 64 3],"Normalization","zerocenter")
convolution2dLayer(3,16,"Padding","same")
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,"Stride",2)
convolution2dLayer(3,32,"Padding","same")
batchNormalizationLayer
reluLayer
fullyConnectedLayer(numel(unique(imds.Labels)))
softmaxLayer
classificationLayer
];
optsJson = jsondecode(fileread("train_options.json"));
"Verbose", false);
%% 5) Train
save("trained_cnn.mat","net");
```
augment.m
```matlab
% augment.m
% -----------------------------------------------------------
img = dataIn{1};
img = fliplr(img);
end
```
train_options.json
```json
"optimizer": "adam",
"learningRate": 0.0005,
"batchSize": 64,
"epochs": 25,
"expectedAccuracy": 0.92
```
Code Explanation
Aspect
What it means
Goal
Provide a line-by-line or section-by-section walkthrough of existing code so the reader understands
logic, data flow, I/O, edge-cases.
Typical inputs
Typical outputs
• A prose explanation (“Step 1 fetches data… Step 2 filters… Step 3 returns a DataFrame”).
Focus areas
How it works, why each step exists, constraints, corner-cases, Big-O, side-effects.
Granularity
Audience
Developers tasked with maintenance, debugging, or extension; juniors learning the codebase.
Examples
```powershell
# Rotate-IIS-Logs.ps1
param (
[string]$LogDir = "C:\inetpub\logs\LogFiles\W3SVC1",
[switch]$ZipArchives
$now = Get-Date
$cutoff = $now.AddDays(-$DaysToKeep)
if ($ZipArchives) {
$zipPath = "$dest.zip"
Remove-Item $file.FullName
} else {
}
}
} | Remove-Item -Force
```
Code Review
“Judge this code—does it do the right thing and do it the right way?”
Aspect
What it means
Goal
Evaluate a change set or new file against correctness, style guidelines, architecture, performance,
security, and maintainability.
Typical inputs
Typical outputs
• Inline review comments (“nit: rename idx → index”).
Focus
Find problems and offer improvements.- Logic errors & failing edge-cases- Consistency with project
conventions (naming, formatting, tests)- Security flaws / resource leaks- Performance hot-spots-
Readability & documentation quality
Granularity
Line-level for nits, block-level for algorithmic concerns, file-level for architecture.
Audience
Authors of the PR, other reviewers, CI gatekeepers; anyone deciding whether to merge the change.
Examples
```go
// handlers/user.go
package handlers
import (
"encoding/json"
"net/http"
"time"
ID string `json:"id"`
start := time.Now()
id := r.URL.Query().Get("id")
if id == "" {
return
}
u := userStore[id]
if u == nil {
http.NotFound(w, r)
return
if err != nil {
return
lat := time.Since(start)
w.Header().Set("Content-Type", "application/json")
w.Write(js)
userStore[u.ID] = u
```
```go
// middleware/auth.go
package middleware
import (
"context"
"net/http"
"strings"
auth := r.Header.Get("Authorization")
if auth == "" {
return
return
}
if !ok {
return
next.ServeHTTP(w, r.WithContext(ctx))
})
// TODO: call remote auth service with timeout; dummy impl for now
if tok == "good" {
```