Classification After Evaluation

Slide Idea

Once outputs have been evaluated, failures should be classified by type to determine appropriate interventions: ambiguity failures (where specifications were insufficiently controlled) represent design failures requiring better constraint specification; model limitation failures (structural ceilings prompting cannot overcome) indicate where further iteration becomes inefficient; and ethical boundary failures (where continued generation requires justification or must halt) recognize that evaluation involves judgment about whether generating additional outputs is appropriate, not merely whether current outputs succeed technically.

Key Concepts & Definitions

Design Failures vs. System Failures

Design failures occur when work does not satisfy requirements because those requirements were inadequately specified, poorly framed, or insufficiently constrained—the problem lies in decisions made during planning and specification, not in execution or system capabilities. System failures occur when clearly specified requirements cannot be satisfied due to technical limitations, capability gaps, or execution errors despite adequate specification. This distinction proves critical for remediation: design failures require revisiting and improving specifications, constraints, and problem framing; system failures require different approaches (alternative systems, workarounds, accepting limitations, awaiting capability improvements). Research on design thinking emphasizes that most real-world "failures" trace to inadequate problem framing rather than to execution inadequacy—the wrong problem was addressed, or the right problem was framed poorly, leading to solutions that technically succeed but don't address actual needs. The slide's note reinforces this research finding: design failures often stem from underspecified or poorly framed constraints, not from system inability to execute clear specifications.

Source: Schön, D. A. (1983). The reflective practitioner: How professionals think in action. Basic Books.

Structural Ceilings and Capability Limits

Structural ceilings are fundamental capability boundaries of current systems or approaches beyond which additional effort, refinement, or iteration cannot produce improvements—the limitation is architectural or systemic rather than a matter of better prompting, more attempts, or refined specifications. Recognizing structural ceilings prevents wasteful iteration attempting to achieve what current capabilities cannot deliver. In generative AI contexts, structural ceilings include: anatomical accuracy limitations for complex poses, consistent character identity across generations, precise spatial reasoning, accurate text rendering, reliable performance on underrepresented categories in training data. These aren't overcome through better prompting—they reflect current model architecture and training limitations. Research on AI system capabilities emphasizes distinguishing what systems can be made to do through better specification from what they fundamentally cannot currently do regardless of specification quality. The slide identifies model limitations as "structural ceiling we cannot prompt past"—acknowledging this boundary determines when to stop iterating and consider alternative approaches.

Source: Mitchell, M., et al. (2019). Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 220-229).

Ethical Boundaries in Generative Systems

Ethical boundaries are normative limits on what should be generated, used, or deployed regardless of technical capability—recognition that just because systems can produce certain outputs doesn't mean those outputs should be created, and that continued generation sometimes requires explicit justification or should halt entirely. Ethical boundaries address concerns including: potential harms (outputs that could injure, deceive, or discriminate), consent and representation (depicting identifiable individuals without permission, appropriating cultural elements), resource consumption (environmental costs of continued iteration), and dignity (outputs that degrade or objectify). Responsible use of generative systems requires evaluating not just whether outputs satisfy technical specifications but whether generating them is ethically defensible. The slide identifies ethical boundaries as "stopping point where continued generation requires justification—or the process must halt," emphasizing that ethics involves judgment about whether to continue, not just assessment of whether current outputs succeed. This shifts evaluation from purely technical conformance checking to values-based decision-making about appropriate use.

Source: Floridi, L., & Cowls, J. (2019). A unified framework of five principles for AI in society. Harvard Data Science Review, 1(1).

Appropriate Intervention Matching

Appropriate intervention matching is the practice of diagnosing failure types correctly and applying remediation strategies suited to each type rather than using generic responses for all failures. Different failure classifications require different responses: ambiguity/design failures require specification refinement (tightening constraints, clarifying requirements, improving problem framing); model limitation failures require working within constraints (accepting limitations, using different systems, adjusting goals, finding workarounds); ethical boundary failures require values assessment (determining whether continued generation is justified, whether outputs should be used despite technical success, whether process should halt). Mismatched interventions prove ineffective: attempting to fix model limitations through specification refinement wastes effort, treating design failures as unavoidable limitations abandons achievable improvements, ignoring ethical concerns because outputs technically succeed abdicates responsibility. Professional practice emphasizes diagnostic precision: correctly identifying failure type enables targeted remediation rather than trial-and-error hoping something eventually works.

Source: Norman, D. A. (2013). The design of everyday things (Revised and expanded edition). Basic Books.

Justification Requirements for Continued Work

Justification requirements recognize that continued iteration, resource expenditure, or generation attempts should be defensible based on reasonable expectation of improvement and ethical acceptability—not merely automatic repetition hoping for better results. When failures occur, decisions about whether to continue require evaluating: Is improvement achievable through identified interventions? Do expected benefits outweigh costs (time, computational resources, environmental impact)? Are there ethical concerns requiring justification for continued generation? Has iteration reached diminishing returns where further attempts are unlikely to yield substantially better outcomes? Professional practice emphasizes purposeful iteration guided by diagnostic understanding rather than reflexive repetition. The slide's formulation "continued generation requires justification—or the process must halt" establishes that continuation isn't default—it requires positive justification that further work serves defensible purposes.

Source: Friedman, B., & Hendry, D. G. (2019). Value sensitive design: Shaping technology with moral imagination. MIT Press.

Why This Matters for Students' Work

Understanding failure classification and appropriate intervention matching fundamentally changes how students respond to unsatisfactory outcomes in creative and technical work—shifting from generic trial-and-error to diagnostic problem-solving.

Students often respond to failures uniformly: when outputs disappoint, they try again with minor variations, hoping different results emerge without understanding why initial attempts failed. This approach proves inefficient and often ineffective because it doesn't address root causes. The slide's three-category classification (ambiguity/design, model limitation, ethical boundary) provides a diagnostic framework enabling targeted responses. Ambiguity failures suggest "the specification was not controlled tightly enough"—remediation requires better constraint articulation, not merely additional generation attempts. Model limitations indicate "structural ceiling we cannot prompt past"—remediation requires different approaches (alternative systems, goal adjustment, acceptance), not iterative refinement hoping the same system suddenly performs differently. Ethical boundaries require values assessment and justification—remediation involves determining whether continued work is appropriate, not just whether it's technically possible.

The concept of design failures stemming from underspecified constraints has profound implications for where students locate responsibility. When work disappoints, students' default attribution often externalizes blame: "the system isn't good enough," "the tool is limited," "this just isn't possible." However, the slide's note emphasizes that design failures trace to specification decisions made earlier—often, failures result from insufficient constraint specification, poor problem framing, or unclear requirements rather than from system inadequacy. This reframes failure as potentially addressable through better upfront work: tightening specifications, articulating constraints explicitly, framing problems more carefully. Rather than passively accepting disappointing outcomes as beyond their control, students can recognize that clearer specification often enables better results.

Understanding structural ceilings prevents wasteful iteration. Students sometimes engage in futile refinement attempts: repeatedly adjusting prompts trying to fix model limitation failures that no amount of specification improvement will resolve. Recognizing when limitations are structural—fundamental capabilities boundaries of current systems—enables strategic decisions about when to stop iterating with one approach and consider alternatives. Professional practice distinguishes productive iteration (refining based on diagnostic understanding of addressable problems) from unproductive repetition (hoping limitations magically resolve). Time spent attempting to overcome structural ceilings through prompting would be better invested in finding workarounds, using different systems, or adjusting goals to work within capabilities.

The ethical boundaries concept introduces values-based evaluation alongside technical evaluation. Students sometimes evaluate work purely on technical conformance: does it meet specifications? Does it work correctly? However, professional and academic contexts require also asking: Should this be created? Are there ethical concerns with generating or using this output? Does continued iteration consume resources disproportionate to benefits? The slide's formulation that ethical boundaries represent "stopping point where continued generation requires justification—or the process must halt" establishes that ethics isn't an optional add-on consideration—it's integral to evaluation determining whether work should continue.

Understanding that "evaluation is judgment, not generation" clarifies the human role in AI-assisted workflows. Students sometimes treat evaluation as mechanical success checking: outputs either work or don't, determined automatically. However, the slide emphasizes evaluation as judgment requiring human assessment: classifying failure types, determining appropriate interventions, assessing ethical acceptability, deciding whether to continue. This judgment cannot be automated or delegated to systems—it requires human values, contextual understanding, and responsibility. Developing sound evaluative judgment represents a core learning goal, not merely learning to generate outputs.

For collaborative and professional contexts, failure classification creates shared diagnostic language. Rather than vague statements like "this didn't work" or "try again," teams can specify: "This exhibits ambiguity failure—constraints need tightening" or "This hits model limitation ceiling—we need alternative approach" or "This raises ethical concerns requiring justification discussion." This precision enables efficient coordination: team members understand what type of intervention is needed rather than all attempting random variations.

How This Shows Up in Practice (Non-Tool-Specific)

Filmmaking and Media Production

Film production systematically classifies failures during dailies review and post-production evaluation. When footage fails to satisfy creative intent, production teams diagnose failure types to determine response.

Design failures in filmmaking often trace to inadequate shot specification. If storyboards showed vague framing like "wide shot of building" without specifying camera height, lens focal length, or compositional emphasis, resulting footage might technically match that specification but fail to achieve intended visual impact. The failure isn't an execution error—the shot was framed as (vaguely) specified. The failure is specification inadequacy: constraints weren't controlled tightly enough. Remediation requires better specification for reshoot: precise camera position, specific lens, detailed composition description.

Technical limitations represent structural ceilings. If lighting setup cannot achieve both deep depth of field and low-light exposure simultaneously with available equipment, no amount of technique refinement overcomes that physical limitation. The constraint isn't prompt-able—it's fundamental to available tools and physics. Response options include: accepting limitations (shoot with shallow depth of field or brighter lighting), using different equipment (higher-sensitivity cameras, more powerful lights), or adjusting creative goals (changing scene timing to allow more light or accepting shallow focus as aesthetic choice).

Ethical boundaries in documentary and journalism require justification for continued access or coverage. If filming causes subject distress, continuing requires explicit justification that documentary value outweighs harm—or recognition that process must halt regardless of footage quality. Ethics review boards, editorial standards, and professional codes establish when continuation requires justification versus when it's prohibited.

Budget and schedule constraints create practical stopping points. When production reaches allocated time or budget without achieving all goals, evaluation determines: Are remaining goals achievable with available resources? Do goals justify additional expenditure? Should scope be reduced to match reality? These decisions require judgment about value versus cost, not merely technical assessment of what's possible.

Design

Interface design projects classify usability test failures diagnostically. When users struggle with interfaces, teams determine failure types to guide redesign.

Design failures often stem from underspecified constraints during initial design. If a design brief vaguely states "make it user-friendly" without specifying target user expertise, task contexts, or success metrics, resulting designs might satisfy that vague specification while failing actual usability. The problem isn't implementation—it's that specification didn't constrain design tightly enough. Remediation requires better requirements: specific user personas, defined use scenarios, measurable usability criteria.

Technical limitations create boundaries for certain interaction patterns. If design requires gesture recognition precision current touch sensors cannot achieve, no amount of design refinement overcomes that sensor limitation. Response involves: adjusting interaction design to work within sensor capabilities, using different input methods, or accepting limitations and designing around it.

Ethical boundaries emerge when design patterns prove manipulative or deceptive. Dark patterns might technically succeed at desired business metrics (increasing purchases, reducing cancellations) while ethically failing by exploiting cognitive biases. Ethical evaluation requires asking: Should we deploy this even though it works? Does this respect user autonomy? Continuation requires justifying why business goals outweigh ethical concerns—or recognizing deployment should halt.

Accessibility review creates mandatory stopping points. Designs failing accessibility standards cannot deploy until remediated, regardless of aesthetic success or functional performance for majority users. The evaluation isn't "does it work?" but "does it work for everyone it should serve?"

Writing

Academic writing evaluation classifies revision needs by failure type. When papers fail to satisfy requirements, understanding failure types guides revision.

Design failures in writing often trace to inadequate thesis specification or argument framing. If the writer began with a vague goal like "write about climate change" without specifying argument, audience, or scope, the resulting paper might technically address the topic while lacking coherent direction. The failure isn't execution—it's that initial framing didn't constrain work tightly enough. Remediation requires better specification: precise thesis statement, defined scope, clear argument structure.

Knowledge limitations create structural ceilings. If the argument requires empirical evidence students cannot access (paywalled journals, unavailable archives, proprietary data), no amount of writing refinement overcomes that limitation. Response involves: adjusting argument to use accessible evidence, seeking alternative sources, or reconsidering claim scope.

Ethical boundaries in research writing require justification for certain approaches. Studying vulnerable populations, using deceptive methods, or risking participant harm requires IRB approval and explicit justification—or recognition that research cannot proceed as planned. Ethics review determines whether continuation is defensible, not whether research is technically feasible.

Word count and deadline constraints create practical stopping points. When drafts exceed length limits or deadlines approach without completion, evaluation determines: Can content compress to meet constraints? Should scope be reduced? Does extension justify requesting exception? These involve judgment about priorities and trade-offs.

Computing and Engineering

Software development classifies bug and failure types to determine response strategies. When systems fail testing, teams diagnose causes to guide remediation.

Design failures often stem from inadequate requirements specification. If requirements vaguely stated "system should be fast" without defining performance targets, resulting implementation might satisfy that vague specification while failing user expectations. The problem isn't coding—it's that requirements didn't constrain implementation tightly enough. Remediation requires concrete specifications: specific latency limits, throughput requirements, resource constraints.

Algorithmic limitations create structural ceilings. If the problem is NP-complete and requires real-time response for large inputs, no clever coding overcomes fundamental computational complexity. Response involves: accepting approximation algorithms with bounded error, constraining input size, relaxing real-time requirements, or determining problem infeasible as specified.

Ethical boundaries in AI/ML require justification for deploying certain models. If a model exhibits demographic bias in hiring recommendations, deploying it requires justifying why business benefits outweigh discriminatory impacts—or recognizing deployment should halt until bias is addressed. Fairness audits and ethics reviews determine when systems should not deploy despite technical functionality.

Performance and security reviews create mandatory stopping points. Code failing security audit cannot deploy to production regardless of feature completeness. Evaluation isn't "does it work for intended use?" but "is it safe to deploy?"

Common Misunderstandings

"All failures are either design failures or system failures—ethical concerns are separate considerations"

This compartmentalization treats ethics as an external constraint applied after technical evaluation rather than as an integral component of evaluation itself. The slide explicitly includes ethical boundaries as a third failure classification category alongside ambiguity and model limitations, establishing that ethical evaluation is not separate from technical evaluation—it's part of comprehensive assessment determining appropriate response. Ethical failures differ from technical failures: outputs might perfectly satisfy specifications and operate within system capabilities while still failing ethically (creating harmful content, violating consent, appropriating without permission, consuming disproportionate resources). Professional practice integrates ethical evaluation throughout development and deployment, not as afterthought following technical validation. Academic and professional contexts increasingly recognize that "it works technically" doesn't imply "it should be deployed"—ethical acceptability constitutes an independent evaluation dimension.

"If specification was adequate, design failures wouldn't occur—all failures with clear specifications are system failures"

This oversimplification ignores that specification adequacy is relative to context and that even seemingly clear specifications can prove insufficient. The slide's "ambiguity" category acknowledges that specifications can be insufficiently controlled without being completely absent. A specification might articulate some constraints clearly while leaving others implicit, might specify requirements that prove contradictory in practice, or might fail to anticipate edge cases. Design failures include not just missing specifications but also poorly framed problems, contradictory constraints, specifications at wrong granularity level, or requirements that don't address actual needs. Research by Schön and Norman establishes that design failures frequently stem from problem framing inadequacy rather than from specification absence—the problem was conceptualized incorrectly from the start. Moreover, iterative work often reveals specification inadequacy only through attempted implementation: what seemed adequately specified proves insufficient when implementation exposes unstated assumptions or unrecognized requirements.

"Once model limitation ceiling is identified, no further work is possible—must accept failure or abandon goals"

This defeatist interpretation treats model limitations as absolute barriers rather than as constraints requiring strategic adaptation. The slide states model limitations represent "structural ceiling we cannot prompt past"—not that no alternative approaches exist, but that continued prompting the same system won't overcome the limitation. Responses to model limitations include: using different systems with different capability profiles, combining multiple systems using each for aspects matching strengths, decomposing problems into components some solvable within limitations, adjusting goals to work within capabilities, accepting limitations for current work while planning future improvements when capabilities advance, or finding workarounds that achieve similar ends through different means. Professional practice treats limitations as design constraints requiring creative adaptation rather than as binary blockers forcing abandonment. The key insight is recognizing when a particular approach has reached the ceiling, enabling a strategic pivot to alternatives rather than wasteful continued iteration with inadequate approach.

"Ethical boundaries only apply to obviously harmful content like violence or explicit material"

This narrow view dramatically underestimates scope of ethical considerations in creative and technical work. Ethical boundaries encompass far more than content restrictions: resource consumption (environmental and computational costs of continued iteration), consent and representation (using likenesses without permission, appropriating cultural elements), fairness and bias (perpetuating stereotypes, excluding marginalized groups), transparency and deception (misrepresenting AI-generated content as human-created), labor impacts (displacing creative workers), and dignity (degrading or objectifying representations). Professional codes, institutional review boards, and organizational policies establish ethical guidelines addressing these dimensions. The slide's formulation that ethical boundaries require "justification" emphasizes that many contexts fall into the judgment zone—not clearly prohibited but requiring explicit reasoning about whether continuation serves defensible purposes. Students working in academic contexts particularly need understanding that ethical evaluation considers research ethics, informed consent, potential harms, and appropriate use beyond merely avoiding obviously offensive content.

Scholarly Foundations

Schön, D. A. (1983). The reflective practitioner: How professionals think in action. Basic Books.

Foundational work on reflective practice examining how professionals frame problems, make decisions, and learn from experience. Emphasizes that most professional failures stem from inadequate problem framing rather than from execution errors—the problem was conceptualized incorrectly, requirements were poorly specified, or constraints were insufficiently articulated. Directly supports the slide's emphasis that design failures trace to underspecified constraints and poor framing, not to system inadequacy. Essential for understanding why diagnosis matters: correctly identifying whether failures are design or execution determines appropriate remediation.

Norman, D. A. (2013). The design of everyday things (Revised and expanded edition). Basic Books.

Classic text on design principles emphasizing that user errors typically reflect design failures rather than user inadequacy—designers failed to constrain interactions appropriately, provide clear affordances, or anticipate failure modes. Introduces concepts of constraints (physical, semantic, cultural, logical) as design tools preventing errors and guiding behavior. Relevant for understanding that constraints enable good outcomes by limiting bad possibilities—"underspecified constraints" represents design failure. Establishes framework for classifying failures and determining whether problems require design changes versus user accommodation.

Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 220-229).

Proposes model cards as standardized documentation of machine learning model capabilities and limitations, enabling users to understand what models can and cannot do reliably. Emphasizes the importance of clearly documenting performance boundaries, known failure modes, and appropriate use cases. Relevant for understanding model limitations as structural ceilings requiring documentation and communication rather than as failures to overcome through better prompting. Establishes that knowing system boundaries enables appropriate use rather than attempting impossible tasks.

Floridi, L., & Cowls, J. (2019). A unified framework of five principles for AI in society. Harvard Data Science Review, 1(1).

Proposes five ethical principles for AI: beneficence (promoting wellbeing), non-maleficence (avoiding harm), autonomy (preserving human decision-making), justice (ensuring fairness), and explicability (enabling understanding and accountability). Argues that AI ethics requires applying these principles throughout design, development, and deployment rather than as post-hoc constraints. Relevant for understanding ethical boundaries as integral evaluation dimension requiring justification for decisions, not external checklist. Establishes framework for evaluating whether AI use serves defensible purposes.

Friedman, B., & Hendry, D. G. (2019). Value sensitive design: Shaping technology with moral imagination. MIT Press.

Comprehensive treatment of value-sensitive design methodology integrating ethical values throughout design processes. Emphasizes that technology embeds values through design choices and that responsible design requires explicitly considering whose values are served, what harms might result, and whether outcomes align with intended purposes. Discusses when design processes should continue versus when they should halt based on values assessment. Directly supports the slide's ethical boundaries concept: evaluation requires judging whether continuation serves defensible purposes, not merely whether technical progress is possible.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610-623).

Critical analysis of large language model limitations, costs, and risks. Discusses how model scale doesn't overcome fundamental capability gaps, how training data biases persist regardless of size, and how deployment decisions require weighing benefits against environmental costs, bias propagation, and other harms. Relevant for understanding model limitations as structural (not overcome through scale or prompting) and ethical boundaries (continuation requires justifying costs and risks). Establishes that evaluation must consider whether use is appropriate, not just whether it's technically possible.

Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., & Barnes, P. (2020). Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 33-44).

Framework for algorithmic auditing emphasizing that failures typically trace to design decisions rather than execution errors. Proposes systematic auditing throughout the development lifecycle to identify where design choices embed values, create risks, or fail to address stakeholder needs. Supports the slide's note that failures often stem from underspecified design choices. Establishes that accountability requires examining design decisions and framing choices, not merely evaluating outputs.

Wachter, S., Mittelstadt, B., & Floridi, L. (2017). Why a right to explanation of automated decision-making does not exist in the General Data Protection Regulation. International Data Privacy Law, 7(2), 76-99.**

Analysis of explainability requirements in automated decision systems, arguing that accountability requires not just explaining outputs but justifying whether automated decision-making is appropriate for context. Discusses when systems should not deploy regardless of technical accuracy. Relevant for understanding that evaluation includes determining whether use is justified, not merely whether outputs are correct. Establishes legal and ethical frameworks requiring justification for deployment decisions.

Boundaries of the Claim

The slide proposes three failure classification categories (ambiguity/design, model limitation, ethical boundary) as framework for determining appropriate interventions. This does not claim these are the only possible failure types, that all failures fit neatly into one category, or that classification always provides clear remediation paths.

The characterization of ambiguity failures as design failures resulting from insufficiently controlled specifications describes a common pattern but doesn't claim all ambiguity stems from specification inadequacy. Some ambiguity may be inherent in the problem domain, some may reflect deliberately flexible specifications allowing creative interpretation, and some may result from legitimate uncertainty about requirements that specification cannot resolve in advance.

The model limitation category describes structural ceilings "we cannot prompt past"—capabilities boundaries of current systems that specification refinement won't overcome. This doesn't claim limitations are permanent, that no alternative approaches exist, or that all apparent limitations are truly fundamental. Some perceived limitations may actually reflect inadequate specification or technique; distinguishing genuine structural ceilings from addressable challenges requires expertise and experimentation.

The ethical boundary category establishes that "continued generation requires justification—or the process must halt." This doesn't specify what justifications are adequate, who determines justification sufficiency, or precisely when continuation becomes ethically impermissible versus merely requiring explicit reasoning. Ethical evaluation involves contextual judgment, stakeholder consideration, and values assessment—not mechanical rule application. Different contexts, institutions, and ethical frameworks may reach different conclusions about when continuation is justified.

The note stating "design failures often stem from underspecified or poorly framed constraints" represents empirical observation from design research but doesn't claim zero failures result from other causes or that all underspecification produces failure. Well-designed systems may tolerate some specification ambiguity; whether underspecification causes failure depends on context, system robustness, and how critical precise specification is for success.

The framework doesn't specify decision procedures for: determining which category a particular failure belongs to, adjudicating borderline cases, prioritizing when failures span multiple categories, or resolving disagreements about classification. These require judgment informed by expertise, context, and stakeholder perspectives.

Reflection / Reasoning Check

1. Think about a project or assignment where you encountered failures and needed to decide whether to continue iterating or try a different approach. Try to classify the failures you experienced using the three categories from this slide: Were some failures due to ambiguity or underspecification in your initial planning (design failures)? Were some due to fundamental limitations of available tools, knowledge, or resources (model/structural limitations)? Were there some situations where you needed to decide whether continuing was appropriate or worthwhile (ethical/resource boundaries)? For each failure type, what would have been the appropriate intervention? Looking back, did you actually apply appropriate interventions, or did you treat all failures the same way (for example, just trying again with minor variations)? What would have changed if you had diagnosed failure types first before deciding how to respond?

This question tests understanding that different failure types require different remediation strategies and that diagnostic classification precedes appropriate response. An effective response would identify specific failures from actual experience, attempt to categorize them using the framework, articulate appropriate interventions for each type (design failures → better specification; limitation failures → alternative approaches or adjusted goals; ethical boundaries → justification assessment or stopping), and recognize whether initial responses matched appropriate interventions or whether all failures were treated uniformly. The response should demonstrate understanding that failure classification is actionable—it guides strategic decisions about what to do next rather than being an abstract categorization exercise. Students should recognize that treating all failures identically (always iterating with the same approach, or always giving up, or always seeking different tools) proves less effective than diagnostic matching of interventions to failure types.

2. The slide states that ethical boundaries represent "stopping point where continued generation requires justification—or the process must halt" and emphasizes that "evaluation is judgment, not generation." Think about a context where you've worked on something—creative work, research, coding, analysis—and consider: What would make continuing the work require justification beyond "I haven't achieved my goal yet"? What kinds of costs, risks, or concerns might create ethical stopping points even when technical progress remains possible? Who should make judgments about whether continuation is justified, and what considerations should inform those judgments? How does this differ from purely technical evaluation asking "does this work correctly?" What does it mean that evaluation involves judgment rather than just verification? Can evaluation be reduced to checklists and rubrics, or does responsible evaluation require human judgment that cannot be automated?

This question tests understanding that evaluation encompasses ethical and values-based assessment beyond technical conformance checking, and that some decisions require human judgment that cannot be reduced to mechanical verification. An effective response would recognize that continuation justification involves considering: resource consumption (time, computational cost, environmental impact) relative to benefits, risks of harmful outputs, opportunity costs of continuing versus pursuing alternatives, whether iteration has reached diminishing returns, and whether goals remain appropriate. The response should articulate that judgment involves contextual assessment, values consideration, and weighing trade-offs—activities requiring human deliberation rather than automated checking. Students should understand that "evaluation is judgment" means humans must assess appropriateness, defensibility, and alignment with values, not merely verify technical correctness. This demonstrates understanding that professional practice requires exercising judgment about when to continue, when to stop, and when to reconsider goals—judgment that education should develop rather than seeking to eliminate through purely objective metrics.

Return to Slide Index