Specification – Vague Prompt

Slide Idea

This slide presents a deliberately underspecified prompt—"A dog running happily in the city"—as an example of deferring creative decisions to a generative system rather than articulating them explicitly in the specification. The vague prompt leaves substantial interpretive work to the system, which must infer numerous unspecified details about composition, style, lighting, perspective, and context that the prompt does not articulate.

Key Concepts & Definitions

Prompt Underspecification

Prompt underspecification occurs when natural language instructions to generative systems fail to explicitly articulate many requirements, constraints, or desired characteristics that matter for output quality or appropriateness. Research demonstrates that LLM prompts frequently omit user-important requirements—sometimes unintentionally because users do not recognize implicit assumptions, sometimes deliberately because users expect systems to infer reasonable defaults. While underspecified prompts sometimes produce acceptable results when systems guess correctly at unstated requirements (approximately 41% success rate for requirement inference), this behavior proves fragile: underspecified prompts are twice as likely to regress when models update or prompts change, sometimes with performance drops exceeding 20%. The problem amplifies with generative AI because natural language interfaces create illusion of understanding—users write prompts as if communicating with humans who share contextual knowledge, but systems lack this shared context.

Source: Yang, C., Shi, Y., Ma, Q., Liu, M. X., Kästner, C., & Wu, T. (2025). Understanding and managing underspecification in LLM prompts.

Creative Decision Deferral

Creative decision deferral describes the practice of leaving creative or aesthetic choices unspecified in instructions to systems, effectively delegating those decisions to the system's default behaviors, training biases, or randomization. When users provide vague prompts to text-to-image models, they defer decisions about composition, color palette, lighting, perspective, level of detail, artistic style, and countless other variables that determine visual outcomes. This deferral can be intentional (users wanting to see what the system produces without constraints) or unintentional (users not recognizing that specifications are needed). The critical issue is that deferred decisions still get made—by the system rather than the user—and these system-made decisions may not align with user intent, project requirements, or quality standards. Users retain ultimate responsibility for outputs even when they have delegated decision-making to systems.

Source: Liu, V., & Chilton, L. B. (2022). Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems.

Prompt Engineering

Prompt engineering is the practice of designing, refining, and optimizing natural language instructions to generative AI systems to achieve desired outputs consistently and reliably. Effective prompt engineering requires understanding how systems interpret different phrasings, what specificity levels produce optimal results, how to structure prompts to guide system behavior, and how to anticipate common failure modes. Research on text-to-image generation identifies that successful prompts typically specify both subject (what should appear) and style (how it should look), use concrete rather than abstract descriptors, and include sufficient detail to reduce ambiguity without over-constraining the system. However, prompt engineering proves challenging for non-experts: users often model systems as human-like conversational partners rather than as pattern-matching algorithms, leading to prompts that assume shared context, common sense, or interpretive flexibility that systems do not possess.

Source: Zamfirescu-Pereira, J. D., et al. (2023). Why Johnny can't prompt: How non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1-21).

Ambiguity vs. Vagueness

In linguistics and philosophy of language, ambiguity refers to expressions with multiple discrete possible meanings (e.g., "bank" meaning financial institution or riverbank), while vagueness refers to expressions with imprecise or fuzzy boundaries (e.g., "tall" without specifying a threshold). In prompt contexts, both create challenges but in different ways. Ambiguous prompts may cause systems to select interpretations different from user intent ("a bat" could mean animal or sports equipment). Vague prompts like "happily running" leave quantitative or qualitative details unspecified—how energetic is "happily"? What running speed? What expression? Systems must fill these gaps through default behaviors or probabilistic sampling. The example prompt exhibits vagueness more than ambiguity: "dog," "running," "happily," and "city" have relatively clear referents but lack specification about countless implementable details.

Source: Williamson, T. (1994). Vagueness. Routledge.

Default System Behaviors

Default system behaviors are the choices generative AI systems make when prompts do not explicitly specify requirements—determined by training data distributions, model architecture, sampling parameters, and built-in biases. When a text-to-image model receives "a dog in the city" without style specification, it does not abstain from making style choices; instead, it samples from its learned distribution of dog-in-city imagery, weighted toward whatever photographic or artistic styles were most common in training data. These defaults are not neutral—they reflect patterns in training corpora that may exhibit cultural biases, aesthetic conventions, or representation gaps. Users who provide underspecified prompts receive outputs shaped by these defaults whether they intend to or not. Understanding default behaviors requires either extensive experimentation with systems or consultation of documentation about training data and design decisions—knowledge non-expert users typically lack.

Source: Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5185-5198).

Why This Matters for Students' Work

The problem of vague prompts and deferred creative decisions has immediate implications for how students work with generative AI systems and, more broadly, for how they conceptualize creative and technical specification.

When students provide underspecified prompts to generative systems, they often view this as efficient—getting started quickly without extensive planning. However, vague prompts create multiple interconnected problems. First, they produce inconsistent results: the same vague prompt generates different outputs across runs due to sampling randomness, making it difficult to iterate systematically. A student seeking to refine an image based on a vague initial prompt must either accept whatever the system produces in subsequent generation or spend extensive effort trying to reverse-engineer what prompt specifications would reproduce desirable aspects while changing undesirable ones.

Second, vague prompts defer evaluation responsibility without eliminating it. Students must still judge whether outputs are suitable, but they lack explicit criteria for evaluation because they have not articulated what they wanted. This creates a problematic feedback loop: outputs do not match unstated expectations, but students cannot clearly identify what is wrong beyond vague dissatisfaction ("it's not quite right"). Without explicit success criteria, iteration becomes trial-and-error hoping something eventually "feels right" rather than systematic refinement toward defined goals.

Third, underspecified prompts obscure authorship and creative agency. When students submit work generated from vague prompts, determining their creative contribution becomes difficult. Did they make intentional choices about composition, style, tone, and content? Or did they delegate those choices to system defaults? The research on prompt underspecification demonstrates that systems make decisions about unspecified requirements whether users intend to delegate those decisions or not—the system's training data biases and default behaviors shape outputs in ways users may not recognize or control.

The concept of creative decision deferral challenges students to recognize that not making explicit decisions does not mean decisions are not being made—it means those decisions get made by default system behaviors rather than by deliberate user choices. This has particular significance for educational contexts that value demonstrating reasoning and intentionality. When students cannot articulate what they asked for, they cannot explain why they got what they got, and they cannot justify outputs as aligned with project goals.

For collaborative work, vague specifications create coordination problems. When team members need to generate related content, underspecified prompts produce inconsistent results—one person's "dog running happily in the city" may yield photorealistic images while another's yields cartoon-style images because they did not specify style consistently. Professional creative workflows require stylistic coherence that vague prompts cannot reliably produce.

The challenge of prompt engineering for non-experts reveals that effective system use requires domain knowledge that students may initially lack. Research shows that people unfamiliar with AI systems model them as human conversational partners, assuming shared context and common-sense inference that systems do not possess. Students write prompts as if addressing humans who can fill in unstated details reasonably—but systems fill in details based on statistical patterns in training data, not on human-like inference about what "makes sense" in context.

Understanding ambiguity versus vagueness helps students recognize different types of specification failures. Ambiguous prompts create discrete interpretation choices (which of several distinct meanings?), while vague prompts create continuous gaps requiring quantitative or qualitative details. Both degrade control, but in different ways requiring different remediation strategies.

How This Shows Up in Practice (Non-Tool-Specific)

Text-to-Image Generation

The slide's example—"A dog running happily in the city"—illustrates typical underspecification in text-to-image prompting. This prompt specifies subject matter (dog, urban setting, activity) but leaves vast decision space unspecified: What breed of dog? What size? What coat color and texture? Running how fast, in what gait? What city architecture style—modern glass towers, historic brownstones, narrow European streets, wide American boulevards? What time of day and lighting conditions? What weather? What camera angle—eye level, low angle looking up, high angle looking down, aerial? What depth of field—sharp focus throughout or selective focus on the dog? What artistic style—photorealistic, illustrative, painterly, graphic? What color palette and mood?

A system receiving this prompt must make decisions about all these unspecified variables. It does so by sampling from learned distributions in its training data. The results may align with user intent by chance, or they may diverge in ways users find unsuitable. Crucially, users who cannot articulate what they wanted beyond "dog running happily in the city" struggle to refine outputs systematically—they cannot specify what to change because they never specified what they wanted in the first place.

Experienced practitioners using text-to-image models typically develop detailed prompting vocabularies specifying: subject with specific attributes, composition and framing, lighting and atmosphere, artistic medium and style, mood and emotion, technical details (camera settings if photographic, materials if illustration), and quality modifiers. This specificity does not eliminate all uncertainty—stochastic systems still vary outputs—but it dramatically constrains the possibility space toward user intent.

Interface and Interaction Design

Designers specifying interfaces encounter analogous underspecification problems. A vague design brief like "create a friendly onboarding experience" leaves critical decisions unspecified: What does "friendly" mean—casual language? Humor? Reassurance? Minimal friction? What onboarding steps are necessary? What information must users provide? How much explanation versus learning-by-doing? What visual style conveys friendliness for the target audience? What accessibility requirements apply?

Design teams working from underspecified briefs either make arbitrary decisions (potentially misaligned with stakeholder expectations) or repeatedly seek clarification (slowing progress). Well-specified design briefs articulate user goals, required functionality, constraints, success metrics, target audience characteristics, brand guidelines, and technical requirements. This specification does not dictate implementation details—designers retain creative freedom about how to achieve specified goals—but it clarifies what success looks like.

Writing and Content Generation

Writers using AI writing assistants for content generation face similar specification challenges. A prompt like "write about climate change" dramatically underspecifies: What genre—news article, opinion piece, technical report, educational explainer? What audience—general public, policymakers, students, scientists? What specific aspect of climate change—causes, effects, solutions, politics, economics, science? What length? What tone—alarming, balanced, optimistic, technical? What perspective—global, regional, sectoral?

Without specification, generated content may address topics the writer did not intend, adopt inappropriate tones, include irrelevant information, or omit essential points. Writers who have not clarified their own intent before prompting struggle to evaluate whether outputs serve their purposes. They may accept mediocre content because they have not defined standards for quality, or they may reject adequate content because it does not match unstated expectations.

Professional writing assignments typically include detailed specifications: target word count, required sources, specific questions to address, audience description, publication context, and style guidelines. These specifications enable writers to work efficiently toward defined goals rather than exploring unlimited possibility spaces hoping to discover what they want.

Software Development and System Design

Developers using code generation tools with vague prompts encounter predictable problems. A prompt like "create a login function" underspecifies: What authentication method—username/password, social login, multi-factor? What security requirements—encryption, token-based, session-based? What error handling—invalid credentials, account locked, network errors? What user experience—remember me, password reset, account creation? What integration points—database schema, authentication service, session management?

Generated code from vague prompts may implement approaches developers did not intend, use patterns inconsistent with existing codebase architecture, omit error handling, or make security choices inappropriate for the application context. Developers who have not specified requirements clearly must either accept unsuitable implementations or invest significant effort reverse-engineering generated code to understand what it does, then modifying it to meet actual (but previously unstated) requirements.

Professional development practices emphasize requirements specification before implementation: functional requirements defining what systems must do, non-functional requirements defining performance and quality attributes, interface specifications defining integration points, and acceptance criteria defining how to verify correct implementation. This specification enables developers to evaluate whether implementations meet requirements rather than guessing what "good enough" means.

Common Misunderstandings

"Vague prompts enable creative exploration by not constraining the system"

This misconception treats specificity as limitation and vagueness as creative freedom. However, vague prompts do not grant systems creative agency—they merely delegate decision-making to default behaviors and training data biases. A system receiving "dog running happily in the city" is not "freely creating" in a meaningful sense; it is sampling from probability distributions learned from training images, weighted toward common patterns. The apparent variety in outputs reflects sampling randomness, not intentional creative exploration. Moreover, truly exploratory workflows benefit from being able to systematically vary specific parameters (trying different styles while holding composition constant, for example), which requires enough specification to establish what is being held constant and what is being varied. Complete vagueness prevents systematic exploration because nothing is anchored.

"Making prompts more specific always improves outputs"

This oversimplification assumes that additional specification universally enhances results. Research demonstrates a more nuanced reality: specifications help when they articulate genuinely important requirements, but excessive or misaligned specifications can degrade outputs. Systems have limited instruction-following capacity—prompts specifying too many requirements simultaneously can cause the system to ignore some or to satisfy requirements superficially. Additionally, overly specific prompts may encode incorrect assumptions or request impossible combinations of features, causing systems either to fail or to produce outputs that technically match the prompt but miss user intent. Effective prompting requires strategic specification—articulating requirements that meaningfully constrain toward user goals while avoiding over-specification that introduces conflicts or exceeds system capabilities.

"Systems should be smart enough to infer what I mean from vague prompts"

This expectation treats systems as human-like conversational partners capable of contextual inference, common-sense reasoning, and theory of mind about user intent. Current generative AI systems, despite sophisticated capabilities, do not "understand" user goals in human-like ways—they pattern-match against training data and optimize for statistical likelihood. The research on "Why Johnny Can't Prompt" demonstrates that non-experts systematically overestimate system inference capabilities, assuming systems will fill gaps in prompts with sensible defaults when systems actually fill gaps with whatever patterns were most frequent in training data. Users' inability to articulate specifications does not obligate systems to divine unstated intent correctly—it results in systems making default choices that may diverge from what users wanted but did not say.

"Vague prompts save time because I don't have to think through details upfront"

This apparent efficiency proves a false economy. While vague prompts require less initial time than detailed specifications, they typically produce outputs requiring extensive revision or regeneration, consuming more total time than would have been saved. The underspecification research demonstrates that vague prompts create fragile outputs prone to regression when iterated—small prompt changes produce unexpectedly large output changes because many variables were never anchored. Additionally, students working from vague prompts often cannot efficiently revise outputs toward their goals because they have not articulated what those goals are. They engage in unfocused trial-and-error rather than systematic refinement. Time invested in upfront specification enables more direct paths to satisfactory outcomes by reducing wasted iterations exploring directions ultimately deemed unsuitable.

Scholarly Foundations

Yang, C., Shi, Y., Ma, Q., Liu, M. X., Kästner, C., & Wu, T. (2025). Understanding and managing underspecification in LLM prompts.

Comprehensive analysis of prompt underspecification demonstrating that while LLMs can sometimes infer unspecified requirements (41% success rate for requirement inference), such behavior is fragile—underspecified prompts are twice as likely to regress with model changes, sometimes with 20%+ accuracy drops. Shows that simply adding more requirements does not reliably help due to limited instruction-following capacity. Essential empirical grounding for understanding costs of vague prompts and why specification matters for reliable system use.

Liu, V., & Chilton, L. B. (2022). Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems.

Study examining what prompt structures and keywords produce coherent outputs from text-to-image models through evaluation of 5,493 generations across 51 subjects and 51 styles. Identifies that successful prompts specify both subject and style, and provides concrete guidelines for constructing effective prompts. Directly relevant to understanding what makes prompts like the slide example underspecified and what specificity would improve outcomes.

Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Agrawala, M. (2023). Why Johnny can't prompt: How non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1-21).

Empirical research studying how non-experts approach prompt design, finding that users systematically model AI systems as human conversational partners, assume shared context and common-sense reasoning systems do not possess, and struggle to debug prompt failures because their mental models do not match system behavior. Essential for understanding why students write vague prompts and why prompt engineering proves challenging for non-experts.

Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5185-5198).

Theoretical analysis argues that language models trained on form (text patterns) without grounded meaning do not truly "understand" language in human-like ways—they learn correlations in linguistic form but lack connection to communicative intent. Relevant for understanding why systems do not reliably infer user intent from underspecified prompts: they lack the grounded semantic understanding that would enable human-like inference about what unstated details "make sense."

Hao, Y., et al. (2023). Optimizing prompts for text-to-image generation. In Advances in Neural Information Processing Systems (Vol. 36).

Research proposing automated prompt optimization that adapts user input to model-preferred prompts through reinforcement learning. Demonstrates that user-friendly prompts (simple, vague) and model-preferred prompts (detailed, specific) are often misaligned, requiring adaptation. Shows that well-engineered prompts significantly outperform naive user inputs. Relevant for understanding the gap between how people naturally describe what they want and how systems need prompts structured.

Williamson, T. (1994). Vagueness. Routledge.

Philosophical treatment of vagueness in language, analyzing how vague predicates create borderline cases where truth is indeterminate. While focused on semantic theory rather than AI systems, provides conceptual foundation for understanding how vague prompts create underspecified requirements—not through ambiguity (multiple discrete interpretations) but through imprecise boundaries leaving quantitative or qualitative details undefined.

Oppenlaender, J. (2022). The creativity of text-to-image generation. In Proceedings of the 25th International Academic Mindtrek Conference (pp. 192-202).

Analysis of creative practices in text-to-image generation examining prompt engineering strategies, exploring tension between control (detailed specification) and serendipity (underspecification allowing unexpected results). Discusses how practitioners balance specificity and openness depending on creative goals. Provides a nuanced view that vague prompts are not uniformly problematic—they serve some creative exploration purposes while undermining others requiring consistent control.

Steyvers, M., & Kumar, A. (2024). Three challenges for AI-assisted decision-making. Perspectives on Psychological Science.

Analysis of challenges in human-AI collaboration including calibration (knowing when to rely on AI), complementarity (combining human and AI strengths), and specification (articulating requirements to systems). Discusses how underspecified prompts create delegation without proper oversight. Relevant for understanding broader context: vague prompts exemplify specification challenge that undermines effective human-AI collaboration across domains.

Boundaries of the Claim

The slide presents "A dog running happily in the city" as an example of a vague prompt that defers creative decisions to the system. This does not claim that all brief prompts are necessarily inadequate or that optimal prompts must be extremely long. Effective prompts balance conciseness with sufficient specification—brevity itself is not problematic, but omitting requirements that matter for output quality is. Some contexts may benefit from deliberately open-ended prompts enabling exploratory generation, while others require precise specification for consistent results.

The characterization of this prompt as "vague" and involving "most creative decisions deferred to the system" describes it relative to the detailed multi-dimensional specification shown previously. Different contexts may require different specification levels. For rapid exploratory generation where users want to see varied possibilities, vague prompts serve legitimate purposes. For production work requiring consistency, control, and intentionality, such prompts prove inadequate.

The cited research on prompt underspecification primarily addresses language models and text-to-image models as of 2023-2025. System capabilities, training approaches, and user interface designs continue evolving—future systems might better handle underspecified prompts through improved inference, interactive clarification dialogues, or learned user preferences. The fundamental challenge that vague prompts delegate decisions remains, but how problematic that delegation proves may change as systems improve.

The framework does not specify optimal specification levels, what constitutes "sufficient" detail, or how to determine when prompts are adequately specific versus over-constrained. These are judgment calls requiring consideration of project goals, system capabilities, acceptable variation in outputs, and whether iteration is possible. The principle that underspecification has costs does not uniquely determine correct specification strategies.

The claim about creative decision deferral emphasizes that decisions happen whether users specify them or not—systems make default choices rather than abstaining. This does not imply users always want complete control over every detail or that all system defaults are problematic. Professional practice often intentionally leverages good defaults while specifying only what truly matters. The issue is unintentional deferral where users have not considered what they are delegating.

Reflection / Reasoning Check

1. Consider the example prompt "A dog running happily in the city." Make a list of at least ten specific decisions that would need to be made to create an actual image but that this prompt does not specify—think across multiple dimensions like subject characteristics, composition, lighting, style, and context. For each unspecified decision, consider: What choice might the system make by default? If generated images were reviewed and certain default choices found unsuitable, could the prompt be easily revised to change specifically those choices while keeping other aspects constant? What does this exercise reveal about the relationship between specification detail and revision control?

This question tests whether students can recognize the vast decision space left unspecified by seemingly straightforward prompts. An effective response would identify diverse unspecified dimensions (dog breed/size/color, running speed/gait, city architecture style/time period, camera angle/distance, lighting/weather, artistic style/medium, color palette/mood, depth of field, background population/activity, foreground/background emphasis) and articulate that systems make choices about all these dimensions through defaults or sampling. The response should recognize that without having specified these dimensions initially, selectively revising them proves difficult—changing one aspect may inadvertently change others because they were not independently controlled. This demonstrates understanding that specificity enables control: detailed prompts create stable reference points allowing systematic variation, while vague prompts make all dimensions simultaneously variable and interdependent.

2. Imagine two students working on the same project using generative systems. Student A writes detailed prompts specifying subject, composition, lighting, style, mood, and technical parameters. Student B writes brief vague prompts and regenerates multiple times, selecting outputs that "feel right." Both eventually produce work they are satisfied with. When the instructor asks them to explain their creative decisions and justify their choices, what differences would be expected in their ability to respond? What does this suggest about the relationship between specification, intentionality, and authorship? If both students' final outputs are of comparable aesthetic quality, does it matter whether one used detailed specifications and the other used vague prompts with selection? Why or why not?

This question tests understanding of authorship and intentionality beyond aesthetic outcomes. An effective response would recognize that Student A can articulate explicit decision-making—explaining why they specified particular attributes and how those choices serve project goals—while Student B can only describe selection from generated options without explaining why those particular attributes were appropriate. This difference matters in educational contexts valuing reasoning and justification even when outcomes appear equivalent. The response should grapple with whether authorship resides in generation, selection, or intentional specification of requirements, and whether selecting from system-generated alternatives demonstrates the same creative agency as specifying what should be created. This assesses whether students understand that professional and academic contexts often value demonstrable reasoning about choices, not merely acceptable final outputs, and that specification practices document that reasoning while vague prompting with selection obscures it.

Return to Slide Index