DRAFT ONLY

Introduction: From Prompt Soup to Assembly Lines – Why Multi-Stage Pipelines Matter

Many teams begin their large language model (LLM) projects with a single, complex prompt that tries to do everything at once—summarizing, extracting information, and classifying in one go. We call this approach "prompt soup": a fragile tangle where even a small change can break the whole system.

Prompt soup is hard to debug, difficult to extend, and nearly impossible to scale. If you need to adjust how sentiment is detected or add a new extraction rule, you risk disrupting everything else.

There’s a better way. Instead of one monolithic prompt, break your workflow into clear, modular stages—just like an assembly line in manufacturing. Each stage does one job, passing its results to the next. This is the essence of a multi-stage pipeline.

In business, assembly lines and workflow automation are proven strategies for reliability and growth. Each department or process focuses on a specific task, making the whole system easier to maintain and improve. AI pipelines can—and should—work the same way.

Let’s make this concrete with a simple example: processing a customer support email. The goal is to summarize the message, extract key details (like product names or issues), and classify sentiment.

First, here’s the monolithic approach:

Monolithic Prompt (Not Recommended)

prompt = (
    "Read the following customer email. Summarize it in two sentences, extract the main issue, and classify the sentiment as Positive, Neutral, or Negative.\\\\n"
    "Email: {email_text}"
)
# This prompt is sent as a single request to the LLM.

This works for simple cases. But as soon as you need to tweak one part—like improving entity extraction—you must rewrite and retest the whole prompt. Debugging and updates become a headache.

Now, let’s see the modular pipeline approach using DSPy. Each stage is a focused module. (For details on DSPy modules, see Chapter 3: DSPy Fundamentals.)

Step 1: Summarization Module

import dspy

class SummarizeEmail(dspy.Module):
    def forward(self, email_text: str) -> str:
        """Return a concise summary of the customer email."""
        return self.predict(email_text=email_text)

# Tip: For production, enforce structured outputs using TypedPredictors or Pydantic validation (see Chapter 11).

The SummarizeEmail module handles only summarization. Its purpose is clear and limited.

Step 2: Entity Extraction Module

class ExtractEntities(dspy.Module):
    def forward(self, summary: str) -> dict:
        """Extract entities like product names and issues from the summary."""
        return self.predict(summary=summary)

# Tip: Use structured outputs and schema validation for reliability (see Chapter 11).

Entity extraction means identifying structured items—like product names or issues—from text. This module focuses only on that job.

Step 3: Sentiment Analysis Module

class AnalyzeSentiment(dspy.Module):
    def forward(self, summary: str) -> str:
        """Classify sentiment of the summarized email as Positive, Neutral, or Negative."""
        return self.predict(summary=summary)

# Tip: DSPy modules can be automatically optimized for higher accuracy using built-in algorithms (see Chapter 8).