LangGraph: Build Stateful AI Agents That Work in Production

Why LangChain Chains Break in Production

The first generation of AI agents — built with LangChain's AgentExecutor — worked brilliantly in demos and broke frustratingly in production. The problems were structural: agents could loop infinitely with no interruption mechanism, there was no way to pause for human review before a destructive action, branching logic was awkward, error recovery required rebuilding the entire chain, and state was ephemeral — a crash meant restarting from scratch with no memory of progress.

LangGraph, released by LangChain in early 2024, fundamentally rearchitects agents as stateful graphs. Instead of a linear chain of steps, a LangGraph workflow is a directed graph where nodes are processing steps (LLM calls, tool executions, human review), edges define conditional control flow, and state is a typed Python object that persists across every node and across time (via a checkpointing backend). This graph model unlocks capabilities that chain-based agents simply cannot achieve.

Core Concepts

State: The Single Source of Truth

Every LangGraph workflow is built around a State object — a TypedDict (or Pydantic model) that holds all information about the current workflow execution. Every node reads from state and writes to state. The graph passes state between nodes; nodes never communicate directly.

from typing import TypedDict, Annotated, List
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    messages: Annotated[List, add_messages]  # accumulated conversation
    lead_data: dict                           # enriched lead information
    email_draft: str                          # AI-generated email
    human_approved: bool                      # approval status
    retry_count: int                          # loop control
    final_output: str

The Annotated[List, add_messages] annotation is important — it tells LangGraph to append new messages rather than replace the entire list. Custom reducers let you define exactly how state fields are updated when nodes write to them.

Nodes: Processing Steps

Nodes are Python functions (or async functions) that take state as input and return a dict of state updates:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o", temperature=0)

def research_lead(state: AgentState) -> dict:
    """Enrich lead data using AI analysis."""
    lead_info = state["lead_data"]

    response = llm.invoke([
        HumanMessage(content=f"""
        Research this lead and provide:
        - ICP fit score (1-10)
        - Company summary (2 sentences)
        - Best outreach angle

        Lead: {lead_info}
        """)
    ])

    return {
        "messages": [response],
        "lead_data": {**lead_info, "research": response.content}
    }

def draft_email(state: AgentState) -> dict:
    """Draft personalised outreach email."""
    research = state["lead_data"].get("research", "")

    response = llm.invoke([
        HumanMessage(content=f"Write a 3-sentence cold email based on: {research}")
    ])

    return {"email_draft": response.content}

Edges: Control Flow

Edges connect nodes and define execution order. Conditional edges are the key capability that makes LangGraph powerful — they route execution based on the current state:

from langgraph.graph import StateGraph, END

def route_after_research(state: AgentState) -> str:
    """Route based on ICP fit score extracted from research."""
    research = state["lead_data"].get("research", "")

    if "score: 9" in research or "score: 10" in research:
        return "draft_email"      # high-value lead, proceed
    elif state["retry_count"] < 2:
        return "research_lead"    # low confidence, retry research
    else:
        return "disqualify_lead"  # give up after 2 retries

Building the Graph

from langgraph.graph import StateGraph, START, END

# Build the graph
builder = StateGraph(AgentState)

# Add nodes
builder.add_node("research_lead", research_lead)
builder.add_node("draft_email", draft_email)
builder.add_node("human_review", human_review_node)  # interrupt point
builder.add_node("send_email", send_email_node)
builder.add_node("disqualify_lead", disqualify_node)

# Add edges
builder.add_edge(START, "research_lead")
builder.add_conditional_edges("research_lead", route_after_research)
builder.add_edge("draft_email", "human_review")
builder.add_conditional_edges(
    "human_review",
    lambda state: "send_email" if state["human_approved"] else "draft_email"
)
builder.add_edge("send_email", END)
builder.add_edge("disqualify_lead", END)

graph = builder.compile()

Human-in-the-Loop: Interrupting for Approval

The single most important production feature of LangGraph is interrupts — the ability to pause graph execution, save the full state, wait indefinitely for human input, then resume exactly where it left off. This is what enables safe agentic systems.

from langgraph.checkpoint.postgres import PostgresSaver

# Use PostgreSQL as checkpointing backend (survives server restarts)
checkpointer = PostgresSaver.from_conn_string(DATABASE_URL)

graph = builder.compile(
    checkpointer=checkpointer,
    interrupt_before=["send_email"]  # pause before this node every time
)

# First run: executes until send_email, then pauses
thread = {"configurable": {"thread_id": "lead-abc123"}}
for event in graph.stream(initial_state, thread):
    print(event)

# The state is saved in PostgreSQL. Server can restart. Days can pass.

# Human reviews the draft via your web app, then resumes:
graph.update_state(thread, {"human_approved": True})
for event in graph.stream(None, thread):  # None = resume from saved state
    print(event)
# send_email node now executes

This pattern — pause, save state, wait for human, resume — is impossible with simple chains or the original AgentExecutor. With LangGraph + PostgreSQL checkpointing, your agent workflows survive server restarts, can wait days for approval, and give humans a meaningful review step before irreversible actions.

Parallel Execution: Fan-Out / Fan-In

LangGraph supports parallel node execution natively. Research a company, scrape their website, and fetch their LinkedIn — all simultaneously — then merge the results:

builder.add_edge("start_research", ["web_search", "linkedin_fetch", "news_search"])
# All three nodes run in parallel

builder.add_edge(["web_search", "linkedin_fetch", "news_search"], "merge_research")
# Merge node waits for all three to complete

This fan-out/fan-in pattern reduces a 3-step sequential workflow (3 × 2s = 6s) to parallel execution (~2s). For research agents, data enrichment pipelines, and multi-source retrieval, this is a significant latency improvement.

Subgraphs: Composing Complex Workflows

Large agentic systems can be decomposed into subgraphs — each a complete LangGraph workflow that can be embedded as a node in a parent graph. The lead enrichment graph becomes a node in the sales automation graph, which becomes a node in the CRM intelligence graph. This composability enables team-scale development of complex multi-agent systems with clear boundaries and testability.

LangGraph vs CrewAI: When to Use Each

CrewAI: Higher-level abstraction. Define agents with roles, assign tasks, let CrewAI handle orchestration. Faster to prototype. Less control over exact execution flow. Better for content generation, research, and report-writing agents where the workflow is relatively linear.
LangGraph: Lower-level, more control. You define every node and every edge. Required for complex conditional branching, human-in-the-loop, stateful long-running workflows, and production systems where reliability and predictability are critical. Steeper learning curve, substantially more production-ready.

In practice: prototype with CrewAI to validate the concept, rebuild with LangGraph for production deployment.

LangGraph: Build Stateful AI Agents That Actually Work in Production