LangGraph: Build Stateful AI Agents That Actually Work in Production
LangGraph models AI agent workflows as graphs — enabling branching, looping, parallel execution, and human-in-the-loop checkpoints that simple chain-based agents can't handle. This is how you build agents that don't break.
Why LangChain Chains Break in Production
The first generation of AI agents — built with LangChain's AgentExecutor — worked brilliantly in demos and broke frustratingly in production. The problems were structural: agents could loop infinitely with no interruption mechanism, there was no way to pause for human review before a destructive action, branching logic was awkward, error recovery required rebuilding the entire chain, and state was ephemeral — a crash meant restarting from scratch with no memory of progress.
LangGraph, released by LangChain in early 2024, fundamentally rearchitects agents as stateful graphs. Instead of a linear chain of steps, a LangGraph workflow is a directed graph where nodes are processing steps (LLM calls, tool executions, human review), edges define conditional control flow, and state is a typed Python object that persists across every node and across time (via a checkpointing backend). This graph model unlocks capabilities that chain-based agents simply cannot achieve.
Core Concepts
State: The Single Source of Truth
Every LangGraph workflow is built around a State object — a TypedDict (or Pydantic model) that holds all information about the current workflow execution. Every node reads from state and writes to state. The graph passes state between nodes; nodes never communicate directly.
from typing import TypedDict, Annotated, List
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[List, add_messages] # accumulated conversation
lead_data: dict # enriched lead information
email_draft: str # AI-generated email
human_approved: bool # approval status
retry_count: int # loop control
final_output: str
The Annotated[List, add_messages] annotation is important — it tells LangGraph to append new messages rather than replace the entire list. Custom reducers let you define exactly how state fields are updated when nodes write to them.
Nodes: Processing Steps
Nodes are Python functions (or async functions) that take state as input and return a dict of state updates:
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(model="gpt-4o", temperature=0)
def research_lead(state: AgentState) -> dict:
"""Enrich lead data using AI analysis."""
lead_info = state["lead_data"]
response = llm.invoke([
HumanMessage(content=f"""
Research this lead and provide:
- ICP fit score (1-10)
- Company summary (2 sentences)
- Best outreach angle
Lead: {lead_info}
""")
])
return {
"messages": [response],
"lead_data": {**lead_info, "research": response.content}
}
def draft_email(state: AgentState) -> dict:
"""Draft personalised outreach email."""
research = state["lead_data"].get("research", "")
response = llm.invoke([
HumanMessage(content=f"Write a 3-sentence cold email based on: {research}")
])
return {"email_draft": response.content}
Edges: Control Flow
Edges connect nodes and define execution order. Conditional edges are the key capability that makes LangGraph powerful — they route execution based on the current state:
from langgraph.graph import StateGraph, END
def route_after_research(state: AgentState) -> str:
"""Route based on ICP fit score extracted from research."""
research = state["lead_data"].get("research", "")
if "score: 9" in research or "score: 10" in research:
return "draft_email" # high-value lead, proceed
elif state["retry_count"] < 2:
return "research_lead" # low confidence, retry research
else:
return "disqualify_lead" # give up after 2 retries
Building the Graph
from langgraph.graph import StateGraph, START, END
# Build the graph
builder = StateGraph(AgentState)
# Add nodes
builder.add_node("research_lead", research_lead)
builder.add_node("draft_email", draft_email)
builder.add_node("human_review", human_review_node) # interrupt point
builder.add_node("send_email", send_email_node)
builder.add_node("disqualify_lead", disqualify_node)
# Add edges
builder.add_edge(START, "research_lead")
builder.add_conditional_edges("research_lead", route_after_research)
builder.add_edge("draft_email", "human_review")
builder.add_conditional_edges(
"human_review",
lambda state: "send_email" if state["human_approved"] else "draft_email"
)
builder.add_edge("send_email", END)
builder.add_edge("disqualify_lead", END)
graph = builder.compile()
Human-in-the-Loop: Interrupting for Approval
The single most important production feature of LangGraph is interrupts — the ability to pause graph execution, save the full state, wait indefinitely for human input, then resume exactly where it left off. This is what enables safe agentic systems.
from langgraph.checkpoint.postgres import PostgresSaver
# Use PostgreSQL as checkpointing backend (survives server restarts)
checkpointer = PostgresSaver.from_conn_string(DATABASE_URL)
graph = builder.compile(
checkpointer=checkpointer,
interrupt_before=["send_email"] # pause before this node every time
)
# First run: executes until send_email, then pauses
thread = {"configurable": {"thread_id": "lead-abc123"}}
for event in graph.stream(initial_state, thread):
print(event)
# The state is saved in PostgreSQL. Server can restart. Days can pass.
# Human reviews the draft via your web app, then resumes:
graph.update_state(thread, {"human_approved": True})
for event in graph.stream(None, thread): # None = resume from saved state
print(event)
# send_email node now executes
This pattern — pause, save state, wait for human, resume — is impossible with simple chains or the original AgentExecutor. With LangGraph + PostgreSQL checkpointing, your agent workflows survive server restarts, can wait days for approval, and give humans a meaningful review step before irreversible actions.
Parallel Execution: Fan-Out / Fan-In
LangGraph supports parallel node execution natively. Research a company, scrape their website, and fetch their LinkedIn — all simultaneously — then merge the results:
builder.add_edge("start_research", ["web_search", "linkedin_fetch", "news_search"])
# All three nodes run in parallel
builder.add_edge(["web_search", "linkedin_fetch", "news_search"], "merge_research")
# Merge node waits for all three to complete
This fan-out/fan-in pattern reduces a 3-step sequential workflow (3 × 2s = 6s) to parallel execution (~2s). For research agents, data enrichment pipelines, and multi-source retrieval, this is a significant latency improvement.
Subgraphs: Composing Complex Workflows
Large agentic systems can be decomposed into subgraphs — each a complete LangGraph workflow that can be embedded as a node in a parent graph. The lead enrichment graph becomes a node in the sales automation graph, which becomes a node in the CRM intelligence graph. This composability enables team-scale development of complex multi-agent systems with clear boundaries and testability.
LangGraph vs CrewAI: When to Use Each
- CrewAI: Higher-level abstraction. Define agents with roles, assign tasks, let CrewAI handle orchestration. Faster to prototype. Less control over exact execution flow. Better for content generation, research, and report-writing agents where the workflow is relatively linear.
- LangGraph: Lower-level, more control. You define every node and every edge. Required for complex conditional branching, human-in-the-loop, stateful long-running workflows, and production systems where reliability and predictability are critical. Steeper learning curve, substantially more production-ready.
In practice: prototype with CrewAI to validate the concept, rebuild with LangGraph for production deployment.
Senior Full Stack Developer — Laravel, Vue.js, Nuxt.js & AI. Available for freelance projects.
Hire Me for Your Project