Persistent Memory with Vercel AI SDK
The Vercel AI SDK is the most popular way to build AI applications in JavaScript. This tutorial shows how to add persistent memory to any AI SDK application with just a few lines of code.
How It Works
Section titled “How It Works”Recall provides a model wrapper that intercepts AI SDK calls to:
- Before generation: Query relevant memories and inject them into the prompt
- After generation: Extract new facts and store them for future use
// Without memorystreamText({ model: openai('gpt-4o'), messages })
// With memory - that's it!streamText({ model: recall(openai('gpt-4o'), { userId }), messages })Quick Setup
Section titled “Quick Setup”1. Install Packages
Section titled “1. Install Packages”npm install @youcraft/recall @youcraft/recall-ai-sdk \ @youcraft/recall-adapter-sqlite @youcraft/recall-embeddings-openai \ @youcraft/recall-extractor-openai2. Create Memory Instance
Section titled “2. Create Memory Instance”import { createMemory } from '@youcraft/recall'import { createRecall } from '@youcraft/recall-ai-sdk'import { sqliteAdapter } from '@youcraft/recall-adapter-sqlite'import { openaiEmbeddings } from '@youcraft/recall-embeddings-openai'import { openaiExtractor } from '@youcraft/recall-extractor-openai'
export const memory = createMemory({ db: sqliteAdapter({ filename: 'memories.db' }), embeddings: openaiEmbeddings({ apiKey: process.env.OPENAI_API_KEY! }), extractor: openaiExtractor({ apiKey: process.env.OPENAI_API_KEY! }),})
export const recall = createRecall({ memory })3. Wrap Your Model
Section titled “3. Wrap Your Model”import { openai } from '@ai-sdk/openai'import { streamText } from 'ai'import { recall } from '@/lib/memory'
const result = streamText({ model: recall(openai('gpt-4o-mini'), { userId: 'user_123' }), messages: [{ role: 'user', content: 'Hello!' }],})Works with Any Provider
Section titled “Works with Any Provider”Recall wraps the model, not the provider. Use it with any AI SDK provider:
import { openai } from '@ai-sdk/openai'import { anthropic } from '@ai-sdk/anthropic'import { google } from '@ai-sdk/google'
// OpenAIrecall(openai('gpt-4o'), { userId })
// Anthropicrecall(anthropic('claude-sonnet-4-20250514'), { userId })
// Googlerecall(google('gemini-1.5-pro'), { userId })Configuration Options
Section titled “Configuration Options”Memory Query Options
Section titled “Memory Query Options”Control how memories are retrieved:
const result = streamText({ model: recall(openai('gpt-4o'), { userId: 'user_123', // Maximum memories to inject limit: 10, // Minimum similarity threshold (0-1) threshold: 0.7, }), messages,})Custom System Prompt Integration
Section titled “Custom System Prompt Integration”By default, memories are prepended to your system prompt:
<memories>- User's name is Alex- User prefers dark mode- User is a software engineer</memories>
Your actual system prompt here...Customize this with formatMemories:
export const recall = createRecall({ memory, formatMemories: memories => { if (memories.length === 0) return ''
return `## What I Know About This User${memories.map(m => `- ${m.content}`).join('\n')}
Use this information to personalize your response.` },})Background Extraction
Section titled “Background Extraction”Memory extraction can be slow. Offload it to avoid blocking responses:
export const recall = createRecall({ memory, onExtract: async ({ messages, userId }) => { // Option 1: Inngest await inngest.send({ name: 'memory/extract', data: { messages, userId }, })
// Option 2: BullMQ await memoryQueue.add('extract', { messages, userId })
// Option 3: Simple async (fire-and-forget) setImmediate(async () => { const text = messages.map(m => `${m.role}: ${m.content}`).join('\n') await memory.extract(text, { userId }) }) },})Disable Auto-Extraction
Section titled “Disable Auto-Extraction”If you want to control extraction manually:
export const recall = createRecall({ memory, autoExtract: false, // Don't extract automatically})
// Extract manually when you wantawait memory.extract(conversationText, { userId })Full Example: Multi-User Chat API
Section titled “Full Example: Multi-User Chat API”import { openai } from '@ai-sdk/openai'import { streamText } from 'ai'import { recall, memory } from '@/lib/memory'
export async function POST(req: Request) { const { messages, userId } = await req.json()
if (!userId) { return new Response('userId is required', { status: 400 }) }
const result = streamText({ model: recall(openai('gpt-4o-mini'), { userId, limit: 5, }), system: `You are a helpful personal assistant. Be friendly and rememberwhat users tell you about themselves.`, messages, })
return result.toDataStreamResponse()}
// Optional: Endpoint to view a user's memoriesexport async function GET(req: Request) { const { searchParams } = new URL(req.url) const userId = searchParams.get('userId')
if (!userId) { return new Response('userId is required', { status: 400 }) }
const memories = await memory.list(userId) return Response.json(memories)}Streaming vs Non-Streaming
Section titled “Streaming vs Non-Streaming”Recall works with all AI SDK functions:
// Streamingconst result = streamText({ model: recall(openai('gpt-4o'), { userId }), messages,})
// Non-streamingconst result = await generateText({ model: recall(openai('gpt-4o'), { userId }), messages,})
// Object generationconst result = await generateObject({ model: recall(openai('gpt-4o'), { userId }), schema: mySchema, messages,})Using with useChat Hook
Section titled “Using with useChat Hook”The useChat hook works seamlessly:
'use client'import { useChat } from 'ai/react'
export default function Chat({ userId }: { userId: string }) { const { messages, input, handleSubmit, handleInputChange } = useChat({ body: { userId }, // Pass userId to the API })
return ( <form onSubmit={handleSubmit}> {messages.map(m => ( <div key={m.id}>{m.content}</div> ))} <input value={input} onChange={handleInputChange} /> </form> )}Direct Memory Operations
Section titled “Direct Memory Operations”Access the memory API directly when needed:
import { memory } from '@/lib/memory'
// Query memories by semantic similarityconst relevant = await memory.query('user preferences', { userId: 'user_123', limit: 5,})
// List all memoriesconst all = await memory.list('user_123')
// CRUD operationsawait memory.update('memory_id', { content: 'Updated fact' })await memory.delete('memory_id')await memory.clear('user_123') // Delete all user memoriesCommon Patterns
Section titled “Common Patterns”Per-Conversation Memory
Section titled “Per-Conversation Memory”For chatbots where each conversation is separate:
const result = streamText({ model: recall(openai('gpt-4o'), { userId: `${userId}_${conversationId}`, }), messages,})Shared Team Memory
Section titled “Shared Team Memory”For team workspaces where memory is shared:
const result = streamText({ model: recall(openai('gpt-4o'), { userId: `team_${teamId}`, }), messages,})Hierarchical Memory
Section titled “Hierarchical Memory”Combine personal and team memories:
// Query both personal and team memoriesconst personalMemories = await memory.query(query, { userId })const teamMemories = await memory.query(query, { userId: `team_${teamId}` })const combined = [...personalMemories, ...teamMemories]Next Steps
Section titled “Next Steps”- Next.js Chatbot Tutorial — Complete example with UI
- Background Extraction — Production-ready pattern
- Structured Memory — For precise data tracking