What database is best for storing AI conversation history?

The right choice depends on scale: SQLite with FTS5 for personal use, PostgreSQL with pgvector for team scale, and a data lakehouse with a separate vector database for enterprise scale. Each offers different tradeoffs between simplicity, query flexibility, and scalability.

How do you implement semantic search over AI conversations?

Chunk conversations into 400-token segments with overlap, generate dense vector embeddings using a model like text-embedding-3-small, store vectors in a database with pgvector or a dedicated vector store like Pinecone, and retrieve top-k nearest neighbors at query time. Combine with keyword search for a hybrid retrieval system.

Building an AI Chat Export Tool: Developer Technical Guide

Defining the Data Model

The core data model for an AI conversation archive needs to represent four entities: Conversation (a container with metadata — title, creation date, AI provider, model version, total tokens), Message (individual turns with role, content, timestamp, and token count), Attachment (files, images, or documents provided as context), and Annotation (user-added notes, tags, or highlights on any message).

Relationships: a Conversation has many Messages, in sequence. A Message may have many Attachments. Any Message may have many Annotations. The Conversation entity should store a content_hash for integrity verification and a sync_status for tracking whether the local archive matches the provider's server state.

Connecting to AI Provider APIs

OpenAI provides a conversations export endpoint that returns the full history as JSON (accessible via the web UI's data export, not the API directly). Claude's API doesn't offer a native history export; capturing history requires intercepting conversations at the client layer and persisting them to your own storage. For production archive tools, a browser extension that intercepts API responses provides the most comprehensive capture across providers without relying on official export features that may change.

Storage Architecture Options

For personal-scale tools (one to ten users), SQLite with FTS5 full-text search provides a zero-infrastructure, sub-millisecond query solution that fits in a single file. For team-scale (ten to a thousand users), PostgreSQL with pgvector for semantic search handles both structured queries and vector similarity at acceptable latency. For enterprise scale (thousands of users, billions of tokens), a data lakehouse architecture with Parquet storage, columnar processing (DuckDB or Apache Spark), and a separate vector database (Pinecone or Weaviate) provides the necessary query flexibility and storage efficiency.

Encryption at Rest

Conversation archives contain sensitive personal data and must be encrypted at rest. The right architecture encrypts at two levels: storage-level encryption (disk encryption or provider-managed encryption in S3/GCS) and application-level encryption (encrypting content fields before writing to the database). Application-level encryption is critical for multi-tenant systems where the database operator should not be able to read user content without the user's key.

Search and Retrieval Design

Build search in two layers: keyword search (SQLite FTS5 or PostgreSQL tsvector) for exact-match queries and semantic search (vector embeddings) for concept-based retrieval. Chunk conversations into 400-token segments with 50-token overlap before embedding. Use OpenAI's text-embedding-3-small (1536 dimensions) for cost efficiency or text-embedding-3-large for maximum accuracy. Store embeddings in a vector database with metadata filters for date range, AI provider, and conversation topic to enable faceted semantic search.

Export Formats

Support at minimum three export formats: JSON (the native format, suitable for programmatic processing), Markdown (human-readable, compatible with Obsidian, Notion, and similar tools), and HTML (self-contained archives that render in any browser with no dependencies). For enterprise users, add CSV (for analysis in Excel or data tools) and PDF (for compliance and legal hold purposes).

Building an AI Chat Export Tool: A Technical Guide for Developers