Performance Engineering for AI Search: Optimizing User Experience at Scale

August 13, 2025

The Performance Wall

AI-powered search promises intelligent, contextual results that understand user intent. But there's a problem: most AI search implementations are painfully slow. Users expect Google-level responsiveness—results appearing as they type—but AI search systems typically take 800ms to 2+ seconds per query.

The challenge isn't just about making individual searches faster. It's about building search that feels instant while handling:

  • Thousands of concurrent users typing simultaneously
  • Large result sets that would freeze the UI
  • Mobile users on slow connections
  • Complex AI-powered relevance algorithms that can't be compromised

At Mastra, we faced this challenge with our documentation search. We needed sub-300ms response times with AI-powered relevance, smooth UI with unlimited results, and a great experience on mobile. Here's how we engineered a search system that's both intelligent and fast.

The Performance Problem Analysis

Our initial search implementation had several bottlenecks:

Network Layer Issues

  • No request deduplication: Multiple identical queries in flight
  • No request cancellation: Old queries continuing after new ones started
  • Inefficient payload: Large response objects with unnecessary data
  • No connection pooling: New connections for each request

UI Layer Issues

  • DOM rendering bottlenecks: Rendering 100+ search results freezes the UI
  • No virtualization: All results rendered simultaneously
  • Inefficient re-renders: Entire result list re-rendered on every keystroke
  • Memory leaks: Event listeners not cleaned up properly

AI/Search Layer Issues

  • Cold start penalties: First searches taking 2+ seconds
  • No result caching: Identical queries re-processed every time
  • Inefficient ranking: Complex AI operations on every search
  • No progressive loading: Users wait for all results before seeing any

Solution 1: Intelligent Debouncing with Request Management

The foundation of fast search is smart request management:

export function useAdvancedDebouncedSearch<T>(
  searchFn: (query: string, signal: AbortSignal) => Promise<T[]>,
  options: {
    debounceMs?: number;
    minQueryLength?: number;
    deduplicate?: boolean;
    cacheSize?: number;
  } = {}
): {
  results: T[];
  isLoading: boolean;
  search: string;
  setSearch: (query: string) => void;
  error: Error | null;
} {
  const {
    debounceMs = 300,
    minQueryLength = 2,
    deduplicate = true,
    cacheSize = 100
  } = options;

  const [results, setResults] = useState<T[]>([]);
  const [isLoading, setIsLoading] = useState(false);
  const [search, setSearch] = useState('');
  const [error, setError] = useState<Error | null>(null);

  // Request management
  const abortControllerRef = useRef<AbortController | null>(null);
  const debounceTimerRef = useRef<NodeJS.Timeout | null>(null);

  // Deduplication and caching
  const inFlightRequests = useRef(new Map<string, Promise<T[]>>());
  const resultCache = useRef(new LRUCache<string, T[]>(cacheSize));

  const executeSearch = useCallback(async (query: string): Promise<void> => {
    if (query.length < minQueryLength) {
      setResults([]);
      setIsLoading(false);
      return;
    }

    setIsLoading(true);
    setError(null);

    try {
      // Check cache first
      const cached= resultCache.current.get(query);
      if (cached) {
        console.log(`📦 Cache hit for query: "${query}"`);
        setResults(cached);
        setIsLoading(false);
        return;
      }

      // Check for in-flight request (deduplication)
      if (deduplicate && inFlightRequests.current.has(query)) {
        console.log(`🔄 Deduplicating request for query: "${query}"`);
        const results= await inFlightRequests.current.get(query)!;
        setResults(results);
        setIsLoading(false);
        return;
      }

      // Create new abort controller for this request
      if (abortControllerRef.current) {
        abortControllerRef.current.abort();
      }
      abortControllerRef.current= new AbortController();

      // Execute search with request deduplication
      const searchPromise= searchFn(query, abortControllerRef.current.signal);
      if (deduplicate) {
        inFlightRequests.current.set(query, searchPromise);
      }

      const results= await searchPromise;

      // Check if request was aborted
      if (abortControllerRef.current.signal.aborted) {
        return;
      }

      // Cache successful results
      resultCache.current.set(query, results);

      // Clean up in-flight tracking
      if (deduplicate) {
        inFlightRequests.current.delete(query);
      }

      setResults(results);
      setIsLoading(false);

    } catch (error) {
      // Clean up in-flight tracking on error
      if (deduplicate) {
        inFlightRequests.current.delete(query);
      }

      // Ignore aborted requests
      if (error instanceof DOMException && error.name= 'AbortError') {
        return;
      }

      console.error('Search error:', error);
      setError(error as Error);
      setIsLoading(false);
    }
  }, [searchFn, minQueryLength, deduplicate]);

  // Debounced search effect
  useEffect(()=> {
    // Clear previous timer
    if (debounceTimerRef.current) {
      clearTimeout(debounceTimerRef.current);
    }

    // Set new timer
    debounceTimerRef.current= setTimeout(()=> {
      executeSearch(search);
    }, debounceMs);

    // Cleanup function
    return ()=> {
      if (debounceTimerRef.current) {
        clearTimeout(debounceTimerRef.current);
      }
    };
  }, [search, executeSearch, debounceMs]);

  // Cleanup on unmount
  useEffect(()=> {
    return ()=> {
      if (abortControllerRef.current) {
        abortControllerRef.current.abort();
      }
      if (debounceTimerRef.current) {
        clearTimeout(debounceTimerRef.current);
      }
    };
  }, []);

  return {
    results,
    isLoading,
    search,
    setSearch,
    error
  };
}

This implementation provides:

  • Request cancellation: Abort old requests when new ones start
  • Deduplication: Don't make identical requests simultaneously
  • LRU caching: Cache recent results to avoid redundant requests
  • Intelligent debouncing: Wait for user to stop typing before searching
  • Error handling: Graceful degradation on network issues

Solution 2: Virtual Scrolling for Large Result Sets

Large result sets kill performance. Virtual scrolling solves this:

interface VirtualScrollProps<T> {
  items: T[];
  height: number;
  itemHeight: number;
  renderItem: (item: T, index: number) => React.ReactNode;
  className?: string;
  onScroll?: (scrollTop: number) => void;
}

function VirtualScrollList<T>({
  items,
  height,
  itemHeight,
  renderItem,
  className,
  onScroll
}: VirtualScrollProps<T>) {
  const [scrollTop, setScrollTop] = useState(0);
  const containerRef = useRef<HTMLDivElement>(null);

  // Calculate visible range
  const visibleStart = Math.floor(scrollTop / itemHeight);
  const visibleEnd = Math.min(
    visibleStart + Math.ceil(height / itemHeight) + 1, // +1 for smooth scrolling
    items.length
  );

  // Get visible items
  const visibleItems = items.slice(visibleStart, visibleEnd);

  // Calculate total height and offset
  const totalHeight = items.length * itemHeight;
  const offsetY = visibleStart * itemHeight;

  const handleScroll = (e: React.UIEvent<HTMLDivElement>) => {
    const newScrollTop = e.currentTarget.scrollTop;
    setScrollTop(newScrollTop);
    onScroll?.(newScrollTop);
  };

  // Intersection observer for better scroll performance
  useLayoutEffect(() => {
    const container = containerRef.current;
    if (!container) return;

    const observer = new IntersectionObserver(
      (entries) => {
        entries.forEach((entry) => {
          if (entry.isIntersecting) {
            // Prefetch next batch if near the end
            const index = parseInt(entry.target.getAttribute('data-index') || '0');
            if (index > items.length - 10) {
              // Trigger prefetch of more results if available
              console.log('🔍 Near end of results, consider loading more');
            }
          }
        });
      },
      { rootMargin: '100px' }
    );

    // Observe visible items
    const itemElements = container.querySelectorAll('[data-index]');
    itemElements.forEach(el => observer.observe(el));

    return () => observer.disconnect();
  }, [visibleItems, items.length]);

  return (
    <div
      ref={containerRef}
      className={className}
      style={{ height, overflow: 'auto' }}
      onScroll={handleScroll}
    >
      <div style={{ height: totalHeight, position: 'relative' }}>
        <div style={{ transform: `translateY(${offsetY}px)` }}>
          {visibleItems.map((item, index) => (
            <div
              key={visibleStart + index}
              data-index={visibleStart + index}
              style={{ height: itemHeight }}
            >
              {renderItem(item, visibleStart + index)}
            </div>
          ))}
        </div>
      </div>
    </div>
  );
}

This enables:

  • Smooth scrolling: Only render visible items
  • Memory efficiency: Constant memory usage regardless of result count
  • Intersection observer: Smart prefetching of additional results
  • Configurable item height: Flexible for different result types

Solution 3: Algolia Integration with Smart Result Processing

We migrated to Algolia for AI-powered search, but the integration required careful optimization:

export function useAlgoliaSearch(
  debounceTime = 300,
  searchOptions?: AlgoliaSearchOptions,
): UseAlgoliaSearchResult {
  const [isSearchLoading, setIsSearchLoading] = useState(false);
  const [results, setResults] = useState<AlgoliaResult[]>([]);
  const [search, setSearch] = useState("");

  // Connection pooling and reuse
  const algoliaClient = useRef<SearchClient | null>(null);
  const requestCache = useRef(new Map<string, Promise<any>>());

  // Initialize Algolia client with optimization
  useEffect(() => {
    const appId = process.env.NEXT_PUBLIC_ALGOLIA_APP_ID;
    const apiKey = process.env.NEXT_PUBLIC_ALGOLIA_SEARCH_API_KEY;

    if (appId && apiKey) {
      algoliaClient.current = algoliasearch(appId, apiKey, {
        // Connection optimization
        requester: customRequester({
          timeout: 5000,
          keepAlive: true,
          maxRetries: 2
        }),
      });
    }
  }, []);

  // Optimized search with caching and result processing
  const executeSearch = useCallback(async (query: string) => {
    if (!algoliaClient.current || !query.trim()) {
      setResults([]);
      return;
    }

    // Check request cache
    const cacheKey = `${query}:${JSON.stringify(searchOptions)}`;
    if (requestCache.current.has(cacheKey)) {
      console.log(`🚀 Request cache hit for: "${query}"`);
      try {
        const cachedResults = await requestCache.current.get(cacheKey);
        setResults(cachedResults);
        return;
      } catch (error) {
        // Cache hit but request failed, continue with new request
        requestCache.current.delete(cacheKey);
      }
    }

    const indexName = searchOptions?.indexName || "mastra_docs";

    const searchRequest = {
      indexName: indexName,
      query: query,
      params: {
        hitsPerPage: 20, // Reasonable default
        attributesToRetrieve: [
          "title",
          "content",
          "url",
          "hierarchy",
        ],
        attributesToHighlight: [
          "title",
          "content",
        ],
        attributesToSnippet: [
          "content:15", // Short snippets for performance
        ],
        highlightPreTag: "<mark>",
        highlightPostTag: "</mark>",
        snippetEllipsisText: "",
        // Locale filtering for relevant results
        ...(searchOptions?.filters && { filters: searchOptions.filters }),
      },
    };

    // Cache the request promise
    const searchPromise = algoliaClient.current
      .search([searchRequest])
      .then(({ results }) => {
        const firstResult = results[0];
        if ("hits" in firstResult) {
          return processSearchResults(firstResult.hits, query);
        }
        return [];
      });

    requestCache.current.set(cacheKey, searchPromise);

    try {
      const processedResults = await searchPromise;
      setResults(processedResults);
    } catch (error) {
      console.error("Algolia search error:", error);
      requestCache.current.delete(cacheKey); // Remove failed request from cache
      setResults([]);
    }
  }, [searchOptions]);

  // Use the advanced debounced search hook
  const {
    results: debouncedResults,
    isLoading,
    setSearch: setSearchQuery
  } = useAdvancedDebouncedSearch(
    executeSearch,
    {
      debounceMs: debounceTime,
      minQueryLength: 1,
      deduplicate: true,
      cacheSize: 50
    }
  );

  return {
    isSearchLoading: isLoading,
    results: debouncedResults,
    search,
    setSearch: (value: string) => {
      setSearch(value);
      setSearchQuery(value);
    },
  };
}

Smart Result Processing

Result processing can be a bottleneck. We optimized it:

const processSearchResults = (hits: AlgoliaHit[], query: string): AlgoliaResult[] => {
  const processedResults: AlgoliaResult[] = [];
  const queryWords = query.toLowerCase().split(/\s+/).filter(word => word.length > 2);

  for (const hit of hits) {
    // Fast relevance scoring
    const relevanceScore = calculateRelevanceScore(hit, queryWords);
    if (relevanceScore < 0.1) continue; // Skip low-relevance results

    // Smart snippet extraction
    const excerpt = extractOptimalSnippet(hit, queryWords, 180);

    // Hierarchical result processing
    const subResults = processHierarchicalResults(hit, queryWords);

    processedResults.push({
      objectID: hit.objectID,
      title: hit.title || "",
      excerpt,
      url: hit.url || "",
      relevanceScore,
      _highlightResult: hit._highlightResult,
      _snippetResult: hit._snippetResult,
      sub_results: subResults,
    });
  }

  // Sort by relevance and return top results
  return processedResults
    .sort((a, b) => b.relevanceScore - a.relevanceScore)
    .slice(0, 15); // Limit to top 15 for performance
};

const calculateRelevanceScore = (hit: AlgoliaHit, queryWords: string[]): number => {
  let score = 0;
  const title = (hit.title || "").toLowerCase();
  const content = (hit.content || "").toLowerCase();

  for (const word of queryWords) {
    // Title matches are weighted more heavily
    if (title.includes(word)) score += 0.5;
    if (content.includes(word)) score += 0.2;
  }

  // Bonus for exact phrase matches
  const fullQuery = queryWords.join(" ");
  if (title.includes(fullQuery)) score += 1.0;
  if (content.includes(fullQuery)) score += 0.5;

  return Math.min(score, 1.0);
};

const extractOptimalSnippet = (
  hit: AlgoliaHit,
  queryWords: string[],
  maxLength: number
): string => {
  // Use Algolia's snippet if available
  if (hit._snippetResult?.content?.value) {
    return hit._snippetResult.content.value;
  }

  // Extract context-aware snippet
  const content = hit.content || "";
  if (!content || queryWords.length === 0) {
    return content.substring(0, maxLength) + "...";
  }

  // Find best match position
  let bestPosition = 0;
  let bestScore = 0;

  for (const word of queryWords) {
    const position = content.toLowerCase().indexOf(word.toLowerCase());
    if (position !== -1) {
      const score = 1 / (position + 1); // Earlier matches score higher
      if (score > bestScore) {
        bestScore = score;
        bestPosition = position;
      }
    }
  }

  // Extract snippet around best match
  const start = Math.max(0, bestPosition - 60);
  const end = Math.min(content.length, start + maxLength);

  let snippet = content.substring(start, end);
  if (start > 0) snippet = "..." + snippet;
  if (end < content.length) snippet = snippet + "...";

  return snippet;
};

Solution 4: Mobile-First Performance

Mobile users have different constraints. We optimized specifically for them:

Responsive Search UI

const SearchComponent = () => {
  const [isSearchVisible, setIsSearchVisible] = useState(false);
  const isMobile = useMediaQuery('(max-width: 768px)');

  const {
    results,
    isSearchLoading,
    search,
    setSearch
  } = useAlgoliaSearch(250); // Faster debounce on mobile

  // Progressive loading for mobile
  const visibleResults = useMemo(() => {
    if (isMobile) {
      return results.slice(0, 10); // Limit results on mobile
    }
    return results;
  }, [results, isMobile]);

  return (
    <div className="search-container">
      {isMobile ? (
        <MobileSearchModal
          isVisible={isSearchVisible}
          onClose={()=> setIsSearchVisible(false)}
          results={visibleResults}
          isLoading={isSearchLoading}
          search={search}
          onSearchChange={setSearch}
        />
      ) : (
        <DesktopSearchInterface
          results={visibleResults}
          isLoading={isSearchLoading}
          search={search}
          onSearchChange={setSearch}
        />
      )}

      {isMobile && (
        <button
          className="search-trigger"
          onClick={()=> setIsSearchVisible(true)}
        >
          🔍 Search
        </button>
      )}
    </div>
  );
};

const MobileSearchModal = ({
  isVisible,
  onClose,
  results,
  isLoading,
  search,
  onSearchChange
}) => {
  const modalRef = useRef<HTMLDivElement>(null);

  // Prevent body scroll when modal is open
  useEffect(() => {
    if (isVisible) {
      document.body.style.overflow = 'hidden';
      return () => {
        document.body.style.overflow = 'unset';
      };
    }
  }, [isVisible]);

  if (!isVisible) return null;

  return (
    <div className="search-modal-overlay" onClick={onClose}>
      <div
        ref={modalRef}
        className="search-modal"
        onClick={(e)=> e.stopPropagation()}
      >
        <div className="search-header">
          <input
            type="text"
            value={search}
            onChange={(e)=> onSearchChange(e.target.value)}
            placeholder="Search documentation..."
            autoFocus
            className="search-input-mobile"
          />
          <button onClick={onClose} className="close-button">

          </button>
        </div>

        <div className="search-results-mobile">
          {isLoading && <SearchLoadingSpinner />}
          <VirtualScrollList
            items={results}
            height={window.innerHeight - 120}
            itemHeight={80}
            renderItem={(result, index)=> (
              <MobileSearchResult
                key={result.objectID}
                result={result}
                onClick={onClose}
              />
            )}
          />
        </div>
      </div>
    </div>
  );
};

Mobile-Optimized Result Processing

const MobileSearchResult = ({ result, onClick }) => {
  // Truncate content for mobile display
  const truncatedExcerpt = useMemo(() => {
    return truncateForMobile(result.excerpt, 100);
  }, [result.excerpt]);

  return (
    <Link href={result.url} onClick={onClick} className="search-result-mobile">
      <div className="result-title">{result.title}</div>
      <div className="result-excerpt">{truncatedExcerpt}</div>
      <div className="result-url">{formatUrlForDisplay(result.url)}</div>
    </Link>
  );
};

const truncateForMobile = (text: string, maxLength: number): string => {
  if (text.length <= maxLength) return text;

  // Find last complete word within limit
  const truncated = text.substring(0, maxLength);
  const lastSpace = truncated.lastIndexOf(' ');

  if (lastSpace > maxLength * 0.8) {
    return truncated.substring(0, lastSpace) + '...';
  }

  return truncated + '...';
};

Solution 5: Advanced Caching Strategy

Smart caching dramatically improves perceived performance:

class SearchCacheManager {
  private queryCache = new Map<string, CachedResult>();
  private prefetchCache = new Map<string, Promise<any>>();
  private cacheStats = { hits: 0, misses: 0, prefetches: 0 };

  constructor(
    private maxCacheSize: number = 200,
    private cacheExpiryMs: number = 5 * 60 * 1000 // 5 minutes
  ) {}

  async get(query: string): Promise<any[] | null> {
    const cached = this.queryCache.get(query);

    if (cached && !this.isExpired(cached)) {
      this.cacheStats.hits++;
      console.log(`💾 Cache hit for "${query}" (${this.cacheStats.hits} hits)`);

      // Update access time for LRU
      cached.lastAccessed = Date.now();
      return cached.results;
    }

    this.cacheStats.misses++;
    return null;
  }

  set(query: string, results: any[]): void {
    // Evict expired entries first
    this.evictExpired();

    // Evict oldest entries if cache is full
    if (this.queryCache.size >= this.maxCacheSize) {
      this.evictLRU();
    }

    this.queryCache.set(query, {
      results,
      timestamp: Date.now(),
      lastAccessed: Date.now()
    });
  }

  // Prefetch likely next searches
  async prefetch(baseQuery: string, variations: string[]): Promise<void> {
    for (const variation of variations) {
      if (this.queryCache.has(variation) || this.prefetchCache.has(variation)) {
        continue; // Already cached or being prefetched
      }

      this.cacheStats.prefetches++;
      console.log(`🔮 Prefetching results for "${variation}"`);

      // Start prefetch but don't wait
      const prefetchPromise = this.performSearch(variation);
      this.prefetchCache.set(variation, prefetchPromise);

      // Handle prefetch completion
      prefetchPromise
        .then(results => {
          this.set(variation, results);
          this.prefetchCache.delete(variation);
        })
        .catch(error => {
          console.warn(`Prefetch failed for "${variation}":`, error);
          this.prefetchCache.delete(variation);
        });
    }
  }

  // Generate search variations for prefetching
  generateSearchVariations(query: string): string[] {
    const variations: string[] = [];

    // Common typos and variations
    if (query.length > 3) {
      variations.push(query + 's'); // plurals
      variations.push(query.slice(0, -1)); // singulars
    }

    // Partial matches (for autocomplete)
    if (query.length > 2) {
      for (let i = query.length - 1; i >= 2; i--) {
        variations.push(query.substring(0, i));
      }
    }

    // Related terms (would be enhanced with ML in production)
    const relatedTerms = this.getRelatedTerms(query);
    variations.push(...relatedTerms);

    return variations.slice(0, 5); // Limit prefetch count
  }

  private async performSearch(query: string): Promise<any[]> {
    // This would call your actual search implementation
    // Placeholder for the real search logic
    return [];
  }

  private isExpired(cached: CachedResult): boolean {
    return Date.now() - cached.timestamp > this.cacheExpiryMs;
  }

  private evictExpired(): void {
    const now = Date.now();
    for (const [query, cached] of this.queryCache.entries()) {
      if (now - cached.timestamp > this.cacheExpiryMs) {
        this.queryCache.delete(query);
      }
    }
  }

  private evictLRU(): void {
    let oldestQuery = '';
    let oldestTime = Date.now();

    for (const [query, cached] of this.queryCache.entries()) {
      if (cached.lastAccessed < oldestTime) {
        oldestTime= cached.lastAccessed;
        oldestQuery= query;
      }
    }

    if (oldestQuery) {
      this.queryCache.delete(oldestQuery);
    }
  }

  private getRelatedTerms(query: string): string[] {
    // Simple related terms - would be enhanced with ML/NLP
    const termMap: Record<string, string[]> = {
      'workflow': ['workflows', 'pipeline', 'automation'],
      'agent': ['agents', 'AI', 'bot'],
      'api': ['endpoint', 'rest', 'graphql'],
      // ... more mappings
    };

    const queryLower = query.toLowerCase();
    return termMap[queryLower] || [];
  }

  getStats() {
    const hitRate = this.cacheStats.hits / (this.cacheStats.hits + this.cacheStats.misses);
    return {
      ...this.cacheStats,
      hitRate: isNaN(hitRate) ? 0 : hitRate,
      cacheSize: this.queryCache.size
    };
  }
}

Performance Monitoring and Analytics

To maintain performance, we track detailed metrics:

interface SearchPerformanceMetrics {
  queryLatency: number;
  renderLatency: number;
  cacheHitRate: number;
  userEngagement: {
    clickThroughRate: number;
    sessionDuration: number;
    queryRefinements: number;
  };
  technicalMetrics: {
    memoryUsage: number;
    domNodes: number;
    networkRequests: number;
  };
}

class SearchPerformanceMonitor {
  private metrics: SearchPerformanceMetrics[] = [];

  startSearchTransaction(query: string): SearchTransaction {
    return new SearchTransaction(query, this);
  }

  recordMetrics(metrics: SearchPerformanceMetrics): void {
    this.metrics.push({
      ...metrics,
      timestamp: Date.now()
    });

    // Keep only recent metrics
    if (this.metrics.length > 100) {
      this.metrics = this.metrics.slice(-50);
    }

    // Alert on performance degradation
    this.checkPerformanceThresholds(metrics);
  }

  private checkPerformanceThresholds(metrics: SearchPerformanceMetrics): void {
    if (metrics.queryLatency > 1000) {
      console.warn(`⚠️ Slow search query: ${metrics.queryLatency}ms`);
    }

    if (metrics.renderLatency > 100) {
      console.warn(`⚠️ Slow render: ${metrics.renderLatency}ms`);
    }

    if (metrics.cacheHitRate < 0.3) {
      console.warn(`⚠️ Low cache hit rate: ${(metrics.cacheHitRate * 100).toFixed(1)}%`);
    }
  }

  getPerformanceReport(): PerformanceReport {
    if (this.metrics.length === 0) return this.getEmptyReport();

    const recent = this.metrics.slice(-10);
    const avgQueryLatency = average(recent.map(m => m.queryLatency));
    const avgRenderLatency = average(recent.map(m => m.renderLatency));
    const avgCacheHitRate = average(recent.map(m => m.cacheHitRate));

    return {
      averageQueryLatency: avgQueryLatency,
      averageRenderLatency: avgRenderLatency,
      cacheHitRate: avgCacheHitRate,
      p95QueryLatency: percentile(recent.map(m => m.queryLatency), 95),
      trends: this.calculateTrends(recent),
      recommendations: this.generateRecommendations(recent)
    };
  }
}

class SearchTransaction {
  private startTime: number;
  private renderStartTime?: number;

  constructor(
    private query: string,
    private monitor: SearchPerformanceMonitor
  ) {
    this.startTime = performance.now();
  }

  markRenderStart(): void {
    this.renderStartTime = performance.now();
  }

  complete(results: any[], clickedResult?: any): void {
    const endTime = performance.now();
    const renderLatency = this.renderStartTime ? endTime - this.renderStartTime : 0;

    const metrics: SearchPerformanceMetrics = {
      queryLatency: endTime - this.startTime,
      renderLatency,
      cacheHitRate: 0, // Would be populated by cache manager
      userEngagement: {
        clickThroughRate: clickedResult ? 1 : 0,
        sessionDuration: endTime - this.startTime,
        queryRefinements: 0 // Would track query modifications
      },
      technicalMetrics: {
        memoryUsage: (performance as any).memory?.usedJSHeapSize || 0,
        domNodes: document.querySelectorAll('*').length,
        networkRequests: (performance.getEntriesByType('resource') as any[]).length
      }
    };

    this.monitor.recordMetrics(metrics);
  }
}

The Results

Our performance optimizations transformed the search experience:

Response Time Improvements

  • Before: 800-1200ms average query time
  • After: 150-300ms average query time
  • P95: Reduced from 2000ms to 500ms
  • Cache hit rate: 78% for repeat queries

User Experience Metrics

  • Bounce rate: Reduced from 35% to 12%
  • Search usage: Increased 240%
  • User satisfaction: 4.2/5 to 4.8/5 rating
  • Mobile usage: Increased 180%

Technical Performance

  • Memory usage: Constant regardless of result count
  • DOM nodes: Reduced from 1000+ to <200 for large result sets
  • Network requests: Reduced by 60% through deduplication and caching
  • Bundle size: Optimized search code reduced by 40%

Operational Benefits

  • Server load: 45% reduction in search API calls
  • CDN costs: 30% reduction through better caching
  • Mobile data usage: 25% reduction per search session
  • Error rate: 90% reduction in timeout errors

Lessons Learned

Building high-performance AI search taught us several key principles:

1. Perceived Performance Matters More Than Absolute Performance

Users prefer instant feedback with progressive results over waiting for perfect results.

2. Caching Strategy Is Critical

Smart caching with prefetching can eliminate 70%+ of actual search requests.

3. Mobile Requires Different Thinking

Mobile users need optimized UI patterns, not just responsive desktop interfaces.

4. Monitoring Drives Optimization

You can't optimize what you don't measure. Detailed metrics reveal optimization opportunities.

5. Network Layer Optimization Is Often The Biggest Win

Request deduplication, cancellation, and intelligent batching often provide more benefit than algorithmic improvements.

The intersection of AI intelligence and search performance isn't a tradeoff—with the right architecture, you can have both. Our users now enjoy sub-300ms search with AI-powered relevance, proving that great AI experiences don't have to sacrifice performance.

Performance engineering for AI search requires thinking about the entire stack: from network optimization to UI patterns to caching strategies. But the investment pays off in user experience that feels truly intelligent and responsive.