Devlog #7: Building a Citation System for the AI Age - Balancing Generative AI with Academic Integrity

How I built an interactive citation system that maintains academic rigor while optimizing content for AI discovery. From inline tooltips to validation scripts, here's the complete development journey.

Devlog #7: Building a Citation System for the AI Age - Balancing Generative AI with Academic Integrity

Writing about AI content optimization while maintaining academic credibility presented a fascinating challenge: how do you cite sources in a way that both serves human readers and signals trustworthiness to AI systems? Here’s how I built a citation system that bridges academic rigor with modern web UX.

The Credibility Paradox

As I developed the GEO series, I faced an uncomfortable irony. I was writing about how to optimize content for AI discovery, emphasizing the importance of citations and references as trust signals that boost AI visibility by 35% [?] . Yet my own platform lacked a proper citation system.

The problem became acute when I realized how easy it would be to fabricate impressive-sounding statistics. Want to claim that “87% of universities see improved enrollment through GEO strategies”? Just wrap it in academic-looking language and most readers would accept it at face value.

Warning

The AI Amplification Effect: Fake citations don’t just mislead human readers – they can be ingested by AI training data, spreading misinformation at scale. This creates a responsibility to model proper citation practices.

But traditional citation formats weren’t designed for interactive web content. Academic papers use numbered references like “[1]” that point to a bibliography, but web readers expect immediate context through tooltips or inline previews.

I needed a system that would:

  • Maintain academic rigor with proper attribution and verification
  • Enhance user experience with interactive, contextual information
  • Signal credibility to AI through structured, machine-readable citations
  • Prevent citation fraud through validation and health monitoring

Solution Architecture: Three-Layer Citation System

Rather than bolt on a simple citation plugin, I designed a comprehensive system with three complementary components:

1. CitedText Component - Interactive Inline Citations

The CitedText.astro component wraps cited content with structured metadata and provides rich hover interactions:

CitedText Usage Example
<CitedText 
  type="statistic"
  author="Aggarwal, P., et al."
  year="2023"
  title="GEO: Generative Engine Optimization"
  url="https://arxiv.org/abs/2311.09735"
 
>
  Research shows 40% improved visibility with proper GEO implementation
</CitedText>

This creates properly attributed content with interactive tooltips [?] that display full citation information on hover. Clicking the citation number opens the source in a new tab, ensuring readers can verify claims immediately.

The component supports different citation types (statistic, quote, fact, projection) with visual styling that helps readers understand the nature of the claim:

  • Statistics get a blue-purple gradient highlighting their quantitative nature
  • Quotes are styled in italic with a green-blue gradient
  • Facts use an orange-red gradient to denote established information
  • Projections maintain the base styling for future-looking claims

2. CitationList Component - Automatic Bibliography Generation

The CitationList.astro component automatically scans the page for CitedText elements and generates a properly formatted reference list:

CitationList.astro - Auto-generation Logic
// Auto-populate citation list from CitedText elements
document.addEventListener('DOMContentLoaded', function() {
const citedElements = document.querySelectorAll('.cited-text');
const citations = new Map();

citedElements.forEach(element => {
  const citationNumber = element.getAttribute('data-citation-number');
  const author = element.getAttribute('data-author');
  const year = element.getAttribute('data-year');
  const title = element.getAttribute('data-title');
  const url = element.getAttribute('data-url');
  
  if (citationNumber && !citations.has(citationNumber)) {
    citations.set(citationNumber, { author, year, title, url });
  }
});

// Generate formatted citation list
const sortedCitations = Array.from(citations.values())
  .sort((a, b) => a.number - b.number);
  
// Render with proper academic formatting
});

This eliminates the tedious manual work of maintaining bibliography sections while ensuring consistency between inline citations and the reference list.

3. Validation Script - Citation Health Monitoring

The validate-article.cjs script provides comprehensive citation quality assessment:

Citation Validation Output
🚀 Article Citation Validator Started
🔍 Validating article: 1-what-is-geo-and-why-higher-ed-needs-it-now.mdx

📊 Found 16 citations

✅ Citations with URLs: 14
❌ Citations without URLs: 2

🌐 VALIDATING URLs:
==================================================
✅ [1] 200 - GEO: Generative Engine Optimization
   https://arxiv.org/abs/2311.09735
✅ [2] 200 - The Economic Potential of Generative AI
   https://www.mckinsey.com/industries/education/...
❌ [8] ERROR - Industry Growth Projections
   https://broken-url.example.com/404

📊 Citation Health Score: 87%
🌟 Excellent! Citations are well-sourced.

The script performs several crucial checks:

  • URL Accessibility: Tests each citation URL to ensure sources are reachable
  • Citation Completeness: Identifies citations missing URLs (potential fakes)
  • Health Scoring: Generates an overall citation health percentage
  • Broken Link Detection: Flags inaccessible sources that need updating

Citation System Impact

Before and after comparison of citation health scores across sample AI-generated articles.

I derived these metrics by having AI generate web content with citations, then running that content through the validation script. When confronted with the results, the AI fixed its mistakes - a process that dramatically improved citation health scores. This experimental approach differs from my typical writing process, where I use generative AI as an assistant to help source statistics or recall paper citations, occasionally asking it to complete attribution details for citations I’ve lazily marked.

Technical Implementation Challenges

Tooltip Positioning and Performance

The biggest UX challenge was creating responsive, accessible tooltips that work across devices. Initial attempts using CSS-only solutions failed on mobile devices and created accessibility issues.

The solution involved a global tooltip system that:

  • Uses fixed positioning to avoid layout conflicts
  • Calculates viewport boundaries to prevent off-screen rendering
  • Implements touch-friendly delays for mobile users
  • Provides keyboard navigation support
Responsive Tooltip Positioning
// Global tooltip positioning logic
const rect = element.getBoundingClientRect();
const tooltipWidth = 320;

// Calculate horizontal position
let left = rect.left + rect.width / 2;

// Adjust if tooltip would go off-screen
if (left + tooltipWidth / 2 > window.innerWidth - 20) {
left = window.innerWidth - tooltipWidth / 2 - 20;
}
if (left - tooltipWidth / 2 < 20) {
left = tooltipWidth / 2 + 20;
}

globalTooltip.style.left = left + 'px';

Citation Deduplication and State Management

Managing citation state across multiple components required careful architecture. The same citation might appear multiple times in an article, but should only appear once in the reference list.

I solved this using a Map-based deduplication system that:

  • Collects all citations on page load
  • Deduplicates by citation number
  • Maintains consistent numbering across inline and list displays
  • Updates dynamically if citations are added via JavaScript

URL Validation Without False Positives

The validation script needed to distinguish between legitimately inaccessible sources (broken links) and temporary network issues or rate limiting.

The solution implements:

  • Timeout handling to avoid hanging on slow responses
  • Status code interpretation distinguishing between 404s and 503s
  • HEAD requests to minimize bandwidth while checking accessibility
  • Error categorization to help authors understand what needs fixing

Insight

Validation Philosophy: The script flags potential issues but doesn’t automatically dismiss citations. Some historical sources may be archived or require institutional access, so human judgment remains essential.

Real-World Usage and Iteration

Deploying the citation system across the GEO series revealed several practical considerations:

Citation Type Evolution

Initially, I only supported generic citations. But as I wrote about different types of claims, clear categories emerged:

  • Statistics needed special highlighting due to their quantitative nature
  • Quotes required different formatting to show direct attribution
  • Projections needed visual distinction from established facts

Academic Format vs. Web Readability

Traditional academic citations are optimized for print and formal review. Web citations need to balance completeness with scannability.

My format prioritizes:

  1. Author and year for immediate credibility assessment
  2. Title for topical relevance
  3. Source for institutional authority
  4. URL for immediate access
Adaptive Citation Formatting
// Citation formatting hierarchy
if (citation.author && citation.year) {
citationText += `${citation.author} (${citation.year}). `;
} else if (citation.author) {
citationText += `${citation.author}. `;
} else if (citation.source) {
citationText += `${citation.source}. `;
}

if (citation.title) {
citationText += `<em>${citation.title}</em>. `;
}

Accessibility Improvements

Screen reader testing revealed several areas for enhancement:

  • Citation markers needed proper ARIA labels
  • Tooltip content required semantic structure
  • Keyboard navigation needed explicit support
  • Color-based type distinctions needed non-visual alternatives

Performance and Development Workflow

The citation system integrates seamlessly with my writing workflow:

  1. Writing: Use CitedText components inline while drafting
  2. Validation: Run node scripts/validate-article.cjs path/to/article.mdx
  3. Review: Check citation health score and fix broken links
  4. Deploy: Citation list auto-generates on build

The validation script has become an essential part of my quality assurance process, catching broken links and missing citations before publication.

Success

Workflow Integration: The citation system adds minimal overhead to the writing process while dramatically improving content credibility and reader experience.

Future Enhancements

Several improvements are on the roadmap:

Advanced Analytics

  • Track which citations get the most engagement
  • Monitor broken link frequency over time
  • Analyze citation type effectiveness

Integration with Reference Managers

  • Import from Zotero/Mendeley APIs
  • Export citations to standard formats (BibTeX, APA, etc.)
  • Bulk citation validation across all articles

AI-Assisted Citation Checking

  • Automatically verify citation accuracy against source content
  • Suggest related sources for incomplete citations
  • Flag potential predatory or unreliable sources

Lessons Learned: Beyond Technical Implementation

Building this citation system taught me that academic rigor and modern web UX aren’t opposing forces – they’re complementary when implemented thoughtfully.

For Content Creators:

  • Citations enhance rather than burden the reading experience when properly designed
  • Validation tools catch issues humans miss during editing
  • Structured citation data benefits both human readers and AI systems

For Developers:

  • Academic publishing has UX problems that web technologies can solve
  • Accessibility considerations are crucial for scholarly content
  • Performance matters even for “serious” academic features

For the AI Age:

  • Proper citations become competitive advantages as AI systems prioritize credible sources
  • Citation integrity affects not just individual articles but entire content ecosystems
  • The tools we build shape the information landscape AI learns from

The citation system has transformed how I approach content creation, making rigorous attribution feel natural rather than burdensome. Most importantly, it demonstrates that we can optimize content for AI discovery while maintaining – and even enhancing – academic integrity.

As AI systems become primary information intermediaries, the stakes for citation quality continue to rise. The tools and practices we establish now will shape how credible information flows through the AI-mediated web.

References


JELL

JELL

Innovator, Educator & Technologist

JELL is an innovator, educator, and technologist exploring the confluence of AI, higher education, and ethical technology. Through Signals & Systems, JELL shares insights, experiments, and reflections on building meaningful digital experiences, and other random things.