Devlog #7: Building a Citation System for the AI Age - Balancing Generative AI with Academic Integrity
How I built an interactive citation system that maintains academic rigor while optimizing content for AI discovery. From inline tooltips to validation scripts, here's the complete development journey.
Writing about AI content optimization while maintaining academic credibility presented a fascinating challenge: how do you cite sources in a way that both serves human readers and signals trustworthiness to AI systems? Here’s how I built a citation system that bridges academic rigor with modern web UX.
The Credibility Paradox
As I developed the GEO series, I faced an uncomfortable irony. I was writing about how to optimize content for AI discovery, emphasizing the importance of citations and references as trust signals that boost AI visibility by 35% [?] . Yet my own platform lacked a proper citation system.
The problem became acute when I realized how easy it would be to fabricate impressive-sounding statistics. Want to claim that “87% of universities see improved enrollment through GEO strategies”? Just wrap it in academic-looking language and most readers would accept it at face value.
Warning
The AI Amplification Effect: Fake citations don’t just mislead human readers – they can be ingested by AI training data, spreading misinformation at scale. This creates a responsibility to model proper citation practices.
But traditional citation formats weren’t designed for interactive web content. Academic papers use numbered references like “[1]” that point to a bibliography, but web readers expect immediate context through tooltips or inline previews.
I needed a system that would:
- Maintain academic rigor with proper attribution and verification
- Enhance user experience with interactive, contextual information
- Signal credibility to AI through structured, machine-readable citations
- Prevent citation fraud through validation and health monitoring
Solution Architecture: Three-Layer Citation System
Rather than bolt on a simple citation plugin, I designed a comprehensive system with three complementary components:
1. CitedText Component - Interactive Inline Citations
The CitedText.astro
component wraps cited content with structured metadata and provides rich hover interactions:
<CitedText
type="statistic"
author="Aggarwal, P., et al."
year="2023"
title="GEO: Generative Engine Optimization"
url="https://arxiv.org/abs/2311.09735"
>
Research shows 40% improved visibility with proper GEO implementation
</CitedText>
This creates properly attributed content with interactive tooltips [?] that display full citation information on hover. Clicking the citation number opens the source in a new tab, ensuring readers can verify claims immediately.
The component supports different citation types (statistic
, quote
, fact
, projection
) with visual styling that helps readers understand the nature of the claim:
- Statistics get a blue-purple gradient highlighting their quantitative nature
- Quotes are styled in italic with a green-blue gradient
- Facts use an orange-red gradient to denote established information
- Projections maintain the base styling for future-looking claims
2. CitationList Component - Automatic Bibliography Generation
The CitationList.astro
component automatically scans the page for CitedText
elements and generates a properly formatted reference list:
// Auto-populate citation list from CitedText elements
document.addEventListener('DOMContentLoaded', function() {
const citedElements = document.querySelectorAll('.cited-text');
const citations = new Map();
citedElements.forEach(element => {
const citationNumber = element.getAttribute('data-citation-number');
const author = element.getAttribute('data-author');
const year = element.getAttribute('data-year');
const title = element.getAttribute('data-title');
const url = element.getAttribute('data-url');
if (citationNumber && !citations.has(citationNumber)) {
citations.set(citationNumber, { author, year, title, url });
}
});
// Generate formatted citation list
const sortedCitations = Array.from(citations.values())
.sort((a, b) => a.number - b.number);
// Render with proper academic formatting
});
This eliminates the tedious manual work of maintaining bibliography sections while ensuring consistency between inline citations and the reference list.
3. Validation Script - Citation Health Monitoring
The validate-article.cjs
script provides comprehensive citation quality assessment:
🚀 Article Citation Validator Started
🔍 Validating article: 1-what-is-geo-and-why-higher-ed-needs-it-now.mdx
📊 Found 16 citations
✅ Citations with URLs: 14
❌ Citations without URLs: 2
🌐 VALIDATING URLs:
==================================================
✅ [1] 200 - GEO: Generative Engine Optimization
https://arxiv.org/abs/2311.09735
✅ [2] 200 - The Economic Potential of Generative AI
https://www.mckinsey.com/industries/education/...
❌ [8] ERROR - Industry Growth Projections
https://broken-url.example.com/404
📊 Citation Health Score: 87%
🌟 Excellent! Citations are well-sourced.
The script performs several crucial checks:
- URL Accessibility: Tests each citation URL to ensure sources are reachable
- Citation Completeness: Identifies citations missing URLs (potential fakes)
- Health Scoring: Generates an overall citation health percentage
- Broken Link Detection: Flags inaccessible sources that need updating
Citation System Impact
Before and after comparison of citation health scores across sample AI-generated articles.
I derived these metrics by having AI generate web content with citations, then running that content through the validation script. When confronted with the results, the AI fixed its mistakes - a process that dramatically improved citation health scores. This experimental approach differs from my typical writing process, where I use generative AI as an assistant to help source statistics or recall paper citations, occasionally asking it to complete attribution details for citations I’ve lazily marked.
Technical Implementation Challenges
Tooltip Positioning and Performance
The biggest UX challenge was creating responsive, accessible tooltips that work across devices. Initial attempts using CSS-only solutions failed on mobile devices and created accessibility issues.
The solution involved a global tooltip system that:
- Uses fixed positioning to avoid layout conflicts
- Calculates viewport boundaries to prevent off-screen rendering
- Implements touch-friendly delays for mobile users
- Provides keyboard navigation support
// Global tooltip positioning logic
const rect = element.getBoundingClientRect();
const tooltipWidth = 320;
// Calculate horizontal position
let left = rect.left + rect.width / 2;
// Adjust if tooltip would go off-screen
if (left + tooltipWidth / 2 > window.innerWidth - 20) {
left = window.innerWidth - tooltipWidth / 2 - 20;
}
if (left - tooltipWidth / 2 < 20) {
left = tooltipWidth / 2 + 20;
}
globalTooltip.style.left = left + 'px';
Citation Deduplication and State Management
Managing citation state across multiple components required careful architecture. The same citation might appear multiple times in an article, but should only appear once in the reference list.
I solved this using a Map-based deduplication system that:
- Collects all citations on page load
- Deduplicates by citation number
- Maintains consistent numbering across inline and list displays
- Updates dynamically if citations are added via JavaScript
URL Validation Without False Positives
The validation script needed to distinguish between legitimately inaccessible sources (broken links) and temporary network issues or rate limiting.
The solution implements:
- Timeout handling to avoid hanging on slow responses
- Status code interpretation distinguishing between 404s and 503s
- HEAD requests to minimize bandwidth while checking accessibility
- Error categorization to help authors understand what needs fixing
Insight
Validation Philosophy: The script flags potential issues but doesn’t automatically dismiss citations. Some historical sources may be archived or require institutional access, so human judgment remains essential.
Real-World Usage and Iteration
Deploying the citation system across the GEO series revealed several practical considerations:
Citation Type Evolution
Initially, I only supported generic citations. But as I wrote about different types of claims, clear categories emerged:
- Statistics needed special highlighting due to their quantitative nature
- Quotes required different formatting to show direct attribution
- Projections needed visual distinction from established facts
Academic Format vs. Web Readability
Traditional academic citations are optimized for print and formal review. Web citations need to balance completeness with scannability.
My format prioritizes:
- Author and year for immediate credibility assessment
- Title for topical relevance
- Source for institutional authority
- URL for immediate access
// Citation formatting hierarchy
if (citation.author && citation.year) {
citationText += `${citation.author} (${citation.year}). `;
} else if (citation.author) {
citationText += `${citation.author}. `;
} else if (citation.source) {
citationText += `${citation.source}. `;
}
if (citation.title) {
citationText += `<em>${citation.title}</em>. `;
}
Accessibility Improvements
Screen reader testing revealed several areas for enhancement:
- Citation markers needed proper ARIA labels
- Tooltip content required semantic structure
- Keyboard navigation needed explicit support
- Color-based type distinctions needed non-visual alternatives
Performance and Development Workflow
The citation system integrates seamlessly with my writing workflow:
- Writing: Use
CitedText
components inline while drafting - Validation: Run
node scripts/validate-article.cjs path/to/article.mdx
- Review: Check citation health score and fix broken links
- Deploy: Citation list auto-generates on build
The validation script has become an essential part of my quality assurance process, catching broken links and missing citations before publication.
Success
Workflow Integration: The citation system adds minimal overhead to the writing process while dramatically improving content credibility and reader experience.
Future Enhancements
Several improvements are on the roadmap:
Advanced Analytics
- Track which citations get the most engagement
- Monitor broken link frequency over time
- Analyze citation type effectiveness
Integration with Reference Managers
- Import from Zotero/Mendeley APIs
- Export citations to standard formats (BibTeX, APA, etc.)
- Bulk citation validation across all articles
AI-Assisted Citation Checking
- Automatically verify citation accuracy against source content
- Suggest related sources for incomplete citations
- Flag potential predatory or unreliable sources
Lessons Learned: Beyond Technical Implementation
Building this citation system taught me that academic rigor and modern web UX aren’t opposing forces – they’re complementary when implemented thoughtfully.
For Content Creators:
- Citations enhance rather than burden the reading experience when properly designed
- Validation tools catch issues humans miss during editing
- Structured citation data benefits both human readers and AI systems
For Developers:
- Academic publishing has UX problems that web technologies can solve
- Accessibility considerations are crucial for scholarly content
- Performance matters even for “serious” academic features
For the AI Age:
- Proper citations become competitive advantages as AI systems prioritize credible sources
- Citation integrity affects not just individual articles but entire content ecosystems
- The tools we build shape the information landscape AI learns from
The citation system has transformed how I approach content creation, making rigorous attribution feel natural rather than burdensome. Most importantly, it demonstrates that we can optimize content for AI discovery while maintaining – and even enhancing – academic integrity.
As AI systems become primary information intermediaries, the stakes for citation quality continue to rise. The tools and practices we establish now will shape how credible information flows through the AI-mediated web.